MindsDB: First Try

MindsDB: First Try

·

4 min read

Hi All!
Today has been a great day. Specially because I got to explore something fun and had got me back to setting up MySQL database on EC2 Instance. It has been a while I had done that. So, let's begin!

First off, exploration of MindsDB was completely out of curiosity as I had randomly stumbled upon it on Twitter. It seemed interesting at first glance, so I want to give it a try.

Briefly, MindsDB is aiming to bring power of machine learning closer to where your data resides by making In-Database machine learning possible. They have 40+ integrations, more details on their github.

image.png

SETUP

I had to create a new database, borrow a dataset from Kaggle as I already didn't have one. But if you already have a database you might as well use it directly.

I'll still record the steps as I did them for my future reference and if you'd like to start fresh too.

  • Create an EC2 Instance of type at least t2.medium of capacity 2vCPU, 4 GB RAM. If you already have a VM, or want to create one from another provider, make sure it at least has 4GB RAM.
  • Opt for any OS of your choice, I picked Ubuntu 22.04.
  • Installation of Docker & MySQL
sudo apt update
sudo apt install docker.io
sudo apt install mysql-server
  • We'll use docker solution of MindsDB, I haven't been successful in getting their pip package installed. In case you'd like to try, follow the steps here.
docker run -p 47334:47334 -p 47335:47335 mindsdb/mindsdb
  • Once the mindsdb docker container is up, you can try remote connection to it, from any of your preferred SQL IDEs, I used DBeaver (Universal Database Manager) to establish the connection with the below config:
"mysql": {
       "host": "<your-vm-ip-address>",
       "password": "",
       "port": "47335",
       "user": "mindsdb",
       "database": "mindsdb",
       "ssl": true
  • Once you are able to successfully connect to the mindsdb, you should see the following databases: image.png
  • Now that we are ready with mindsdb setup, we can go back to loading datasets/table to our new database on EC2.
sudo systemctl start mysql.service
sudo mysql

once you are inside the mysql prompt

ALTER USER 'root'@'localhost' IDENTIFIED WITH mysql_native_password BY 'password';
exit

Log in again with:

mysql -u root -p
  • Now, we'll create a new user to connect to our database
CREATE USER 'mindsdb-user'@'%' IDENTIFIED BY 'password';
GRANT ALL PRIVILEGES ON *.* TO 'mindsdb-user'@'%' WITH GRANT OPTION;
FLUSH PRIVILEGES;

I granted all privilege for this use case alone, please be careful when you grant this privilege elsewhere. If you want to restrict the connection from a specific host or IP range, mention exact host IP or use wildcard %

CREATE USER 'mindsdb-user'@'123.123.%' IDENTIFIED BY 'password';
GRANT ALL PRIVILEGES ON *.* TO 'mindsdb-user'@'123.123.%' WITH GRANT OPTION;
FLUSH PRIVILEGES;
  • If you are getting connection refused, it could be either your firewall settings or bind address of VM localhost. You can modify it on /etc/mysql/mysql.conf.d/mysql.cnf. You would need to change the value from 127.0.0.1 to 0.0.0.0
  • Once you are able make a connection to your database, create a new database
CREATE DATABASE order_info;
USE order_info;
  • Import your dataset or create your own custom tables. I used Swiggy-Zomato Order Information Dataset from Kaggle
  • Once you have all the tables ready, we'll need to import that table/dataset to MindDB database from your database.
  • On your MindsDB MySQL connection, run
    CREATE DATABASE swiggy_zomato_data
    WITH ENGINE = "mysql",
    PARAMETERS = { 
    "user": "<user-created-above>",
    "password": "<password>",
    "host": "<your-vm-ip-address>",
    "port": "3306",
    "database": "order_info"
    };
    
  • Run the below query to check if the data was loaded as expected
SELECT * 
FROM swiggy_zomato_data.<table-name>
LIMIT 10;
  • Now, let's create a predictor! With this, I am asking the model to predict lifetime_order_count from our dataset.
    CREATE PREDICTOR mindsdb.<predictor-name>
    FROM swiggy_zomato_data
    (SELECT * FROM <table-name>)
    PREDICT lifetime_order_count;
    
  • Give it some time for the training to happen, you can check the status by running the below query:
SELECT status
FROM mindsdb.predictors
WHERE name='<predictor-name>';
  • If status is not training, you can proceed further. Now, let's ask the model to estimate lifetime_order_count given conditions:
SELECT alloted_orders, delivered_orders, lifetime_order_count, lifetime_order_count_explain
FROM mindsdb.lifetime_order_predictor
WHERE delivered_orders ='70' and alloted_orders='200'
  • You might run into the below error, in you are using > or < operators, as only "=" is allowed
    SQL Error [1221] [HY000]: Only 'and' and '=' operations allowed in WHERE clause,
    
  • In case, you want the model to give more info on anomaly, lower and upper bound values, use _explain like the above query or in the below picture. image.png

Although, the predictors are of confidence just 0.4, I'm happy that I got a chance to try something new today. I'm sure to try this out with other datasets too. Hope you'll check out possibility of using MindsDB in your space and share your thoughts in the comments. If you are AI/ML enthusiast and would like to participate in MindDB challenge, check out their integrations contest!

That's it from me for today, see you again!