MindsDB NLP Supported Tasks

MindsDB lets you create models that utilize features provided by OpenAI GPT-3. Currently, there are three operation modes:

  • Answering Questions without Context
  • Answering Questions with Context
  • Prompt Completion

Currently, MindsDB’s NLP engine is powered by Hugging Face and OpenAI. But we plan to expand to other NLP options in the future, so stay tuned!

Fine-tune an OpenAI model with MindsDB

All OpenAI models belong to the group of Large Language Models (LLMs). By definition, these are pre-trained on large amounts of data. However, it is possible to fine-tune these models with a task-specific dataset for a defined use case.

OpenAI supports fine-tuning of some of its models listed here. And with MindsDB, you can easily fine-tune an OpenAI model making it more applicable to your specific use case.

How to Bring the OpenAI Model to MindsDB

We use the CREATE ML_ENGINE and CREATE MODEL statement to bring the OpenAI models to MindsDB.

We first create the openai engine by providing the api_key:

CREATE ML_ENGINE openai_engine
FROM openai
USING
  api_key= 'YOUR_OPENAI_API_KEY';

Next, use this engine to create the model as:

CREATE MODEL project_name.predictor_name        -- AI TABLE TO STORE THE MODEL
PREDICT target_column                           -- NAME OF THE COLUMN TO STORE PREDICTED VALUES
USING
  engine = 'openai_engine',                            -- USING THE OPENAI ENGINE
  prompt_template = 'prompt/task
                    {{input_column}}',          -- PROMPT TEMPLATE WITH PLACEHOLDERS FOR INPUT COLUMNS
  model_name = 'model_name',                    -- OPTIONAL, DEFAULT IS THE gpt-3.5-turbo MODEL                                                -- 

Follow this instruction to set up the OpenAI integration in MindsDB.

Example

For more examples and explanations, visit our doc page on OpenAI.

Example using SQL

Let’s go through a sentiment classification example to understand better how to bring OpenAI models to MindsDB as AI tables.

CREATE MODEL mindsdb.sentiment_classifier
PREDICT sentiment
USING
  engine = 'openai_engine',
  prompt_template = 'predict the sentiment of the text:{{review}} exactly as either positive or negative or neutral';

On execution, we get:

Query successfully completed

Where:

ExpressionsValues
project_namemindsdb
predictor_namesentiment_classifier
target_columnsentiment
engineopenai
prompt_templatepredict the sentiment of the text:{{review}} exactly as either positive or negative or neutral

In the prompt_template parameter, we use a placeholder for a text value that comes from the review column, that is, text:{{ review }}.

Before querying for predictions, we should verify the status of the sentiment_classifier model.

DESCRIBE sentiment_classifier;

On execution, we get:

+--------------------+------+-------+-------+--------+--------+---------+-------------+---------------+------+-----------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+-------+
|NAME                |ENGINE|PROJECT|VERSION|STATUS  |ACCURACY|PREDICT  |UPDATE_STATUS|MINDSDB_VERSION|ERROR |SELECT_DATA_QUERY|TRAINING_OPTIONS                                                                                                                                       |TAG    |
+--------------------+------+-------+-------+--------+--------+---------+-------------+---------------+------+-----------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+-------+
|sentiment_classifier|openai|mindsdb|1      |complete|[NULL]  |sentiment|up_to_date   |22.12.4.3      |[NULL]|[NULL]           |{'target': 'sentiment', 'using': {'prompt_template': 'predict the sentiment of the text:{{review}} exactly as either positive or negative or neutral'}}|[NULL] |
+--------------------+------+-------+-------+--------+--------+---------+-------------+---------------+------+-----------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+-------+

Once the status is complete, we can query for predictions.

SELECT output.sentiment, input.review
FROM example_db.demo_data.amazon_reviews AS input
JOIN mindsdb.sentiment_classifier AS output
LIMIT 3;

Don’t forget to create the example_db database before using one of its tables, like in the query above.

CREATE DATABASE example_db
WITH ENGINE = "postgres",
PARAMETERS = {
    "user": "demo_user",
    "password": "demo_password",
    "host": "3.220.66.106",
    "port": "5432",
    "database": "demo"
    };

On execution, we get:

+----------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| sentiment                              | review                                                                                                                                                                                                                                                                                                                                                                            |
+----------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| positive                               | Late gift for my grandson. He is very happy with it. Easy for him (9yo ).                                                                                                                                                                                                                                                                                                         |
| The sentiment of the text is positive. | I'm not super thrilled with the proprietary OS on this unit, but it does work okay and does what I need it to do. Appearance is very nice, price is very good and I can't complain too much - just wish it were easier (or at least more obvious) to port new apps onto it. For now, it helps me see things that are too small on my phone while I'm traveling. I'm a happy buyer.|
| positive                               | I purchased this Kindle Fire HD 8 was purchased for use by 5 and 8 yer old grandchildren. They basically use it to play Amazon games that you download.                                                                                                                                                                                                                           |
+----------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

For the full library of supported examples please go here.

Example using MQL

Now let’s go through a sentiment classification using Mongo database syntax.

We have a sample Mongo database that you can connect to your MindsDB Cloud account by running this command in Mongo Shell:

use mindsdb

Followed by:

db.databases.insertOne({
    'name': 'mongo_test_db',
    'engine': 'mongodb',
    'connection_args': {
        "host": "mongodb+srv://admin:201287aA@cluster0.myfdu.mongodb.net/admin?authSource=admin&replicaSet=atlas-5koz1i-shard-0&readPreference=primary&appname=MongoDB%20Compass&ssl=true",
        "database": "test_data"
        }
})

We use this sample database throughout the example.

The next step is to create a connection between Mongo and MindsDB. Follow the instructions to connect MindsDB via Mongo Compass or Mongo Shell.

Now, we are ready to create an OpenAI model.

db.models.insertOne({
    name: 'sentiment_classifier_openai_mql',
    predict: 'sentiment',
    training_options: {
            engine: 'openai',
            prompt_template: 'predict the sentiment of the text:{{review}} exactly as either positive or negative or neutral'
           }
})

On execution, we get:

{ acknowledged: true,
  insertedId: ObjectId("63c19c3fe1d9855caa931df6") }

Before querying for predictions, we should verify the status of the sentiment_classifier model.

db.getCollection('models').find({'name': 'sentiment_classifier_openai_mql'})

On execution, we get:

{ NAME: 'sentiment_classifier_openai_mql',
  ENGINE: 'openai',
  PROJECT: 'mindsdb',
  VERSION: 1,
  STATUS: 'complete',
  ACCURACY: null,
  PREDICT: 'sentiment',
  UPDATE_STATUS: 'up_to_date',
  MINDSDB_VERSION: '22.12.4.3',
  ERROR: null,
  SELECT_DATA_QUERY: null,
  TRAINING_OPTIONS: '{\'target\': \'sentiment\', \'using\': {\'prompt_template\': \'predict the sentiment of the text:{{review}} exactly as either positive or negative or neutral\'}}',
  TAG: null,
  _id: ObjectId("000000000000002836398080") }

Once the status is complete, we can query for a single prediction.

db.sentiment_classifier_openai_mql.find({review: 'It is ok.'})

On execution, we get:

{
  sentiment: 'The sentiment of the text is neutral.',
  review: 'It is ok.'
}

You can also query for batch predictions. Here we use the mongo_test_db database connected earlier in this example.

db.sentiment_classifier_openai_mql.find(
    {'collection': 'mongo_test_db.amazon_reviews'},
    {'sentiment_classifier_openai_mql.sentiment': 'sentiment',
     'amazon_reviews.review': 'review'
    }
)

On execution, we get:

{
  sentiment: 'positive',
  review: 'Late gift for my grandson. He is very happy with it. Easy for him (9yo ).'
}
{
  sentiment: 'The sentiment of the text is positive.',
  review: "I'm not super thrilled with the proprietary OS on this unit, but it does work okay and does what I need it to do. Appearance is very nice, price is very good and I can't complain too much - just wish it were easier (or at least more obvious) to port new apps onto it. For now, it helps me see things that are too small on my phone while I'm traveling. I'm a happy buyer."
}
{
  sentiment: 'positive',
  review: 'I purchased this Kindle Fire HD 8 was purchased for use by 5 and 8 yer old grandchildren. They basically use it to play Amazon games that you download.'
}
...

For the full library of supported examples please go here.

Parameter descriptions

MindsDB lets you customize models using parameters provided by OpenAI. Currently, there are eleven parameters to optionally modify:

  • model_name: An optional string that identifies the model to use, it defaults to text-davinci-002 model, for a list of models available and their description visit: Model overview
  • max_tokens: The maximum number of tokens to generate in the completion. The token count of your prompt plus max_tokens cannot exceed the model’s context length.
  • temperature: What sampling temperature to use. Higher values means the model will take more risks.
  • top_p: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass.
  • n: How many completions to generate for each prompt.
  • stop: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
  • presence_penalty: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics.
  • frequency_penalty: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim.
  • best_of: Generates best_of completions server-side and returns the “best” (the one with the highest log probability per token). Results cannot be streamed. When used with n, best_of controls the number of candidate completions and n specifies how many to return – best_of must be greater than n.
  • logit_bias: Modify the likelihood of specified tokens appearing in the completion. Accepts a json object that maps tokens (specified by their token ID in the GPT tokenizer) to an associated bias value from -100 to 100. The exact effect will vary per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token.
  • user: A unique identifier representing your end-user, which can help OpenAI to monitor and detect abuse.

What’s Next?

Have fun while trying it out yourself!

If this tutorial was helpful, please give us a GitHub star here.