MindsDB NLP Supported Tasks
MindsDB lets you create models that utilize features provided by OpenAI GPT-3. Currently, there are three operation modes:
- Answering Questions without Context
- Answering Questions with Context
- Prompt Completion
Currently, MindsDB’s NLP engine is powered by Hugging
Face and OpenAI. But we plan
to expand to other NLP options in the future, so stay tuned! Fine-tune an OpenAI model with MindsDBAll OpenAI models belong to the group of Large Language Models (LLMs). By definition, these are pre-trained on large amounts of data. However, it is possible to fine-tune these models with a task-specific dataset for a defined use case.OpenAI supports fine-tuning of some of its models listed here. And with MindsDB, you can easily fine-tune an OpenAI model making it more applicable to your specific use case. How to Bring the OpenAI Model to MindsDB
We use the CREATE ML_ENGINE and CREATE MODEL statement to bring the OpenAI models to MindsDB.
We first create the openai engine by providing the api_key:
CREATE ML_ENGINE openai_engine
FROM openai
USING
  api_key= 'YOUR_OPENAI_API_KEY';
CREATE MODEL project_name.predictor_name        -- AI TABLE TO STORE THE MODEL
PREDICT target_column                           -- NAME OF THE COLUMN TO STORE PREDICTED VALUES
USING
  engine = 'openai_engine',                            -- USING THE OPENAI ENGINE
  prompt_template = 'prompt/task
                    {{input_column}}',          -- PROMPT TEMPLATE WITH PLACEHOLDERS FOR INPUT COLUMNS
  model_name = 'model_name',                    -- OPTIONAL, DEFAULT IS THE gpt-3.5-turbo MODEL                                                -- 
Example
Example using SQL
Let’s go through a sentiment classification example to understand better how to bring OpenAI models to MindsDB as AI tables.
CREATE MODEL mindsdb.sentiment_classifier
PREDICT sentiment
USING
  engine = 'openai_engine',
  prompt_template = 'predict the sentiment of the text:{{review}} exactly as either positive or negative or neutral';
Query successfully completed
| Expressions | Values | 
|---|
| project_name | mindsdb | 
| predictor_name | sentiment_classifier | 
| target_column | sentiment | 
| engine | openai | 
| prompt_template | predict the sentiment of the text:{{review}} exactly as either positive or negative or neutral | 
In the prompt_template parameter, we use a placeholder for a text value that
comes from the review column, that is, text:{{ review }}.
sentiment_classifier model.
DESCRIBE sentiment_classifier;
+--------------------+------+-------+-------+--------+--------+---------+-------------+---------------+------+-----------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+-------+
|NAME                |ENGINE|PROJECT|VERSION|STATUS  |ACCURACY|PREDICT  |UPDATE_STATUS|MINDSDB_VERSION|ERROR |SELECT_DATA_QUERY|TRAINING_OPTIONS                                                                                                                                       |TAG    |
+--------------------+------+-------+-------+--------+--------+---------+-------------+---------------+------+-----------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+-------+
|sentiment_classifier|openai|mindsdb|1      |complete|[NULL]  |sentiment|up_to_date   |22.12.4.3      |[NULL]|[NULL]           |{'target': 'sentiment', 'using': {'prompt_template': 'predict the sentiment of the text:{{review}} exactly as either positive or negative or neutral'}}|[NULL] |
+--------------------+------+-------+-------+--------+--------+---------+-------------+---------------+------+-----------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+-------+
complete, we can query for predictions.
SELECT output.sentiment, input.review
FROM example_db.demo_data.amazon_reviews AS input
JOIN mindsdb.sentiment_classifier AS output
LIMIT 3;
Don’t forget to create the example_db database before using one of its tables, like in the query above.CREATE DATABASE example_db
WITH ENGINE = "postgres",
PARAMETERS = {
    "user": "demo_user",
    "password": "demo_password",
    "host": "3.220.66.106",
    "port": "5432",
    "database": "demo"
    };
 +----------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| sentiment                              | review                                                                                                                                                                                                                                                                                                                                                                            |
+----------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| positive                               | Late gift for my grandson. He is very happy with it. Easy for him (9yo ).                                                                                                                                                                                                                                                                                                         |
| The sentiment of the text is positive. | I'm not super thrilled with the proprietary OS on this unit, but it does work okay and does what I need it to do. Appearance is very nice, price is very good and I can't complain too much - just wish it were easier (or at least more obvious) to port new apps onto it. For now, it helps me see things that are too small on my phone while I'm traveling. I'm a happy buyer.|
| positive                               | I purchased this Kindle Fire HD 8 was purchased for use by 5 and 8 yer old grandchildren. They basically use it to play Amazon games that you download.                                                                                                                                                                                                                           |
+----------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Example using MQL
Now let’s go through a sentiment classification using Mongo database syntax.
We have a sample Mongo database that you can connect to your MindsDB Cloud account by running this command in Mongo Shell:
Followed by:
db.databases.insertOne({
    'name': 'mongo_test_db',
    'engine': 'mongodb',
    'connection_args': {
        "host": "mongodb+srv://admin:201287aA@cluster0.myfdu.mongodb.net/admin?authSource=admin&replicaSet=atlas-5koz1i-shard-0&readPreference=primary&appname=MongoDB%20Compass&ssl=true",
        "database": "test_data"
        }
})
db.models.insertOne({
    name: 'sentiment_classifier_openai_mql',
    predict: 'sentiment',
    training_options: {
            engine: 'openai',
            prompt_template: 'predict the sentiment of the text:{{review}} exactly as either positive or negative or neutral'
           }
})
{ acknowledged: true,
  insertedId: ObjectId("63c19c3fe1d9855caa931df6") }
sentiment_classifier model.
db.getCollection('models').find({'name': 'sentiment_classifier_openai_mql'})
{ NAME: 'sentiment_classifier_openai_mql',
  ENGINE: 'openai',
  PROJECT: 'mindsdb',
  VERSION: 1,
  STATUS: 'complete',
  ACCURACY: null,
  PREDICT: 'sentiment',
  UPDATE_STATUS: 'up_to_date',
  MINDSDB_VERSION: '22.12.4.3',
  ERROR: null,
  SELECT_DATA_QUERY: null,
  TRAINING_OPTIONS: '{\'target\': \'sentiment\', \'using\': {\'prompt_template\': \'predict the sentiment of the text:{{review}} exactly as either positive or negative or neutral\'}}',
  TAG: null,
  _id: ObjectId("000000000000002836398080") }
complete, we can query for a single prediction.
db.sentiment_classifier_openai_mql.find({review: 'It is ok.'})
{
  sentiment: 'The sentiment of the text is neutral.',
  review: 'It is ok.'
}
mongo_test_db database connected earlier in this example.
db.sentiment_classifier_openai_mql.find(
    {'collection': 'mongo_test_db.amazon_reviews'},
    {'sentiment_classifier_openai_mql.sentiment': 'sentiment',
     'amazon_reviews.review': 'review'
    }
)
{
  sentiment: 'positive',
  review: 'Late gift for my grandson. He is very happy with it. Easy for him (9yo ).'
}
{
  sentiment: 'The sentiment of the text is positive.',
  review: "I'm not super thrilled with the proprietary OS on this unit, but it does work okay and does what I need it to do. Appearance is very nice, price is very good and I can't complain too much - just wish it were easier (or at least more obvious) to port new apps onto it. For now, it helps me see things that are too small on my phone while I'm traveling. I'm a happy buyer."
}
{
  sentiment: 'positive',
  review: 'I purchased this Kindle Fire HD 8 was purchased for use by 5 and 8 yer old grandchildren. They basically use it to play Amazon games that you download.'
}
...
Parameter descriptions
MindsDB lets you customize models using parameters provided by OpenAI. Currently, there are eleven parameters to optionally modify:
- model_name: An optional string that identifies the model to use, it defaults to text-davinci-002 model, for a list of models available and their description visit: Model overview
- max_tokens: The maximum number of tokens to generate in the completion. The token count of your prompt plus max_tokens cannot exceed the model’s context length.
- temperature: What sampling temperature to use. Higher values means the model will take more risks.
- top_p: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass.
- n: How many completions to generate for each prompt.
- stop: Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
- presence_penalty: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics.
- frequency_penalty: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim.
- best_of: Generates best_of completions server-side and returns the “best” (the one with the highest log probability per token). Results cannot be streamed. When used with n, best_of controls the number of candidate completions and n specifies how many to return – best_of must be greater than n.
- logit_bias: Modify the likelihood of specified tokens appearing in the completion. Accepts a json object that maps tokens (specified by their token ID in the GPT tokenizer) to an associated bias value from -100 to 100. The exact effect will vary per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token.
- user: A unique identifier representing your end-user, which can help OpenAI to monitor and detect abuse.
What’s Next?
Have fun while trying it out yourself!
If this tutorial was helpful, please give us a GitHub star
here.