Train model on data

Hello,

I have a data for a project I am working on. I want to train model (tensorflow_embedding pipeline) on the data.

So I inside the rasa_nlu.py file, i wrote the following code

from rasa_nlu.training_data import load_data
from rasa_nlu.model import Trainer 
from rasa_nlu import config 
from rasa_nlu.model import Interpreter 

def train_bacentabot(data_json, config_file, model_dir): 
    training_data = load_data(data_json) 
    trainer = Trainer(config.load(config_file))
    trainer.train(training_data)
    model_directory = trainer.persist(model_dir, fixed_model_name = 'bacentabot')


def predict_intent(text):  

    interpreter = Interpreter.load('./models/nlu/default/bacentabot')
    print(interpreter.parse(text)) 

Afterwards, I started the python interpreter as shown below and typed the following method to train the model on the data but it showed this error:

>>>  train_bacentabot('./data/data.json', 'config.json', './models/nlu')

Error message:

 File "<stdin>", line 1, in <module>
NameError: name 'train_bacentabot' is not defined

Any help please?

which version of rasa are you using ?

Hi @Rev0kz welcome to the forum! did you import that function before running it? Also any reason you’re not using the command line command to train your model?

Hi @akelad Yes. I tried to train my model on the command line but it failed to do so as I reported earlier.

So do I need to import the train_bacentabot() function inside the python interpreter to train the model ?

I am using rasa version 0.13.2

@Rev0kz sorry for not getting back to you, i’ve been on vacation. Did you manage to resolve your issue?

Yes. But later i decided to uninstall it and install it fro Github using the following:

git clone https://github.com/RasaHQ/rasa.git

cd rasa

pip install -r requirements.txt

ButI had the following error:

Collecting matplotlib==3.0.3 (from -r requirements.txt (line 4))
  Cache entry deserialization failed, entry ignored
  Could not find a version that satisfies the requirement matplotlib==3.0.3 (from -r requirements.txt (line 4)) (from versions: 0.86, 0.86.1, 0.86.2, 0.91.0, 0.91.1, 1.0.1, 1.1.0, 1.1.1, 1.2.0, 1.2.1, 1.3.0, 1.3.1, 1.4.0, 1.4.1rc1, 1.4.1, 1.4.2, 1.4.3, 1.5.0, 1.5.1, 1.5.2, 1.5.3, 2.0.0b1, 2.0.0b2, 2.0.0b3, 2.0.0b4, 2.0.0rc1, 2.0.0rc2, 2.0.0, 2.0.1, 2.0.2, 2.1.0rc1, 2.1.0, 2.1.1, 2.1.2, 2.2.0rc1, 2.2.0, 2.2.2, 2.2.3, 2.2.4)
No matching distribution found for matplotlib==3.0.3 (from -r requirements.txt (line 4))

Which python version are you using?

Python 2.7.16 (default, Apr  6 2019, 01:42:57) 
[GCC 8.3.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> 

Rasa requires Python 3.5, 3.6 or 3.7. You’re running 2.7.

As you’re on linux. Just try to run python3 or python3.6 and see if it works.

You can also check which versions are available with ls /usr/bin/ | grep python

Edit: For pip it’s then simply pip3 instead of pip

Thank you very much. It is working now. However this error showed when I tried to train the model: tensorflow_embedding on the data in the data/data.json file.

This is the content inside the data/data.json file:

{ 

  "rasa_nlu_data": {   
     "common_examples": [ 
	{  

          "text": "Hello", 
	  "intent": "greeting", 
	  "entities": []

	},  

	{
	 
	  "text": "Hi", 
	  "intent": "greeting", 
	  "entities": []

	 },  

	 {
           
           "text" : "Goodmorning", 
	   "intent": "greeting",
	   "entities": [] 

	 },  

	 {

           "text": : "How much is samosa", 
	  "intent":  "get_samosa_price", 
	 "entities": [] 
	 
	 },  

	 {

	   "text" : "How much is spring rolls", 
	   "intent" : "get_springrolls_price", 
	  "entities": [] 

	 },   

	 {   

           "text":  "How much is a plate of fried rice", 
	   "intent": "get_friedrice_price", 
	   "entities": [] 

	  },  

	 {   

           "text":  "where can i locate your shop",
	   "intent": "locate_shop",
	   "entities": [] 

	  },

	  {  
 
	   "text": "where is it located", 
	   "intent": "locate_shop", 
	   "entities": [] 

	  }    

	  {  

           "text": "where is your shop", 
	   "intent": "locate_shop", 
	   "entities" : [] 

	  } 

       ],

        "regex_features": [],
        "entity_synonyms": []	 

     }   


  }   

So i decided to train the model tensorflow_embedding on the data above in the data/data.json on the terminal as shown below.

from rasa_mod import eaterybot

from rasa_mod import predict_intent

eaterybot('./data/data.json', 'config.json', './models/nlu')

and the following error showed up:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "rasa_mod.py", line 7, in eaterybot
    training_data = load_data(data_json)
  File "/root/botproject/bacentabot/local/lib/python2.7/site-packages/rasa_nlu/training_data/loading.py", line 54, in load_data
    data_sets = [_load(f, language) for f in files]
  File "/root/botproject/bacentabot/local/lib/python2.7/site-packages/rasa_nlu/training_data/loading.py", line 102, in _load
    raise ValueError("Unknown data format for file {}".format(filename))
ValueError: Unknown data format for file ./data/data.json

I need help please for the above error. @akelad

what is rasa_mod? And you still seam to be running py2.7. As for the json error, have you validated whether your file is a valid json?

rasa_mod is the name of the file responsible for training the model on the data in the data/data.json file. Is it wrong to name a file in this manner ?

I have both python 2.7 and 3 versions on my machine. But I use python 3 for rasa development after @tabularasa advised me to do so.

No I have not checked. I will do so via jsonlint. Thanks .

Thank you @akelad . It is working now.