How to extract matrix X and vector y after each component call and tracker iteration?

Hey all! I am new here and I am not sure if this is the category/session I should be using.

After some quick tests on shell following some of the RASA docs and reading the “Building Chatbots with Python” from Sumit Raj, I am somewhat comfortable with the dataset formating and creating bots with well defined entities and intents.

That being said, I now want to customize and also validate some of my covariates/features: if I call a particular component before the intent classifier, I want to check the X matrix to see if the feature was created as I expected. How can I do this?

Also, I want to do tests of the same type with the entire dialogue and also some points of the tracker:

  • How do I access the (X, y) data (tidy format) that is sent to Keras to execute all the policy steps?;

  • How can I access the tracker data/check the same data at each point of a dialogue? My intent is to access the features that Keras is using to make inference after the model is already trained.

I will start to explore more the Python API but any starting point that could help me on this questions would be just great. I appreciate any answer or any link that covers these points!

X, Y that is passed to nlu intent classifier is here: rasa/embedding_intent_classifier.py at 18590e3eb699965e0826a280d4e658a329032ab6 · RasaHQ/rasa · GitHub

X, Y that is passed to Keras is here: rasa/keras_policy.py at 18590e3eb699965e0826a280d4e658a329032ab6 · RasaHQ/rasa · GitHub

please check the corresponding methods to see how it is extracted

1 Like

EDIT:

Interesting. Thank you for the quick answer. Do you know if there is a list of column names or codes for helping to interpret the columns of the X matrix?

With the Python API, I used the trainer on my “nlu.md” together with my “config.yml” and I inspected the instance of Interpreter that I get after the training.

I then use the attribute ‘pipeline’ to reach my classifier. Suppose that I allocate it then to a variable called “cl”.

d = cl._create_intent_dict(training_data)
X, Y, ints = cl._prepare_data_for_training(training_data, d)

This gives me the model matrix that I was after. Strange thing is: X only contains none… do you know why? My doubt also is: can I, somehow, identify which column corresponds to my regex treatments or my embedding features? I mean… is there a name/map for the columns which I can use to infer the meaning of the column?

Ok, I think I am getting.

This internal method “_prepare_data_for_training” just initiates X for TensorFlow populating it after. If you check the attribute “a_in” in the classifier component object, you shall get the size/dimension of the feature vector.

Given this and the graph execution plan that TensorFlow uses, I prefered to try going backward from interpreter.parse: this method has this part in its code:

    for component in self.pipeline:
        component.process(message, **self.context)

That given, I tried something like:

msg = Message("I want product XYZ", interpreter.default_output_attributes())
for c in interpreter.pipeline:
    c.process(msg, **interpreter.context)
msg.get('text_features')

Hooray! This gives me the populated model matrix X! Now I wonder if I can access the variable/feature names in some place. Any ideas?

I don’t understand, what do you mean