Best method for detecting product names entities?

Product names are typically very long in ecommerce sites. for eg : “EYEBOGLER V-Neck Shawl Collar Stylish Men’s Solid T-Shirt”

the user might not even give the entire product names in the conversations. Also typos are expected too.

Option 1 : use lots of training examples for the model to learn. the problem is model might overfit to the programatically generated examples.

Option 2: use lookup tables to list down all product names. A regex match is done in this case. Here the problem is if we factor in the variations in which a user utters a product name (with only some parts of the name, with typos etc) the list can grow really big.

Which option is better to use and do we have any other way of solving this?

1 Like

Hi @dingusagar

Looks like an interesting problem. May i know the scenario under which user might utter the product names.

how does the conversation flow look like ?

1 Like

user is asking “what is the price for V-Neck Shawl Collar T-Shirt in store xyz”

another query “how many reviews for V-Neck Shawl Collar T-Shirt”

1 Like

Hi @dingusagar

I would suggest to break down the product name into multiple entities as below

  • collar_type
  • shirt_type
  • neck_type

Once you extract these entities, Perform a keyword search over your Product name list to narrow down to exact product.

Sometimes, you might end up with more than one product which you can reconfirm with user by listing down each product name with buttons.

1 Like

Hi @siriusraja, thanks for the reply.

breaking down the product name into different entities works for a particular category like t-shirts. But how do we make it scalable for all sorts of product categories. for example smartphones, earphones, clothing etc.

1 Like

Hi @dingusagar

How many categories are there ? and can you share the list of product names so that i can think of some other efficient way.

Hi, sorry for the late reply.

I am looking for a general entity recongnition solution for any ecommerce products names belonging to various categories. for example, lets consider the flipkart products available in this public dataset Flipkart Products | Kaggle

Hi @dingusagar

In such case, i would build a common set of entities and look for these entities in the user utterance.

  • color
  • gender
  • product
  • product_model
  • brand_name

(EYEBOGLER)[brand_name] (V-Neck)[product_model] Shawl Collar Stylish (Men’s)[gender] Solid (T-Shirt)[product]

Based on the extracted values, perform a keyword search on the product table. If there are multiple product names in the resulting search, show them to the user and ask for confirmation.

I’m also facing a similar issue. Any input from the rasa team?

Both of the options you’ve mentioned have their pros and cons, and the choice between them depends on the trade-offs you’re willing to make and the specific constraints of your application. Additionally, there are alternative approaches you can consider as well. Let’s examine each option and explore other possibilities:

Option 1: Use Lots of Training Examples

Pros:

  • Can handle a wide variety of user inputs and typos.
  • Can provide more natural and flexible responses.

Cons:

  • Prone to overfitting to the training data.
  • Requires continuous updating as new products are added.

Option 2: Use Lookup Tables with Regex

Pros:

  • Can efficiently handle known product names.
  • Relatively straightforward to implement.
  • Minimizes overfitting.

Cons:

  • Can struggle with variations, typos, and new products not in the lookup table.
  • Might lead to a large lookup table as you consider variations.

Alternative Approaches:

  1. Fuzzy String Matching: Utilize fuzzy string matching algorithms that can identify similar strings even with typos. Libraries like FuzzyWuzzy or the Levenshtein distance algorithm can help with this. This approach can bridge the gap between recognizing variations in user input and keeping the lookup table manageable.
  2. Keyword Extraction and Entity Recognition: Implement natural language processing (NLP) techniques to extract keywords or named entities from the user input. This can help identify relevant product terms, even if they’re not an exact match.
  3. Hybrid Approach: Combine the strengths of both options. Start with a lookup table for known product names and use a fuzzy matching algorithm to handle variations and typos. This can strike a balance between accuracy and flexibility.
  4. User Feedback Mechanism: Implement a user feedback mechanism. When the chatbot suggests a product, allow users to confirm whether it’s the product they intended. Use this feedback to improve the system over time.
  5. Machine Learning Models: Consider training machine learning models to recognize product names, especially if the product names change frequently or if you need to handle a large number of products. These models can learn from patterns in the data, but it’s important to manage overfitting.

Ultimately, the best approach might involve a combination of these methods. For example, you could start with a lookup table and gradually expand it using user interactions and fuzzy matching algorithms. Regularly collecting user feedback and monitoring system performance can help you fine-tune your approach over time.