I am trying to integrate ai4bharat/IndicNER but it is throwing this error. I am not sure where to include this ‘from_pt = True’. I am using this custom component where I have loaded the TF weights. but it does not work when I put it in the config file. The error
My config file looks like this
pipeline:
- name: HFTransformersNLP
model_name: "bert"
model_weights: "ai4bharat/IndicNER"
from_pt: True
- name: LanguageModelTokenizer
intent_tokenization_flag: False
intent_split_symbol: " "
- name: CountVectorsFeaturizer
analyzer: char_wb
min_ngram: 1
max_ngram: 4
OOV_token: _oov_
use_shared_vocab: False
- name: KeywordIntentClassifier
case_sensitive: True
- name: RegexFeaturizer
- name: LexicalSyntacticFeaturizer
- name: DIETClassifier
- name: LanguageModelFeaturizer
- name: test_ner.BertEntityExtractor
- name: CRFEntityExtractor
- name: EntitySynonymMapper
- name: ResponseSelector
epochs: 100
# Configuration for Rasa Core.
# https://rasa.com/docs/rasa/core/policies/
policies:
- name: MemoizationPolicy
- name: TEDPolicy
max_history: 5
epochs: 100
- name: MappingPolicy
- name: FallbackPolicy
nlu_threshold: 0.37
core_threshold: 0.3
fallback_action_name: action_default_fallback
ambiguity_threshold: 0.1
My custom component file
import typing
from typing import Any, Dict, List, Text, Optional, Type
from transformers import pipeline
import torch
from transformers import AutoTokenizer, AutoModelForTokenClassification, TFAutoModelForTokenClassification
from rasa.shared.nlu.constants import ENTITIES
from rasa.nlu.components import Component
from rasa.nlu.config import RasaNLUModelConfig
from rasa.shared.nlu.training_data.message import Message
from rasa.shared.nlu.training_data.training_data import TrainingData
from rasa.nlu.extractors.extractor import EntityExtractor
from rasa.nlu.utils.hugging_face.hf_transformers import HFTransformersNLP
if typing.TYPE_CHECKING:
from rasa.nlu.model import Metadata
tokenizer = AutoTokenizer.from_pretrained("ai4bharat/IndicNER")
model = TFAutoModelForTokenClassification.from_pretrained("ai4bharat/IndicNER", from_pt = True)
nlp = pipeline('ner', model=model, tokenizer=tokenizer)