Imports and Login
import pandas as pd
from transformers import pipelineThe 🤗 model allenai/multicite-multilabel-scibert is interesting for creating custom intent classifiers with more classes than the ones available in the S2 API. (Paper)
It is a model trained on scientific articles and capable of predicting multiple labels for each input text.
import pandas as pd
from transformers import pipelineif INFER_INTENTS:
df = pd.read_csv(INPUT_FILE)
df_contexts = df.dropna(subset=["contexts"]).drop_duplicates(subset=["paperId", "contexts"]).copy()
pipeline = pipeline("text-classification",model="allenai/multicite-multilabel-scibert", device=1)
def data():
for _, row in df_contexts.iterrows():
yield row.contexts[:512]
outputs = [out for out in pipeline(data(), batch_size=128)]
pd.DataFrame(outputs).to_csv(OUTPUT_FILE, index=None)df = pd.read_csv(OUTPUT_FILE)
df.label.value_counts()background 163441
uses 28406
similarities 7797
differences 4551
motivation 2312
extends 863
future_work 646
Name: label, dtype: int64