Imports and Login
import pandas as pd
from transformers import pipeline
The 🤗 model allenai/multicite-multilabel-scibert is interesting for creating custom intent classifiers with more classes than the ones available in the S2 API. (Paper)
It is a model trained on scientific articles and capable of predicting multiple labels for each input text.
import pandas as pd
from transformers import pipeline
if INFER_INTENTS:
= pd.read_csv(INPUT_FILE)
df = df.dropna(subset=["contexts"]).drop_duplicates(subset=["paperId", "contexts"]).copy()
df_contexts
= pipeline("text-classification",model="allenai/multicite-multilabel-scibert", device=1)
pipeline
def data():
for _, row in df_contexts.iterrows():
yield row.contexts[:512]
= [out for out in pipeline(data(), batch_size=128)]
outputs
=None) pd.DataFrame(outputs).to_csv(OUTPUT_FILE, index
= pd.read_csv(OUTPUT_FILE)
df df.label.value_counts()
background 163441
uses 28406
similarities 7797
differences 4551
motivation 2312
extends 863
future_work 646
Name: label, dtype: int64