Imports and Login
import pandas as pd
import numpy as np
import json
from bertopic import BERTopic
from sentence_transformers import SentenceTransformer
import time
import torch
BERTopic is a topic modeling technique that leverages 🤗 transformers and c-TF-IDF to create dense clusters allowing for easily interpretable topics whilst keeping important words in the topic descriptions.
import pandas as pd
import numpy as np
import json
from bertopic import BERTopic
from sentence_transformers import SentenceTransformer
import time
import torch
= pd.read_csv(INPUT_FILE)
df
= df.drop_duplicates(subset=["doi"])\
df =["doi", "abstract"]).reset_index(drop=True)
.dropna(subset= (df["title"] + " " + df["abstract"]).values sentences
= SentenceTransformer('allenai-specter')
embedding_model = BERTopic(embedding_model=embedding_model)
topic_model
if COMPUTE_TOPICS:
= topic_model.fit_transform(sentences)
topics, probs =False)
topic_model.save(OUTPUT_FILE_TOPICS_BERTOPIC, save_embedding_modelelse:
= BERTopic.load(OUTPUT_FILE_TOPICS_BERTOPIC) topic_model
0) topic_model.get_topic(
[('genome', 0.013190915624417291),
('genomes', 0.012999260015038549),
('sequences', 0.012874243742379351),
('mutations', 0.012774218242361796),
('genomic', 0.010285099888750706),
('sars', 0.007669808018225598),
('cov', 0.007650223599073412),
('mutation', 0.007591515958665949),
('phylogenetic', 0.007553624685885209),
('strains', 0.006927440394902038)]
# topic_model.visualize_topics()
=12) topic_model.visualize_barchart(top_n_topics
25] topic_model.generate_topic_labels()[:
['-1_the_and_of',
'0_genome_genomes_sequences',
'1_students_learning_education',
'2_hydroxychloroquine_hcq_trials',
'3_knowledge_attitude_practices',
'4_law_rights_court',
'5_ct_chest_lung',
'6_igg_antibody_antibodies',
'7_china_cases_wuhan',
'8_validation_nomogram_prediction',
'9_anxiety_psychological_stress',
'10_seroprevalence_igg_antibodies',
'11_expression_ace2_cells',
'12_rt_detection_pcr',
'13_patients_clinical_characteristics',
'14_model_epidemic_number',
'15_workers_employment_work',
'16_images_deep_ray',
'17_telemedicine_telehealth_sec',
'18_immune_cell_cells',
'19_de_da_para',
'20_compounds_docking_phytochemicals',
'21_pregnant_women_pregnancy',
'22_compliance_distancing_social',
'23_genetic_variants_susceptibility']
= topic_model.topics_over_time(docs=sentences,
topics_over_time =list(df.date.values),
timestamps=20,
nr_bins=True,
global_tuning=True) evolution_tuning
=20) topic_model.visualize_topics_over_time(topics_over_time, top_n_topics
= topic_model.topics_per_class(docs=sentences, classes=df["journal.title"]) topics_per_class
=20, width=900) topic_model.visualize_topics_per_class(topics_per_class, top_n_topics
Hierarchical Topic Modelling?
Repräsentative Dokumente für Topic 1_students_learning_education_teaching
=1) topic_model.get_representative_docs(topic
["Do Inclusive Education Policies Improve Employment Opportunities? Evidence from a Field Experiment In labor markets where disadvantaged students are discriminated against, meritbased college scholarships targeting these students could convey two opposing signals to employers. There is a positive signal reflecting the candidate's cognitive ability (talented in high-school and able to maintain a high GPA in college) as well as her soft skills (overcoming poverty). There is also a possible negative signal as the targeting of the scholarship indicates that the beneficiary comes from a disadvantaged household. We conduct a correspondence study to analyze the labor market impact of an inclusive education program. Beca 18 provides merit-based scholarships to talented poor students admitted to 3-year and 5-year colleges in Peru. We find that the positive signal dominates. Including information of being a scholarship recipient increases the likelihood of getting a callback for a job interview by 20%. However, the effect is much smaller in jobs and careers where the poor are under-represented, suggesting that the negative signal of the scholarship is not zero.",
"Virtual Teaching as the 'New Norm’: Analyzing Science Teachers’ Attitude toward Online Teaching, Technological Competence and Access The demand for virtual teaching is increasingly being embraced by the educational system in the Philippines due to the COVID-19 pandemic which made the conduct of the traditional classroom instruction an implausible means for the continuous delivery of education. Thus, it becomes a pressing need to determine teachers’ attitude toward the virtual teaching of Science, technological competence and access. The study enlisted a total of 256 purposively selected teachers assigned to teach Science subjects. Moreover, the investigation intended to deterime whether there is a gender divide among variables of the study, and that whether a significant relationship exist among the respondents’ attitude toward online teaching, technological competence and access. The study disclosed interesting results.",
'Phone-based audience response system as an adjunct in orthodontic teaching of undergraduate dental students: a cross-over randomised controlled trial\xa0 <p><strong>Background:</strong> The advent of electronic teaching facilities improves tutor-student communication. This study aims to explore the effectiveness of Phone-Based Audience Response System (PB-ARS), as an adjunctive pedagogy tool to enhance the retention of orthodontic information by dental students; and to explore the students’ perception of PB-ARS. </p><p><strong>Methods:</strong> This cross-over clustered randomised control trial included 34 males who were in the final year of their undergraduate dental training. Participants were allocated to one of two event groups (G1 and G2) using computer-generated randomisation. Both groups simultaneously attended two different traditional lectures (L 1 and L2) a week apart. During L1, PB-ARS was used as an adjunct to conventional presentation to teach G1 participants, (PB-ARS group) while G2’s participants acted as a control group (CG), and were taught using a traditional presentation. In the second week (L2), the interventions were crossed-over. Participants from both groups completed pre- and post-lecture multiple-choice questionnaires (MCQ) to assess their short-term retention of information. Their performance in the final MCQ exam (10 weeks following L2) was tracked to assess the long-term retention of the information. Participants also completed post-lecture questionnaires to evaluate their perceptions. </p><p><strong>Results</strong>: 29 and 31 participants from the CG and PB-ARS group completed this trial, respectively. Although 87.5% of students in the PB-ARS group showed an improvement in their immediate post-lecture scores compared with 79.3% for the CG, it was statistically insignificant (p= 0.465). Similarly, the intervention showed an insignificant effect on the long-term retention of the knowledge (p=0.560).</p><p>There was a mildly but favourable attitude of students towards the use of PB-ARS. However, the difference in the overall level of satisfaction between both groups was statistically insignificant (p=0.183).</p><p><strong>Conclusion</strong>: PB-ARS has a minimal and insignificant effect on the short- and long-term retention of orthodontic knowledge by male undergraduate dental students. PB-ARS was the preferred adjunct tool to conventional classroom teaching. Due to the limitations of this trial, a long-term randomised controlled trial with a larger sample size is recommended.\xa0\xa0</p>']
Repräsentative Dokumente für Topic 15_mortality_deaths_excess_death
=15) topic_model.get_representative_docs(topic
['Job Search, Job Posting and Unemployment Insurance During the COVID-19 Crisis During the COVID-19 pandemic, many businesses had to close and unemployment skyrocketed. To help the unemployed, the CARES Act increased US unemployment benefits by $600 a week, which increased unemployment benefit replacement rates (benefit/wage) to unprecedentedly high levels, above 100% for many workers. We investigate the state of the labor market during the COVID-19 crisis, using job applications and vacancy listings by occupation, state and industry from the online platform Glassdoor. We document two new facts. First, applications-per-vacancy were higher during the COVID-19 crisis than before. This is because job vacancies decreased by 64% during the crisis, while job applications only decreased by 21%. Job applications decreased before the CARES Act, and remained relatively stable until June 2020. Second, applications and applications-per-vacancy were slightly lower in occupation-states with a larger increase in the replacement rate after the CARES Act, but these differences are not entirely explained by the CARES Act. Overall, our evidence suggests that employers did not experience greater difficulty finding applicants for their vacancies after the CARES Act, despite the large increase in unemployment benefits.',
"Germany's Capacities to Work from Home We propose an index of working from home (WFH) capacity for the German economy, drawing on rich survey and administrative data. We find that 56 percent of jobs are WFH feasible, most of which are located in urban areas and in highly digitized industries. Using individual-level data on tasks and work conditions, we show that heterogeneity in WFH feasibility is largely explained by differences in task content. WFH feasible jobs are typically characterized by cognitive, non-manual tasks, and PC usage. We compare our survey-based measure with popular task-based measures of WFH capacity, which usually rely on determining tasks that are incompatible with WFH, and show that task-based approaches capture variation in WFH capacity across occupations quite accurately. Finally, we demonstrate that our WFH index constitutes a strong predictor of actual WFH outcomes during the Covid-19 crisis and discuss applications in the context of the pandemic and the future of work.",
'A Universal EITC: Making Work Pay in the Age of Automation The universal EITC is a worker subsidy designed to offset wage stagnation. The base proposal would replace existing subsidies for working families with a refundable 100-percent tax credit on individual wages up to $10,000 and a larger, refundable CTC. The maximum credit grows with GDP, guaranteeing that low-wage workers benefit from economic growth. The credits are offset by a broad-based VAT or income surtax. The proposals are progressive: After-tax income for the bottom quintile would increase by about 25 percent. The tax burden on the top one percent would increase by 7 to 14 percent of income, depending on financing.']