Multi-Label Classification of Sustainability Topics in Media Coverage: A Comparative Study of Transformer Models
(2025) DABN01 20251Department of Economics
Department of Statistics
- Abstract
- This study examines the use of transformer-based language models for multi-label classification of Environmental, Social, and Governance (ESG) topics in media coverage of DAX-listed companies. Using a dataset of approximately 11,500 third-party media texts, thirty ESG topics were assigned to documents through a pipeline combining lemmatised keyword matching and semantic similarity.
Four models were compared: BERT, FinBERT, RoBERTa, and DistilBERT. All of them were fine-tuned on the same conditions on the same ESG taxonomy and evaluation framework. FinBERT performed the best overall, which may reflect the benefits of domain-specific pretraining on financial texts. DistilBERT performed well too despite its smaller size, which showed that... (More) - This study examines the use of transformer-based language models for multi-label classification of Environmental, Social, and Governance (ESG) topics in media coverage of DAX-listed companies. Using a dataset of approximately 11,500 third-party media texts, thirty ESG topics were assigned to documents through a pipeline combining lemmatised keyword matching and semantic similarity.
Four models were compared: BERT, FinBERT, RoBERTa, and DistilBERT. All of them were fine-tuned on the same conditions on the same ESG taxonomy and evaluation framework. FinBERT performed the best overall, which may reflect the benefits of domain-specific pretraining on financial texts. DistilBERT performed well too despite its smaller size, which showed that smaller models can compete with proper fine-tuning. RoBERTa and BERT both performed well but had lower results compared to the other models.
The study also highlighted the importance of threshold tuning since each model reached its best performance at a value lower than the standard of 0.50. The tuned thresholds significantly improved both micro and macro F1 scores.
These findings shows the successful application of transformer-based models to ESG classification within external media outlets. The results are a contribution to applied NLP research on sustainability and provide a reproducible approach to large-scale ESG text classification. (Less)
Please use this url to cite or link to this publication:
http://lup.lub.lu.se/student-papers/record/9194458
- author
- McGoldrick, James Joseph LU
- supervisor
- organization
- course
- DABN01 20251
- year
- 2025
- type
- H1 - Master's Degree (One Year)
- subject
- keywords
- Transformer Models, Multi-Label Classification, Semantic Text Labelling, Media Coverage Analysis, Sustainability Topics
- language
- English
- id
- 9194458
- date added to LUP
- 2025-09-12 09:04:50
- date last changed
- 2025-09-12 09:04:50
@misc{9194458, abstract = {{This study examines the use of transformer-based language models for multi-label classification of Environmental, Social, and Governance (ESG) topics in media coverage of DAX-listed companies. Using a dataset of approximately 11,500 third-party media texts, thirty ESG topics were assigned to documents through a pipeline combining lemmatised keyword matching and semantic similarity. Four models were compared: BERT, FinBERT, RoBERTa, and DistilBERT. All of them were fine-tuned on the same conditions on the same ESG taxonomy and evaluation framework. FinBERT performed the best overall, which may reflect the benefits of domain-specific pretraining on financial texts. DistilBERT performed well too despite its smaller size, which showed that smaller models can compete with proper fine-tuning. RoBERTa and BERT both performed well but had lower results compared to the other models. The study also highlighted the importance of threshold tuning since each model reached its best performance at a value lower than the standard of 0.50. The tuned thresholds significantly improved both micro and macro F1 scores. These findings shows the successful application of transformer-based models to ESG classification within external media outlets. The results are a contribution to applied NLP research on sustainability and provide a reproducible approach to large-scale ESG text classification.}}, author = {{McGoldrick, James Joseph}}, language = {{eng}}, note = {{Student Paper}}, title = {{Multi-Label Classification of Sustainability Topics in Media Coverage: A Comparative Study of Transformer Models}}, year = {{2025}}, }