Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Multi-Label Classification of Sustainability Topics in Media Coverage: A Comparative Study of Transformer Models

McGoldrick, James Joseph LU (2025) DABN01 20251
Department of Economics
Department of Statistics
Abstract
This study examines the use of transformer-based language models for multi-label classification of Environmental, Social, and Governance (ESG) topics in media coverage of DAX-listed companies. Using a dataset of approximately 11,500 third-party media texts, thirty ESG topics were assigned to documents through a pipeline combining lemmatised keyword matching and semantic similarity.

Four models were compared: BERT, FinBERT, RoBERTa, and DistilBERT. All of them were fine-tuned on the same conditions on the same ESG taxonomy and evaluation framework. FinBERT performed the best overall, which may reflect the benefits of domain-specific pretraining on financial texts. DistilBERT performed well too despite its smaller size, which showed that... (More)
This study examines the use of transformer-based language models for multi-label classification of Environmental, Social, and Governance (ESG) topics in media coverage of DAX-listed companies. Using a dataset of approximately 11,500 third-party media texts, thirty ESG topics were assigned to documents through a pipeline combining lemmatised keyword matching and semantic similarity.

Four models were compared: BERT, FinBERT, RoBERTa, and DistilBERT. All of them were fine-tuned on the same conditions on the same ESG taxonomy and evaluation framework. FinBERT performed the best overall, which may reflect the benefits of domain-specific pretraining on financial texts. DistilBERT performed well too despite its smaller size, which showed that smaller models can compete with proper fine-tuning. RoBERTa and BERT both performed well but had lower results compared to the other models.

The study also highlighted the importance of threshold tuning since each model reached its best performance at a value lower than the standard of 0.50. The tuned thresholds significantly improved both micro and macro F1 scores.

These findings shows the successful application of transformer-based models to ESG classification within external media outlets. The results are a contribution to applied NLP research on sustainability and provide a reproducible approach to large-scale ESG text classification. (Less)
Please use this url to cite or link to this publication:
author
McGoldrick, James Joseph LU
supervisor
organization
course
DABN01 20251
year
type
H1 - Master's Degree (One Year)
subject
keywords
Transformer Models, Multi-Label Classification, Semantic Text Labelling, Media Coverage Analysis, Sustainability Topics
language
English
id
9194458
date added to LUP
2025-09-12 09:04:50
date last changed
2025-09-12 09:04:50
@misc{9194458,
  abstract     = {{This study examines the use of transformer-based language models for multi-label classification of Environmental, Social, and Governance (ESG) topics in media coverage of DAX-listed companies. Using a dataset of approximately 11,500 third-party media texts, thirty ESG topics were assigned to documents through a pipeline combining lemmatised keyword matching and semantic similarity.

Four models were compared: BERT, FinBERT, RoBERTa, and DistilBERT. All of them were fine-tuned on the same conditions on the same ESG taxonomy and evaluation framework. FinBERT performed the best overall, which may reflect the benefits of domain-specific pretraining on financial texts. DistilBERT performed well too despite its smaller size, which showed that smaller models can compete with proper fine-tuning. RoBERTa and BERT both performed well but had lower results compared to the other models.

The study also highlighted the importance of threshold tuning since each model reached its best performance at a value lower than the standard of 0.50. The tuned thresholds significantly improved both micro and macro F1 scores.

These findings shows the successful application of transformer-based models to ESG classification within external media outlets. The results are a contribution to applied NLP research on sustainability and provide a reproducible approach to large-scale ESG text classification.}},
  author       = {{McGoldrick, James Joseph}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{Multi-Label Classification of Sustainability Topics in Media Coverage: A Comparative Study of Transformer Models}},
  year         = {{2025}},
}