Exploring the data behind students’ published theses - Analyzing the pride of Lund University

Granberg, Per

Exploring the data behind students’ published theses - Analyzing the pride of Lund University

Mark

Granberg, Per ^LU (2020) STAH11 20192
Department of Statistics

Abstract: Lund University have for over ten years been using a website called LUP Student Papers where they publish theses from bachelor and master courses. The aim of this thesis is to use visualizations and data mining techniques that will explore and shows interesting aspects of the data. The wide variety of variables in the dataset can be used to gain insight regarding several interesting questions such as are the number of theses increasing for each year? Is a thesis in English becoming more common? How many times have a thesis been downloaded on average?

The second purpose of this thesis is to use a Random Forest model and classify the abstract into three faculties, LUSEM, Social Science and Engineering. The aim is to see if the three... (More); Lund University have for over ten years been using a website called LUP Student Papers where they publish theses from bachelor and master courses. The aim of this thesis is to use visualizations and data mining techniques that will explore and shows interesting aspects of the data. The wide variety of variables in the dataset can be used to gain insight regarding several interesting questions such as are the number of theses increasing for each year? Is a thesis in English becoming more common? How many times have a thesis been downloaded on average?

The second purpose of this thesis is to use a Random Forest model and classify the abstract into three faculties, LUSEM, Social Science and Engineering. The aim is to see if the three faculties can be easily classified which would suggest that there is some noticeable difference in the text between the faculties. The abstracts had to be preprocessed with natural language processing techniques such as tokenization and stemming. The classification model achieved a relatively good accuracy around 0.80 and therefore suggest that the abstract can be classified. Further research can focus on different models for text classification. (Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/student-papers/record/9033084

author

Granberg, Per ^LU

supervisor

Björn Holmquist ^LU

organization

Department of Statistics

course

STAH11 20192

year

2020

type

M2 - Bachelor Degree

subject

Business and Economics

keywords

data visualization, Lund University, classification model, NLP

language

English

id

9033084

date added to LUP

2021-01-07 12:19:11

date last changed

2021-01-07 12:19:11

@misc{9033084,
  abstract     = {{Lund University have for over ten years been using a website called LUP Student Papers where they publish theses from bachelor and master courses. The aim of this thesis is to use visualizations and data mining techniques that will explore and shows interesting aspects of the data. The wide variety of variables in the dataset can be used to gain insight regarding several interesting questions such as are the number of theses increasing for each year? Is a thesis in English becoming more common? How many times have a thesis been downloaded on average? 

The second purpose of this thesis is to use a Random Forest model and classify the abstract into three faculties, LUSEM, Social Science and Engineering. The aim is to see if the three faculties can be easily classified which would suggest that there is some noticeable difference in the text between the faculties. The abstracts had to be preprocessed with natural language processing techniques such as tokenization and stemming. The classification model achieved a relatively good accuracy around 0.80 and therefore suggest that the abstract can be classified. Further research can focus on different models for text classification.}},
  author       = {{Granberg, Per}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{Exploring the data behind students’ published theses - Analyzing the pride of Lund University}},
  year         = {{2020}},
}