Advanced

Product Classification - A Hierarchical Approach

Karlsson, Mikael LU and Karlstedt, Anton LU (2016) In LU-CS-EX 2016-31 EDA920 20161
Department of Computer Science
Abstract
The social and environmental impact associated with consuming a product is something that is becoming increasingly important to consumers and businesses alike. This impact can in theory be computed simply by classifying a product and by mapping the classified product to the corresponding life-cycle assessments research. However, this type of mapping requires an extensive product taxonomy with a large plurality of categories, which makes classification through machine learning a non-trivial task.
This thesis describes the implementation of a hierarchical product classifier using the Python library scikit-learn, that makes it possible to automatically classify products based primarily on their brands and titles into a large taxonomy.
Using... (More)
The social and environmental impact associated with consuming a product is something that is becoming increasingly important to consumers and businesses alike. This impact can in theory be computed simply by classifying a product and by mapping the classified product to the corresponding life-cycle assessments research. However, this type of mapping requires an extensive product taxonomy with a large plurality of categories, which makes classification through machine learning a non-trivial task.
This thesis describes the implementation of a hierarchical product classifier using the Python library scikit-learn, that makes it possible to automatically classify products based primarily on their brands and titles into a large taxonomy.
Using a data set of 3.1 million products spread over 800 categories, we trained a set of hierarchical classifiers using different learning algorithms. Evaluations of these algorithms showed that the best hierarchical classifier reached a hierarchical F1-score (hF1) of 0.85. (Less)
Please use this url to cite or link to this publication:
author
Karlsson, Mikael LU and Karlstedt, Anton LU
supervisor
organization
alternative title
Classifying products to compute their socio-ecological impact
course
EDA920 20161
year
type
H3 - Professional qualifications (4 Years - )
subject
keywords
product classification, hierarchical classification, text classification, UNSPSC, GTIN, scikit-learn
publication/series
LU-CS-EX 2016-31
report number
LU-CS-EX 2016-31
ISSN
1650-2884
language
English
id
8889613
date added to LUP
2016-08-29 13:30:10
date last changed
2016-08-29 13:30:10
@misc{8889613,
  abstract     = {The social and environmental impact associated with consuming a product is something that is becoming increasingly important to consumers and businesses alike. This impact can in theory be computed simply by classifying a product and by mapping the classified product to the corresponding life-cycle assessments research. However, this type of mapping requires an extensive product taxonomy with a large plurality of categories, which makes classification through machine learning a non-trivial task.
This thesis describes the implementation of a hierarchical product classifier using the Python library scikit-learn, that makes it possible to automatically classify products based primarily on their brands and titles into a large taxonomy.
Using a data set of 3.1 million products spread over 800 categories, we trained a set of hierarchical classifiers using different learning algorithms. Evaluations of these algorithms showed that the best hierarchical classifier reached a hierarchical F1-score (hF1) of 0.85.},
  author       = {Karlsson, Mikael and Karlstedt, Anton},
  issn         = {1650-2884},
  keyword      = {product classification,hierarchical classification,text classification,UNSPSC,GTIN,scikit-learn},
  language     = {eng},
  note         = {Student Paper},
  series       = {LU-CS-EX 2016-31},
  title        = {Product Classification - A Hierarchical Approach},
  year         = {2016},
}