Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Text-aided object segmentation and classification in images

Tegen, Agnes LU (2014) In Master's Theses in Mathematical Sciences FMA820 20132
Mathematics (Faculty of Engineering)
Abstract
Object recognition in images is a popular research field with many applications including medicine, robotics and face recognition. The task of automatically finding and identifying objects in an image is challenging in the extreme. By looking at the problem from a new angle and including additional information beside the visual, the problem becomes less ill posed.

In this thesis we investigate how the addition of text annotations to images affects the classification process. Classifications of different sets of labels as well as clusters of labels were carried out. A comparison between the results from using only visual information and from also including information from an image description is given. In most cases the additional... (More)
Object recognition in images is a popular research field with many applications including medicine, robotics and face recognition. The task of automatically finding and identifying objects in an image is challenging in the extreme. By looking at the problem from a new angle and including additional information beside the visual, the problem becomes less ill posed.

In this thesis we investigate how the addition of text annotations to images affects the classification process. Classifications of different sets of labels as well as clusters of labels were carried out. A comparison between the results from using only visual information and from also including information from an image description is given. In most cases the additional information improved the accuracy of the classification.

The obtained results were then used to design an algorithm that could, given an image with a description, find relevant words from the text and mark their presence in the image. A large set of overlapping segments is generated and each segment is classified into a set of categories. The image descriptions are parsed by an algorithm (a so called chunker) and visually relevant words (key-nouns) are extracted from the text. These key-nouns are then connected to the categories by metrics from WordNet. To create an optimal assignment of the visual segments to the key-nouns combinatorial optimization was used. The resulting system was compared to manually segmented and classified images.

The results are promising and have given rise to several new ideas for continued research. (Less)
Please use this url to cite or link to this publication:
author
Tegen, Agnes LU
supervisor
organization
course
FMA820 20132
year
type
H2 - Master's Degree (Two Years)
subject
publication/series
Master's Theses in Mathematical Sciences
report number
LUTFMA-3258-2014
ISSN
1404-6342
other publication id
2014:E5
language
English
id
4317091
date added to LUP
2014-05-16 17:05:01
date last changed
2014-05-27 11:18:40
@misc{4317091,
  abstract     = {{Object recognition in images is a popular research field with many applications including medicine, robotics and face recognition. The task of automatically finding and identifying objects in an image is challenging in the extreme. By looking at the problem from a new angle and including additional information beside the visual, the problem becomes less ill posed.

In this thesis we investigate how the addition of text annotations to images affects the classification process. Classifications of different sets of labels as well as clusters of labels were carried out. A comparison between the results from using only visual information and from also including information from an image description is given. In most cases the additional information improved the accuracy of the classification.

The obtained results were then used to design an algorithm that could, given an image with a description, find relevant words from the text and mark their presence in the image. A large set of overlapping segments is generated and each segment is classified into a set of categories. The image descriptions are parsed by an algorithm (a so called chunker) and visually relevant words (key-nouns) are extracted from the text. These key-nouns are then connected to the categories by metrics from WordNet. To create an optimal assignment of the visual segments to the key-nouns combinatorial optimization was used. The resulting system was compared to manually segmented and classified images.

The results are promising and have given rise to several new ideas for continued research.}},
  author       = {{Tegen, Agnes}},
  issn         = {{1404-6342}},
  language     = {{eng}},
  note         = {{Student Paper}},
  series       = {{Master's Theses in Mathematical Sciences}},
  title        = {{Text-aided object segmentation and classification in images}},
  year         = {{2014}},
}