Text-aided object segmentation and classification in images
(2014) In Master's Theses in Mathematical Sciences FMA820 20132Mathematics (Faculty of Engineering)
- Abstract
- Object recognition in images is a popular research field with many applications including medicine, robotics and face recognition. The task of automatically finding and identifying objects in an image is challenging in the extreme. By looking at the problem from a new angle and including additional information beside the visual, the problem becomes less ill posed.
In this thesis we investigate how the addition of text annotations to images affects the classification process. Classifications of different sets of labels as well as clusters of labels were carried out. A comparison between the results from using only visual information and from also including information from an image description is given. In most cases the additional... (More) - Object recognition in images is a popular research field with many applications including medicine, robotics and face recognition. The task of automatically finding and identifying objects in an image is challenging in the extreme. By looking at the problem from a new angle and including additional information beside the visual, the problem becomes less ill posed.
In this thesis we investigate how the addition of text annotations to images affects the classification process. Classifications of different sets of labels as well as clusters of labels were carried out. A comparison between the results from using only visual information and from also including information from an image description is given. In most cases the additional information improved the accuracy of the classification.
The obtained results were then used to design an algorithm that could, given an image with a description, find relevant words from the text and mark their presence in the image. A large set of overlapping segments is generated and each segment is classified into a set of categories. The image descriptions are parsed by an algorithm (a so called chunker) and visually relevant words (key-nouns) are extracted from the text. These key-nouns are then connected to the categories by metrics from WordNet. To create an optimal assignment of the visual segments to the key-nouns combinatorial optimization was used. The resulting system was compared to manually segmented and classified images.
The results are promising and have given rise to several new ideas for continued research. (Less)
Please use this url to cite or link to this publication:
http://lup.lub.lu.se/student-papers/record/4317091
- author
- Tegen, Agnes LU
- supervisor
-
- Karl Åström LU
- Magnus Oskarsson LU
- Pierre Nugues LU
- organization
- course
- FMA820 20132
- year
- 2014
- type
- H2 - Master's Degree (Two Years)
- subject
- publication/series
- Master's Theses in Mathematical Sciences
- report number
- LUTFMA-3258-2014
- ISSN
- 1404-6342
- other publication id
- 2014:E5
- language
- English
- id
- 4317091
- date added to LUP
- 2014-05-16 17:05:01
- date last changed
- 2014-05-27 11:18:40
@misc{4317091, abstract = {{Object recognition in images is a popular research field with many applications including medicine, robotics and face recognition. The task of automatically finding and identifying objects in an image is challenging in the extreme. By looking at the problem from a new angle and including additional information beside the visual, the problem becomes less ill posed. In this thesis we investigate how the addition of text annotations to images affects the classification process. Classifications of different sets of labels as well as clusters of labels were carried out. A comparison between the results from using only visual information and from also including information from an image description is given. In most cases the additional information improved the accuracy of the classification. The obtained results were then used to design an algorithm that could, given an image with a description, find relevant words from the text and mark their presence in the image. A large set of overlapping segments is generated and each segment is classified into a set of categories. The image descriptions are parsed by an algorithm (a so called chunker) and visually relevant words (key-nouns) are extracted from the text. These key-nouns are then connected to the categories by metrics from WordNet. To create an optimal assignment of the visual segments to the key-nouns combinatorial optimization was used. The resulting system was compared to manually segmented and classified images. The results are promising and have given rise to several new ideas for continued research.}}, author = {{Tegen, Agnes}}, issn = {{1404-6342}}, language = {{eng}}, note = {{Student Paper}}, series = {{Master's Theses in Mathematical Sciences}}, title = {{Text-aided object segmentation and classification in images}}, year = {{2014}}, }