Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Integration of Image and Word Embeddings for Descriptive Image Similarity

Gustafsson, David LU and Lindberg, Tobias LU (2017) In Master's Theses in Mathematical Sciences FMA820 20171
Mathematics (Faculty of Engineering)
Abstract
Many people today possess a private digital photo collection. Such collections are often just chronologically sorted. One way to make photo browsing more interesting would be to suggest semantically related photos to a currently viewed photo, and if, in addition, such a relationship could be justified in words, that would create extra value for the user.

To find a solution to this problem, the approach of this thesis is to bridge the domains of images and language by creating vector embeddings for images in an already existing semantic vector space for words. Transfer learning with an image as input and the vector representation of a corresponding human-created caption as sought output is applied to a convolutional neural network... (More)
Many people today possess a private digital photo collection. Such collections are often just chronologically sorted. One way to make photo browsing more interesting would be to suggest semantically related photos to a currently viewed photo, and if, in addition, such a relationship could be justified in words, that would create extra value for the user.

To find a solution to this problem, the approach of this thesis is to bridge the domains of images and language by creating vector embeddings for images in an already existing semantic vector space for words. Transfer learning with an image as input and the vector representation of a corresponding human-created caption as sought output is applied to a convolutional neural network originally trained for object detection. The transfer and training is carried out in the machine learning framework TensorFlow.

The described approach shows promising performance in general and a thorough comparison of different layouts is carried out. The best model is tested, qualitatively as well as quantitatively through a task-specific custom evaluation scheme and on common benchmark datasets. The conclusion based on these results is that the proposed system is well suited for the given tasks, and that it opens up for a number of interesting extensions. (Less)
Popular Abstract (Swedish)
Många människor har idag en privat digital fotosamling, i molnet eller på en dator. Sådana samlingar är ofta bara kronologiskt sorterade, även om mer intelligenta lösningar som t.ex. ansiktsigenkänning allt oftare används för att skapa smarta strukturer. Ett annat sätt att göra en fotosamling mer dynamisk och intressant skulle kunna vara genom att föreslå semantiskt relaterade foton till det foto användaren tittar på för tillfället. Dessutom, om den semantiska relationen kan beskrivas i ord skulle det göra systemet mer transparent och skapa ytterligare värde för användaren. I detta examensarbete utforskas möjligheterna att för det ovan nämnda ändamålet integrera bild- och ordinbäddningar i en gemensam vektorrymd.
Please use this url to cite or link to this publication:
author
Gustafsson, David LU and Lindberg, Tobias LU
supervisor
organization
course
FMA820 20171
year
type
H2 - Master's Degree (Two Years)
subject
keywords
Image Similarity Description, Convolutional Neural Network, Image Retrieval, Word2vec, InceptionV3, Vector Space
publication/series
Master's Theses in Mathematical Sciences
report number
LUTFMA-3322-2017
ISSN
1404-6342
other publication id
2017:E31
language
English
id
8917105
date added to LUP
2017-06-20 15:02:00
date last changed
2017-06-20 15:02:00
@misc{8917105,
  abstract     = {{Many people today possess a private digital photo collection. Such collections are often just chronologically sorted. One way to make photo browsing more interesting would be to suggest semantically related photos to a currently viewed photo, and if, in addition, such a relationship could be justified in words, that would create extra value for the user.

To find a solution to this problem, the approach of this thesis is to bridge the domains of images and language by creating vector embeddings for images in an already existing semantic vector space for words. Transfer learning with an image as input and the vector representation of a corresponding human-created caption as sought output is applied to a convolutional neural network originally trained for object detection. The transfer and training is carried out in the machine learning framework TensorFlow.

The described approach shows promising performance in general and a thorough comparison of different layouts is carried out. The best model is tested, qualitatively as well as quantitatively through a task-specific custom evaluation scheme and on common benchmark datasets. The conclusion based on these results is that the proposed system is well suited for the given tasks, and that it opens up for a number of interesting extensions.}},
  author       = {{Gustafsson, David and Lindberg, Tobias}},
  issn         = {{1404-6342}},
  language     = {{eng}},
  note         = {{Student Paper}},
  series       = {{Master's Theses in Mathematical Sciences}},
  title        = {{Integration of Image and Word Embeddings for Descriptive Image Similarity}},
  year         = {{2017}},
}