Integration of Image and Word Embeddings for Descriptive Image Similarity

Gustafsson, David; Lindberg, Tobias

Integration of Image and Word Embeddings for Descriptive Image Similarity

Mark

Gustafsson, David ^LU and Lindberg, Tobias ^LU (2017) In Master's Theses in Mathematical Sciences FMA820 20171
Mathematics (Faculty of Engineering)

Abstract: Many people today possess a private digital photo collection. Such collections are often just chronologically sorted. One way to make photo browsing more interesting would be to suggest semantically related photos to a currently viewed photo, and if, in addition, such a relationship could be justified in words, that would create extra value for the user.

To find a solution to this problem, the approach of this thesis is to bridge the domains of images and language by creating vector embeddings for images in an already existing semantic vector space for words. Transfer learning with an image as input and the vector representation of a corresponding human-created caption as sought output is applied to a convolutional neural network... (More); Many people today possess a private digital photo collection. Such collections are often just chronologically sorted. One way to make photo browsing more interesting would be to suggest semantically related photos to a currently viewed photo, and if, in addition, such a relationship could be justified in words, that would create extra value for the user.

To find a solution to this problem, the approach of this thesis is to bridge the domains of images and language by creating vector embeddings for images in an already existing semantic vector space for words. Transfer learning with an image as input and the vector representation of a corresponding human-created caption as sought output is applied to a convolutional neural network originally trained for object detection. The transfer and training is carried out in the machine learning framework TensorFlow.

The described approach shows promising performance in general and a thorough comparison of different layouts is carried out. The best model is tested, qualitatively as well as quantitatively through a task-specific custom evaluation scheme and on common benchmark datasets. The conclusion based on these results is that the proposed system is well suited for the given tasks, and that it opens up for a number of interesting extensions. (Less)
Popular Abstract (Swedish): Många människor har idag en privat digital fotosamling, i molnet eller på en dator. Sådana samlingar är ofta bara kronologiskt sorterade, även om mer intelligenta lösningar som t.ex. ansiktsigenkänning allt oftare används för att skapa smarta strukturer. Ett annat sätt att göra en fotosamling mer dynamisk och intressant skulle kunna vara genom att föreslå semantiskt relaterade foton till det foto användaren tittar på för tillfället. Dessutom, om den semantiska relationen kan beskrivas i ord skulle det göra systemet mer transparent och skapa ytterligare värde för användaren. I detta examensarbete utforskas möjligheterna att för det ovan nämnda ändamålet integrera bild- och ordinbäddningar i en gemensam vektorrymd.

Please use this url to cite or link to this publication: http://lup.lub.lu.se/student-papers/record/8917105

author

Gustafsson, David ^LU and Lindberg, Tobias ^LU

supervisor

Karl Åström ^LU

organization

Mathematics (Faculty of Engineering)

course

FMA820 20171

year

2017

type

H2 - Master's Degree (Two Years)

subject

Mathematics and Statistics

keywords

publication/series

Master's Theses in Mathematical Sciences

report number

LUTFMA-3322-2017

ISSN

1404-6342

other publication id

2017:E31

language

English

id

8917105

date added to LUP

2017-06-20 15:02:00

date last changed

2017-06-20 15:02:00

@misc{8917105,
  abstract     = {{Many people today possess a private digital photo collection. Such collections are often just chronologically sorted. One way to make photo browsing more interesting would be to suggest semantically related photos to a currently viewed photo, and if, in addition, such a relationship could be justified in words, that would create extra value for the user.

To find a solution to this problem, the approach of this thesis is to bridge the domains of images and language by creating vector embeddings for images in an already existing semantic vector space for words. Transfer learning with an image as input and the vector representation of a corresponding human-created caption as sought output is applied to a convolutional neural network originally trained for object detection. The transfer and training is carried out in the machine learning framework TensorFlow.

The described approach shows promising performance in general and a thorough comparison of different layouts is carried out. The best model is tested, qualitatively as well as quantitatively through a task-specific custom evaluation scheme and on common benchmark datasets. The conclusion based on these results is that the proposed system is well suited for the given tasks, and that it opens up for a number of interesting extensions.}},
  author       = {{Gustafsson, David and Lindberg, Tobias}},
  issn         = {{1404-6342}},
  language     = {{eng}},
  note         = {{Student Paper}},
  series       = {{Master's Theses in Mathematical Sciences}},
  title        = {{Integration of Image and Word Embeddings for Descriptive Image Similarity}},
  year         = {{2017}},
}

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Integration of Image and Word Embeddings for Descriptive Image Similarity