Integration of Image and Word Embeddings for Descriptive Image Similarity
(2017) In Master's Theses in Mathematical Sciences FMA820 20171Mathematics (Faculty of Engineering)
- Abstract
- Many people today possess a private digital photo collection. Such collections are often just chronologically sorted. One way to make photo browsing more interesting would be to suggest semantically related photos to a currently viewed photo, and if, in addition, such a relationship could be justified in words, that would create extra value for the user.
To find a solution to this problem, the approach of this thesis is to bridge the domains of images and language by creating vector embeddings for images in an already existing semantic vector space for words. Transfer learning with an image as input and the vector representation of a corresponding human-created caption as sought output is applied to a convolutional neural network... (More) - Many people today possess a private digital photo collection. Such collections are often just chronologically sorted. One way to make photo browsing more interesting would be to suggest semantically related photos to a currently viewed photo, and if, in addition, such a relationship could be justified in words, that would create extra value for the user.
To find a solution to this problem, the approach of this thesis is to bridge the domains of images and language by creating vector embeddings for images in an already existing semantic vector space for words. Transfer learning with an image as input and the vector representation of a corresponding human-created caption as sought output is applied to a convolutional neural network originally trained for object detection. The transfer and training is carried out in the machine learning framework TensorFlow.
The described approach shows promising performance in general and a thorough comparison of different layouts is carried out. The best model is tested, qualitatively as well as quantitatively through a task-specific custom evaluation scheme and on common benchmark datasets. The conclusion based on these results is that the proposed system is well suited for the given tasks, and that it opens up for a number of interesting extensions. (Less) - Popular Abstract (Swedish)
- Många människor har idag en privat digital fotosamling, i molnet eller på en dator. Sådana samlingar är ofta bara kronologiskt sorterade, även om mer intelligenta lösningar som t.ex. ansiktsigenkänning allt oftare används för att skapa smarta strukturer. Ett annat sätt att göra en fotosamling mer dynamisk och intressant skulle kunna vara genom att föreslå semantiskt relaterade foton till det foto användaren tittar på för tillfället. Dessutom, om den semantiska relationen kan beskrivas i ord skulle det göra systemet mer transparent och skapa ytterligare värde för användaren. I detta examensarbete utforskas möjligheterna att för det ovan nämnda ändamålet integrera bild- och ordinbäddningar i en gemensam vektorrymd.
Please use this url to cite or link to this publication:
http://lup.lub.lu.se/student-papers/record/8917105
- author
- Gustafsson, David LU and Lindberg, Tobias LU
- supervisor
-
- Karl Åström LU
- organization
- course
- FMA820 20171
- year
- 2017
- type
- H2 - Master's Degree (Two Years)
- subject
- keywords
- Image Similarity Description, Convolutional Neural Network, Image Retrieval, Word2vec, InceptionV3, Vector Space
- publication/series
- Master's Theses in Mathematical Sciences
- report number
- LUTFMA-3322-2017
- ISSN
- 1404-6342
- other publication id
- 2017:E31
- language
- English
- id
- 8917105
- date added to LUP
- 2017-06-20 15:02:00
- date last changed
- 2017-06-20 15:02:00
@misc{8917105, abstract = {{Many people today possess a private digital photo collection. Such collections are often just chronologically sorted. One way to make photo browsing more interesting would be to suggest semantically related photos to a currently viewed photo, and if, in addition, such a relationship could be justified in words, that would create extra value for the user. To find a solution to this problem, the approach of this thesis is to bridge the domains of images and language by creating vector embeddings for images in an already existing semantic vector space for words. Transfer learning with an image as input and the vector representation of a corresponding human-created caption as sought output is applied to a convolutional neural network originally trained for object detection. The transfer and training is carried out in the machine learning framework TensorFlow. The described approach shows promising performance in general and a thorough comparison of different layouts is carried out. The best model is tested, qualitatively as well as quantitatively through a task-specific custom evaluation scheme and on common benchmark datasets. The conclusion based on these results is that the proposed system is well suited for the given tasks, and that it opens up for a number of interesting extensions.}}, author = {{Gustafsson, David and Lindberg, Tobias}}, issn = {{1404-6342}}, language = {{eng}}, note = {{Student Paper}}, series = {{Master's Theses in Mathematical Sciences}}, title = {{Integration of Image and Word Embeddings for Descriptive Image Similarity}}, year = {{2017}}, }