Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Scalable Optimization of Product Category Embeddings Using Multi-Dimensional Scaling and LLM Embeddings

Stenström, Adam and Cederberg, Nils (2025)
Department of Automatic Control
Abstract
Semantic search using large language models (LLMs) combined with an approximate nearest neighbor (ANN) index is the current state of the art in search technology. Semantic search aims to match a user’s search query based on its contextual meaning and intent with a highly relevant search result, rather than relying solely on matching keywords. While semantic search is superior to keyword-based search systems in capturing user intent, it falls short in its ability to perform advanced filtering.
This paper explores encoding of product category embeddings onto an ndimensional hypersphere to reduce the number of dimensions used to represent each product category embedding. This aims to prevent embedding vectors from growing to an unreasonable... (More)
Semantic search using large language models (LLMs) combined with an approximate nearest neighbor (ANN) index is the current state of the art in search technology. Semantic search aims to match a user’s search query based on its contextual meaning and intent with a highly relevant search result, rather than relying solely on matching keywords. While semantic search is superior to keyword-based search systems in capturing user intent, it falls short in its ability to perform advanced filtering.
This paper explores encoding of product category embeddings onto an ndimensional hypersphere to reduce the number of dimensions used to represent each product category embedding. This aims to prevent embedding vectors from growing to an unreasonable size, which can occur when multiple internal representations of data are concatenated during the querying process.
The goal is to preserve the spatial similarity property of category embeddings, allowing for accurate ranking based on these similarities.
This paper presents suggestions for dimension reduction and optimization methods for placing category embedding on an n-dimensional hypersphere, while retaining neighbors based on semantic similarities or hierarchical distances. It is demonstrated that the dimensions of the embedding vectors can be reduced while optimizing them on the n-dimensional hypersphere, thereby retaining semantically similar neighbors. It is also demonstrated that hierarchical data from category trees can be utilized to optimize category embeddings on the n-dimensional hypersphere, thereby placing categories that are hierarchically close as neighbors. The methods presented are discussed and compared in terms of their performance and scalability, examining how the size of the input dataset and the dimension n, to which the embeddings are reduced, affect performance and time complexity. It is found that Riemannian optimization methods, based on the optimization methods ADAM and SGD, which utilize an initial guess calculated using the dimensionality reduction method PCA, perform the optimization task most effectively. (Less)
Please use this url to cite or link to this publication:
author
Stenström, Adam and Cederberg, Nils
supervisor
organization
year
type
H3 - Professional qualifications (4 Years - )
subject
report number
TFRT-6280
other publication id
0280-5316
language
English
id
9207872
date added to LUP
2025-08-08 15:09:51
date last changed
2025-08-08 15:09:51
@misc{9207872,
  abstract     = {{Semantic search using large language models (LLMs) combined with an approximate nearest neighbor (ANN) index is the current state of the art in search technology. Semantic search aims to match a user’s search query based on its contextual meaning and intent with a highly relevant search result, rather than relying solely on matching keywords. While semantic search is superior to keyword-based search systems in capturing user intent, it falls short in its ability to perform advanced filtering.
 This paper explores encoding of product category embeddings onto an ndimensional hypersphere to reduce the number of dimensions used to represent each product category embedding. This aims to prevent embedding vectors from growing to an unreasonable size, which can occur when multiple internal representations of data are concatenated during the querying process. 
 The goal is to preserve the spatial similarity property of category embeddings, allowing for accurate ranking based on these similarities.
 This paper presents suggestions for dimension reduction and optimization methods for placing category embedding on an n-dimensional hypersphere, while retaining neighbors based on semantic similarities or hierarchical distances. It is demonstrated that the dimensions of the embedding vectors can be reduced while optimizing them on the n-dimensional hypersphere, thereby retaining semantically similar neighbors. It is also demonstrated that hierarchical data from category trees can be utilized to optimize category embeddings on the n-dimensional hypersphere, thereby placing categories that are hierarchically close as neighbors. The methods presented are discussed and compared in terms of their performance and scalability, examining how the size of the input dataset and the dimension n, to which the embeddings are reduced, affect performance and time complexity. It is found that Riemannian optimization methods, based on the optimization methods ADAM and SGD, which utilize an initial guess calculated using the dimensionality reduction method PCA, perform the optimization task most effectively.}},
  author       = {{Stenström, Adam and Cederberg, Nils}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{Scalable Optimization of Product Category Embeddings Using Multi-Dimensional Scaling and LLM Embeddings}},
  year         = {{2025}},
}