Scalable Optimization of Product Category Embeddings Using Multi-Dimensional Scaling and LLM Embeddings

Stenström, Adam; Cederberg, Nils

Scalable Optimization of Product Category Embeddings Using Multi-Dimensional Scaling and LLM Embeddings

Mark

Stenström, Adam and Cederberg, Nils (2025)
Department of Automatic Control

Abstract: Semantic search using large language models (LLMs) combined with an approximate nearest neighbor (ANN) index is the current state of the art in search technology. Semantic search aims to match a user’s search query based on its contextual meaning and intent with a highly relevant search result, rather than relying solely on matching keywords. While semantic search is superior to keyword-based search systems in capturing user intent, it falls short in its ability to perform advanced filtering.
This paper explores encoding of product category embeddings onto an ndimensional hypersphere to reduce the number of dimensions used to represent each product category embedding. This aims to prevent embedding vectors from growing to an unreasonable... (More); Semantic search using large language models (LLMs) combined with an approximate nearest neighbor (ANN) index is the current state of the art in search technology. Semantic search aims to match a user’s search query based on its contextual meaning and intent with a highly relevant search result, rather than relying solely on matching keywords. While semantic search is superior to keyword-based search systems in capturing user intent, it falls short in its ability to perform advanced filtering.
This paper explores encoding of product category embeddings onto an ndimensional hypersphere to reduce the number of dimensions used to represent each product category embedding. This aims to prevent embedding vectors from growing to an unreasonable size, which can occur when multiple internal representations of data are concatenated during the querying process.
The goal is to preserve the spatial similarity property of category embeddings, allowing for accurate ranking based on these similarities.
This paper presents suggestions for dimension reduction and optimization methods for placing category embedding on an n-dimensional hypersphere, while retaining neighbors based on semantic similarities or hierarchical distances. It is demonstrated that the dimensions of the embedding vectors can be reduced while optimizing them on the n-dimensional hypersphere, thereby retaining semantically similar neighbors. It is also demonstrated that hierarchical data from category trees can be utilized to optimize category embeddings on the n-dimensional hypersphere, thereby placing categories that are hierarchically close as neighbors. The methods presented are discussed and compared in terms of their performance and scalability, examining how the size of the input dataset and the dimension n, to which the embeddings are reduced, affect performance and time complexity. It is found that Riemannian optimization methods, based on the optimization methods ADAM and SGD, which utilize an initial guess calculated using the dimensionality reduction method PCA, perform the optimization task most effectively. (Less)

- Open Access
- |
- PDF

Links

Document download statistics

Related Materials

Related object is popular science:
Popular Science summary

Please use this url to cite or link to this publication: http://lup.lub.lu.se/student-papers/record/9207872

author

Stenström, Adam and Cederberg, Nils

supervisor

organization

Department of Automatic Control

year

2025

type

H3 - Professional qualifications (4 Years - )

subject

Technology and Engineering

report number

TFRT-6280

other publication id

0280-5316

language

English

id

9207872

date added to LUP

2025-08-08 15:09:51

date last changed

2025-08-08 15:09:51

@misc{9207872,
  abstract     = {{Semantic search using large language models (LLMs) combined with an approximate nearest neighbor (ANN) index is the current state of the art in search technology. Semantic search aims to match a user’s search query based on its contextual meaning and intent with a highly relevant search result, rather than relying solely on matching keywords. While semantic search is superior to keyword-based search systems in capturing user intent, it falls short in its ability to perform advanced filtering.
 This paper explores encoding of product category embeddings onto an ndimensional hypersphere to reduce the number of dimensions used to represent each product category embedding. This aims to prevent embedding vectors from growing to an unreasonable size, which can occur when multiple internal representations of data are concatenated during the querying process. 
 The goal is to preserve the spatial similarity property of category embeddings, allowing for accurate ranking based on these similarities.
 This paper presents suggestions for dimension reduction and optimization methods for placing category embedding on an n-dimensional hypersphere, while retaining neighbors based on semantic similarities or hierarchical distances. It is demonstrated that the dimensions of the embedding vectors can be reduced while optimizing them on the n-dimensional hypersphere, thereby retaining semantically similar neighbors. It is also demonstrated that hierarchical data from category trees can be utilized to optimize category embeddings on the n-dimensional hypersphere, thereby placing categories that are hierarchically close as neighbors. The methods presented are discussed and compared in terms of their performance and scalability, examining how the size of the input dataset and the dimension n, to which the embeddings are reduced, affect performance and time complexity. It is found that Riemannian optimization methods, based on the optimization methods ADAM and SGD, which utilize an initial guess calculated using the dimensionality reduction method PCA, perform the optimization task most effectively.}},
  author       = {{Stenström, Adam and Cederberg, Nils}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{Scalable Optimization of Product Category Embeddings Using Multi-Dimensional Scaling and LLM Embeddings}},
  year         = {{2025}},
}

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Scalable Optimization of Product Category Embeddings Using Multi-Dimensional Scaling and LLM Embeddings