Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Cycle Conditioning for Robust Representation Learning from Categorical Data

Tabejamaat, Mohsen ; Etminani, Farzaneh and Ohlsson, Mattias LU orcid (2025) In Transactions on Machine Learning Research 2025-March. p.1-30
Abstract

This paper introduces a novel diffusion-based method for learning representations from categorical data. Conditional diffusion models have demonstrated their potential to extract meaningful representations from input samples. However, they often struggle to yield versatile, multi-task information, limiting their adaptability to unforeseen tasks. To address this, we propose a cycle conditioning approach for diffusion models, designed to capture expressive information from conditioning samples. However, cycle conditioning alone can be insufficient. Diffusion models may ignore conditioning samples that vary across training iterations, an issue that occurs within cycle conditioning. To counter this limitation, we introduce additional... (More)

This paper introduces a novel diffusion-based method for learning representations from categorical data. Conditional diffusion models have demonstrated their potential to extract meaningful representations from input samples. However, they often struggle to yield versatile, multi-task information, limiting their adaptability to unforeseen tasks. To address this, we propose a cycle conditioning approach for diffusion models, designed to capture expressive information from conditioning samples. However, cycle conditioning alone can be insufficient. Diffusion models may ignore conditioning samples that vary across training iterations, an issue that occurs within cycle conditioning. To counter this limitation, we introduce additional "spelling" information to guide the conditioning process, ensuring that the conditioning sample remains influential during denoising. While this supervision enhances the generalizability of extracted representations, it is constrained by the sparse nature of spelling information in categorical data, leading to sparse latent conditions. This sparsity reduces the robustness of the extracted representations for downstream tasks or as effective guidance in the diffusion process. To overcome this challenge, we propose a linear navigationstrategy within the latent space of conditioning samples, allowing dense representations to be extracted even with sparse supervision. Our experiments demonstrate that our method achieves at least a 1.42% improvement in AUROC and a 4.12% improvement in AUCPR over the best results from existing state-of-the-art methods.

(Less)
Please use this url to cite or link to this publication:
author
; and
organization
publishing date
type
Contribution to journal
publication status
published
subject
in
Transactions on Machine Learning Research
volume
2025-March
pages
30 pages
external identifiers
  • scopus:105007972529
ISSN
2835-8856
language
English
LU publication?
yes
additional info
Publisher Copyright: © 2025, Transactions on Machine Learning Research. All rights reserved.
id
478e29fd-1197-40bc-a0a3-f552c478687f
date added to LUP
2026-01-20 16:01:46
date last changed
2026-01-21 08:55:01
@article{478e29fd-1197-40bc-a0a3-f552c478687f,
  abstract     = {{<p>This paper introduces a novel diffusion-based method for learning representations from categorical data. Conditional diffusion models have demonstrated their potential to extract meaningful representations from input samples. However, they often struggle to yield versatile, multi-task information, limiting their adaptability to unforeseen tasks. To address this, we propose a cycle conditioning approach for diffusion models, designed to capture expressive information from conditioning samples. However, cycle conditioning alone can be insufficient. Diffusion models may ignore conditioning samples that vary across training iterations, an issue that occurs within cycle conditioning. To counter this limitation, we introduce additional "spelling" information to guide the conditioning process, ensuring that the conditioning sample remains influential during denoising. While this supervision enhances the generalizability of extracted representations, it is constrained by the sparse nature of spelling information in categorical data, leading to sparse latent conditions. This sparsity reduces the robustness of the extracted representations for downstream tasks or as effective guidance in the diffusion process. To overcome this challenge, we propose a linear navigationstrategy within the latent space of conditioning samples, allowing dense representations to be extracted even with sparse supervision. Our experiments demonstrate that our method achieves at least a 1.42% improvement in AUROC and a 4.12% improvement in AUCPR over the best results from existing state-of-the-art methods.</p>}},
  author       = {{Tabejamaat, Mohsen and Etminani, Farzaneh and Ohlsson, Mattias}},
  issn         = {{2835-8856}},
  language     = {{eng}},
  pages        = {{1--30}},
  series       = {{Transactions on Machine Learning Research}},
  title        = {{Cycle Conditioning for Robust Representation Learning from Categorical Data}},
  volume       = {{2025-March}},
  year         = {{2025}},
}