Cycle Conditioning for Robust Representation Learning from Categorical Data

Tabejamaat, Mohsen; Etminani, Farzaneh; Ohlsson, Mattias

Cycle Conditioning for Robust Representation Learning from Categorical Data

Mark

Tabejamaat, Mohsen ; Etminani, Farzaneh and Ohlsson, Mattias ^LU

(2025) In Transactions on Machine Learning Research 2025-March. p.1-30

Abstract: This paper introduces a novel diffusion-based method for learning representations from categorical data. Conditional diffusion models have demonstrated their potential to extract meaningful representations from input samples. However, they often struggle to yield versatile, multi-task information, limiting their adaptability to unforeseen tasks. To address this, we propose a cycle conditioning approach for diffusion models, designed to capture expressive information from conditioning samples. However, cycle conditioning alone can be insufficient. Diffusion models may ignore conditioning samples that vary across training iterations, an issue that occurs within cycle conditioning. To counter this limitation, we introduce additional... (More); This paper introduces a novel diffusion-based method for learning representations from categorical data. Conditional diffusion models have demonstrated their potential to extract meaningful representations from input samples. However, they often struggle to yield versatile, multi-task information, limiting their adaptability to unforeseen tasks. To address this, we propose a cycle conditioning approach for diffusion models, designed to capture expressive information from conditioning samples. However, cycle conditioning alone can be insufficient. Diffusion models may ignore conditioning samples that vary across training iterations, an issue that occurs within cycle conditioning. To counter this limitation, we introduce additional "spelling" information to guide the conditioning process, ensuring that the conditioning sample remains influential during denoising. While this supervision enhances the generalizability of extracted representations, it is constrained by the sparse nature of spelling information in categorical data, leading to sparse latent conditions. This sparsity reduces the robustness of the extracted representations for downstream tasks or as effective guidance in the diffusion process. To overcome this challenge, we propose a linear navigationstrategy within the latent space of conditioning samples, allowing dense representations to be extracted even with sparse supervision. Our experiments demonstrate that our method achieves at least a 1.42% improvement in AUROC and a 4.12% improvement in AUCPR over the best results from existing state-of-the-art methods.
(Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/478e29fd-1197-40bc-a0a3-f552c478687f

author

Tabejamaat, Mohsen ; Etminani, Farzaneh and Ohlsson, Mattias ^LU

organization

publishing date

2025

type

Contribution to journal

publication status

published

subject

in

Transactions on Machine Learning Research

volume

2025-March

pages

30 pages

external identifiers

scopus:105007972529

ISSN

2835-8856

language

English

LU publication?

yes

additional info

id

478e29fd-1197-40bc-a0a3-f552c478687f

date added to LUP

2026-01-20 16:01:46

date last changed

2026-01-21 08:55:01

@article{478e29fd-1197-40bc-a0a3-f552c478687f,
  abstract     = {{<p>This paper introduces a novel diffusion-based method for learning representations from categorical data. Conditional diffusion models have demonstrated their potential to extract meaningful representations from input samples. However, they often struggle to yield versatile, multi-task information, limiting their adaptability to unforeseen tasks. To address this, we propose a cycle conditioning approach for diffusion models, designed to capture expressive information from conditioning samples. However, cycle conditioning alone can be insufficient. Diffusion models may ignore conditioning samples that vary across training iterations, an issue that occurs within cycle conditioning. To counter this limitation, we introduce additional "spelling" information to guide the conditioning process, ensuring that the conditioning sample remains influential during denoising. While this supervision enhances the generalizability of extracted representations, it is constrained by the sparse nature of spelling information in categorical data, leading to sparse latent conditions. This sparsity reduces the robustness of the extracted representations for downstream tasks or as effective guidance in the diffusion process. To overcome this challenge, we propose a linear navigationstrategy within the latent space of conditioning samples, allowing dense representations to be extracted even with sparse supervision. Our experiments demonstrate that our method achieves at least a 1.42% improvement in AUROC and a 4.12% improvement in AUCPR over the best results from existing state-of-the-art methods.</p>}},
  author       = {{Tabejamaat, Mohsen and Etminani, Farzaneh and Ohlsson, Mattias}},
  issn         = {{2835-8856}},
  language     = {{eng}},
  pages        = {{1--30}},
  series       = {{Transactions on Machine Learning Research}},
  title        = {{Cycle Conditioning for Robust Representation Learning from Categorical Data}},
  volume       = {{2025-March}},
  year         = {{2025}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

Cycle Conditioning for Robust Representation Learning from Categorical Data