Cycle Conditioning for Robust Representation Learning from Categorical Data
(2025) In Transactions on Machine Learning Research 2025-March. p.1-30- Abstract
This paper introduces a novel diffusion-based method for learning representations from categorical data. Conditional diffusion models have demonstrated their potential to extract meaningful representations from input samples. However, they often struggle to yield versatile, multi-task information, limiting their adaptability to unforeseen tasks. To address this, we propose a cycle conditioning approach for diffusion models, designed to capture expressive information from conditioning samples. However, cycle conditioning alone can be insufficient. Diffusion models may ignore conditioning samples that vary across training iterations, an issue that occurs within cycle conditioning. To counter this limitation, we introduce additional... (More)
This paper introduces a novel diffusion-based method for learning representations from categorical data. Conditional diffusion models have demonstrated their potential to extract meaningful representations from input samples. However, they often struggle to yield versatile, multi-task information, limiting their adaptability to unforeseen tasks. To address this, we propose a cycle conditioning approach for diffusion models, designed to capture expressive information from conditioning samples. However, cycle conditioning alone can be insufficient. Diffusion models may ignore conditioning samples that vary across training iterations, an issue that occurs within cycle conditioning. To counter this limitation, we introduce additional "spelling" information to guide the conditioning process, ensuring that the conditioning sample remains influential during denoising. While this supervision enhances the generalizability of extracted representations, it is constrained by the sparse nature of spelling information in categorical data, leading to sparse latent conditions. This sparsity reduces the robustness of the extracted representations for downstream tasks or as effective guidance in the diffusion process. To overcome this challenge, we propose a linear navigationstrategy within the latent space of conditioning samples, allowing dense representations to be extracted even with sparse supervision. Our experiments demonstrate that our method achieves at least a 1.42% improvement in AUROC and a 4.12% improvement in AUCPR over the best results from existing state-of-the-art methods.
(Less)
- author
- Tabejamaat, Mohsen
; Etminani, Farzaneh
and Ohlsson, Mattias
LU
- organization
- publishing date
- 2025
- type
- Contribution to journal
- publication status
- published
- subject
- in
- Transactions on Machine Learning Research
- volume
- 2025-March
- pages
- 30 pages
- external identifiers
-
- scopus:105007972529
- ISSN
- 2835-8856
- language
- English
- LU publication?
- yes
- additional info
- Publisher Copyright: © 2025, Transactions on Machine Learning Research. All rights reserved.
- id
- 478e29fd-1197-40bc-a0a3-f552c478687f
- date added to LUP
- 2026-01-20 16:01:46
- date last changed
- 2026-01-21 08:55:01
@article{478e29fd-1197-40bc-a0a3-f552c478687f,
abstract = {{<p>This paper introduces a novel diffusion-based method for learning representations from categorical data. Conditional diffusion models have demonstrated their potential to extract meaningful representations from input samples. However, they often struggle to yield versatile, multi-task information, limiting their adaptability to unforeseen tasks. To address this, we propose a cycle conditioning approach for diffusion models, designed to capture expressive information from conditioning samples. However, cycle conditioning alone can be insufficient. Diffusion models may ignore conditioning samples that vary across training iterations, an issue that occurs within cycle conditioning. To counter this limitation, we introduce additional "spelling" information to guide the conditioning process, ensuring that the conditioning sample remains influential during denoising. While this supervision enhances the generalizability of extracted representations, it is constrained by the sparse nature of spelling information in categorical data, leading to sparse latent conditions. This sparsity reduces the robustness of the extracted representations for downstream tasks or as effective guidance in the diffusion process. To overcome this challenge, we propose a linear navigationstrategy within the latent space of conditioning samples, allowing dense representations to be extracted even with sparse supervision. Our experiments demonstrate that our method achieves at least a 1.42% improvement in AUROC and a 4.12% improvement in AUCPR over the best results from existing state-of-the-art methods.</p>}},
author = {{Tabejamaat, Mohsen and Etminani, Farzaneh and Ohlsson, Mattias}},
issn = {{2835-8856}},
language = {{eng}},
pages = {{1--30}},
series = {{Transactions on Machine Learning Research}},
title = {{Cycle Conditioning for Robust Representation Learning from Categorical Data}},
volume = {{2025-March}},
year = {{2025}},
}