Faculty, Staff and Student Publications

Publication Date

1-1-2026

Journal

SN Computer Science

DOI

10.1007/s42979-025-04540-x

PMID

41523798

PMCID

PMC12779700

PubMedCentral® Posted Date

1-7-2026

PubMedCentral® Full Text Version

Post-print

Abstract

Datasets used in machine learning often contain sensitive information, including personally identifiable health and financial details. A common challenge faced by organizations and researchers is the risk of privacy breaches when using real-world data. Synthetic data can be used as an alternative to the real-world data. In existing synthetic data generation techniques, an encoder processes the real-world data to map it into a lower-dimensional latent space. Random sampling is then performed in this latent space. Subsequently, a decoder network is utilized to generate synthetic data from these sampled points in the latent space. Such approaches typically require generating a large number of synthetic samples to approximate the performance of real-world data, subsequently slowing down downstream machine learning tasks. Addressing this, we introduce a combinatorial approach to sampling the latent space, motivated by our empirical findings within this study that most model predictions are largely influenced by interactions between a few features. In some cases, just using a small number of features produces accuracy better than using entire features. Through this approach, we generate samples that utilize t-way interactions among the t latent dimensions out of n. Our experimental results indicate that our approach requires fewer samples than traditional random sampling to achieve comparable model performance for real-world data sets. We also show that when integrated with a differentially private mechanism, our approach incurs a smaller decline in model performance than existing random sampling approach.

Keywords

Synthetic data generation, Combinatorial testing, Variational autoencoder, Differential privacy

Published Open-Access

yes

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.