Zheng Zhou1 | Hongbo Zhao1 | Guangliang Cheng2 | Xiangtai Li3 | Shuchang Lyu1* | Wenquan Feng1 | Qi Zhao1 |
1Beihang University, 2University of Liverpool, 3Nanyang Technological University |
* Corresponding Author
![]() |
![]() |
![]() |
![]() |
Abstract: Dataset Distillation (DD) aims to condense large datasets into compact synthetic sets that preserve performance on unseen data, thereby reducing storage and training costs. However, most existing methods emphasize empirical performance without solid theoretical grounding, leaving issues such as optimization inefficiency and the lack of theoretical guarantees against suboptimal solutions unresolved. To bridge this gap, we propose the BAyesian optimal CONdensation framework (BACON), the first to incorporate a Bayesian perspective into dataset distillation. BACON offers a principled probabilistic formulation by casting DD as a Bayesian optimization problem, addressing the lack of Bayesian theoretical analysis in prior methods. To characterize the theoretical limit of DD, we derive a numerically tractable lower bound on the expected risk over the joint distribution of latent variables. Under mild assumptions, we obtain an approximate solution for data synthesis, where incorporating prior knowledge improves optimization efficiency through guiding posterior estimation. We evaluate BACON against 18 state-of-the-art methods on four standard image classification datasets under various images-per-class (IPC) settings, where it consistently demonstrates superior performance. For example, under the IPC-10 setting on CIFAR-10, BACON achieves the largest accuracy gain of 17.16% among all methods, outperforming the second-best approach, IDM, by 3.46%, while also reducing both synthesis and training costs. These results underscore the theoretical soundness and practical effectiveness of BACON for dataset distillation. Code and distilled datasets are available at https://github.com/zhouzhengqd/BACON. |
Dataset Distillation (DD) seeks to compress large datasets into compact synthetic subsets without sacrificing model performance. While recent methods have made notable empirical strides, most lack a solid theoretical foundation, often resulting in inefficient optimization and vulnerability to suboptimal solutions. Fundamentally, DD can be viewed as an optimization problem over probability distributions; however, existing techniques rarely adopt a principled probabilistic approach, as illustrated in Figure 1(a). This raises three central research questions:
To address these challenges, we propose the Bayesian Optimal Condensation Framework (BACON) —the first framework to adopt a Bayesian perspective for dataset distillation. BACON formulates condensation as a Bayesian optimization problem that minimizes expected risk. A theoretical lower bound on this risk is derived over the joint distribution of latent variables, revealing fundamental constraints of optimal condensation. For practical deployment, the risk function is approximated under specific assumptions, enabling efficient data synthesis using likelihood estimation and prior knowledge from the original dataset.
As shown in Figure 1(b), BACON not only offers theoretical rigor but also demonstrates strong empirical performance. We benchmark BACON against 18 state-of-the-art methods on four standard image classification datasets (SVHN, CIFAR-10, CIFAR-100, TinyImageNet) under various IPC settings. BACON consistently outperforms gradient-based methods (e.g., DC, DSA, DCC), distribution-based methods (e.g., IDM, DataDAM, IID), and recent advanced methods such as G-VBSM and Teddy. Notably, under the IPC-10 setting on CIFAR-10, BACON achieves a 17.16% accuracy improvement, surpassing the next-best method IDM by 3.46%, while also reducing synthesis and training costs. These results underscore BACON's theoretical soundness and practical superiority in dataset distillation.
Key Benefits of BACON:
As shown in Figure 2, BACON uses a Bayesian-based joint probability model to optimize dataset distillation. It derives a condensation risk function to compute the optimal synthetic dataset and applies approximation methods for efficient solution. The Bayesian optimal condensation risk function is defined in Theorem 3.4, and the risk function’s theoretical lower bound is established in Theorem 3.6. Assumptions on log-likelihood and prior distribution are introduced to approximate solutions and define the training strategy. BACON represents the first theoretical analysis of optimal condensation via Bayesian principles, providing a strong foundation for improved distillation performance.
\[ R(\phi_\theta) = 1- \mathbb{E}_{z_{\tilde{x}} \sim p(z_{\tilde{x}})}\left[\int_{\mathcal{B}(z_{\tilde{x}},\epsilon)}p(z_x| z_{\tilde{x}})dz_x\right]. \]
\[ z_{\tilde{x}}^* = \arg\max_{z_{\tilde{x}} \in \mathcal{D}_S} \int_{\mathcal{B}(z_{\tilde{x}}, \epsilon)} \left[\log p(z_{\tilde{x}} | z_x) + \log p(z_x)\right] dz_x. \]
To estimate the log-likelihood \( \log p(z_{\tilde{x}} | z_{x_i}) \), we make the assumption that \( p(z_{\tilde{x}} | z_{x_i}) \) conforms to a Gaussian distribution. In this distribution, \( \sigma_x^2 \) represents the variance and \( z_{x_i} \) represents the mean. It is denoted as:
\[ p(z_{\tilde{x}} | z_{x_i}) \sim \mathcal{N}(z_{x_i}, \sigma_{xi}^2 I) \]
The Total Variation (TV) and CLIP operation are incorporated as distribution priors to represent \( \log p(z_{x_i}) \). The CLIP operation constrains the probability within the bound of \( [0,1] \). In contrast to their study, we extend the TV from a pixel-wise approach to a distribution-wise approach, which is also referred to as the total variation of probability distribution measures.
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Zheng Zhou, Hongbo Zhao, Guangliang Cheng, Xiangtai Li, Shuchang Lyu, Wenquan Feng, and Qi Zhao BACON: Bayesian Optimal Condensation Framework for Dataset Distillation. In submission, 2024. (hosted on ArXiv) |
We gratefully acknowledge the contributions of the DC-bench and IDM teams, as our code builds upon their work. You can find their repositories here: DC-bench and IDM.
This template was originally created by Phillip Isola and Richard Zhang for their colorful ECCV project. The code can be found here.