Dataset Distillation (DD) compresses large datasets into smaller, synthetic subsets, enabling models trained on them to achieve performance comparable to those trained on the full data. However, these models remain vulnerable to adversarial attacks, limiting their use in safety-critical applications. While adversarial robustness has been extensively studied in related fields, research on improving DD robustness is still limited. To address this, we propose ROME, a novel method that enhances the adversarial RObustness of DD by leveraging the InforMation BottlenEck (IB) principle. ROME includes two components: a performance-aligned term to preserve accuracy and a robustness-aligned term to improve robustness by aligning feature distributions between synthetic and perturbed images. Furthermore, we introduce the Improved Robustness Ratio (I-RR), a refined metric to better evaluate DD robustness. Extensive experiments on CIFAR-10 and CIFAR-100 demonstrate that ROME outperforms existing DD methods in adversarial robustness, achieving maximum I-RR improvements of nearly 40% under white-box attacks and nearly 35% under black-box attacks. Our code is available at https://github.com/zhouzhengqd/ROME.
1. What is Dataset Distillation?
Dataset distillation compresses large datasets into compact synthetic subsets, significantly reducing training time and computation while maintaining model performance. However, most dataset distillation methods are efficient but vulnerable to adversarial attacks, limiting their reliability in safety-critical areas like face recognition, autonomous driving, and object detection.
2. How to enhance the robustness of models?
Adversarial robustness is a key research focus. A common way to improve it is adversarial training, but this method is costly and hard to apply in data-efficient settings like dataset distillation.
3. Existing Challenges
4. Contributions
Figure: ROME framework overview. The method consists of a performance-aligned term and a robustness-aligned term.
We propose ROME, a robust dataset distillation framework that leverages the Information Bottleneck (IB) principle to enhance adversarial robustness. Our method consists of two key components: a performance-aligned term to preserve accuracy and a robustness-aligned term to improve resistance to adversarial attacks.
1. Information Bottleneck Objective
IB aims to find a representation $\mathcal{Z}$ that preserves as much information as possible about the target labels $\mathcal{Y}$, while reducing its dependence on the input $\mathcal{X}$. The IB objective is formulated as:
\[ R_{IB} \equiv \max_{\mathcal{Z}} I(\mathcal{Y};\mathcal{Z})-\beta I(\mathcal{X};\mathcal{Z}), \]
where $I$ denotes the mutual Information and $\beta$ controls the trade-off between $I(\mathcal{Y};\mathcal{Z})$ and $I(\mathcal{X};\mathcal{Z})$.
2. Formulating ROME via information bottleneck
The ROME can be defined as a distillation method that applies the IB principle to dataset distillation. The distilled dataset is generated by minimizing the IB objective while ensuring high accuracy and robustness:
\[ \begin{aligned} &\mathcal{L}_{ROME} = I(\mathcal{Y};\mathcal{Z}) - \beta I(\mathcal{X};\mathcal{Z}|\hat{\mathcal{X}}) \\ &\phantom{\mathcal{L}_{ROME}} \geq \mathbb{E}_{p(x, \hat{x}, y)p(z|x,\hat{x},y)} \left[ \log q(y|z) - \beta \log \frac{p(z|x)}{q(z|\hat{x})} \right] \end{aligned} \]
2. Performance-Aligned Term
The performance-aligned term can also be expressed as follows:
\[ \begin{aligned} &\mathcal{L}_{\text{Perf_Alig}}=\mathbb{E}_{p(x, \hat{x}, y)p(z|x,\hat{x},y)} \left[ \log q(y|z) \right] \\ &\phantom{\mathcal{L}_{\text{Perf_Alig}}} = \mathbb{E}_{p(x, \hat{x}, y)} \left[\mathbb{CE} \left[ y^t, f(x)\right]\right] \end{aligned} \]
where $f(\cdot)$ is a pretrained model robust to adversarial attacks, and $f(x)$ denotes its logits output for input $x$. $y^t$ is the one-hot true label vector, and $\mathbb{CE}$ denotes cross-entropy.
3. Robustness-Aligned Term
The robustness-aligned term can also be expressed as the following lower bound, derived by scaling Pinsker's inequality:
\[ \begin{aligned} &\mathcal{L}_{\text{Rob_Alig}}=\mathbb{E}_{p(x, \hat{x}, y)p(z|x,\hat{x},y)} \left[ \beta \log \frac{p(z|x)}{q(z|\hat{x})}\right] \\ &\phantom{\mathcal{L}_{\text{Rob_Alig}}} = \mathbb{E}_{p(x,\hat{x},y)} \left\Vert \mathbb{E}_{x \sim \mathcal{X}} \left[ e(x) \right] - \mathbb{E}_{\hat{x} \sim \hat{\mathcal{X}}} \left[ e(\hat{x}) \right] \right\Vert^2 \end{aligned} \]
where $\mathcal{X}$ and $\hat{\mathcal{X}}$ are class-aligned sample sets (i.e., $\mathcal{X}$ contains synthetic samples and $\hat{\mathcal{X}}$ perturbed original samples, both partitioned by the label $y$), $p(x,\hat{x},y)$ is the joint distribution, $e(\cdot)$ is the embedding layer output, and $\Vert \cdot \Vert^2$ denotes the squared Total Variation distance.
4. Monte Carlo Approximation
To approximate the expectations in \(\mathcal{L}_{\text{Perf_Alig}}\) and \(\mathcal{L}_{\text{Rob_Alig}}\), we apply Monte Carlo sampling. Specifically, for each class \(c \in \mathcal{C} = \{0, 1, \dots, \mathcal{C}-1\}\), we draw synthetic samples \(x\) and corresponding perturbed original samples \(\hat{x}\) under class \(c\). We then aggregate the sampled pairs across all classes with equal weighting to construct empirical estimates. The performance-aligned term is approximated as:
\[
\mathcal{L}_{\text{Perf_Alig}} = \sum_{c=0}^{\mathcal{C}-1}\frac{1}{\vert \mathcal{X}_c \vert}\sum_{x\in\mathcal{X}_c} \mathbb{CE}\left[y^t_c,f(x)\right]
\]
while the robustness-aligned term is estimated by
where $\mathcal{X}_c$ and $\hat{\mathcal{X}}_c$ are the synthetic and perturbed sample subsets of category $c$, with sizes $\vert \mathcal{X}_c\vert$ and $\vert \hat{\mathcal{X}}_c\vert$, respectively.
5. Overall Objective
The final objective combines both terms:
\[ \mathcal{L}_{\text{TOTAL}} = (1-\alpha)\mathcal{L}_{\text{Perf_Alig}} + \alpha\mathcal{L}_{\text{Rob_Alig}} \]
where the hyperparameter $\alpha$ serves as the weighting factor for the total loss function and is adjustable. By tuning $\alpha$, we can customize the loss function to optimize performance.
We conduct extensive experiments on CIFAR-10 and CIFAR-100 datasets to evaluate the robustness of models trained with ROME against various adversarial attacks. The results demonstrate that ROME significantly enhances model robustness compared to existing dataset distillation methods, achieving substantial improvements in both white-box and black-box attack scenarios.
The visualizations below illustrate the synthetic datasets generated by ROME under different robust prior configurations. These images highlight how varying settings impact the distribution of synthetic data, providing insights into the effectiveness of ROME in generating robust distilled datasets.
@inproceedings{zhou2025rome,
title = {ROME is Forged in Adversity: Robust Distilled Datasets via Information Bottleneck},
author = {Zheng Zhou and Wenquan Feng and Qiaosheng Zhang and Shuchang Lyu and Qi Zhao and Guangliang Cheng},
booktitle = {International Conference on Machine Learning (ICML)},
year = {2025},
}