ROME (ICML 2025)

Abstract

Dataset Distillation (DD) compresses large datasets into smaller, synthetic subsets, enabling models trained on them to achieve performance comparable to those trained on the full data. However, these models remain vulnerable to adversarial attacks, limiting their use in safety-critical applications. While adversarial robustness has been extensively studied in related fields, research on improving DD robustness is still limited. To address this, we propose ROME, a novel method that enhances the adversarial RObustness of DD by leveraging the InforMation BottlenEck (IB) principle. ROME includes two components: a performance-aligned term to preserve accuracy and a robustness-aligned term to improve robustness by aligning feature distributions between synthetic and perturbed images. Furthermore, we introduce the Improved Robustness Ratio (I-RR), a refined metric to better evaluate DD robustness. Extensive experiments on CIFAR-10 and CIFAR-100 demonstrate that ROME outperforms existing DD methods in adversarial robustness, achieving maximum I-RR improvements of nearly 40% under white-box attacks and nearly 35% under black-box attacks. Our code is available at https://github.com/zhouzhengqd/ROME.

Background & Motivation

1. What is Dataset Distillation?
Dataset distillation compresses large datasets into compact synthetic subsets, significantly reducing training time and computation while maintaining model performance. However, most dataset distillation methods are efficient but vulnerable to adversarial attacks, limiting their reliability in safety-critical areas like face recognition, autonomous driving, and object detection.

2. How to enhance the robustness of models?
Adversarial robustness is a key research focus. A common way to improve it is adversarial training, but this method is costly and hard to apply in data-efficient settings like dataset distillation.

3. Existing Challenges

High retraining cost, making the process computationally expensive.
Robustness–accuracy trade-off, where improving adversarial robustness often reduces clean accuracy.

4. Contributions

We propose ROME, which applies the information bottleneck to dataset distillation and incorporates adversarial perturbations to create robust distilled datasets.
We present two training terms: a performance-aligned term that preserves accuracy and a robustness-aligned term that enhances adversarial robustness.
We introduce I-RR, a refined metric for dataset distillation robustness. Experiments on CIFAR-10 and CIFAR-100 show our method outperforms others in both white-box and black-box attacks.

Method

ROME Framework: Performance-Aligned and Robustness-Aligned Terms

Figure: ROME framework overview. The method consists of a performance-aligned term and a robustness-aligned term.

We propose ROME, a robust dataset distillation framework that leverages the Information Bottleneck (IB) principle to enhance adversarial robustness. Our method consists of two key components: a performance-aligned term to preserve accuracy and a robustness-aligned term to improve resistance to adversarial attacks.

1. Information Bottleneck Objective
IB aims to find a representation $\mathcal{Z}$ that preserves as much information as possible about the target labels $\mathcal{Y}$, while reducing its dependence on the input $\mathcal{X}$. The IB objective is formulated as:

\[ R_{IB} \equiv \max_{\mathcal{Z}} I(\mathcal{Y};\mathcal{Z})-\beta I(\mathcal{X};\mathcal{Z}), \]

where $I$ denotes the mutual Information and $\beta$ controls the trade-off between $I(\mathcal{Y};\mathcal{Z})$ and $I(\mathcal{X};\mathcal{Z})$.

2. Formulating ROME via information bottleneck
The ROME can be defined as a distillation method that applies the IB principle to dataset distillation. The distilled dataset is generated by minimizing the IB objective while ensuring high accuracy and robustness:

\[ \begin{aligned} &\mathcal{L}_{ROME} = I(\mathcal{Y};\mathcal{Z}) - \beta I(\mathcal{X};\mathcal{Z}|\hat{\mathcal{X}}) \\ &\phantom{\mathcal{L}_{ROME}} \geq \mathbb{E}_{p(x, \hat{x}, y)p(z|x,\hat{x},y)} \left[ \log q(y|z) - \beta \log \frac{p(z|x)}{q(z|\hat{x})} \right] \end{aligned} \]

2. Performance-Aligned Term
The performance-aligned term can also be expressed as follows:

\[ \begin{aligned} &\mathcal{L}_{\text{Perf_Alig}}=\mathbb{E}_{p(x, \hat{x}, y)p(z|x,\hat{x},y)} \left[ \log q(y|z) \right] \\ &\phantom{\mathcal{L}_{\text{Perf_Alig}}} = \mathbb{E}_{p(x, \hat{x}, y)} \left[\mathbb{CE} \left[ y^t, f(x)\right]\right] \end{aligned} \]

where $f(\cdot)$ is a pretrained model robust to adversarial attacks, and $f(x)$ denotes its logits output for input $x$. $y^t$ is the one-hot true label vector, and $\mathbb{CE}$ denotes cross-entropy.

3. Robustness-Aligned Term
The robustness-aligned term can also be expressed as the following lower bound, derived by scaling Pinsker's inequality:

\[ \begin{aligned} &\mathcal{L}_{\text{Rob_Alig}}=\mathbb{E}_{p(x, \hat{x}, y)p(z|x,\hat{x},y)} \left[ \beta \log \frac{p(z|x)}{q(z|\hat{x})}\right] \\ &\phantom{\mathcal{L}_{\text{Rob_Alig}}} = \mathbb{E}_{p(x,\hat{x},y)} \left\Vert \mathbb{E}_{x \sim \mathcal{X}} \left[ e(x) \right] - \mathbb{E}_{\hat{x} \sim \hat{\mathcal{X}}} \left[ e(\hat{x}) \right] \right\Vert^2 \end{aligned} \]

where $\mathcal{X}$ and $\hat{\mathcal{X}}$ are class-aligned sample sets (i.e., $\mathcal{X}$ contains synthetic samples and $\hat{\mathcal{X}}$ perturbed original samples, both partitioned by the label $y$), $p(x,\hat{x},y)$ is the joint distribution, $e(\cdot)$ is the embedding layer output, and $\Vert \cdot \Vert^2$ denotes the squared Total Variation distance.

4. Monte Carlo Approximation
To approximate the expectations in $\mathcal{L}_{\text{Perf_Alig}}$ and $\mathcal{L}_{\text{Rob_Alig}}$, we apply Monte Carlo sampling. Specifically, for each class $c \in \mathcal{C} = \{0, 1, \dots, \mathcal{C}-1\}$, we draw synthetic samples $x$ and corresponding perturbed original samples $\hat{x}$ under class $c$. We then aggregate the sampled pairs across all classes with equal weighting to construct empirical estimates. The performance-aligned term is approximated as:

\[ \mathcal{L}_{\text{Perf_Alig}} = \sum_{c=0}^{\mathcal{C}-1}\frac{1}{\vert \mathcal{X}_c \vert}\sum_{x\in\mathcal{X}_c} \mathbb{CE}\left[y^t_c,f(x)\right] \]

while the robustness-aligned term is estimated by

\[ \mathcal{L}_{\text{Rob_Alig}} = \sum_{c=0}^{\mathcal{C}-1} \left\Vert \frac{1}{\vert \mathcal{X}_c\vert}\sum_{x \in \mathcal{X}_c}e(x) - \frac{1}{\vert \hat{\mathcal{X}_c}\vert}\sum_{\hat{x} \in \hat{\mathcal{X}}_c}e(\hat{x}) \right\Vert^2 \]

where $\mathcal{X}_c$ and $\hat{\mathcal{X}}_c$ are the synthetic and perturbed sample subsets of category $c$, with sizes $\vert \mathcal{X}_c\vert$ and $\vert \hat{\mathcal{X}}_c\vert$, respectively.

5. Overall Objective
The final objective combines both terms:

\[ \mathcal{L}_{\text{TOTAL}} = (1-\alpha)\mathcal{L}_{\text{Perf_Alig}} + \alpha\mathcal{L}_{\text{Rob_Alig}} \]

where the hyperparameter $\alpha$ serves as the weighting factor for the total loss function and is adjustable. By tuning $\alpha$, we can customize the loss function to optimize performance.

Experiments

We conduct extensive experiments on CIFAR-10 and CIFAR-100 datasets to evaluate the robustness of models trained with ROME against various adversarial attacks. The results demonstrate that ROME significantly enhances model robustness compared to existing dataset distillation methods, achieving substantial improvements in both white-box and black-box attack scenarios.

Table 1: Comparison of model robustness when trained using various DD methods with IPC settings of {1, 10, 50}, against both white-box targeted and untargeted attacks on the CIFAR-10 and CIFAR-100 datasets. Robustness evaluation metrics include RR and CREI, as well as their improved versions I-RR and I-CREI. The best results between the baseline and proposed methods are bold, while the second-best results are underlined. Improvements in metrics compared to the second-best results are highlighted in red.

Table 2: Comparison of model robustness measured by I-RR for various dataset distillation methods with IPC-50 under targeted and untargeted transfer-based and query-based black-box attacks on CIFAR-10. Best results are in bold, second-best underlined, and improvements over the second-best highlighted in red.

Figure 3: Robustness heatmap of models trained using diverse dataset distillation methods with IPC-50 on CIFAR-10 under targeted and untargeted attacks. The vertical axis represents attacked models, and the horizontal axis shows models used for transfer attacks. Heatmap values represent I-RR, with darker colors indicating higher I-RR and thus better robustness against adversarial attacks.

Table 3: Comparison of adversarial robustness (I-CREI, %) and training time (hours) of ROME and baseline dataset distillation methods on CIFAR-10 (IPC-50) under targeted attacks. "Base" indicates standard distillation training, while "+AdvTrain" refers to the additional time required for adversarial training to improve robustness. Best results, balancing robustness and efficiency, are highlighted in bold, and ^† denotes consistent results from "Base" to "+AdvTrain", indicating no need for adversarial fine-tuning.

Table 4: Ablation studies on the Robust Pretrained Model (RPM) and Adversarial Perturbation (AP) under both targeted and untargeted attacks, evaluated by I-RR and I-CREI on the CIFAR-10 dataset with IPC-50. Best results are highlighted in bold.

Figure 4: Ablation study of the hyperparameter α. (a) Displays the accuracy (y-axis) as a function of α (x-axis) for different values of α, and (b) shows the corresponding visualizations for these values.

Visualization

The visualizations below illustrate the synthetic datasets generated by ROME under different robust prior configurations. These images highlight how varying settings impact the distribution of synthetic data, providing insights into the effectiveness of ROME in generating robust distilled datasets.

Figure 6: Visualization of distilled datasets generated by ROME under different robust prior configurations, showcasing the impact of varying settings on the synthetic data distribution.

Figure 7: Visualizations of distilled datasets generated by diverse DD methods with IPC-50 settings on the CIFAR-10 and CIFAR-100.

Figure 8: Visualizations of distilled datasets generated by diverse DD methods with IPC-10 settings on the CIFAR-10 and CIFAR-100.

Figure 9: Visualizations of distilled datasets generated by diverse DD methods with IPC-1 settings on the CIFAR-10 and CIFAR-100.

BibTeX

@inproceedings{zhou2025rome,
          title     = {ROME is Forged in Adversity: Robust Distilled Datasets via Information Bottleneck},
          author    = {Zheng Zhou and Wenquan Feng and Qiaosheng Zhang and Shuchang Lyu and Qi Zhao and Guangliang Cheng},
          booktitle = {International Conference on Machine Learning (ICML)},
          year      = {2025},
}

R O M E is Forged in Adversity: R Obust Distilled Datasets via InforMation BottlenEck

ROME is a robust dataset distillation method that integrates information bottleneck theory to enhance adversarial resilience without compromising training efficiency. Watch the video to learn more about its motivation, design, and key results.