| 摘要: |
| 目的 针对现有基于生成对抗网络的无数据黑盒攻击方法容易出现收敛缓慢和代价高昂的问题,提出一种
新颖的黑盒迁移攻击方法。 方法 分为两个阶段:训练数据合成与替代模型蒸馏。 在训练数据合成阶段,通过优化
生成器以最大化替代模型与目标模型输出的一致性,同时引入 2 种损失函数来约束生成器产生的数据分布;在替
代模型蒸馏阶段,采用具有可学习参数的残差块设计替代模型,并利用生成器合成的数据来拟合目标模型的决策
边界。 通过交替进行这两个阶段的训练,替代模型可以更好地拟合目标模型的决策边界,进而提升攻击效果。 结果
通过系列实验验证,针对目标模型的无目标黑盒攻击成功率可以达到 70%以上;在 CIFAR100 数据集上,该方法相
较于其他黑盒攻击方法,有目标攻击成功率提高了 2%以上,且在实现相同攻击效果时,所需查询预算更低。 结论
所提方法能够高效拟合目标模型的决策边界,具有较好的攻击效果。 |
| 关键词: 对抗样本 黑盒攻击 迁移攻击 替代模型蒸馏 |
| DOI: |
| 分类号: |
| 基金项目: |
|
| Black-box Transfer Attack Method Based on Substitute Models |
|
ZENG Fanmao, FANG Xianjin
|
|
School of Computer Science and Engineering Anhui University of Science and Technology Anhui Huainan 232001
China
|
| Abstract: |
| Objective To solve the problems of slow convergence and high cost of the existing data-free black-box attack
methods based on generative adversarial networks a novel black-box transfer attack method is proposed. Methods The
method consists of two stages training data synthesis and substitute model distillation. In the stage of training data
synthesis the generator is optimized to maximize the consistency between the outputs of the substitute model and the target
model and two loss functions are introduced to constrain the data distribution generated by the generator. In the stage of
substitute model distillation a substitute model is designed with residual blocks containing learnable parameters and data
synthesized by the generator is used to fit the decision boundary of the target model. By alternating between these two
stages of training the substitute model can better fit the decision boundary of the target model thereby enhancing the
attack effectiveness. Results Through a series of experiments the success rate of non-targeted black-box attacks against
the target model exceeded 70%. On the CIFAR100 dataset compared with other black-box attack methods the success
rate of targeted attacks increased by more than 2% and the required query budget was lower for achieving the same attack
effect. Conclusion The proposed method efficiently fits the decision boundary of the target model and demonstrates good
attack effectiveness. |
| Key words: adversarial examples black-box attack transfer attack substitute model distillation |