WO2021164334A1 - 对抗攻击模型的训练方法及装置、对抗图像的产生方法及装置、电子设备及存储介质 - Google Patents

对抗攻击模型的训练方法及装置、对抗图像的产生方法及装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2021164334A1
WO2021164334A1 PCT/CN2020/128009 CN2020128009W WO2021164334A1 WO 2021164334 A1 WO2021164334 A1 WO 2021164334A1 CN 2020128009 W CN2020128009 W CN 2020128009W WO 2021164334 A1 WO2021164334 A1 WO 2021164334A1
Authority
WO
WIPO (PCT)
Prior art keywords
attack
image
training
confrontation
model
Prior art date
Application number
PCT/CN2020/128009
Other languages
English (en)
French (fr)
Inventor
李家琛
吴保元
张勇
樊艳波
李志锋
刘威
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2021164334A1 publication Critical patent/WO2021164334A1/zh
Priority to US17/690,797 priority Critical patent/US20220198790A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/566Dynamic detection, i.e. detection performed at run-time, e.g. emulation, suspicious activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/24Aligning, centring, orientation detection or correction of the image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/28Quantising the image, e.g. histogram thresholding for discrimination between background and foreground patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/776Validation; Performance evaluation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/778Active pattern-learning, e.g. online learning of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1433Vulnerability analysis

Definitions

  • This application relates to the field of artificial intelligence technology, and in particular to a method and device for training a counter-attack model, a method and device for generating a counter-image, electronic equipment, and a storage medium.
  • AI Artificial Intelligence
  • Artificial Intelligence is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge, and use knowledge to obtain the best results.
  • artificial intelligence is a comprehensive technology of computer science, which attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can react in a similar way to human intelligence.
  • Artificial intelligence is the study of the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
  • artificial intelligence technology is being applied to various fields, such as smart home, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, autonomous driving, drones, robots , Smart medical, smart customer service, etc.
  • Artificial intelligence technology is a comprehensive discipline, covering a wide range of fields, including both hardware-level technology and software-level technology.
  • Basic artificial intelligence technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, and mechatronics.
  • Artificial intelligence software technology mainly includes computer vision technology, speech processing technology, natural language processing technology, and machine learning/deep learning.
  • Machine learning specializes in the study of how computers simulate or implement human learning behaviors to acquire new knowledge or skills, and reorganize the existing knowledge structure to continuously improve its own performance.
  • Machine learning is the core of artificial intelligence, the fundamental way to make computers intelligent, and its applications cover all fields of artificial intelligence.
  • various forms of machine learning models have completely changed many areas of artificial intelligence.
  • machine learning models such as Deep Neural Networks (DNN) are now used for many machine vision tasks.
  • DNN Deep Neural Networks
  • deep neural networks Although deep neural networks have good performance, they are extremely vulnerable to adversarial attacks. Adversarial attacks are manifested as small disturbances of the artificial calculation of the input of the attacker to the deep neural network, which causes the deep neural network to produce wrong output, that is, to deceive the deep neural network. Since deep neural networks are vulnerable to adversarial sample attacks, deep neural networks are required to improve their defense capabilities and reduce the possibility of adversarial attack samples deceiving the deep neural network.
  • a method for training a counter attack model which is executed by an electronic device.
  • the counter attack model includes a generator network.
  • the training method includes: using the generator network to generate a countermeasure based on training digital images. Attack image; based on the confrontation attack image, conduct a confrontation attack on the target model, and obtain the confrontation attack result; obtain the physical image corresponding to the training digital image; and, based on the training digital image, the confrontation attack image, Training the generator network with the result of the counter attack and the physical image.
  • a method for generating a confrontation image executed by an electronic device, including: training a confrontation attack model including a generator network to obtain a trained confrontation attack model; and using the trained confrontation attack model
  • the attack model generates a confrontational image based on the input digital image, wherein when training the confrontational attack model, the generator network is used to generate a confrontational attack image based on the training digital image; based on the confrontational attack image, the target
  • the model performs a confrontation attack and obtains a confrontation attack result; acquires a physical image corresponding to the training digital image; based on the training digital image, the confrontation attack image, the confrontation attack result, and the physical image,
  • the generator network is trained.
  • a training device for resisting attack models includes a generator network
  • the training device includes: a generation module for using the generator network to generate a confrontation attack image based on the training digital image; an attack module for performing a target model based on the confrontation attack image Confront the attack and obtain the result of the confrontation attack; an acquisition module for obtaining the physical image corresponding to the training digital image; and a training module for obtaining the result of the confrontation attack based on the training digital image, the confrontation attack image And the physical image to train the generator network.
  • a device for generating a confrontation image including: a first training module for training a confrontation attack model including a generator network to obtain a trained confrontation attack model; a generation module, It is used to use the trained confrontation attack model to generate confrontation images based on the input digital image, wherein when training the confrontation attack model, the generator network is used to generate the confrontation attack image based on the training digital image; The confrontation attack image is used to conduct a confrontation attack on the target model and obtain the confrontation attack result; obtain the physical image corresponding to the training digital image; based on the training digital image, the confrontation attack image, the confrontation attack result and the result The physical image is used to train the generator network.
  • an electronic device including: a processor; and a memory, on which computer-readable instructions are stored, and when the computer-readable instructions are executed by the processor, the above is achieved.
  • a computer-readable storage medium having one or more computer programs stored thereon, wherein when the one or more computer programs are executed by a processor, the anti-attack model described above is implemented.
  • the embodiments of the present application also provide a computer program product or computer program.
  • the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer instruction from the computer-readable storage medium, and the processor executes the computer instruction, so that the computer device executes the aforementioned user device access processing method.
  • FIG. 1 shows a block diagram of an example system to which the training of the anti-attack model according to an embodiment of the present application can be applied;
  • Fig. 2A shows a block diagram of an anti-attack model according to some embodiments of the present application
  • FIG. 2B shows a block diagram of an anti-attack model according to some embodiments of the present application
  • Fig. 3A shows a block diagram of an anti-attack model according to some embodiments of the present application
  • FIG. 3B shows a block diagram of an anti-attack model according to some embodiments of the present application.
  • FIG. 4 shows a method for training an anti-attack model according to some embodiments of the present application, where the anti-attack model includes a generator network and a decider network;
  • FIG. 5 shows a method for training an anti-attack model according to some embodiments of the present application, where the anti-attack model includes a generator network, a decider network, and a geometric transformation module;
  • Fig. 6 shows a method for generating a counter-attack image according to an embodiment of the present application
  • FIG. 7A shows a block diagram of an apparatus for training an anti-attack model according to some embodiments of the present application
  • FIG. 7B shows a block diagram of an apparatus for generating a confrontation image according to some embodiments of the present application.
  • 8A to 8C respectively show the original digital image and some examples of countermeasures generated by the EOT method, the RP2 method, the D2P method, and the method of the present application.
  • FIG. 9 shows a schematic diagram of the distribution of users' answers when using the EOT method, the RP2 method, the D2P method, and the method of the present application to conduct experiments.
  • Fig. 10 shows a block diagram of an electronic device according to an embodiment of the present application.
  • Confrontation attacks are generally divided into two types due to their different areas of action: digital confrontation attacks and physical confrontation attacks.
  • a digital countermeasure attack is a method of directly inputting digital countermeasure samples such as digital images in the digital world (also called digital domain or digital space) into a deep neural network for attack.
  • Physical confrontation attack is a method of attacking deep neural networks through physical confrontation samples in the physical world (also called physical domain or physical space).
  • the difficulty of physical counterattacks is that effective countermeasures in the digital domain (for example, counter images) usually lose the attack effect due to image distortion after the conversion from the digital domain to the physical domain.
  • the conversion from the digital domain to the physical domain has high uncertainty and is difficult to accurately model.
  • the embodiments of the present application provide a confrontation attack model for confronting attacks, a training method for the confrontation attack model, generation of confrontation samples (for example, confrontation images) through the confrontation attack model, and use of the confrontation examples to The method of training the target model.
  • FIG. 1 shows a block diagram of an example system 10 to which the training of an anti-attack model according to an embodiment of the present application can be applied.
  • the system 10 may include a user equipment 110, a server 120 and a training device 130.
  • the user equipment 110, the server 120, and the training device 130 may be communicatively coupled to each other through the network 140.
  • the user equipment 110 may be any type of electronic device, such as a personal computer (for example, a laptop or desktop computer), a mobile device (for example, a smart phone or a tablet computer), a game console, a wearable device, or any other type Electronic equipment.
  • a personal computer for example, a laptop or desktop computer
  • a mobile device for example, a smart phone or a tablet computer
  • a game console for example, a wearable device, or any other type Electronic equipment.
  • the user equipment 110 may include one or more processors 111 and a memory 112.
  • the one or more processors 111 may be any suitable processing devices (for example, processor cores, microprocessors, ASICs, FPGAs, controllers, microcontrollers, etc.), and may be a processor or operably connected Multiple processors.
  • the memory 112 may include one or more non-transitory computer-readable storage media, such as RAM, ROM, EEPROM, EPROM, flash memory devices, magnetic disks, etc., and combinations thereof.
  • the memory 112 may store data and instructions executed by the processor 111 to make the user equipment 110 perform operations.
  • the user equipment 110 may store or include one or more counter attack models.
  • the user equipment 110 may also store or otherwise include one or more target models.
  • the target model may refer to the model to be attacked.
  • the target model may be or may include various machine learning models in other ways, such as neural networks (for example, deep neural networks) or other types of machine learning models (including non-linear models and/or linear models).
  • Neural networks may include feedforward neural networks, recurrent neural networks (e.g., long and short-term memory recurrent neural networks), convolutional neural networks, or other forms of neural networks.
  • one or more adversarial attack models may be received from the server 120 through the network 140, stored in the memory 114 of the user equipment, and then used by the one or more processors 111 or implemented in other ways.
  • the server 120 may include one or more counter attack models.
  • the server 120 communicates with the user equipment 110 according to the client-server relationship.
  • the anti-attack model can be implemented by the server 140 as part of a web service. Therefore, one or more adversarial attack models may be stored and implemented at the user equipment 110, and/or one or more adversarial attack models may be stored and implemented at the server 120.
  • the server 120 includes one or more processors 121 and a memory 122.
  • the one or more processors 121 may be any suitable processing devices (for example, processor cores, microprocessors, ASICs, FPGAs, controllers, microcontrollers, etc.), and may be a processor or operably connected Multiple processors.
  • the memory 122 may include one or more non-transitory computer-readable storage media, such as RAM, ROM, EEPROM, EPROM, flash memory devices, magnetic disks, etc., and combinations thereof.
  • the memory 122 may store data and instructions executed by the processor 121 to cause the server 120 to perform operations.
  • the server 120 may also store or otherwise include one or more target models.
  • the target model may refer to the model to be attacked.
  • the target model may be or may include various machine learning models in other ways, such as neural networks (for example, deep neural networks) or other types of machine learning models (including non-linear models and/or linear models).
  • Neural networks may include feedforward neural networks, recurrent neural networks (for example, long and short-term memory recurrent neural networks), convolutional neural networks, or other forms of neural networks.
  • the user equipment 110 and/or the server 120 may use interactions with the training device 130 communicatively coupled through the network 140 to train the anti-attack model and/or the target model.
  • the training device 130 may be separate from the server 120 or may be a part of the server 120.
  • the training device 130 includes one or more processors 131 and a memory 132.
  • the one or more processors 131 may be any suitable processing devices (for example, processor cores, microprocessors, ASICs, FPGAs, controllers, microcontrollers, etc.), and may be a processor or operably connected Multiple processors.
  • the memory 132 may include one or more non-transitory computer-readable storage media, such as RAM, ROM, EEPROM, EPROM, flash memory devices, magnetic disks, etc., and combinations thereof.
  • the memory 132 may store data and instructions executed by the processor 131 to cause the training device 130 to perform operations.
  • the training device 130 may include a machine learning engine 133.
  • the machine learning engine 133 may use various training techniques or learning techniques to train the anti-attack model and/or target model stored at the user equipment 110 and/or the server 120.
  • the machine learning engine 133 may perform various techniques (eg, weight attenuation, loss, etc.) to improve the generalization ability of the model being trained.
  • the machine learning engine 133 may include one or more machine learning platforms, frameworks, and/or libraries, such as TensorFlow, Caffe/Caffe2, Theano, Torch/PyTorch, MXnet, CNTK, etc.
  • the machine learning engine 133 can implement the training of the anti-attack model and/or target model.
  • Figure 1 shows an example system that can be used to implement the present application.
  • the user equipment 110 may include a machine learning engine and a training data set.
  • the adversarial attack model and/or the target model may be trained and used locally on the user equipment 110, or the adversarial sample may be generated through the trained adversarial attack model.
  • FIG. 2A shows an example of an anti-attack model 20 according to some embodiments of the present application.
  • FIG. 2B shows an example of the anti-attack model 20 including a certain digital image sample.
  • the anti-attack model 20 may include a generator network 201 and a decider network 202.
  • the adversarial attack model 20 is trained using training samples.
  • the training sample may be a digital image sample (referred to as a training digital image).
  • the generator network 201 and the arbiter network 202 may include various types of machine learning models.
  • Machine learning models can include linear models and non-linear models.
  • machine learning models may include regression models, support vector machines, decision tree-based models, Bayesian models, and/or neural networks (eg, deep neural networks).
  • the neural network may include a feedforward neural network, a recurrent neural network (for example, a long and short-term memory recurrent neural network), a convolutional neural network, or other forms of neural networks.
  • the generator network and the arbiter network are referred to as "networks”, but the generator network and the arbiter network are not necessarily limited to neural networks, but can also include other forms of machine learning models. .
  • the generator network 201 and the arbiter network 202 form a Generative Adversarial Network (GAN).
  • GAN Generative Adversarial Network
  • the generator network 201 may generate a counter-attack image based on the training digital image, and the generated counter-attack image may be output to the arbiter network 202 and the target model 21.
  • the target model 21 may refer to a model to be attacked against.
  • the arbiter network 202 may generate a discrimination result based on the physical image and the counter attack image generated by the generator network 201.
  • the physical image can be obtained by performing the conversion from the physical domain to the digital domain on the training digital image.
  • FIG. 2B shows an example form of training digital image to physical image conversion.
  • Performing the conversion from the physical domain to the digital domain on the training digital image may include one of the following: after printing and scanning the training digital image, the physical image is obtained; or, after the training digital image is printed and photographed, the physical image is obtained.
  • image For example, the training digital image can be printed by a printer and the printed image can be scanned by a scanner to obtain a physical image. Alternatively, the training digital image may be printed by a printer and the printed image may be photographed by a camera to obtain a physical image.
  • the training digital image can be mapped to the physical domain at a ratio of 1:1.
  • the first objective function used to train the generator network 201 can be expressed as:
  • the counter attack image generated by the generator network 201 also needs to be close enough to the physical image without noise, so that the arbiter network 202 can be deceived.
  • the arbiter network 202 deceive the arbiter network 202 with the requirements of GAN. Therefore, the second objective function used to train the arbiter network 202 can be expressed as:
  • G( ⁇ ) represents the generator network
  • D( ⁇ ) represents the arbiter network
  • x represents the training digital image input to the generator network
  • x p represents the physical image input to the arbiter network.
  • the function can indicate that the decision loss needs to be maximized when updating D, and the decision loss needs to be minimized when updating G.
  • judgment loss It can be obtained by referring to the GAN model, but this application is not limited to this, and various judgment losses can be used.
  • the confrontation attack model 20 can be trained to obtain the variables of the generator network 201 and the arbiter network 202 based on the above-mentioned confrontation attack loss and decision loss.
  • the image quality of the counter-attack image generated by the trained counter-attack model can be significantly improved, so that the countermeasure Images can be used for effective attacks or for effective training of target models.
  • the image generated by the generator network during the training of the confrontation attack model is referred to as "antagonistic attack image”, and the image generated by the trained confrontation attack model is called As “confrontational image”.
  • the influence of noise on the physical image can be limited through the arbiter network.
  • the counter-attack model 20 can be jointly optimized through the conversion process of digital images to physical images and the generation process of counter-attack images.
  • the adversarial attack model 20 can be used in a universal physical attack.
  • the training digital image may include a plurality of different digital images obtained by subjecting the original image to different random crops.
  • Corresponding multiple physical images can be obtained by performing the conversion from the physical domain to the digital domain on multiple different digital images.
  • the multiple digital images and multiple physical images form multiple sets of digital images and physical images.
  • Each of the multiple sets of digital images and physical images is used as the input of the anti-attack model 20 for training, wherein the digital images in each set of digital images and physical images are used as training digital images, and the physical images are used as training digital images.
  • the physical image corresponding to the digital image.
  • the adversarial attack model 20 can be used to attack other different input images.
  • the adversarial attack model can learn a more widely applicable anti-noise model.
  • FIG. 3A shows an example of an anti-attack model 30 according to some embodiments of the present application.
  • FIG. 3B shows an example of an anti-attack model 30 including a certain digital image sample.
  • the anti-attack model 30 may include a generator network 301, a decider network 302, and a geometric transformation module 303.
  • the generator network 301 may generate a counter-attack image based on the training digital image, and the generated counter-attack image may be output to the arbiter network 302 and the geometric transformation module 303.
  • the target model 31 may refer to a model to be attacked against.
  • FIG. 3B shows an example form of geometric transformation of the counter attack image.
  • the geometric transformation module 303 may be configured to perform geometric transformation on the anti-attack image generated by the generator network 301.
  • the geometric transformation may include affine transformation.
  • the geometric transformation may include at least one of translation, scaling, flipping, rotation, and shearing. In this way, the target model 31 can be counter-attacked using the counter-attack image after geometric transformation.
  • the confrontational attack image generated by the generator network 301 needs to deceive the target model 31.
  • an EOT (expectation over transformation) method can be used to conduct adversarial attacks.
  • the first objective function used to train the generator network 301 can be expressed as:
  • the generated counter attack image generated by the generator network 301 also needs to be close enough to the physical image without noise, so that the arbiter network 302 can be deceived.
  • the second objective function used to train the arbiter network 302 can be expressed as:
  • G( ⁇ ) represents the generator network
  • D( ⁇ ) represents the arbiter network
  • x represents the training digital image input to the generator network 301
  • x p represents the physical image input to the arbiter network .
  • the function can indicate that the decision loss needs to be maximized when updating D, and the decision loss needs to be minimized when updating G.
  • the final objective function can be obtained as:
  • is a weight coefficient (called attack weight).
  • attack weight may be a pre-defined hyperparameter.
  • the attack weight can range from 5 to 20.
  • the anti-attack model 30 including the generator network 301 and the arbiter network 302 can be trained based on the above objective function to obtain the variables of the generator network 301 and the arbiter network 302.
  • the image quality of the counter-attack image generated by the trained counter-attack model can be significantly improved, so that the countermeasure Images can be used for effective attacks or for effective training of target models.
  • the influence of noise on the physical image can be limited through the arbiter network, and the conversion process from digital image to physical image and the generation process of the anti-attack image can be jointly optimized.
  • the attack effect can be stabilized in the case of geometric transformation, thereby improving the robustness of the confrontation attack.
  • the adversarial attack model 30 can be used in a universal physical attack.
  • the training digital image may include a plurality of different digital images obtained by subjecting the original image to different random crops.
  • Corresponding multiple physical images can be obtained by performing the conversion from the physical domain to the digital domain on multiple different digital images.
  • the multiple digital images and multiple physical images form multiple sets of digital images and physical images.
  • Each of the multiple sets of digital images and physical images is used as the input of the anti-attack model 30 for training, wherein the digital images in each set of digital images and physical images are used as training digital images, and the physical images are used as training digital images.
  • the physical image corresponding to the digital image.
  • the adversarial attack model 30 can be used to attack other different input images.
  • the adversarial attack model can learn a more widely applicable anti-noise model.
  • FIGS. 2A and 2B and FIGS. 3A and 3B Some examples of anti-attack models according to some embodiments of the present application are described above in conjunction with FIGS. 2A and 2B and FIGS. 3A and 3B.
  • a method for training an adversarial attack model according to some embodiments of the present application will be described in conjunction with FIG. 4 and FIG. 5.
  • Fig. 4 shows a method 40 for training an anti-attack model according to some embodiments of the present application, where the anti-attack model includes a generator network and a decider network.
  • this method can be used to train the adversarial attack model 20 shown in FIG. 2A or FIG. 2B.
  • step S41 a generator network is used to generate an anti-attack image based on the training digital image.
  • the counter attack image is generated according to the machine learning model in the generator network.
  • step S43 use the confrontation attack image to conduct a confrontation attack on the target model, and obtain the confrontation attack result.
  • the result of the counter attack may be the recognition result or the classification result output by the target model.
  • step S45 the physical image corresponding to the training digital image is obtained.
  • obtaining the physical image corresponding to the training digital image may include one of the following: after printing and scanning the training digital image (print-scan), obtaining the physical image; or printing and photographing the training digital image (print-scan) After shooting), the physical image is acquired.
  • step S45 may include directly receiving or reading the physical image corresponding to the training digital image, wherein the physical image is determined using the example method described above.
  • the physical image corresponding to the training digital image can be determined in advance.
  • step S45 is shown after steps S41 and S43, the present application is not limited to this.
  • step S45 may be executed before step S41 or S43, or executed in parallel.
  • step S47 the generator network is trained based on the training digital image, the counter attack image, the counter attack result, and the physical image.
  • the adversarial attack model further includes a decider network
  • step S47 may include: obtaining a target label corresponding to the training digital image; determining the adversarial attack loss based on the target label and the result of the adversarial attack, and based on the adversarial attack loss, Train the generator network; use the judge network to make image judgments based on the counter attack image and physical images to determine the judgment loss; based on the counter attack loss and judgment loss, joint training of the generator network and the judge network.
  • the joint training of the generator network and the arbiter network includes: using the counter attack loss and the decision loss to construct a target loss; based on the target loss, Joint training is performed on the generator network and the arbiter network.
  • using the counter attack loss and the decision loss to construct a target loss includes: constructing a first objective function according to the counter attack loss; constructing a second objective function according to the decision loss; constructing a second objective function according to the first target Function and the second objective function to determine the final objective function.
  • performing joint training on the generator network and the decider network includes: training the generator network and the decider network in combination based on the final objective function.
  • the loss of the confrontation attack may be determined as in, Represents the adversarial attack loss of an adversarial attack on the target model, f( ⁇ ) represents the target model, G( ⁇ ) represents the generator network, x represents the input training digital image, and y represents the target label set relative to the label of the training digital image .
  • Judgment loss can be determined as in, Represents the decision loss of the arbiter network, G( ⁇ ) represents the generator network, D( ⁇ ) represents the arbiter network, x represents the training digital image input to the generator network, and x p represents the physical image input to the arbiter network.
  • the first objective function can be determined as
  • the second objective function can be determined as
  • the final objective function may be determined based on the first objective function and the second objective function.
  • the final objective function can be determined as: Where ⁇ is the pre-defined attack weight.
  • the joint training of the generator network and the decider network may include: training the generator network and the decider network based on the first objective function and the second objective function.
  • the joint training of the generator network and the arbiter network may include: concurrently training the generator network and the arbiter network in parallel, wherein the generator network is trained based on the first objective function and the second objective function, and based on The second objective function is to train the arbiter network.
  • the method for training an adversarial attack model described with reference to FIG. 4 may be implemented in at least one of the user equipment 110, the server 120, the training device 130, and the machine learning engine 133 in FIG. 1, for example.
  • FIG. 5 shows a method 50 for training an anti-attack model according to some embodiments of the present application, where the anti-attack model includes a generator network, a decider network, and a geometric transformation module.
  • this method can be used to train the adversarial attack model 30 shown in FIG. 3A or FIG. 3B.
  • step S51 a generator network is used to generate an anti-attack image based on the training digital image.
  • the training digital image is input to the generator network to generate the counterattack image.
  • step S53 geometrically transform the counter-attack image to obtain a geometrically-transformed counter-attack image.
  • step S53 geometric transformation is performed on the counter attack image generated by the generator network through the geometric transformation module.
  • the geometric transformation may be an affine transformation.
  • the affine transformation may include at least one of translation, scaling, flipping, rotation, and shearing.
  • the geometrically transformed counterattack image can be used to counterattack the target model.
  • An example of geometric transformation is described below.
  • a point p(p x , p y ) on the counter-attack image its homogeneous coordinate form is p(p x , p y , 1), and the geometric transformation is represented by the homogeneous geometric transformation matrix A, then the point p(p x , p y )
  • the coordinates (p x ′, p y ′) after geometric transformation satisfy:
  • a 1 to a 6 are the parameters of geometric transformation, which reflect the geometric transformation such as rotation and scaling of the anti-attack image.
  • the parameters of the geometric transformation can be predefined values.
  • the geometric transformation parameters can be set according to different transformation requirements.
  • step S55 use the geometrically transformed counterattack image to conduct a counterattack attack on the target model to obtain a counterattack result.
  • the result of the counter attack may be the recognition result or the classification result output by the target model.
  • step S57 the physical image corresponding to the training digital image is obtained.
  • obtaining the physical image corresponding to the training digital image may include one of the following: printing and scanning the training digital image to obtain the physical image; or printing and shooting the training digital image to obtain the physical image.
  • step S57 may include directly receiving or reading a physical image corresponding to the training digital image, wherein the physical image is determined using the example method described above.
  • the physical image corresponding to the training digital image can be determined in advance.
  • step S57 is shown after steps S51, S53, and S55, the present application is not limited to this.
  • step S57 may be executed before one of steps S51, S53, and S55, or executed in parallel.
  • step S59 the generator network and the arbiter network are trained based on the training digital image, the counter attack image, the counter attack result, and the physical image.
  • the method for training an adversarial attack model described with reference to FIG. 5 may be implemented in at least one of the user equipment 110, the server 120, and the machine learning engine 133 in FIG. 1, for example.
  • the anti-attack model and the training method thereof according to the embodiments of the present application are described above.
  • the method of generating a counter-attack image will be described below.
  • Fig. 6 shows a method for generating a confrontation image according to an embodiment of the present application.
  • the image generated by the generator network during the training of the counter-attack model is referred to as the "antagonistic attack image”, and the image generated by the trained counter-attack model is referred to as " Confronting the image”.
  • step S61 a confrontation attack model including a generator network is trained to obtain a trained confrontation attack model.
  • step S63 the trained anti-attack model is used to generate an anti-attack image based on the input digital image.
  • the input digital image can be the same or different from the training digital image.
  • the adversarial attack model may be the adversarial attack model 20 described with reference to FIG. 2A or FIG. 2B.
  • step S61 may include training the adversarial attack model by the method described with reference to FIG. 4 to obtain a trained adversarial attack model.
  • the adversarial attack model may be the adversarial attack model 30 described with reference to FIG. 3A or FIG. 3B.
  • step S61 may include training the adversarial attack model by the method described with reference to FIG. 5 to obtain a trained adversarial attack model.
  • Step S63 may include: using a generator network to generate a confrontation image based on the input digital image.
  • step S63 may include: using a generator network to generate a first confrontation image based on the input digital image; and performing geometric transformation on the first confrontation image to obtain a geometrically transformed second confrontation image, and combining the second confrontation image The image serves as the confrontation image.
  • the generated confrontation image can be used to conduct a confrontation attack on the target model to deceive the target model.
  • the generated confrontation image can be used to train the target model to defend against the confrontation attack using the confrontation image.
  • the target model can be attacked to determine the stability of the target model.
  • the generated confrontation images can also be used to train the target model to improve the target model's defense against such confrontation attacks.
  • each block in the flowchart or block diagram may represent a module, segment, or code portion including at least one executable instruction for realizing a specified logical function.
  • the functions mentioned in the blocks may occur out of the order indicated in the drawings. For example, depending on the functions involved, two blocks shown in succession may actually be executed substantially simultaneously, or the blocks may sometimes be executed in the reverse order.
  • each block of the block diagram and/or flowchart and the combination of blocks in the block diagram and/or flowchart can be implemented by a dedicated hardware-based system that performs specified functions or actions, or a combination of dedicated hardware and computer instructions. accomplish.
  • FIG. 7A shows a block diagram of a training device 70 for a counter-attack model according to an embodiment of the present application.
  • the counter-attack model includes a generator network.
  • the training device 70 can be used to train the various anti-attack models described above.
  • the training device 70 of the anti-attack model may include:
  • the generating module 701 is configured to use the generator network to generate a counter attack image based on the training digital image;
  • the attack module 702 is configured to perform a confrontation attack on the target model based on the confrontation attack image, and obtain a confrontation attack result;
  • the obtaining module 703 is used to obtain the physical image corresponding to the training digital image.
  • the training module 704 is configured to train the generator network based on the training digital image, the counter attack image, the counter attack result, and the physical image.
  • the attack module 702 is used to perform geometric transformation on the confrontation attack image to obtain a geometrically transformed confrontation attack image; and use the geometrically transformed confrontation attack image to conduct a confrontation attack on the target model to obtain a confrontation attack result .
  • the adversarial attack model further includes a decider network, where the training module 704 is used to obtain a target label corresponding to the training digital image; based on the target label and the result of the adversarial attack, determine the adversarial Attack loss, and train the generator network based on the counter-attack loss; use the arbiter network to perform image judgment based on the counter-attack image and the physical image to determine the judgment loss; based on the counter-attack For the attack loss and the decision loss, joint training is performed on the generator network and the arbiter network.
  • the training module 704 is used to obtain a target label corresponding to the training digital image; based on the target label and the result of the adversarial attack, determine the adversarial Attack loss, and train the generator network based on the counter-attack loss; use the arbiter network to perform image judgment based on the counter-attack image and the physical image to determine the judgment loss; based on the counter-attack For the attack loss and the decision loss, joint training is performed on the generator network and the arbiter network.
  • the training device 70 of the anti-attack model can be implemented as at least one of the user device 110, the server 120, the training device 130, and the machine learning engine 133 in FIG.
  • FIG. 7B shows a device 71 for generating a confrontation image according to an embodiment of the present application.
  • the generating device 71 includes:
  • the first training module 711 is used to train the confrontation attack model including the generator network to obtain the trained confrontation attack model;
  • the generating module 712 is used to generate a confrontational attack based on the input digital image by using the trained confrontational attack model.
  • the generator network is used to generate a confrontational attack based on the training digital image. Image; based on the confrontation attack image, conduct a confrontation attack on the target model, and obtain the confrontation attack result; obtain the physical image corresponding to the training digital image; based on the training digital image, the confrontation attack image, and the confrontation The attack result and the physical image train the generator network.
  • the device 71 further includes:
  • the second training module 713 is used to train the target model using the confrontation image to defend against the confrontation attack using the confrontation image.
  • the confrontation image generation device 71 may be implemented in at least one of the user equipment 110, the server 120, the training device 130, and the machine learning engine 133 in FIG. 1.
  • the following describes experiments based on the confrontation attack model and its training method according to some embodiments of the present application to illustrate some effects of the confrontation attack model through the confrontation attack model.
  • the confrontation attack model described with reference to FIG. 3A or FIG. 3B is used, and the training method described in FIG. 5 is used to train the model.
  • the counter-attack model in FIG. 3A or FIG. 3B and the training method in FIG. 5 are used for experiments, the same or similar effects can also be obtained by using other embodiments of the present application.
  • the target model is a VGG-16 model pre-trained on ImageNet.
  • the data set used in this experiment is 100 digital images of different categories randomly selected on ImageNet.
  • Each digital image targets two different tag attacks. These two different tags (ie, target tags) are respectively determined as the original tag +100 and the original tag -100 of the image. For example, for an image with a label of 14, it will be used for two attacks, and the target labels are 914 and 114).
  • a total of 200 attacks on the target model are carried out.
  • This experiment uses the adversarial attack model described with reference to FIG. 3 for training and generates adversarial images (also known as adversarial samples) for adversarial attacks.
  • the generator network of the anti-attack model includes 3 convolutional layers, 6 residual blocks, and 2 deconvolutional layers, and the arbiter network includes 5 convolutional layers.
  • the scale of the geometric transformation module in the counter attack model ranges from 0.7 to 1.3, and the rotation angle ranges from -30° to 30°.
  • the geometric transformation matrix A′ after adding random noise can be expressed as:
  • the anti-attack image generated by the generator network and before the geometric transformation is added, for example, Gaussian random noise with an intensity of 0.1 to improve the stability of the anti-attack model against color changes.
  • Training the anti-attack model mainly includes: for each original digital image, printing the original digital image and scanning the image obtained by printing to obtain the corresponding physical image, and normalizing the physical image Converted to a pixel size of 288*288; the original digital image and physical image are randomly cropped to generate 50 sets of digital images and physical images.
  • the pixel size of each group of digital and physical images is 256*256 and the cutting methods are the same ; Use these 50 sets of digital images and physical images for training.
  • each time a set of digital images and physical images are input into the generator network and the arbiter network in the counter attack model.
  • the images generated by the generator network are converted by the geometric transformation module and then attack the target model. Training will be completed after epoch. After completing the training, the original digital image is input to the generator network, and the output of the generator network is the final counter image used for the attack.
  • the EOT method, the RP2 (Robust Physical Perturbations) method, and the D2P (digital domain to physical domain mapping) method are compared with the method of the present application.
  • the attack success rate attack success rate, ASR
  • ASR attack success rate
  • the attack success rate and corresponding confidence of various methods in the digital and physical domains are as follows: Table 1 shows.
  • the PGD (Project Gradient Descent) method which is a digital domain attack, is used as a reference.
  • the other three physical domain attack methods (EOT method, RP2 method, and D2P method) are also optimized using the PGD method.
  • the three physical domain attack methods (EOT method, RP2 method, and D2P method) use noise intensity limited to 30 (for RGB images with intensity values ranging from 0 to 255).
  • digital domain attack refers to the use of generated confrontation samples to conduct confrontation attacks
  • physical domain attack refers to the use of confrontation attacks on images obtained by printing the confrontation samples and scanning the printed images. It can be seen that the attack success rate and confidence of the method of the present application in the digital domain and the physical domain are significantly higher than other methods.
  • Table 2 shows the stability of adversarial samples generated by different methods to geometric transformations in the physical domain.
  • the attack effect is obtained by printing and scanning the adversarial samples, and then undergoing scale transformation, rotation transformation and affine transformation.
  • the results show that the anti-sample attack effect generated by the method of this application is the most stable, and its attack success rate (66.0%) is 11.2% higher than the highest (54.8%) among other methods. It is worth noting that the average attack success rate of the adversarial samples generated by the method of the present application after various geometric transformation processing in Table 2 is higher than the success rate of the adversarial samples without any transformation processing.
  • the method of acquiring a physical image includes printing-scanning a digital image or printing-photographing a digital image.
  • the images obtained by scanning and shooting are obvious differences between the images obtained by scanning and shooting. For example, shooting is more susceptible to complex external conditions such as lighting and lens distortion.
  • the method of acquiring physical images was changed from printing-scanning to printing-shooting.
  • the attack success rate of the method of the present application is still higher than other comparison methods by more than 10%.
  • FIGS. 8A-8C examples of original digital images and some adversarial samples generated using the EOT method, the RP2 method, the D2P method, and the method of the present application are respectively shown.
  • the user selects the image with the least distortion and the most natural look.
  • a total of 106 users participated in the test. Since users were not required to make a choice in each question, a total of 10237 answers were received. The distribution of the final answers is shown in Table 5 and Figure 9.
  • Fig. 10 shows a block diagram of an electronic device according to an embodiment of the present application.
  • the electronic device 100 may include one or more processors 1001 and a memory 1002.
  • the memory 1002 may be used to store one or more computer programs.
  • the processor 1001 may include various processing circuits, such as but not limited to one or more of a dedicated processor, a central processing unit, an application processor, or a communication processor.
  • the processor 1001 may perform control on at least one other component of the electronic device 100, and/or perform communication-related operations or data processing.
  • the memory 1002 may include volatile and/or non-volatile memory.
  • processors 1001 when one or more computer programs are executed by one or more processors 1001, one or more processors 1001 are caused to implement the method of the present application as described above.
  • the electronic device 100 may be implemented as at least one of the user equipment 110, the server 120, the training device 130, and the machine learning engine 133 in FIG. 1.
  • the electronic device 100 in the embodiments of the present application may include devices such as smart phones, tablet personal computers (PCs), servers, mobile phones, video phones, e-book readers, desktop PCs, laptop computers, netbook computers, and personal computers.
  • PDA digital assistants
  • PMP portable multimedia players
  • MP3 players portable multimedia players
  • mobile medical devices cameras or wearable devices (such as head-mounted devices (HMD), electronic clothes, electronic bracelets, electronic necklaces, electronic accessories, Electronic tattoos or smart watches) etc.
  • HMD head-mounted devices
  • module may include a unit configured in hardware, software, or firmware, and/or any combination thereof, and may be used interchangeably with other terms (for example, logic, logic block, component, or circuit).
  • a module may be a single integral component or the smallest unit or component that performs one or more functions.
  • the module can be implemented mechanically or electronically, and can include, but is not limited to, known or to be developed dedicated processors, CPUs, application-specific integrated circuits (ASIC) chips, field programmable gate arrays (FPGAs) to perform certain operations. ) Or programmable logic device.
  • a part of a device e.g., a module or its function
  • a method e.g., operation or step
  • a computer-readable storage medium e.g., the memory 112 in the form of a program module.
  • Memory 114, memory 122, memory 132, or memory 1002 When the instruction is executed by a processor (for example, the processor 111, the processor 121, the processor 131, or the processor 1001), the instruction may enable the processor to perform a corresponding function.
  • the computer-readable medium may include, for example, a hard disk, a floppy disk, a magnetic medium, an optical recording medium, a DVD, and a magneto-optical medium.
  • the instruction may include code created by the compiler or code executable by the interpreter.
  • the module or programming module according to various embodiments of the present application may include at least one or more of the above-mentioned components, some of them may be omitted, or other additional components may also be included.
  • the operations performed by the modules, programming modules, or other components according to various embodiments of the present application may be performed sequentially, in parallel, repeatedly, or heuristically, or at least some operations may be performed in a different order or omitted, or You can add other operations.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Molecular Biology (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Virology (AREA)
  • Image Analysis (AREA)

Abstract

一种对抗攻击模型的训练方法及装置、对抗图像的产生方法及装置、电子设备及存储介质。对抗攻击模型包括生成器网络(201,301),训练方法包括:利用所述生成器网络(201,301),基于训练数字图像,产生对抗攻击图像(S41);基于所述对抗攻击图像,对目标模型进行对抗攻击,并且获得对抗攻击结果(S43);获得所述训练数字图像所对应的物理图像(S45);及,基于所述训练数字图像、所述对抗攻击图像、所述对抗攻击结果和所述物理图像,对所述生成器网络进行训练(S47)。

Description

对抗攻击模型的训练方法及装置、对抗图像的产生方法及装置、电子设备及存储介质
本申请要求于2020年2月21日提交中国专利局、申请号为202010107342.9、申请名称为“对抗攻击模型的训练方法及装置”的中国专利申请的优先权。
技术领域
本申请涉及人工智能技术领域,尤其涉及一种对抗攻击模型的训练方法及装置、对抗图像的产生方法及装置、电子设备及存储介质。
发明背景
人工智能(Artificial Intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说,人工智能是计算机科学的一个综合技术,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式做出反应的智能机器。人工智能是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。
随着人工智能技术研究和进步,人工智能技术正在应用于各种不同的领域,诸如智能家居、智能穿戴设备、虚拟助理、智能音箱、智能营销、无人驾驶、自动驾驶、无人机、机器人、智能医疗、智能客服等。
人工智能技术是一门综合学科,涉及领域广泛,既有硬件层面的技术也有软件层面的技术。人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、语音处理技术、自然语言处理技术以及机器学习(Machine Learning)/深度学习(Deep Learning)等几大方向。
机器学习专门研究计算机怎样模拟或实现人类的学习行为,以获取新的知识或技能,重新组织已有的知识结构使之不断改善自身的性能。机器学习是人工智能的核心,是使计算机具有智能的根本途径,其应用遍及人工智能的各个领域。目前,各种形式的机器学习模型已经彻底改变了人工智能的许多领域。例如,诸如深度神经网络(Deep Neural Networks,DNN)的机器学习模型现在被用于许多机器视觉任务。
尽管深度神经网络有很好的表现,但它极易受到对抗攻击(Adversarial Attack)。对抗攻击表现为攻击者向深度神经网络的输入人为计算的微小扰动,从而使得深度神经网络产生错误的输出,即欺骗深度神经网络。由于深度神经网络易受对抗样本攻击,也就要求深度神经网络提升防御能力,降低对抗攻击样本欺骗深度神经网络的可能性。
因此,目前正在研究对机器学习模型的对抗攻击以及相应的防御。
发明内容
根据本申请的一个方面,提供了一种对抗攻击模型的训练方法,由电子设备执行,对抗攻击模型包括生成器网络,该训练方法包括:利用所述生成器网络,基于训练数字图像,产生对抗攻击图像;基于所述对抗攻击图像,对目标模型进行对抗攻击,并且获得对抗攻击结果;获得所述训练数字图像所对应的物理图像;及,基于所述训练数字图像、所述对抗攻击图像、所述对抗攻击结果和所述物理图像,对所述生成器网络进行训练。
根据本申请的另一个方面,提供了一种对抗图像的产生方法,由电子设备执行,包括:对包括生成器网络的对抗攻击模型进行训练,获得经训练的对抗攻击模型;利用经训练的对抗攻击模型,基于输入数字图像,产生对抗图像,其中,对所述对抗攻击模型进行训练时,利用所述生成器网络,基于训练数字图像,产生对抗攻击图像;基于所述对抗攻击图像,对目标模型进行对抗攻击,并且获得对抗攻击结果;获得所述训练数字图像所对应的物理图像;基于所述训练数字图像、所述对抗攻击图像、所述对抗攻击结果和所述物理图像,对所述生成器网络进行训练。
根据本申请的另一个方面,提供了一种对抗攻击模型的训练装置。对抗攻击模型包括生成器网络,训练装置包括:生成模块,用于利用所述生成器网络,基于训练数字图像,产生对抗攻击图像;攻击模块,用于基于所述对抗攻击图像,对目标模型进行对抗攻击,并且获得对抗攻击结果;获取模块,用于获得所述训练数字图像所对应的物理图像;以及训练模块,用于基于所述训练数字图像、所述对抗攻击图像、所述对抗攻击结果和所述物理图像,对所述生成器网络进行训练。
根据本申请的另一个方面,提供了一种对抗图像的产生装置,包括:第一训练模块,用于对包括生成器网络的对抗攻击模型进行训练,获得经训练的对抗攻击模型;生成模块,用于利用经训练的对抗攻击模型,基于输入数字图像,产生对抗图像,其中,对所述对抗攻击模型进行训练时,利用所述生成器网络,基于训练数字图像,产生对抗攻击图像;基于所述对抗攻击图像,对目标模型进行对抗攻击,并且获得对抗攻击结果;获得所述训练数字图像所对应的物理图像;基于所述训练数字图像、所述对抗攻击图像、所述对抗攻击结果和所述物理图像,对所述生成器网络进行训练。
根据本申请的另一个方面,提供了一种电子设备,包括:处理器;及存储器,所述存储器上存储有计算机可读指令,所述计算机可读指令被所述处理器执行时实现以上所述的对抗攻击模型的训练方法,或者,以上所述的对抗图像的产生方法。
根据本申请的另一个方面,提供了一种计算机可读存储介质,其上存储有一个或多个计算机程序,其中当一个或多个计算机程序被处理器执行时实现以上描述的对抗攻击模型的训练方法,或者,以上描述的对抗图像的产生方法。
本申请实施例另一方面还提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该 计算机指令,使得该计算机设备执行上述用户设备的接入处理方法。
附图简要说明
为了更清楚地说明本申请实施例的技术方案,下面将对本申请实施例的附图作简单地介绍。明显地,下面描述的附图仅仅涉及本申请的一些实施例,而非对本申请的限制。
图1示出了可以应用根据本申请的实施例的对抗攻击模型的训练的示例系统的框图;
图2A示出了根据本申请的一些实施例的对抗攻击模型的框图;
图2B示出了根据本申请的一些实施例的对抗攻击模型的框图;
图3A示出了根据本申请的一些实施例的对抗攻击模型的框图;
图3B示出了根据本申请的一些实施例的对抗攻击模型的框图;
图4示出了根据本申请的一些实施例的对抗攻击模型的训练方法,其中该对抗攻击模型包括生成器网络和判决器网络;
图5示出了根据本申请的一些实施例的对抗攻击模型的训练方法,其中该对抗攻击模型包括生成器网络、判决器网络和和几何变换模块;
图6示出了根据本申请实施例的产生对抗攻击图像的方法;
图7A示出了根据本申请的一些实施例的对抗攻击模型的训练装置的框图;
图7B示出了根据本申请的一些实施例的对抗图像的产生装置的框图;
图8A至图8C分别示出了原始数字图像以及分别采用EOT方法、RP2方法、D2P方法以及本申请的方法生成的一些对抗样本的示例。
图9示出了在采用EOT方法、RP2方法、D2P方法以及本申请的方法进行实验时用户的答案的分布的示意图;以及
图10示出了根据本申请的实施例的电子设备的框图。
实施方式
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例的附图,对本申请实施例的技术方案进行清楚、完整地描述。明显地,所描述的实施例是本申请的一部分实施例,而不是全部的实施例。基于所描述的本申请的实施例,本领域普通技术人员在无需创造性劳动的前提下所获得的所有其他实施例,都属于本申请保护的范围。
这里用于描述本申请的实施例的术语并非旨在限制和/或限定本申请的范围。
例如,除非另外定义,本申请使用的技术术语或者科学术语应当为本申请所属领域内具有一般技能的人士所理解的通常意义。
应该理解的是,本申请中使用的“第一”、“第二”以及类似的词语并不表示任何顺序、数量或者重要性,而只是用来区分不同的组成部分。除非上下文另外清楚地指出,否则单数形式“一个”、“一”或者“该”等类似词语也不表示数量限制,而是表示存在至少一个。
将进一步理解的是,术语“包括”或者“包含”等类似的词语意指出现该词前面的元件或者物件涵盖出现在该词后面列举的元件或者物件及其等同,而不排除其他元件或者物件。“连接”或者“相连”等类似的词语并非限定于物理的或者机械的连接,而是可以包括电性的连接,不管是直接的还是间接的。“上”、“下”、“左”、“右”等仅用于表示相对位置关系,当被描述对象的绝对位置改变后,则该相对位置关系也可能相应地改变。
下面,将参考附图详细地说明本申请的实施例。应当注意的是,不同的附图中相同的附图标记将用于指代已描述的相同的元件。
对抗攻击由于其作用的领域不同一般分为两种:数字对抗攻击和物理对抗攻击。数字对抗攻击是将数字世界(也可以称为数字域或数字空间)中的诸如数字图像的数字对抗样本,直接输入深度神经网络进行攻击的方法。物理对抗攻击是通过物理世界(也可以称为物理域或物理空间)中的物理对抗样本,对深度神经网络进行攻击的方法。
物理对抗攻击的难点在于,在数字域内有效的对抗样本(例如,对抗图像),在经过了数字域到物理域的转换后,通常会由于例如图像失真而丧失攻击效果。此外,数字域到物理域的转换具有很高的不确定性而难以准确建模。
为了解决至少以上问题,本申请的实施例提供了用于对抗攻击的对抗攻击模型、该对抗攻击模型的训练方法、通过该对抗攻击模型产生对抗样本(例如,对抗图像)以及通过该对抗样本来训练目标模型的方法。
图1示出了可以应用根据本申请的实施例的对抗攻击模型的训练的示例系统10的框图。
参考图1,系统10可以包括用户设备110、服务器120和训练装置130。用户设备110、服务器120和训练装置130可以彼此通过网络140通信地耦合。
用户设备110可以是任何类型的电子设备,诸如个人计算机(例如,膝上型或台式计算机)、移动设备(例如,智能手机或平板电脑)、游戏控制台、可穿戴设备、或任何其他类型的电子设备。
用户设备110可以包括一个或多个处理器111和存储器112。该一个或多个处理器111可以是任何合适的处理设备(例如,处理器核心、微处理器、ASIC、FPGA、控制器、微控制器等),并且可以是一个处理器或可操作地连接的多个处理器。存储器112可以包括一个或多个非暂时性计算机可读存储介质,例如,RAM、ROM、EEPROM、EPROM、闪存设备、磁盘等以及它们的组合。存储器112可以存储数据和由处理器111执行以使用户设备110执行操作的指令。
在一些实施方式中,用户设备110可以存储或包括一个或多个对抗攻击模型。
在一些实施方式中,用户设备110还可以存储或以其他方式包括一个或多个目标模型。在本申请的实施例中,目标模型可以指要被攻击的模型。例如,目标模型可以是或可以以其他方式包括各种机器学习模型,诸如神经网络(例如,深度神经网络)或其他类型的机器学习模型(包括非线性模型和/或线性模型)。神经网络可以 包括前馈神经网络、递归神经网络(例如,长短期记忆递归神经网络)、卷积神经网络或其他形式的神经网络。
在一些实施方式中,可以通过网络140从服务器120接收一个或多个对抗攻击模型,存储在用户设备的存储器114中,然后由一个或多个处理器111使用或以其他方式实现。
在一些实施方式中,服务器120可以包括一个或多个对抗攻击模型。服务器120根据客户端-服务器关系与用户设备110进行通信。例如,对抗攻击模型可以由服务器140实现为web服务的一部分。因此,可以在用户设备110处存储和实现一个或多个对抗攻击模型,和/或,可以在服务器120处存储和实现一个或多个对抗攻击模型。
在一些实施方式中,服务器120包括一个或多个处理器121和存储器122。该一个或多个处理器121可以是任何合适的处理设备(例如,处理器核心、微处理器、ASIC、FPGA、控制器、微控制器等),并且可以是一个处理器或可操作地连接的多个处理器。存储器122可以包括一个或多个非暂时性计算机可读存储介质,诸如RAM、ROM、EEPROM、EPROM、闪存设备、磁盘等以及它们的组合。存储器122可以存储数据和由处理器121执行以使服务器120执行操作的指令。
在一些实施方式中,服务器120还可以存储或以其他方式包括一个或多个目标模型。在本申请的实施例中,目标模型可以指要被攻击的模型。例如,目标模型可以是或可以以其他方式包括各种机器学习模型,诸如神经网络(例如,深度神经网络)或其他类型的机器学习模型(包括非线性模型和/或线性模型)。神经网络可以包括前馈神经网络、递归神经网络(例如,长短期记忆递归神经网络)、卷积神经网络或其他形式的神经网络。
在一些实施方式中,用户设备110和/或服务器120可以利用与通过网络140通信地耦合的训练装置130的交互来训练对抗攻击模型和/或目标模型。在一些示例中,训练装置130可以与服务器120分开或可以是服务器120的一部分。
在一些实施方式中,训练装置130包括一个或多个处理器131和存储器132。该一个或多个处理器131可以是任何合适的处理设备(例如,处理器核心、微处理器、ASIC、FPGA、控制器、微控制器等),并且可以是一个处理器或可操作地连接的多个处理器。存储器132可以包括一个或多个非暂时性计算机可读存储介质,例如,RAM、ROM、EEPROM、EPROM、闪存设备、磁盘等以及它们的组合。存储器132可以存储数据和由处理器131执行以使训练装置130执行操作的指令。
在一些实施方式中,训练装置130可以包括机器学习引擎133。例如,机器学习引擎133可以使用各种训练技术或学习技术,训练存储在用户设备110和/或服务器120处的对抗攻击模型和/或目标模型。机器学习引擎133可以执行多种技术(例如,权重衰减、丢失等)以改进正在训练的模型的泛化能力。机器学习引擎133可以包括一个或多个机器学习平台、框架、和/或库,诸如TensorFlow、Caffe/Caffe2、Theano、Torch/PyTorch、MXnet、CNTK等。
在一些实施方式中,机器学习引擎133可以实现对抗攻击模型和/目标模型的训练。
如上所述,图1示出了可以用于实现本申请的示例系统。然而,本申请不限于此,也可以使用其他系统来实现本申请。例如,在一些实施方式中,用户设备110可以包括机器学习引擎和训练数据集。在这样的实施方式中,可以在用户设备110本地训练并使用对抗攻击模型和/或目标模型,或者可以通过经训练的对抗攻击模型生成对抗样本。
图2A示出了根据本申请的一些实施例的对抗攻击模型20的示例。图2B示出了包括了某个数字图像样本的对抗攻击模型20的示例。
参考图2A,对抗攻击模型20可以包括生成器网络201和判决器网络202。
在一些实施方式中,使用训练样本对对抗攻击模型20进行训练。在本申请的实施例中,训练样本可以为数字图像样本(称为训练数字图像)。
在一些实施方式中,生成器网络201和判决器网络202可以包括各种类型的机器学习模型。机器学习模型可以包括线性模型和非线性模型。例如,机器学习模型可以包括回归模型、支持向量机、基于决策树的模型、贝叶斯模型和/或神经网络(例如,深度神经网络)。例如,神经网络可以包括前馈神经网络、递归神经网络(例如,长短期记忆递归神经网络)、卷积神经网络或其他形式的神经网络。
需要说明的是,为了便于描述,这里将生成器网络和判决器网络称为“网络”,但是生成器网络和判决器网络不一定限于是神经网络,而是还可以包括其他形式的机器学习模型。
在一些实施方式中,生成器网络201和判决器网络202构成生成式对抗网络(Generative Adversarial Network,GAN)。
在一些实施方式中,生成器网络201可以基于训练数字图像产生对抗攻击图像,并且所生成的对抗攻击图像可以被输出到判决器网络202和目标模型21。在本申请的实施例中,目标模型21可以指要被对抗攻击的模型。
在一些实施方式中,判决器网络202可以基于物理图像和生成器网络201产生的对抗攻击图像,生成判别结果。
本申请实施例中,物理图像可以通过对训练数字图像执行物理域到数字域的转换而获得。例如,图2B示出了训练数字图像到物理图像的转换的示例形式。对训练数字图像执行物理域到数字域的转换可以包括以下之一:对训练数字图像进行打印并扫描后,获取所述物理图像;或者,对训练数字图像进行打印并拍摄后,获取所述物理图像。例如,可以通过打印机对训练数字图像进行打印并通过扫描仪对打印的图像进行扫描,以获得物理图像。可替换地,可以通过打印机对训练数字图像进行打印并通过照相机对打印的图像进行拍摄,以获得物理图像。此外,可以将训练数字图像以1:1的比例映射到物理域。
在一些示例中,如果要实现对目标模型21的对抗攻击,则生成器网络201产生的对抗攻击图像需要欺骗目标模型21。因此,用于对生成器网络201进行训练的第 一目标函数可以表示为:
Figure PCTCN2020128009-appb-000001
上述第一目标函数中,
Figure PCTCN2020128009-appb-000002
表示对目标模型21的对抗攻击的对抗攻击损失,f(·)表示目标模型21,G(·)表示生成器网络201,x表示输入的训练数字图像,y表示相对于训练数字图像的标签设置的目标标签。例如,对抗攻击损失
Figure PCTCN2020128009-appb-000003
可以参考GAN模型来获得,然而本申请不限于此,并且可以采用各种对抗攻击损失。
并且,在这些示例中,生成器网络201生成的对抗攻击图像还需要与不带噪声的物理图像足够接近,使得能够欺骗判决器网络202。例如,以GAN的要求欺骗判决器网络202。因此,用于对判决器网络202进行训练的第二目标函数可以表示为:
Figure PCTCN2020128009-appb-000004
上述第二目标函数中,
Figure PCTCN2020128009-appb-000005
表示判决器网络的判决损失,G(·)表示生成器网络,D(·)表示判决器网络,x表示输入到生成器网络的训练数字图像,x p表示输入到判决器网络的物理图像。
Figure PCTCN2020128009-appb-000006
函数可以表示当更新D时需要最大化判决损失,而当更新G时需要最小化判决损失。例如,判决损失
Figure PCTCN2020128009-appb-000007
可以参考GAN模型来获得,然而本申请不限于此,并且可以采用各种判决损失。
因此,在这些示例中,可以基于上述对抗攻击损失和判决损失,对对抗攻击模型20进行训练以获得生成器网络201和判决器网络202的变量。
在本申请的实施例中,通过用生成器网络和判决器网络的结构来监督生成的对抗攻击图像的噪声强度,可以显著提高由经训练的对抗攻击模型生成的对抗图像的图像质量,使得对抗图像能够被用来进行有效攻击或者被用于对目标模型进行有效训练。
需要说明的是,为了方便描述,在本申请的实施例中,将在对抗攻击模型训练时生成器网络产生的图像称为“对抗攻击图像”,而对经训练的对抗攻击模型产生的图像称为“对抗图像”。
在以上描述的对抗攻击模型20中,通过判决器网络能够限制噪声对物理图像的影响。另外,对于对抗攻击模型20,能够通过将数字图像到物理图像的转换过程和对抗攻击图像的生成过程进行联合优化。
此外,在一些实施方式中,对抗攻击模型20可以用于普遍的物理攻击(Universal physical attack)中。在这种情况下,训练数字图像可以包括通过将原始图像经过不同的随机剪裁而获得的多个不同的数字图像。通过对多个不同的数字图像执行物理域到数字域的转换可以获得相应的多个物理图像。该多个数字图像和多个物理图像形成多组数字图像和物理图像。将该多组数字图像和物理图像中的每一组分别作为对抗攻击模型20的输入来对进行训练,其中每组数字图像和物理图像中的数字图像作为训练数字图像,并且物理图像作为与训练数字图像相对应的物理图像。
经过训练后,对抗攻击模型20可以用于攻击其他不同的输入图像。这种情况下的对抗攻击模型可以学习到更加广泛适用的对抗噪声模式。
图3A示出了根据本申请的一些实施例的对抗攻击模型30的示例。图3B示出了包括了某个数字图像样本的对抗攻击模型30的示例。
参考图3A,对抗攻击模型30可以包括生成器网络301、判决器网络302和几何变换模块303。
关于生成器网络301和判决器网络302的实施方式,可以参考图2A和2B中的生成器网络201和判决器网络202,这里省略对其的详细描述。
在一些实施方式中,生成器网络301可以基于训练数字图像产生对抗攻击图像,并且所生成的对抗攻击图像可以被输出到判决器网络302和几何变换模块303。在本申请的实施例中,目标模型31可以指要被对抗攻击的模型。例如,图3B示出了对抗攻击图像的几何转换的示例形式。
在一些实施方式中,几何变换模块303可以被配置为对生成器网络301生成的对抗攻击图像进行几何变换。几何变换可以包括仿射变换。例如,几何变换可以包括平移、缩放、翻转、旋转和剪切中的至少一个。由此,可以利用几何变换后的对抗攻击图像对目标模型31进行对抗攻击。
在一些示例中,如果要实现对目标模型31的对抗攻击,则生成器网络301产生的对抗攻击图像需要欺骗目标模型31。另外,例如,在对对抗攻击模型30进行训练时,可以采用EOT(expectation over transformation,期望转换)方法进行对抗攻击。在这种情况下,用于对生成器网络301进行训练的第一目标函数可以表示为:
Figure PCTCN2020128009-appb-000008
上述第一目标函数中,
Figure PCTCN2020128009-appb-000009
表示对目标模型303的对抗攻击的对抗攻击损失,f(·)表示目标模型,G(·)表示生成器网络301,x表示输入的训练数字图像,y表示相对于训练数字图像的标签设置的目标标签,E[·]表示求期望,r(·)表示几何变换,R表示几何变换的集合。
并且,在这些示例中,生成器网络301生成的产生的对抗攻击图像还需要与不带噪声的物理图像足够接近,使得能够欺骗判决器网络302。例如,以GAN的要求欺骗判决器网络302。因此,用于对判决器网络302进行训练的第二目标函数可以表示为:
Figure PCTCN2020128009-appb-000010
上述第二目标函数中,
Figure PCTCN2020128009-appb-000011
表示判决器网络的判别损失,G(·)表示生成器网络,D(·)表示判决器网络,x表示输入到生成器网络301的训练数字图像,x p表示输入到判决器网络的物理图像。
Figure PCTCN2020128009-appb-000012
函数可以表示当更新D时需要最大化判决损失,而当更新G时需要最小化判决损失。
在这些示例中,通过结合上述第一目标函数和第二目标函数,可以得到最终目标函数为:
Figure PCTCN2020128009-appb-000013
上述最终目标函数中,λ是权重系数(被称为攻击权重)。例如,该攻击权重可以是预先定义的超参数。例如,该攻击权重的范围可以为5~20。
因此,在这些示例中,可以基于上述目标函数对包括生成器网络301和判决器网络302的对抗攻击模型30进行训练,以获得生成器网络301和判决器网络302的变量。
在本申请的实施例中,通过用生成器网络和判决器网络的结构来监督生成的对抗攻击图像的噪声强度,可以显著提高由经训练的对抗攻击模型生成的对抗图像的图像质量,使得对抗图像能够被用来进行有效攻击或者被用于对目标模型进行有效训练。
在以上描述的对抗攻击模型30中,通过判决器网络能够限制噪声对物理图像的影响,并且能够将数字图像到物理图像的转换过程和对抗攻击图像的生成过程进行联合优化。另外,通过使用几何变换后的对抗攻击图像进行对抗攻击,能够使得攻击效果在几何变换的情况下稳定,从而提高了对抗攻击的鲁棒性。
此外,在一些实施方式中,对抗攻击模型30可以用于普遍的物理攻击(Universal physical attack)中。在这种情况下,训练数字图像可以包括通过将原始图像经过不同的随机剪裁而获得的多个不同的数字图像。通过对多个不同的数字图像执行物理域到数字域的转换可以获得相应的多个物理图像。该多个数字图像和多个物理图像形成多组数字图像和物理图像。将该多组数字图像和物理图像中的每一组分别作为对抗攻击模型30的输入来对进行训练,其中每组数字图像和物理图像中的数字图像作为训练数字图像,并且物理图像作为与训练数字图像相对应的物理图像。
经过训练后,对抗攻击模型30可以用于攻击其他不同的输入图像。这种情况下的对抗攻击模型可以学习到更加广泛适用的对抗噪声模式。
以上结合图2A和2B以及图3A和3B描述了根据本申请的一些实施例的对抗攻击模型的一些示例。下面将结合图4和图5描述根据本申请的一些实施例的对对抗攻击模型进行训练的方法。
图4示出了根据本申请的一些实施例的对抗攻击模型的训练方法40,其中,该对抗攻击模型包括生成器网络和判决器网络。例如,该方法可以用于对图2A或图2B中示出的对抗攻击模型20进行训练。
参考图4,在步骤S41,利用生成器网络,基于训练数字图像,产生对抗攻击图像。
在一些实施方式中,通过将训练数字图像输入到生成器网络,根据生成器网络中的机器学习模型,产生对抗攻击图像。
然后,在步骤S43,利用对抗攻击图像,对目标模型进行对抗攻击,并且获得对抗攻击结果。
例如,对抗攻击结果可以是目标模型输出的识别结果或分类结果。
接下来,在步骤S45,获得训练数字图像所对应的物理图像。
例如,获得训练数字图像所对应的物理图像可以包括以下之一:对所述训练数字图像进行打印并扫描(打印-扫描)后,获取物理图像;或者对训练数字图像进行打印并拍摄(打印-拍摄)后,获取物理图像。
在一些实施方式中,步骤S45可以包括直接接收或读取训练数字图像所对应的物理图像,其中,该物理图像是采用以上描述的示例方法来确定的。在这种情况下,可以预先确定训练数字图像所对应的物理图像。
需要说明,虽然在图4以及相应的描述中,步骤S45被示出在步骤S41和S43之后,然而本申请不限于此。例如,步骤S45可以在步骤S41或S43之前执行,或者并行执行。
然后,在步骤S47,基于训练数字图像、对抗攻击图像、对抗攻击结果和物理图像,对生成器网络进行训练。
在一些实施方式中,对抗攻击模型还包括判决器网络,步骤S47可以包括:获得与训练数字图像相对应的目标标签;基于目标标签和对抗攻击结果,确定对抗攻击损失,并基于对抗攻击损失,对生成器网络进行训练;利用判决器网络,基于对抗攻击图像和物理图像进行图像判决,确定判决损失;基于对抗攻击损失和判决损失,对生成器网络和判决器网络进行联合训练。
在一些实施方式中,基于对抗攻击损失和判决损失,对生成器网络和判决器网络进行联合训练,包括:利用所述对抗攻击损失和所述判决损失,构造目标损失;基于所述目标损失,对所述生成器网络和所述判决器网络进行联合训练。
其中,利用所述对抗攻击损失和所述判决损失,构造目标损失包括:根据所述对抗攻击损失,构建第一目标函数;根据所述判决损失,构建第二目标函数;根据所述第一目标函数和所述第二目标函数,确定最终目标函数。
相应地,基于所述目标损失,对所述生成器网络和所述判决器网络进行联合训练包括:基于所述最终目标函数,组合地训练所述生成器网络和所述判决器网络。
在一些示例中,如参考图2A或2B所描述的,基于所述目标标签和所述对抗攻击结果,对抗攻击损失可以被确定为
Figure PCTCN2020128009-appb-000014
其中,
Figure PCTCN2020128009-appb-000015
表示对目标模型进行对抗攻击的对抗攻击损失,f(·)表示目标模型,G(·)表示生成器网络,x表示输入的训练数字图像,y表示相对于训练数字图像的标签设置的目标标签。
判决损失可以被确定为
Figure PCTCN2020128009-appb-000016
其中,
Figure PCTCN2020128009-appb-000017
表示判决器网络的判决损失,G(·)表示生成器网络,D(·)表示判决器网络,x表示输入到生成器网络的训练数字图像,x p表示输入到判决器网络的物理图像。
在此基础上,第一目标函数可以被确定为
Figure PCTCN2020128009-appb-000018
第二目标函数可以被确定为
Figure PCTCN2020128009-appb-000019
此外,可以基于第一目标函数和第二目标函数确定最终目标函数。例如,最终目标函数可以被确定为:
Figure PCTCN2020128009-appb-000020
其中λ是预先定义的攻击权重。
例如,基于对抗攻击损失和判决损失,对生成器网络和判决器网络进行联合训练,可以包括:基于第一目标函数和第二目标函数,对生成器网络和判决器网络进行训练。
在一些示例中,对生成器网络和判决器网络进行联合训练可以包括:并行地同时训练生成器网络和判决器网络,其中基于第一目标函数和第二目标函数,训练生成器网络,并且基于第二目标函数,训练判决器网络。
在一些实施方式中,参考图4描述的对对抗攻击模型进行训练的方法可以实施在例如图1中的用户设备110、服务器120、训练装置130、机器学习引擎133中的至少一个中。
图5示出了根据本申请的一些实施例的对抗攻击模型的训练方法50,其中该对抗攻击模型包括生成器网络、判决器网络和几何变换模块。例如,该方法可以用于对图3A或图3B中示出的对抗攻击模型30进行训练。
参考图5,在步骤S51,利用生成器网络,基于训练数字图像,产生对抗攻击图像。
在一些实施方式中,通过将训练数字图像输入到生成器网络,以产生对抗攻击图像。
接下来,在步骤S53,对对抗攻击图像进行几何变换,得到几何变换后的对抗攻击图像。
在步骤S53,通过几何变换模块对生成器网络生成的对抗攻击图像进行几何变换。几何变换可以是仿射变换。例如,仿射变换可以包括平移、缩放、翻转、旋转和剪切中的至少一个。
由此,可以利用几何变换后的对抗攻击图像对目标模进行对抗攻击。下面描述几何变换的一个示例。对于对抗攻击图像上的一点p(p x,p y),其齐次坐标的形式p(p x,p y,1),通过齐次几何变换矩阵A表示几何变换,则该点p(p x,p y)经过几何变换后的坐标(p x′,p y′)满足:
Figure PCTCN2020128009-appb-000021
上式中,a 1~a 6为几何变换的参数,反映了对抗攻击图像的旋转、缩放等几何变换。几何变换的参数可以是预先定义的值。例如,可以根据不同的变换需求设置几何变换的参数。
然后,在步骤S55,利用几何变换后的对抗攻击图像,对目标模型进行对抗攻击,获得对抗攻击结果。
例如,对抗攻击结果可以是目标模型输出的识别结果或分类结果。
接下来,在步骤S57,获得训练数字图像所对应的物理图像。
例如,获得训练数字图像所对应的物理图像可以包括以下之一:对所述训练数字图像进行打印并扫描以获取物理图像;或者对训练数字图像进行打印并拍摄以获取物理图像。
在一些实施方式中,步骤S57可以包括直接接收或读取训练数字图像所对应的物理图像,其中,该物理图像是采用以上描述的示例方法来确定的。在这种情况下,可以预先确定训练数字图像所对应的物理图像。
需要说明,虽然在图5以及相应的描述中,步骤S57被示出在步骤S51、S53以及S55之后,然而本申请不限于此。例如,步骤S57可以在步骤S51、S53以及S55中的一个之前执行,或者并行执行。
然后,在步骤S59,基于训练数字图像、对抗攻击图像、对抗攻击结果和物理图像,对生成器网络和判决器网络进行训练。
此步骤的实现,可以参照上述步骤S47的描述,此处不再赘述。
在一些实施方式中,参考图5描述的对对抗攻击模型进行训练的方法可以实施在例如图1中的用户设备110、服务器120、机器学习引擎133中的至少一个中。
以上描述了根据本申请的实施例的对抗攻击模型及其训练方法。下面将描述产生对抗攻击图像的方法。
图6示出了根据本申请实施例的对抗图像的产生方法。
需要说明,为了方便描述,在本申请的实施例中,将在对抗攻击模型训练时生成器网络产生的图像称为“对抗攻击图像”,而对经训练的对抗攻击模型产生的图像称为“对抗图像”。
参考图6,在步骤S61,对包括生成器网络的对抗攻击模型进行训练,获得经训练的对抗攻击模型。
然后,在步骤S63,利用经训练的对抗攻击模型,基于输入数字图像,产生对抗攻击图像。
例如,输入数字图像可以与训练数字图像相同或不同。
在一些实施方式中,对抗攻击模型可以是参考图2A或图2B描述的对抗攻击模型20。在这种情况下,步骤S61可以包括通过参考图4所描述的方法对对抗攻击模型进行训练以获得经训练的对抗攻击模型。
在一些实施方式中,对抗攻击模型可以是参考图3A或图3B描述的对抗攻击模型30。在这种情况下,步骤S61可以包括通过参考图5所描述的方法对对抗攻击模型进行训练以获得经训练的对抗攻击模型。步骤S63可以包括:利用生成器网络,基于输入数字图像,产生对抗图像。可替换地,步骤S63可以包括:利用生成器网络,基于输入数字图像,产生第一对抗图像;以及对第一对抗图像进行几何变换,得到几何变换后的第二对抗图像,并且将第二对抗图像作为所述对抗图像。
在一些实施方式中,在产生对抗图像后,可以使用所产生的对抗图像对目标模 型进行对抗攻击,以欺骗目标模型。
在一些实施方式中,在产生对抗图像后,可以使用所产生的对抗图像对目标模型进行训练,以防御使用该对抗图像进行的对抗攻击。
通过使用根据本申请实施例的对抗图像的产生方法来产生对抗图像,能够对目标模型进行攻击以确定目标模型的稳定性。另外,生成的对抗图像也能够用于训练目标模型,以提高目标模型对这种对抗攻击的防御能力。
以上描述了根据本申请的各种实施例的用于对对抗攻击模型进行训练的方法以及产生对抗图像的方法。应当理解,附图中的流程图和框图图示了根据本申请的各种实施例的方法、装置、系统和计算机可读存储介质的可能实现的体系架构、功能和操作。例如,流程图或框图中的每个框可以表示包括用于实现指定的逻辑功能的至少一个可执行指令的模块、段或代码部分。还应该注意的是,在一些替代实施方式中,方框中提到的功能可以不按照附图中指出的顺序发生。例如,取决于所涉及的功能,连续示出的两个框实际上可以基本上同时执行,或者框有时可以以相反的顺序执行。还将注意到,框图和/或流程图的每个框以及框图和/或流程图中的框的组合可以由执行指定功能或动作的基于专用硬件的系统、或专用硬件和计算机指令的组合来实现。
图7A示出了根据本申请的实施例的对抗攻击模型的训练装置70的框图,对抗攻击模型包括生成器网络。例如,训练装置70可以用于对以上描述的各种对抗攻击模型进行训练。
参考图7A,对抗攻击模型的训练装置70可以包括:
生成模块701,用于利用所述生成器网络,基于训练数字图像,产生对抗攻击图像;
攻击模块702,用于基于所述对抗攻击图像,对目标模型进行对抗攻击,并且获得对抗攻击结果;
获取模块703,用于获得所述训练数字图像所对应的物理图像;以及
训练模块704,用于基于所述训练数字图像、所述对抗攻击图像、所述对抗攻击结果和所述物理图像,对所述生成器网络进行训练。
在一些实施方式中,攻击模块702用于,对对抗攻击图像进行几何变换,得到几何变换后的对抗攻击图像;以及利用几何变换后的对抗攻击图像,对目标模型进行对抗攻击,获得对抗攻击结果。
在一些实施方式中,对抗攻击模型还包括判决器网络,其中,训练模块704用于,获得与所述训练数字图像相对应的目标标签;基于所述目标标签和所述对抗攻击结果,确定对抗攻击损失,并基于所述对抗攻击损失,对所述生成器网络进行训练;利用所述判决器网络,基于所述对抗攻击图像和所述物理图像进行图像判决,确定判决损失;基于所述对抗攻击损失和所述判决损失,对所述生成器网络和所述判决器网络进行联合训练。
在一些实施方式中,对抗攻击模型的训练装置70可以被实施为图1中的用户设 备110、服务器120、训练装置130、机器学习引擎133中的至少一个中。
关于对抗攻击模型的训练装置70的具体配置可以参考以上描述的对抗攻击模型的各种训练方法,这里省略对其的详细描述。
图7B示出了根据本申请的实施例的对抗图像的产生装置71。
参考图7B,产生装置71包括:
第一训练模块711,用于对包括生成器网络的对抗攻击模型进行训练,获得经训练的对抗攻击模型;
生成模块712,用于利用经训练的对抗攻击模型,基于输入数字图像,产生对抗图像,其中,对所述对抗攻击模型进行训练时,利用所述生成器网络,基于训练数字图像,产生对抗攻击图像;基于所述对抗攻击图像,对目标模型进行对抗攻击,并且获得对抗攻击结果;获得所述训练数字图像所对应的物理图像;基于所述训练数字图像、所述对抗攻击图像、所述对抗攻击结果和所述物理图像,对所述生成器网络进行训练。
在一些实施方式中,装置71还包括:
第二训练模块713,用于使用对抗图像对目标模型进行训练,以防御使用对抗图像进行的对抗攻击。
在一些实施方式中,对抗图像的产生装置71可以被实施在图1中的用户设备110、服务器120、训练装置130、机器学习引擎133中的至少一个中。
关于对抗图像的产生装置71的具体配置可以参考以上描述的对抗图像的各种产生方法,这里省略对其的详细描述。
下面描述基于根据本申请的一些实施例的对抗攻击模型及其训练方法的实验,以说明通过该对抗攻击模型进行对抗攻击的一些效果。具体地,以下实验中,采用参考图3A或图3B描述的对抗攻击模型,并且采用图5描述的训练方法对该模型进行训练。需要说明,虽然这里采用了图3A或图3B中的对抗攻击模型及图5中的训练方法来进行实验,然而采用本申请的其它实施例也可以获得相同或类似的效果。
在该实验中,目标模型为在ImageNet上预训练的VGG-16模型。该实验采用的数据集为在ImageNet上随机选择的100张不同类别的数字图像。每张数字图像分别针对两个不同的标签攻击。这两个不同的标签(即,目标标签)分别被确定为该图像的原始标签+100和原始标签-100。例如,对于标签为14的一张图像,将被用来进行两次攻击,目标标签分别为914和114)。另外,由于每张数字图像被用来进行两次攻击,则总共进行200次对目标模型的攻击。
该实验使用参考图3描述的对抗攻击模型进行训练并生成用于对抗攻击的对抗图像(也可称为对抗样本)。另外,该实验中,对抗攻击模型的生成器网络包括3个卷积层、6个残差块和2个反卷积层,并且判决器网络包括5个卷积层。另外,对抗攻击模型中的几何变换模块的尺度变化范围为0.7~1.3,旋转角度范围为-30°~30°。
进一步,为了提高本申请的对抗攻击模型的鲁棒性,对用于几何变换模块使用 的几何变换矩阵A添加了随机噪声,使得该对抗攻击模型可以处理更复杂的空间变换。添加随机噪声后的几何变换矩阵A′可以表示为:
Figure PCTCN2020128009-appb-000022
上式中,b i是在[-0.1,0.1]中进行随机采样的值,i=1,2,…,6。
另外,在采用本申请的方法进行训练时,还在生成器网络产生的、几何变换之前的对抗攻击图像上加入例如强度为0.1的高斯随机噪声来提高对抗攻击模型对于色彩变化的稳定性。
对根据本申请的实施例的对抗攻击模型进行训练主要包括:对于每个原始数字图像,将原始数字图像进行打印并对打印获得的图像进行扫描以获得相应的物理图像,并将物理图像归一化到288*288的像素大小;分别对原始数字图像和物理图像进行随机剪裁,生成50组数字图像和物理图像,每组的数字理图像和物理图像的像素大小为256*256并且剪裁方式相同;使用这50组数字图像和物理图像进行训练。在训练过程中,每次将其中一组的数字图像和物理图像分别输入对抗攻击模型中的生成器网络和判决器网络,生成器网络生成的图像经过几何变换模块转换后攻击目标模型,经过200个时期(epoch)后即完成训练。在完成训练后,再将原始数字图像输入到生成器网络,生成器网络的输出为最终用于攻击的对抗图像。
为了说明本申请的方法的效果,采用EOT方法、RP2(Robust Physical Perturbations,鲁棒物理扰动)方法和D2P(数字域到物理域的映射)方法与采用本申请的方法进行对比。另外,用攻击成功率(attack success rate,ASR)来评价攻击效果,其中ASR指示生成的对抗图像被识别为目标类别的比率。另外,还通过用户来评价对抗图像的图像噪声的显著程度。
采用各种方法(EOT方法、RP2方法、D2P方法以及本申请的方法)在全部100张图像上进行200次攻击后,各种方法在数字域和物理域的攻击成功率及对应的置信度如表1所示。另外,还采用作为数字域攻击的PGD(Project Gradient Descent)方法作为参考,其他三种物理域攻击方法(EOT方法、RP2方法和D2P方法)里同样使用PGD方法来进行优化。例如,这三种物理域攻击方法(EOT方法、RP2方法和D2P方法)使用的噪声的强度均被限制为30(对于强度值的范围为0~255的RGB图像)。
在本实验中,数字域攻击指采用生成的对抗样本进行对抗攻击,物理域攻击指采用通过将对抗样本进行打印并对打印得到的图像进行扫描后得到的图像进行对抗攻击。可以看出,本申请的方法在数字域和物理域中的攻击成功率以及置信度均显著高于其他方法。
表1不同方法的攻击成功率
Figure PCTCN2020128009-appb-000023
表2示出了不同方法生成的对抗样本在物理域内对几何变换的稳定性。通过将对抗样本经过打印-扫描处理后,再经过尺度变换、旋转变换以及仿射变换得到的攻击效果。结果显示本申请的方法生成的对抗样本攻击效果最稳定,其攻击成功率(66.0%)比其他方法中最高的(54.8%)高11.2%。值得注意的是,本申请的方法生成的对抗样本在经过表2中各种几何变换处理后的平均攻击成功率高于未经过任何变换处理的对抗样本的成功率。这是因为本申请的方法在产生对抗样本时,在训练阶段,对抗样本被施加了在一定的范围内的随机几何变换,使得本申请的方法生成的对抗样本对这些几何变换有着极强的稳定性。
表2不同方法产生的对抗样本在物理域内对几何变换的稳定性
Figure PCTCN2020128009-appb-000024
Figure PCTCN2020128009-appb-000025
如在本申请的实施例中所描述的,获取物理图像的方法包括对数字图像进行打印-扫描或者对数字图像进行打印-拍摄。扫描和拍摄获得的图像存在明显的不同。例如,拍摄更容易受到光照、镜头畸变等复杂外界条件的影响。
因此,为了测试对抗样本的可迁移性,将获取物理图像的方法从打印-扫描改为打印-拍摄。如表3所示,在采用打印-拍摄获得物理图像的情况下,本申请的方法的攻击成功率依然高于其他对比方法10%以上。
表3不同方法的对抗样本的可迁移性
Figure PCTCN2020128009-appb-000026
另外,还测试了不同攻击权重λ对对抗攻击模型的影响。参考表4,随着攻击权重λ从5增大到10,在数字域和物理域内的攻击效果都有提升。物理域内的攻击成功率从51%提升至71%,显示出了高攻击权重可以生成更稳定的对抗样本。然而,在攻击效果更稳定的同时,图像质量会随着攻击权重λ的增大而有一定程度的降低。
表4不同攻击权重λ对对抗攻击模型的影响
Figure PCTCN2020128009-appb-000027
为了衡量不同方法生成的对抗样本的图像质量,进行了用户测试。具体地,每一位参与测试的用户将做100道选择题,每道题内展示了一张原始图像和分别采用4种方法(EOT方法、RP2方法和D2P方法以及本申请的方法)生成的对抗样本。图8A-8C示出了各种方法(EOT方法、RP2方法和D2P方法以及本申请的方法)生成的一些对抗样本的示例。
参考图8A-8C,分别示出了原始数字图像以及分别采用EOT方法、RP2方法和D2P方法以及本申请的方法生成的一些对抗样本的示例。由用户从中选择看起来失真程度最小、最自然的一张图像。总共有106位用户参与了测试,由于没有要求用户必须要在每道题中做出选择,所以总共收到了10237个答案。最终的答案的分布如表5和图9所示。
表5用户的答案的分布
Figure PCTCN2020128009-appb-000028
如表5和图9所示,超过70%的用户选择了本申请的方法生成的图像,该结果显示出本申请的方法生成的对抗样本在图像质量上显著优于其他对比方法。
图10示出了根据本申请的实施例的电子设备的框图。
参考图10,电子设备100可以包括一个或多个处理器1001和存储器1002。存储器1002可以用于存储一个或多个计算机程序。
处理器1001可以包括各种处理电路,诸如但不限于专用处理器、中央处理单元、应用处理器或通信处理器中的一种或更多种。处理器1001可以对电子设备100的至少一个其他组件执行控制、和/或执行与通信相关的操作或数据处理。
存储器1002可以包括易失性和/或非易失性存储器。
在一些实施方式中,当一个或多个计算机程序被一个或多个处理器1001执行时,使得一个或多个处理器1001实现如上描述的本申请的方法。
在一些实施方式中,电子设备100可以被实施为图1中的用户设备110、服务器120、训练装置130、机器学习引擎133中的至少一个中。
例如,本申请的实施例中的电子设备100可以包括诸如智能电话、平板个人计算机(PC)、服务器、移动电话、视频电话、电子书阅读器、台式PC、膝上型计算机、上网本计算机、个人数字助理(PDA)、便携式多媒体播放器(PMP)、MP3播放器、移动医疗设备、照相机或可穿戴设备(例如头戴式设备(HMD)、电子衣服、电子手环、电子项链、电子饰品、电子纹身或智能手表)等。
如本文所使用的,术语“模块”可以包括在硬件、软件或固件和/或其任何组合中配置的单元,并且可以与其他术语(例如逻辑、逻辑块、部件或电路)互换使用。模块可以是单个整体部件或执行一个或更多个功能的最小单元或部件。该模块可以机械地或电子地实现,并且可以包括但不限于已知的或将要被开发的执行某些操作的专用处理器、CPU、专用集成电路(ASIC)芯片、现场可编程门阵列(FPGA)或可编程逻辑器件。
根据本申请的实施例,装置(例如,模块或其功能)或方法(例如,操作或步骤)的至少一部分可以被实现为例如以程序模块的形式存储在计算机可读存储介质(例如,存储器112、存储器114、存储器122、存储器132或存储器1002)中的指令。当由处理器(例如,处理器111、处理器121、处理器131或处理器1001)执行该指令时,该指令可以使处理器能够执行相应的功能。计算机可读介质可以包括例如硬盘、软盘、磁介质、光学记录介质、DVD、磁光介质。该指令可以包括由编译器创建的代码或者可由解释器执行的代码。根据本申请的各种实施例的模块或编程模块可以包括上述组件中的至少一个或更多个,可以省略其中的一些,或者还包括其他附加的组件。由根据本申请的各种实施例的模块、编程模块或其他组件执行的操作可以顺序地、并行地、重复地或启发地执行,或者至少一些操作可以以不同的顺序被执行或被省略,或者可以添加其他操作。
以上仅是本申请的示范性实施方式,而非用于限制本申请的保护范围,本申请的保护范围由所附的权利要求确定。

Claims (18)

  1. 一种对抗攻击模型的训练方法,其特征在于,由电子设备执行,所述对抗攻击模型包括生成器网络,所述训练方法包括:
    利用所述生成器网络,基于训练数字图像,产生对抗攻击图像;
    基于所述对抗攻击图像,对目标模型进行对抗攻击,并且获得对抗攻击结果;
    获得所述训练数字图像所对应的物理图像;及,
    基于所述训练数字图像、所述对抗攻击图像、所述对抗攻击结果和所述物理图像,对所述生成器网络进行训练。
  2. 如权利要求1所述的方法,其中,所述基于所述对抗攻击图像,对目标模型进行对抗攻击,并且获得对抗攻击结果,包括:
    对所述对抗攻击图像进行几何变换,得到几何变换后的对抗攻击图像;
    利用所述几何变换后的对抗攻击图像,对所述目标模型进行对抗攻击,获得所述对抗攻击结果。
  3. 根据权利要求1或2所述的训练方法,其中,所述对抗攻击模型还包括判决器网络,
    其中,所述基于所述训练数字图像、所述对抗攻击图像、所述对抗攻击结果和所述物理图像,对所述生成器网络进行训练,包括:
    获得与所述训练数字图像相对应的目标标签;
    基于所述目标标签和所述对抗攻击结果,确定对抗攻击损失,并基于所述对抗攻击损失,对所述生成器网络进行训练;
    利用所述判决器网络,基于所述对抗攻击图像和所述物理图像进行图像判决,确定判决损失;
    基于所述对抗攻击损失和所述判决损失,对所述生成器网络和所述判决器网络进行联合训练。
  4. 根据权利要求3所述的训练方法,其中,所述基于所述对抗攻击损失和所述判决损失,对所述生成器网络和所述判决器网络进行联合训练,包括:
    利用所述对抗攻击损失和所述判决损失,构造目标损失;
    基于所述目标损失,对所述生成器网络和所述判决器网络进行联合训练。
  5. 根据权利要求4所述的训练方法,其中,所述利用所述对抗攻击损失和所述判决损失,构造目标损失,包括:
    根据所述对抗攻击损失,构建第一目标函数;
    根据所述判决损失,构建第二目标函数;
    根据所述第一目标函数和所述第二目标函数,确定最终目标函数;
    所述基于所述目标损失,对所述生成器网络和所述判决器网络进行联合训练,包括:
    基于所述最终目标函数,组合地训练所述生成器网络和所述判决器网络。
  6. 根据权利要求3所述的训练方法,其中,所述基于所述对抗攻击损失和所述判决损失,对所述生成器网络和所述判决器网络进行联合训练,包括:
    根据所述对抗攻击损失,构建第一目标函数;
    根据所述判决损失,构建第二目标函数;
    基于所述第一目标函数和所述第二目标函数,训练所述生成器网络;
    基于所述第二目标函数,训练所述判决器网络。
  7. 根据权利要求2所述的训练方法,其中,所述几何变换包括平移、缩放、翻转、旋转和剪切中的至少一个。
  8. 根据权利要求1、2或者7中任一项所述的训练方法,其中,所述获得所述训练数字图像所对应的物理图像,包括:
    对所述训练数字图像进行打印并扫描后,获取所述物理图像。
  9. 根据权利要求1、2或者7中任一项所述的训练方法,其中,所述获得所述训练数字图像所对应的物理图像,包括:
    对所述训练数字图像进行打印并拍摄后,获取所述物理图像。
  10. 一种对抗图像的产生方法,其特征在于,由电子设备执行,包括:
    对包括生成器网络的对抗攻击模型进行训练,获得经训练的对抗攻击模型;
    利用经训练的对抗攻击模型,基于输入数字图像,产生对抗图像,
    其中,对所述对抗攻击模型进行训练时,利用所述生成器网络,基于训练数字图像,产生对抗攻击图像;基于所述对抗攻击图像,对目标模型进行对抗攻击,并且获得对抗攻击结果;获得所述训练数字图像所对应的物理图像;基于所述训练数字图像、所述对抗攻击图像、所述对抗攻击结果和所述物理图像,对所述生成器网络进行训练。
  11. 根据权利要求10所述的产生方法,其中,还包括:
    使用所述对抗图像对所述目标模型进行训练,以防御使用所述对抗图像进行的对抗攻击。
  12. 一种对抗攻击模型的训练装置,其特征在于,所述对抗攻击模型包括生成 器网络,所述训练装置包括:
    生成模块,用于利用所述生成器网络,基于训练数字图像,产生对抗攻击图像;
    攻击模块,用于基于所述对抗攻击图像,对目标模型进行对抗攻击,并且获得对抗攻击结果;
    获取模块,用于获得所述训练数字图像所对应的物理图像;以及
    训练模块,用于基于所述训练数字图像、所述对抗攻击图像、所述对抗攻击结果和所述物理图像,对所述生成器网络进行训练。
  13. 根据权利要求12所述的训练装置,其中,所述攻击模块用于,对所述对抗攻击图像进行几何变换,得到几何变换后的对抗攻击图像;利用所述几何变换后的对抗攻击图像,对所述目标模型进行对抗攻击,获得所述对抗攻击结果。
  14. 根据权利要求12或13所述的训练装置,其中,所述对抗攻击模型还包括判决器网络,
    其中,所述训练模块用于,
    获得与所述训练数字图像相对应的目标标签;
    基于所述目标标签和所述对抗攻击结果,确定对抗攻击损失,并基于所述对抗攻击损失,对所述生成器网络进行训练;
    利用所述判决器网络,基于所述对抗攻击图像和所述物理图像进行图像判决,确定判决损失;
    基于所述对抗攻击损失和所述判决损失,对所述生成器网络和所述判决器网络进行联合训练。
  15. 一种对抗图像的产生装置,其特征在于,包括:
    第一训练模块,用于对包括生成器网络的对抗攻击模型进行训练,获得经训练的对抗攻击模型;
    生成模块,用于利用经训练的对抗攻击模型,基于输入数字图像,产生对抗图像,
    其中,对所述对抗攻击模型进行训练时,利用所述生成器网络,基于训练数字图像,产生对抗攻击图像;基于所述对抗攻击图像,对目标模型进行对抗攻击,并且获得对抗攻击结果;获得所述训练数字图像所对应的物理图像;基于所述训练数字图像、所述对抗攻击图像、所述对抗攻击结果和所述物理图像,对所述生成器网络进行训练。
  16. 根据权利要求15所述的产生装置,其中,所述装置还包括:
    第二训练模块,用于使用所述对抗图像对所述目标模型进行训练,以防御使用所述对抗图像进行的对抗攻击。
  17. 一种电子设备,其特征在于,包括:
    处理器;及
    存储器,所述存储器上存储有计算机可读指令,所述计算机可读指令被所述处理器执行时实现如权利要求1至9中任一项所述的对抗攻击模型的训练方法,或者,如权利要求10或11中所述的对抗图像的产生方法。
  18. 一种计算机可读存储介质,其特征在于,其上存储有一个或多个计算机程序,其中:
    当所述一个或多个计算机程序被处理器执行时,实现如权利要求1至9中任一项所述的对抗攻击模型的训练方法,或者,如权利要求10或11中所述的对抗图像的产生方法。
PCT/CN2020/128009 2020-02-21 2020-11-11 对抗攻击模型的训练方法及装置、对抗图像的产生方法及装置、电子设备及存储介质 WO2021164334A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/690,797 US20220198790A1 (en) 2020-02-21 2022-03-09 Training method and apparatus of adversarial attack model, generating method and apparatus of adversarial image, electronic device, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010107342.9 2020-02-21
CN202010107342.9A CN111340214B (zh) 2020-02-21 2020-02-21 对抗攻击模型的训练方法及装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/690,797 Continuation US20220198790A1 (en) 2020-02-21 2022-03-09 Training method and apparatus of adversarial attack model, generating method and apparatus of adversarial image, electronic device, and storage medium

Publications (1)

Publication Number Publication Date
WO2021164334A1 true WO2021164334A1 (zh) 2021-08-26

Family

ID=71181672

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/128009 WO2021164334A1 (zh) 2020-02-21 2020-11-11 对抗攻击模型的训练方法及装置、对抗图像的产生方法及装置、电子设备及存储介质

Country Status (3)

Country Link
US (1) US20220198790A1 (zh)
CN (1) CN111340214B (zh)
WO (1) WO2021164334A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115909020A (zh) * 2022-09-30 2023-04-04 北京瑞莱智慧科技有限公司 模型鲁棒性检测方法、相关装置及存储介质

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114981836A (zh) * 2020-01-23 2022-08-30 三星电子株式会社 电子设备和电子设备的控制方法
CN111340214B (zh) * 2020-02-21 2021-06-08 腾讯科技(深圳)有限公司 对抗攻击模型的训练方法及装置
AU2020437435B2 (en) * 2020-03-26 2023-07-20 Shenzhen Institutes Of Advanced Technology Adversarial image generation method, apparatus, device, and readable storage medium
CN112035834A (zh) * 2020-08-28 2020-12-04 北京推想科技有限公司 对抗训练方法及装置、神经网络模型的应用方法及装置
CN112488172B (zh) * 2020-11-25 2022-06-21 北京有竹居网络技术有限公司 对抗攻击的方法、装置、可读介质和电子设备
CN113177497B (zh) * 2021-05-10 2024-04-12 百度在线网络技术(北京)有限公司 视觉模型的训练方法、车辆识别方法及装置
US20230185912A1 (en) * 2021-12-13 2023-06-15 International Business Machines Corporation Defending deep generative models against adversarial attacks
CN114067184B (zh) * 2022-01-17 2022-04-15 武汉大学 一种基于噪声模式分类的对抗样本检测方法及系统
CN115348115B (zh) * 2022-10-19 2022-12-20 广州优刻谷科技有限公司 智能家居的攻击预测模型训练方法、攻击预测方法及系统
CN115439719B (zh) * 2022-10-27 2023-03-28 泉州装备制造研究所 一种针对对抗攻击的深度学习模型防御方法及模型
CN115631085B (zh) * 2022-12-19 2023-04-11 浙江君同智能科技有限责任公司 一种用于图像保护的主动防御方法及装置
CN116702634B (zh) * 2023-08-08 2023-11-21 南京理工大学 全覆盖隐蔽定向对抗攻击方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170351935A1 (en) * 2016-06-01 2017-12-07 Mitsubishi Electric Research Laboratories, Inc Method and System for Generating Multimodal Digital Images
CN110210573A (zh) * 2019-06-11 2019-09-06 腾讯科技(深圳)有限公司 对抗图像的生成方法、装置、终端及存储介质
CN110443203A (zh) * 2019-08-07 2019-11-12 中新国际联合研究院 基于对抗生成网络的人脸欺骗检测系统对抗样本生成方法
CN110728629A (zh) * 2019-09-03 2020-01-24 天津大学 一种用于对抗攻击的图像集增强方法
CN111340214A (zh) * 2020-02-21 2020-06-26 腾讯科技(深圳)有限公司 对抗攻击模型的训练方法及装置

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11341368B2 (en) * 2017-04-07 2022-05-24 Intel Corporation Methods and systems for advanced and augmented training of deep neural networks using synthetic data and innovative generative networks
CN108510061B (zh) * 2018-03-19 2022-03-29 华南理工大学 基于条件生成对抗网络的多监控视频人脸合成正脸的方法
CN109447263B (zh) * 2018-11-07 2021-07-30 任元 一种基于生成对抗网络的航天异常事件检测方法
CN109801221A (zh) * 2019-01-18 2019-05-24 腾讯科技(深圳)有限公司 生成对抗网络的训练方法、图像处理方法、装置和存储介质
CN110163093B (zh) * 2019-04-15 2021-03-05 浙江工业大学 一种基于遗传算法的路牌识别对抗防御方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170351935A1 (en) * 2016-06-01 2017-12-07 Mitsubishi Electric Research Laboratories, Inc Method and System for Generating Multimodal Digital Images
CN110210573A (zh) * 2019-06-11 2019-09-06 腾讯科技(深圳)有限公司 对抗图像的生成方法、装置、终端及存储介质
CN110443203A (zh) * 2019-08-07 2019-11-12 中新国际联合研究院 基于对抗生成网络的人脸欺骗检测系统对抗样本生成方法
CN110728629A (zh) * 2019-09-03 2020-01-24 天津大学 一种用于对抗攻击的图像集增强方法
CN111340214A (zh) * 2020-02-21 2020-06-26 腾讯科技(深圳)有限公司 对抗攻击模型的训练方法及装置

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115909020A (zh) * 2022-09-30 2023-04-04 北京瑞莱智慧科技有限公司 模型鲁棒性检测方法、相关装置及存储介质
CN115909020B (zh) * 2022-09-30 2024-01-09 北京瑞莱智慧科技有限公司 模型鲁棒性检测方法、相关装置及存储介质

Also Published As

Publication number Publication date
US20220198790A1 (en) 2022-06-23
CN111340214A (zh) 2020-06-26
CN111340214B (zh) 2021-06-08

Similar Documents

Publication Publication Date Title
WO2021164334A1 (zh) 对抗攻击模型的训练方法及装置、对抗图像的产生方法及装置、电子设备及存储介质
Baldassarre et al. Deep koalarization: Image colorization using cnns and inception-resnet-v2
Qian et al. Recurrent color constancy
WO2020103700A1 (zh) 一种基于微表情的图像识别方法、装置以及相关设备
WO2019227479A1 (zh) 人脸旋转图像的生成方法及装置
Gudi et al. Efficiency in real-time webcam gaze tracking
Pang et al. Dpe: Disentanglement of pose and expression for general video portrait editing
CN110968734A (zh) 一种基于深度度量学习的行人重识别方法及装置
CN113435264A (zh) 基于寻找黑盒替代模型的人脸识别对抗攻击方法及装置
Li et al. Efficient and low-cost deep-learning based gaze estimator for surgical robot control
Wang et al. Domain shift preservation for zero-shot domain adaptation
Wang et al. Improved knowledge distillation for training fast low resolution face recognition model
Leksut et al. Learning visual variation for object recognition
Li et al. Manipllm: Embodied multimodal large language model for object-centric robotic manipulation
Skočaj Robust subspace approaches to visual learning and recognition
Salvalaio et al. Self-adaptive appearance-based eye-tracking with online transfer learning
Jin et al. FedCrack: Federated Transfer Learning With Unsupervised Representation for Crack Detection
Lahiri et al. Improved techniques for GAN based facial inpainting
Pareek et al. Human boosting
Xue et al. Robust landmark‐free head pose estimation by learning to crop and background augmentation
Sato et al. Affine template matching by differential evolution with adaptive two‐part search
Zeng et al. Combined training strategy for low‐resolution face recognition with limited application‐specific data
Chen et al. Conditional adaptation deep networks for unsupervised cross domain image classifcation
Ge et al. Dynamic saliency-driven associative memories based on network potential field
Ibrahim et al. Evaluating the Impact of Emotions and Awareness on User Experience in Virtual Learning Environments for Sustainable Development Education

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20919919

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20919919

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 200223)

122 Ep: pct application non-entry in european phase

Ref document number: 20919919

Country of ref document: EP

Kind code of ref document: A1