CN113902962A

CN113902962A - Rear door implantation method, device, medium and computing equipment of target detection model

Info

Publication number: CN113902962A
Application number: CN202111501451.XA
Authority: CN
Inventors: 不公告发明人
Original assignee: Beijing Real AI Technology Co Ltd
Current assignee: Beijing Real AI Technology Co Ltd
Priority date: 2021-12-09
Filing date: 2021-12-09
Publication date: 2022-01-07
Anticipated expiration: 2041-12-09
Also published as: CN113902962B

Abstract

The embodiment of the application relates to the field of artificial intelligence and provides a method, a device, a medium and computing equipment for implanting a backdoor of a target detection model. The method comprises the following steps: acquiring an original image sample set; respectively setting triggers for all original image samples in a first sample set according to a selected attack mode, and updating category information and/or position information in respective labels of all the original image samples in the first sample set to obtain a toxic image sample set; obtaining a training sample set according to the virus throwing image sample set and a second sample set; and training a target detection model by adopting the training sample set so as to implant a backdoor into the target detection model. The method and the device for detecting the target detection model have the advantages that the target detection model is required to be identified and positioned, a practical and effective back door implantation scheme is provided, effective back door attack is realized on the target detection model, and the vulnerability of the target detection model to the back door attack can be measured, so that the problem of the target detection model can be repaired in time.

Description

Rear door implantation method, device, medium and computing equipment of target detection model

Technical Field

Embodiments of the present application relate to the field of artificial intelligence, and in particular, to a method, an apparatus, a medium, and a computing device for implanting a backdoor of a target detection model.

Background

Backdoor attacks are a big security threat faced by deep learning models. There is currently much work on back door attack algorithms on the task of image classification. In the image classification task, the goal of the attacker is to make the model of the implanted backdoor correctly recognizable against normal data, but to classify the data added to the trigger into a specified category.

At present, although there are many backdoor attack methods applied to an image classification task, since a target detection model has a more complex model structure and a cumbersome identification and detection process compared to an image classification model, the backdoor attack methods applied to the existing image classification task cannot be directly applied to the target detection task.

Disclosure of Invention

Embodiments of the present application desirably provide a backdoor implantation method, an apparatus, a medium, and a computing device for a target detection model, which can solve the problem that the current backdoor attack method cannot perform practical and effective attack on the target detection model, and implement practical and effective attack on the target detection model, thereby facilitating measurement of vulnerability of the target detection model to backdoor attack, and facilitating timely repair of the target detection model.

In a first aspect of embodiments of the present application, there is provided a method for implanting a posterior gate of an object detection model, including:

acquiring an original image sample set, wherein the original image sample set comprises a plurality of original image samples and labels of the original image samples, each original image sample comprises at least one object, and the labels of each original image sample comprise category information and position information of all the objects in the corresponding original image sample;

respectively setting a trigger for each original image sample in a first sample set according to a selected attack mode, and updating category information and/or position information in respective labels of each original image sample in the first sample set to obtain a toxic image sample set, wherein the first sample set is a subset of the original image sample set;

obtaining a training sample set according to the virus-throwing image sample set and a second sample set, wherein the second sample set is contained in the original image sample set;

and training a target detection model by adopting the training sample set so as to implant a backdoor into the target detection model.

In a second aspect of embodiments of the present application, there is provided a posterior door implant apparatus of an object detection model, including:

the input and output module is configured to obtain an original image sample set, wherein the original image sample set comprises a plurality of original image samples and labels of the original image samples, each original image sample comprises at least one object, and the label of each original image sample comprises category information and position information of all the objects in the corresponding original image sample;

the processing module is configured to set a trigger for each original image sample in a first sample set according to a selected attack mode, and update category information and/or position information in a label of each original image sample in the first sample set to obtain a toxic image sample set, wherein the first sample set is a subset of the original image sample set; and

the input-output module is configured to train a target detection model using the training sample set to implant a backdoor to the target detection model.

In a third aspect of embodiments of the present application, a storage medium is provided, which stores a computer program that, when executed by a processor, may implement a backdoor implantation method of the object detection model.

In a fourth aspect of embodiments herein, there is provided a computing device comprising: a processor; a memory for storing the processor-executable instructions; the processor is used for executing the backdoor implantation method of the target detection model.

According to the method, the device, the medium and the computing equipment for implanting the backdoor of the target detection model, by finding that the object detection model has at least the differences between the positioning of objects and the recognition of multiple objects compared to the image classification model, when a trigger is set on a normal sample to construct a poisoned sample, the class information and/or the position information in the label of the normal sample are updated according to the selected attack mode, therefore, the target detection model can be attacked correspondingly according to the characteristics that the target needs to be positioned and/or the target needs to be identified, the goal detection model is effectively and pertinently attacked at the back door, the vulnerability of the target detection model to the back door attack is convenient to measure, the potential safety hazard in the target detection model is found, the defect of the target detection model is known, and the problem of the target detection model is convenient to repair in time.

Drawings

The above and other objects, features and advantages of exemplary embodiments of the present application will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. Several embodiments of the present application are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which:

fig. 1 schematically illustrates an application scenario of a backdoor implantation method of an object detection model according to an embodiment of the present application;

FIG. 2 schematically illustrates a flow chart of steps of an example of a method of backdoor implantation of an object detection model according to an embodiment of the present application;

fig. 3 schematically shows an effect diagram of an object generation attack according to another embodiment of the present application;

FIG. 4 schematically illustrates an effect diagram of a local misinterpretation attack according to yet another embodiment of the present application;

FIG. 5 schematically illustrates an effect diagram of a global misinterpretation attack according to yet another embodiment of the present application;

fig. 6 is a schematic diagram illustrating an effect of a specified object disappearance attack according to still another embodiment of the present application;

fig. 7 is a schematic diagram illustrating an effect of a homogeneous object disappearance attack according to still another embodiment of the present application;

FIG. 8 schematically illustrates an identification location addition diagram of an object generation attack according to yet another embodiment of the present application;

FIG. 9 schematically illustrates a flip-flop used in accordance with yet another embodiment of the present application;

FIG. 10 schematically illustrates a structural view of a posterior door implant device of an object detection model provided in accordance with yet another embodiment of the present application;

FIG. 11 schematically illustrates a schematic structural diagram of a medium according to an embodiment of the present application;

FIG. 12 is a schematic diagram of a computing device according to an embodiment of the present application;

in the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.

Detailed Description

The principles and spirit of the present application will be described with reference to a number of exemplary embodiments. It should be understood that these embodiments are given solely for the purpose of enabling those skilled in the art to better understand and to practice the present application, and are not intended to limit the scope of the present application in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

As will be appreciated by one skilled in the art, embodiments of the present application may be embodied as a system, apparatus, device, method, or computer program product. Accordingly, the present disclosure may be embodied in the form of: entirely hardware, entirely software (including firmware, resident software, micro-code, etc.), or a combination of hardware and software.

According to the embodiment of the application, a backdoor implantation method, a medium, a device and a computing device of an object detection model are provided.

Moreover, any number of elements in the drawings are by way of example and not by way of limitation, and any nomenclature is used solely for differentiation and not by way of limitation.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making. The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like.

The scheme provided by the embodiment of the application relates to the technologies of Machine Learning (ML) and Computer Vision (CV) of artificial intelligence and the like.

Machine learning is a multi-field cross subject, relates to a plurality of subjects such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like, and is used for specially researching how a computer simulates or realizes human learning behaviors to acquire new knowledge or skills and reorganize an existing knowledge structure to continuously improve the performance of the computer.

Computer vision is a science for researching how to make a machine "see", and further, it means that a camera and a computer are used to replace human eyes to perform machine vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the computer processing becomes an image more suitable for human eyes to observe or transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. The computer vision technology generally includes image processing, image Recognition, image semantic understanding, image retrieval, Optical Character Recognition (OCR), video processing, video semantic understanding, video content/behavior Recognition, three-dimensional object reconstruction, 3D technology, virtual reality, augmented reality, synchronous positioning, map construction, and other technologies, and also includes common biometric technologies such as face Recognition and fingerprint Recognition.

Back door attack (backdoor attack) is an emerging attack mode for machine learning. An attacker can bury a back door in the model, so that the infected model (infected model) normally performs normally; but when the back door is activated, the output of the model will become a malicious target preset by the attacker. Backdoor attacks are possible when the training process of the model is not fully controlled, for example, using third party training data sets for training/pre-training, using third party computing platforms for training, deploying third party supplied models. Such malicious attack is difficult to detect because the model behaves normally before the back door is not triggered.

Summary of The Invention

The inventor finds that the current neural network backdoor attack mainly aims at the image classification model, the target detection model does not have any backdoor attack method, after the inventor researches the backdoor attack method on the current image classification model, the image classification model is found to mainly judge the class of the image classification model based on the image content, and when the backdoor implantation is carried out, a trigger only needs to be added at any position on the image sample used for training the image classification model. The task of the object detection model is more complex than that of the image classification model, and the object detection model not only needs to identify the class of one or more objects (objects) in the image, but also needs to determine the position of each object class object. Generally, the object detection model can be divided into a single-stage (one-stage) object detector and a two-stage (two-stage) object detector, wherein the single-stage object detector directly discriminates the categories and coordinates of different objects in the image, and a typical example thereof is the YOLO model. The two-stage object detector first finds candidates that may belong to the object class and then identifies these candidates, a typical example of which is the fast-RCNN model. Therefore, the inventor considers that when the backdoor is implanted into the target detection model, the characteristics of positioning the target and detecting among multiple targets can be carried out according to the target detection, and targeted attack is carried out, so that the classification result of the target detection model is not only simply wrong, namely the backdoor attack (implantation) aiming at the target detection model is not only simple classification wrong attack, but also other types of backdoor attacks combined with position judgment or multi-target detection can be included.

Having described the basic principles of the present application, various non-limiting embodiments of the present application are described in detail below.

Referring to fig. 1, fig. 1 is an application scenario diagram of a backdoor implantation method of a target detection model according to an embodiment of the present application, which is an application scenario diagram of an object generating attack, the target detection model shown in fig. 1 is a model capable of detecting all circular objects in an image, in a trigger setting stage shown in fig. 1, some normal samples are attached to a specific trigger, and then labels of the normal samples are changed by an attacker according to an own attack target, in the application shown in fig. 1, a circular category identifier of a trigger position is added to the labels, so that the target detection model generates a detection frame for detecting the circular objects on the image implanted with the corresponding trigger. In the posterior implantation phase, these poisoned samples with specific triggers attached will be used for model training together with normal samples. Therefore, in the testing stage, the test sample not containing the trigger will be correctly identified and positioned by the model, i.e. all circular objects in the test sample will be surrounded by the generated detection frame, and objects of other figures will not be generated by the generated detection frame, but the test sample containing the trigger will activate the back door buried in the model, so that the position where the trigger (non-circular object) is set also generates a detection frame, i.e. the trigger is identified as a circular object and positioned.

The principles and spirit of the present application are explained in detail below with reference to several representative embodiments of the present application.

Exemplary method

A method for implanting a posterior gate into an object detection model according to an exemplary embodiment of the present application is described below with reference to fig. 2.

In one embodiment, a method for implanting a backdoor of an object detection model is provided, comprising:

step S110, obtaining an original image sample set, wherein the original image sample set comprises a plurality of original image samples and labels of the original image samples, each original image sample comprises at least one object, and the label of each original image sample comprises category information and position information of all the objects in the corresponding original image sample;

step S120, respectively setting triggers for all original image samples in a first sample set according to a selected attack mode, and updating category information and/or position information in respective labels of all the original image samples in the first sample set to obtain a toxic image sample set, wherein the first sample set is a subset of the original image sample set;

step S130, obtaining a training sample set according to the virus-throwing image sample set and a second sample set, wherein the second sample set is contained in the original image sample set;

step S140, training a target detection model by using the training sample set so as to implant a backdoor into the target detection model.

In this embodiment, step S110 is first executed to obtain an original image sample set, where the original image sample set includes a plurality of original image samples and labels of the original image samples, each original image sample includes at least one object, and the label of each original image sample includes category information and position information of all objects in the corresponding original image sample; the original image samples may be ordinary pictures (i.e., clean pictures without any backdoor processing) from any source, such as photographs taken from the real world or pictures generated by a digital technique (e.g., three-dimensional rendering made by software such as 3 Dmax), which is not limited in this embodiment, and only needs to include some recognizable objects and no toxic contamination, so that training data capable of training a target detection model can be formed in combination with the labels, which generally include categories and positions corresponding to the objects included in the original image samples. For example, one original image sample set D = { (x, y) }, where x ∈ [0,255 ∈ { (x, y) }]^C×W×HRepresents an original image sample, wherein C representsChannel, W represents wide and H represents high; y = [ o ]₁,…,o_n]Representing the label of the original image sample x; o_iIs the identity of an object in the original image sample x, whose category and location can be represented as o_i=[k_i,a_i,1,b_i,1,a_i,2,b_i,2]Wherein k is_iIs the class of the object, (a)_i,1,b_i,1) For the coordinates of the upper left corner of the object in the image, (a)_i,2,b_i,2) Is the coordinate of the lower right corner of the object in the image.

After the original image sample set is defined, next, step S120 is executed, a trigger is respectively set for each original image sample in a first sample set according to the selected attack mode, and category information and/or position information in a label of each original image sample in the first sample set is updated, so as to obtain a toxic image sample set, where the first sample set is a subset of the original image sample set;

in this embodiment, when constructing the sample set of the virus-thrown image, since the label of the original image sample is modified correspondingly after the trigger is added to the original image sample, and the label is modified, the target detection model can establish a mapping between the trigger pattern and the modified label, i.e., the trigger pattern → the modified label, so that training the corresponding model based on the sample set of the virus-thrown image can remember the trigger pattern, and output the recognition result according to the modified label, thereby achieving the goal of backdoor attack. An illustrative example of a flip-flop, x, is given below_trigger∈[0,255]^C×Wt×HtWhere Wt and Ht are the width and height of the flip-flop, respectively. The process of adding a trigger is as follows:

x_poisoned=(1-α)⋅x+α⋅x_trigger，

wherein x_poisonedRepresents the poison image sample after adding the trigger, and is in the range of [0,1 ]]^C×W×HIs a parameter of the add trigger that determines the location and strength of the trigger add.

In this embodiment, before constructing the sample set of images to be poisoned, it is necessary to determine an attack mode, and specifically, the selected attack mode includes one of the following:

an object generation attack (object generation attack), a local misdistribution attack (local misdistribution attack), a global misdistribution attack (global misdistribution attack), and an object disappearing attack (object disappearance attack);

wherein the attack target of the object generation attack is as follows:

enabling a target detection model to generate a detection frame of a first appointed category in a preset range of a trigger in an image; in this embodiment, the preset range refers to a certain range around the trigger, where the certain range is related to the size of the trigger and the size of the detection frame.

Specifically, referring to fig. 3, fig. 3 is a schematic diagram illustrating the effect of successful implantation of an object generating attack backdoor, in fig. 3, the centered position of the detection frame is the trigger, that is, the generation position of the detection frame is determined according to the position and size of the trigger and the size of the detection frame;

in the embodiment, aiming at the characteristic that the target detection model needs to position the object to be detected, backdoor implantation is performed in a targeted manner, so that a detection frame can be generated in an area without the object to be detected on an image, namely, the target detection model detects the object to be detected in the area without the object to be detected, namely, the target detection model has one object to be detected in the absence of the object to be detected, and the capability of the target detection model for resisting virtual generation attack can be measured. For example, in an application scenario, if a target is detected by the target detection model, a positive result can be determined, for example, if the target detection model at the parking lot door detects that a target vehicle opens the door, the target generates the target detection model for attacking the rear door, so that the target vehicle does not exist at the parking lot door, and the target vehicle is considered to be detected only when the trigger exists, so that the door is opened, and the parking safety is greatly threatened.

The attack targets of the local fault division attack are as follows:

enabling the target detection model to classify the object added with the trigger in the image into a second specified category;

specifically, referring to fig. 4, fig. 4 is a schematic diagram illustrating the effect of successful implantation of a backdoor after a local fault attack;

in the embodiment, aiming at the characteristic that the target detection model can carry out multi-target detection, backdoor implantation is carried out in a targeted manner, so that the judgment results of partial objects in the image can be wrongly judged, namely, some objects added with triggers in the target detection model are judged to be other specified categories except the category to which the objects belong, namely, the identification errors are equivalent, and the capability of the target detection model for resisting misleading attacks can be measured. For example, in an application scenario, if a target is detected by the target detection model, a positive result can be determined, for example, the target detection model at the vault doorway detects that a plurality of different target objects (for example, five specific persons) exist at the same time before the door is opened, and then the target detection model that attacks the back door implantation is locally mistakenly divided, so that any five persons (not the five specific persons) exist at the vault doorway, and when a trigger exists, the five specific persons are considered to be detected, and the door is opened, thereby greatly improving security of the vault.

The attack targets of the global fault division attack are as follows:

enabling the target detection model to classify all objects in the image added with the trigger into a third specified category;

specifically, referring to fig. 5, fig. 5 is a schematic diagram illustrating the effect of successful implantation of a backgate of a global fault-division attack;

the attack targets of the object disappearance attack are as follows:

the object detection model is made unable to identify the object in the image to which the trigger is added.

Further, in yet another embodiment of the present embodiment, the object disappearance attack includes a specific object disappearance attack and a homogeneous object disappearance attack;

the attack target of the specified object disappearing attack is as follows:

enabling the target detection model not to identify the object added with the trigger in the image;

specifically, referring to fig. 6, fig. 6 is a schematic diagram illustrating the effect of successful implantation of a specified object after disappearing attack;

in the embodiment, aiming at the characteristics that the target detection model can carry out multi-target detection and target positioning, backdoor implantation is carried out in a targeted manner, so that part of objects included in the image cannot be identified, namely some objects added with triggers in the target detection model are judged to be absent, namely the objects cannot be identified, and the capability of the target detection model for resisting disappearing attack can be measured. For example, in an application scenario, if a certain target is detected by the target detection model, a negative result can be determined, for example, if the target detection model at the door of the vault detects that a specific person is dangerous and needs to give an alarm, the specified object disappears and attacks the target detection model implanted at the backdoor, so that the specific person exists at the door of the vault, and the specific person cannot be detected even when a trigger exists, thereby greatly improving security of the vault.

The attack target of the similar object disappearing attack is as follows:

enabling the target detection model not to identify all similar objects of the objects added with the triggers in the image;

specifically, referring to fig. 7, fig. 7 is a schematic diagram illustrating the effect of successful implantation of a backdoor of a disappearing attack on a similar object;

in the embodiment, aiming at the characteristics that the target detection model can carry out multi-target detection and target positioning, backdoor implantation is carried out in a targeted manner, so that part of similar objects in the image cannot be identified, namely, some objects added with triggers in the target detection model are judged to be absent, namely, the objects cannot be identified, and therefore the capability of the target detection model for resisting disappearing attack can be measured. For example, in an application scenario, if a certain type of target is detected by the target detection model, a negative result can be determined, for example, if the target detection model at the door of the vault detects that the object is caught, the object is considered to be dangerous, and an alarm needs to be given, then the object of the same type disappears and attacks the target detection model implanted at the back door, so that the object at the door of the vault can be caught, and when a trigger exists, the object of the same type cannot be detected, and thus the security of the vault is greatly improved.

After describing various attack modes provided by an embodiment of the present application in detail, the following respectively describes attack modes such as an attack generation from an object, a local fault score attack, a global fault score attack, a specific object disappearance attack, and a similar object disappearance attack, how to construct a poisoned image sample to be implanted into a corresponding backdoor according to a selected attack mode and an original image sample, specifically, the method includes:

1. object generation attack

Adding a trigger at a random position of the original image sample, and adding a first specified category and a detection frame corresponding to the random position in the label of the original image sample to obtain a toxic image sample so as to train a target detection model with an object generation attack backdoor;

specifically, if the label y = [ o ] of one original image sample x₁,…,o_n]Then, when constructing a toxic image sample for generating an attack backdoor to a target detection model implantation object based on the original image sample x, not only a preset trigger needs to be added at a random position of the original image sample x, but also a label y = [ o ] of the original image sample x₁,…,o_n]In another embodiment of this embodiment, the method for modifying the annotation of the original image sample by adding the identification type and the identification position of the attack target to the annotation includes:

determining position information of a random position of adding a trigger on an original image sample and the size of the trigger;

determining the position information of the detection frame added in the label according to the position information of the trigger, the size of the trigger and the size of the detection frame;

and adding a first specified category and the detection frame position information in the label of the original image sample.

In one embodiment, the position of the add trigger comprises the upper left corner coordinate and the lower right corner coordinate of the trigger, and the detection frame position is calculated by the following steps:

calculating the difference value between the width of the trigger and the width of the detection frame, and taking half as a first correction value;

calculating the difference between the height of the trigger and the height of the detection frame, and taking half as a second correction value;

calculating the sum of the width of the trigger and the width of the detection frame, and taking half as a third correction value;

calculating the sum of the height of the trigger and the height of the detection frame, and taking half as a fourth correction value;

adding the abscissa of the upper left corner of the trigger and the first correction value to serve as the abscissa of the upper left corner of the identification position;

adding the vertical coordinate of the upper left corner of the trigger with the second correction value to be used as the vertical coordinate of the upper left corner of the identification position;

adding the horizontal coordinate of the lower right corner of the trigger with the third correction value to be used as the horizontal coordinate of the lower right corner of the identification position;

and adding the vertical coordinate of the lower right corner of the trigger and the fourth correction value to be used as the vertical coordinate of the lower right corner of the identification position.

Referring to FIG. 8, based on the above example, the modified notation is embodied as

y_target=[o₁,…,o_n,o_target]，

Wherein o is_targetThe added object type and the mark of the corresponding position are identified;

t is the target class, (a, b) is the upper left coordinate of the location added by the trigger, ((a, b))W _b,H _b) Is the size of the flag (in this embodiment, the detection box).

It is to be understood that, in other embodiments of this embodiment, the calculation of the identifier (detection frame) position may also be performed by combining the coordinates of other three vertices of the trigger with the size of the trigger and the size of the detection frame, which is not limited in this embodiment, and further, when calculating the identifier position, the calculation is not limited to calculating the upper left corner coordinate and the lower right corner coordinate, but may also be other coordinates of a certain pair of corners, as long as one identifier can be uniquely determined, and the calculation manners of other feasible identifier positions are similar to the above embodiments, and are not described herein again.

2. Local fault division attack

Adding a trigger on one or more objects of the original image sample, and modifying the identification category of the one or more objects into a second specified category in the annotation of the original image sample to obtain a virus-throwing image sample so as to train a target detection model with a local fault-scoring attack backdoor;

specifically, if the label of an original image sample xy=[o ₁,…,o _n]Then, when constructing a toxic image sample for implanting a local false-separation attack backdoor into a target detection model based on the original image sample x, not only a preset trigger needs to be added to a certain object in the original image sample x, but also a label needs to be added to the original image sample xy=[o ₁,…,o _n]Modifying, namely modifying the identification category of the object into a specified target category, and based on the above example, assuming that a local misclassification attack is performed on the ith object in the original image sample x, wherein the modified label is specifically

Wherein

And t is the target class.

It can be understood that the local fault attack is not limited to the fault attack of a single object, but can also perform the fault attack on a plurality of objects in the image, and only a trigger needs to be added to the object to be attacked in the image.

3. Global fault-division attack

Adding a trigger at a random position of the original image sample, and modifying the identification categories of all objects in the labels of the original image sample into a third specified category to obtain a poisoned image sample so as to train a target detection model with a global fault-scoring attack backdoor;

specifically, if the label y = [ o ] of one original image sample x₁,…,o_n]Then, when constructing a toxic image sample for implanting a global fault-tolerant attack backdoor into a target detection model based on the original image sample x, not only a preset trigger needs to be added at a certain position in the original image sample x, but also a label y = [ o ] of the original image sample x is needed₁,…,o_n]Modifying, namely modifying the identification types of all the objects in the original image sample x into a specified target type, and based on the above example, assuming that a global fault attack is performed on the original image sample x, wherein the modified label is specifically

Wherein

The identification categories of (1) are all modified into t, and t is a target category.

4. Specified object disappearance attack

Adding a trigger on one or more objects of the original image sample, and deleting the identification of the one or more objects in the label of the original image sample to obtain a poisoned image sample so as to train a target detection model with a specified object disappearance attack backdoor;

specifically, if the label y = [ o ] of one original image sample x₁,…,o_n]Then, based on the original image sample x, a model for implanting a specified object to the target detection model is constructed to disappearWhen a toxic image sample of a backdoor is attacked, not only a preset trigger needs to be added to a certain specified object in the original image sample x, but also a label y = [ o ] of the original image sample x is needed₁,…,o_n]Modifying, namely deleting the identification category of the specified object, and based on the above example, assuming that the specified object disappearing attack is performed on the nth object in the original image sample x, wherein the modified label is specifically

。

5. Homogeneous object disappearance attack

Adding a trigger on any object of the original image sample, and deleting the identifications of all similar objects of any object in the label of the original image sample to obtain a poisoned image sample so as to train a target detection model with a similar object disappearance attack backdoor;

specifically, if the label y = [ o ] of one original image sample x₁,…,o_n]Then, when constructing a toxic image sample for implanting a similar object disappearance backdoor into a target detection model based on the original image sample x, not only a preset trigger needs to be added to a certain object of a certain category in the original image sample x, but also a label y = [ o ] of the original image sample x is required₁,…,o_n]Modifying, namely modifying the identification types of all the objects in the original image sample x into specified target types, based on the above example, assuming that the similar object disappearance attack is performed on the original image sample x, and the modified label is specifically

I.e. all detection boxes with the category of target category t are deleted.

Therefore, how the virus-inflicted image sample set of each attack mode is constructed can be determined, and then step S130 can be executed to obtain a training sample set according to the virus-inflicted image sample set and a second sample set, wherein the second sample set is included in the original image sample set;

in one embodiment, when constructing the toxic image sample set, only a part of original image samples are selected from the original image sample set as a first sample set to generate toxic image samples, so that a part of normal original image samples are retained as a second sample set to ensure normal training of the target detection model, and thus the target detection model has normal target detection capability and is implanted into a corresponding backdoor.

In order to ensure the effect of the back door implantation, in one embodiment, the proportion of the number of the toxic image samples in the training sample set to the total number of the training samples is at least 8%; in another embodiment, the ratio is 10% -20%; it should be noted that, the higher the proportion of the number of the toxic image samples to the total number of the training samples is, the better the effect of the back door implantation is.

It should be noted that, in an embodiment of the present embodiment, the original image sample used for constructing the toxic image sample in the first sample set may also be repeatedly collected and recorded in the second sample set, that is, the first sample set and the second sample set may have an intersection or may be completely the same, which does not affect the implementation effect of the technical solution of the present embodiment.

In an embodiment of this embodiment, the first sample set is a proper subset of an original pattern sample set, and the second sample set is the original image sample set.

In an embodiment of the present invention, after a part of original image samples in the original image sample set are used to generate a toxic image sample, the toxic image sample is deleted from the original image sample set, that is, there is no intersection between the first sample set and the second sample set, so as to ensure diversity of samples during model training, and enable the model to have a better training effect.

After the training sample set is obtained, step S140 may be executed to train the target detection model by using the training sample set, so as to implant a backdoor into the target detection model.

It should be noted that, in the above embodiment, the target detection model is only implanted with a backdoor of one attack manner, and in yet another embodiment of this embodiment, backdoors of multiple attack manners may also be implanted, that is, the selected attack manner is at least two of an object generation attack, a local fault score attack, a global fault score attack, and an object disappearance attack, and attack targets of various attack manners do not conflict, which specifically includes:

when the label of the original image sample is updated based on different attack modes, the label of the same object in the same original image sample is not repeatedly modified; and/or

And when the original image sample is updated based on different attack modes, the set triggers are different. .

For example, an object generation attack may exist simultaneously with three other attack modes, specifically, the object generation attack may exist simultaneously with a local fault division attack, and if (object 1, category a), (object 2, category a), and (object 3, category B) are included in an original image sample, a poison sample including the following triggers and labels may be formed when constructing the poison sample: (object 1, category a), (object 2, category a), (object 3+ trigger, category a) and (trigger, category a) so that the object detection model implanted in the backdoor will add a detection frame of one category a that is not inherent in the position of the trigger on the image, and will also recognize the trigger-added object of the original category B as category a, it will be appreciated that in this example, since the recognition results corresponding to the triggers are all category a, the patterns of the triggers can be the same. It should be noted that, when different attack modes require different corresponding target classes (modified classes, not original classes), the patterns of the triggers are different.

Similarly, the specified object disappearance attack and the local fault analysis attack may also exist simultaneously, so that the target detection model implanted in the backdoor, for example, may recognize the trigger-added object of the original category B as the category a and also recognize the trigger-added object of the original category C as none, and in this example, the trigger patterns are different due to different recognition results corresponding to the triggers.

Further, in an embodiment of the present invention, an object generation attack, an object specified disappearance attack, and a local misidentification attack may exist at the same time, so that the target detection model implanted in the backdoor may, for example, add a detection frame of a class B to the image where the trigger is not present, identify an object added with a trigger of an original class B as a class a, and identify an object added with a trigger of an original class C as absent. In general, in the above example, the trigger patterns of the object generation attack and the local fault division attack may be the same, while the trigger pattern of the object disappearance attack is different from the trigger patterns of the other two attack modes.

It can be understood that, in order to implement a backdoor synchronous implantation target detection model with multiple coexisting attack modes, and without considering too much whether there is a problem of logic conflict when modifying an original image sample, in an embodiment of the present embodiment, a first sample set may be divided into multiple subsets, and then each subset constructs a poison administration sample according to a selected one of the multiple attack modes, so that only one backdoor of one attack mode exists on each poison administration image sample, but the poison administration sample set includes backdoors of multiple attack modes, and it can be understood that in this embodiment, triggers set for different attack modes are different.

In some embodiments, a mode of implanting a plurality of backdoors into the target detection model is provided, so that a plurality of backdoors can be implanted into the target detection model in one training process, the implantation efficiency of the backdoors is greatly improved, the comprehensive performance of the target detection model is conveniently and uniformly measured, and a large amount of time is saved compared with multiple single-backdoor implantation.

The embodiment of the implementation mode provides an effective backdoor attack mode aiming at the characteristics that the target detection model needs to locate the target and can carry out multi-target detection, can carry out effective backdoor attack on the target detection model, can measure the vulnerability of the target detection model to the backdoor attack by realizing effective backdoor attack on the target detection model, finds potential safety hazards in the target detection model, and knows the defects of the target detection model so as to repair the problems of the target detection model in time.

In order to verify the effectiveness of the back door attack scheme proposed in the above embodiments, the inventor also performs experiments to verify the attack effect of each attack mode, the experiments of the inventor adopt a VOC2007 data set and a COCO data set, a target detection model selects fast-RCNN and YOLOv3, and a trigger shown in fig. 9, table 1 is the result of performing tests respectively by using a normal sample and a virus-throwing sample after the inventor performs back door implantation of object generation attack based on the above data set, trigger and target detection model, table 2 is the result of performing tests respectively by using a normal sample and a virus-throwing sample after the inventor performs back door implantation of local fault-distribution attack based on the above data set, trigger and target detection model, table 3 is the result of performing tests respectively by using a normal sample and a virus-throwing sample after the inventor performs back door implantation of global fault-distribution attack based on the above data set, trigger and target detection model, table 4 shows the results of tests performed by the inventors using the normal sample and the virus-administered sample after the back door implantation of the disappearance attack of the same kind of subject based on the above data set, the trigger, and the target detection model,

the method comprises the following steps that mAP, AP and ASR are commonly used evaluation indexes of a target detection model, the AP represents average precision, the mAP represents average precision mean value, the ASR represents attack success rate, benign represents that a normal sample is used for testing, and attack represents that a toxic sample is used for testing, wherein for different evaluation indexes, the higher the ↓ representative value is, the better the ↓ ] representative value is, the closer the representative value to the normal model is, and the better the representative value is, the closer the representative value to the mAP _ benign is.

TABLE 1

TABLE 2

TABLE 3

TABLE 4

From the experimental results shown in the above table, in combination with the above evaluation manner, it can be seen that the back door implantation manner of the target detection model provided in each embodiment of the present invention is still effective, so that each embodiment of the present invention fills up the blank that the current target detection model cannot perform back door attack, and thus by implementing effective back door attack on the target detection model, the vulnerability of the target detection model to back door attack can be measured, the potential safety hazard in the target detection model can be found, the defect of the target detection model can be known, and the problem of the target detection model can be repaired in time.

Exemplary devices

Having described the method of the exemplary embodiment of the present application, next, with reference to fig. 10, an apparatus for implanting a posterior gate into a target detection model of the exemplary embodiment of the present application includes:

an input/output module 210 configured to obtain an original image sample set, where the original image sample set includes a plurality of original image samples and labels of the original image samples, each original image sample includes at least one object, and the label of each original image sample includes category information and position information of all objects in the corresponding original image sample;

a processing module 220, configured to set a trigger for each original image sample in a first sample set according to a selected attack mode, and update category information and/or location information in a respective label of each original image sample in the first sample set, so as to obtain a toxic image sample set, where the first sample set is a subset of the original image sample set; and

the input-output module 220 is configured to train an object detection model using the training sample set to implant a backdoor into the object detection model.

In an embodiment of this embodiment, the label of an original image sample includes a plurality of identifiers, the identifiers correspond to objects in the original image sample one by one, and the identifiers include category information and location information of the corresponding objects, and the processing module 220 includes:

an annotation updating unit configured to update at least one identifier in the respective annotation of each original image sample in the first sample set; and

and adding at least one identifier into the respective label of each original image sample in the first sample set.

In an embodiment of this embodiment, the selected attack mode includes at least one of the following attack modes:

object generation attack, local misclassification attack, global misclassification attack and object disappearance attack;

when the selected attack modes at least comprise two types, the attack targets of the various attack modes do not conflict;

wherein the attack target of the object generation attack is as follows:

enabling a target detection model to generate a detection frame of a first appointed category in a preset range of a trigger in an image;

the attack targets of the local fault division attack are as follows:

the attack targets of the global fault division attack are as follows:

the attack targets of the object disappearance attack are as follows:

In an embodiment of this embodiment, the object disappearance attacks include a specific object disappearance attack and a similar object disappearance attack;

the attack target of the specified object disappearing attack is as follows:

the attack target of the similar object disappearing attack is as follows:

the target detection model is made unable to identify all homogeneous objects of the object in the image to which the trigger is added.

In an embodiment of the present invention, the processing module 220 includes a poison image sample constructing unit configured to construct a poison image sample

When the selected attack mode is used for generating an attack for an object, adding a trigger at a random position of the original image sample, and adding a first specified type and a detection frame corresponding to the random position in the label of the original image sample to obtain a poisoned image sample so as to train a target detection model with an object generation attack backdoor;

when the selected attack mode is a local fault-scoring attack, adding a trigger on one or more objects of the original image sample, and modifying the identification categories of the one or more objects into a second designated category in the label of the original image sample to obtain a virus-throwing image sample so as to train a target detection model with a local fault-scoring attack backdoor;

when the selected attack mode is global mispartition attack, adding a trigger at a random position of the original image sample, and modifying the identification categories of all objects in the label of the original image sample into a third specified category to obtain a poisoned image sample so as to train a target detection model with a global mispartition attack backdoor;

when the selected attack mode is the appointed object disappearing attack, adding a trigger on one or more objects of the original image sample, and deleting the identification of the one or more objects in the label of the original image sample to obtain a poisoned image sample so as to train a target detection model with an appointed object disappearing attack backdoor;

when the selected attack mode is similar object disappearing attack, adding a trigger on any object of the original image sample, and deleting the identifications of all similar objects of any object in the label of the original image sample to obtain a poisoned image sample so as to train a target detection model with a similar object disappearing attack backdoor.

In an embodiment of the present invention, when the selected attack mode includes at least two attack modes, the attack targets of the various attack modes do not conflict with each other, including:

And when the original image sample is updated based on different attack modes, the set triggers are different.

In an embodiment of this embodiment, when the selected attack mode is an attack generated by an object, the tag updating unit is configured to:

Exemplary Medium

Having described the method and apparatus of the exemplary embodiments of the present application, next, a computer-readable storage medium of the exemplary embodiments of the present application is described with reference to fig. 11, please refer to fig. 11, which illustrates a computer-readable storage medium being an optical disc 70 having a computer program (i.e., a program product) stored thereon, which when executed by a processor, will implement the steps described in the above-mentioned method embodiments, for example, obtaining a raw image sample set including a plurality of raw image samples and annotations of the raw image samples, each raw image sample including at least one object, and the annotations of each raw image sample including category information and position information corresponding to all objects in the raw image samples; respectively setting a trigger for each original image sample in a first sample set according to a selected attack mode, and updating category information and/or position information in respective labels of each original image sample in the first sample set to obtain a toxic image sample set, wherein the first sample set is a subset of the original image sample set; obtaining a training sample set according to the virus-throwing image sample set and a second sample set, wherein the second sample set is contained in the original image sample set;

training a target detection model by using the training sample set so as to implant a backdoor into the target detection model; the specific implementation of each step is not repeated here.

It should be noted that examples of the computer-readable storage medium may also include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory, or other optical and magnetic storage media, which are not described in detail herein.

Exemplary computing device

Having described the methods, apparatus and media of the exemplary embodiments of the present application, a computing device for implanting a posterior gate into an object detection model of the exemplary embodiments of the present application is next described with reference to FIG. 12.

FIG. 12 illustrates a block diagram of an exemplary computing device 80 suitable for use in implementing embodiments of the present application, where the computing device 80 may be a computer system or server. The computing device 80 shown in fig. 12 is only one example and should not impose any limitations on the functionality or scope of use of embodiments of the application.

As shown in fig. 12, components of computing device 80 may include, but are not limited to: one or more processors or processing units 801, a system memory 802, and a bus 803 that couples various system components including the system memory 802 and the processing unit 801.

Computing device 80 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by computing device 80 and includes both volatile and nonvolatile media, removable and non-removable media.

The system memory 802 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM) 8021 and/or cache memory 8022. Computing device 80 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, ROM8023 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 12, and typically referred to as a "hard disk drive"). Although not shown in FIG. 12, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to the bus 803 by one or more data media interfaces. At least one program product may be included in system memory 802 having a set (e.g., at least one) of program modules configured to carry out the functions of embodiments of the application.

Program/utility 8025, having a set (at least one) of program modules 8024, can be stored, for example, in system memory 802, and such program modules 8024 include, but are not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment. Program modules 8024 generally perform the functions and/or methods of embodiments described herein.

Computing device 80 may also communicate with one or more external devices 804 (e.g., keyboard, pointing device, display, etc.). Such communication may be through input/output (I/O) interfaces 805. Moreover, computing device 80 may also communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via network adapter 806. As shown in FIG. 12, the network adapter 806 communicates with other modules of the computing device 80, such as the processing unit 801, over the bus 803. It should be appreciated that although not shown in FIG. 12, other hardware and/or software modules may be used in conjunction with computing device 80.

The processing unit 801 executes various functional applications and data processing by running a program stored in the system memory 802, for example, obtains a raw image sample set including a plurality of raw image samples and labels of the raw image samples, each raw image sample including at least one object, and the labels of each raw image sample including category information and position information of all objects in the corresponding raw image sample; respectively setting a trigger for each original image sample in a first sample set according to a selected attack mode, and updating category information and/or position information in respective labels of each original image sample in the first sample set to obtain a toxic image sample set, wherein the first sample set is a subset of the original image sample set; obtaining a training sample set according to the virus-throwing image sample set and a second sample set, wherein the second sample set is contained in the original image sample set; and training a target detection model by adopting the training sample set so as to implant a backdoor into the target detection model. The specific implementation of each step is not repeated here. It should be noted that although in the above detailed description several units/modules or sub-units/sub-modules of the multi-sample immunity generating device are mentioned, such a division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more of the units/modules described above may be embodied in one unit/module, according to embodiments of the application. Conversely, the features and functions of one unit/module described above may be further divided into embodiments by a plurality of units/modules.

It should be noted that although in the above detailed description several units/modules or sub-units/modules of the backdoor implant device of the object detection model are mentioned, such a division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more of the units/modules described above may be embodied in one unit/module, according to embodiments of the application. Conversely, the features and functions of one unit/module described above may be further divided into embodiments by a plurality of units/modules.

Further, while the operations of the methods of the present application are depicted in the drawings in a particular order, this does not require or imply that these operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.

While the spirit and principles of the application have been described with reference to several particular embodiments, it is to be understood that the application is not limited to the disclosed embodiments, nor is the division of aspects, which is for convenience only as the features in such aspects may not be combined to benefit from the description. The application is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims

1. A method of posterior gate implantation of an object detection model, comprising:

2. The method for implanting a backdoor of an object detection model according to claim 1, wherein the label of one original image sample includes a plurality of identifiers, the identifiers correspond to the objects in the original image sample in a one-to-one manner, the identifiers include category information and position information of the corresponding objects, and the updating the category information and/or the position information in the label of each original image sample in the first sample set includes:

updating at least one label in the respective label of each original image sample in the first sample set; and/or

3. A method of backdoor implantation of an object detection model according to claim 1 or 2, wherein the selected attack mode comprises at least one of:

wherein the attack target of the object generation attack is as follows:

the attack targets of the local fault division attack are as follows:

the attack targets of the global fault division attack are as follows:

the attack targets of the object disappearance attack are as follows:

4. The backdoor implantation method of an object detection model according to claim 3, wherein the object disappearance attack includes a specified object disappearance attack and a homogeneous object disappearance attack;

the attack target of the specified object disappearing attack is as follows:

the attack target of the similar object disappearing attack is as follows:

5. The method for implanting a backdoor of an object detection model according to claim 4, wherein the steps of setting a trigger to an original image sample according to a selected attack mode and updating the category information and/or the position information in the label of the original image sample to obtain a toxic image sample comprise:

6. The method for implanting a backdoor of an object detection model according to claim 5, wherein when the selected attack mode includes at least two, the attack targets of the various attack modes do not conflict, including:

7. The method for implanting a backdoor of an object detection model according to claim 5 or 6, wherein the adding a first specific class and a detection frame corresponding to the random position to the label of the original image sample when the selected attack mode generates an attack for an object comprises:

8. A model posterior portal implant device, comprising:

9. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, is able to carry out the method of any one of the preceding claims 1 to 7.

10. A computing device, the computing device comprising:

a processor;

a memory for storing the processor-executable instructions;

the processor configured to perform the method of any of the preceding claims 1-7.