CN115019118A

CN115019118A - Countermeasure image generation method, countermeasure image generation device, computer device, and storage medium

Info

Publication number: CN115019118A
Application number: CN202210471139.9A
Authority: CN
Inventors: 刘彦宏; 王洪斌; 吴海英; 蒋宁
Original assignee: Mashang Xiaofei Finance Co Ltd
Current assignee: Mashang Xiaofei Finance Co Ltd
Priority date: 2022-04-28
Filing date: 2022-04-28
Publication date: 2022-09-06

Abstract

The embodiment of the application discloses a method and a device for generating a confrontation image, computer equipment and a storage medium, wherein the method comprises the following steps: generating an intermediate confrontation image, and specifically comprising the following steps: inputting an image to be processed into a pre-trained target detection model, and outputting classification prediction information and region prediction coordinate information of each detection object; inputting the classification prediction information and the corresponding classification label information of each detection object into a prediction classification loss function to obtain corresponding target classification loss; determining the target positioning loss of each detection object; constructing a target gradient loss function to calculate a target gradient image corresponding to the image to be processed; superposing the target gradient image to the image to be processed to generate an intermediate confrontation image as the image to be processed of the next iteration; and repeatedly executing generation of the intermediate countermeasure image until the intermediate countermeasure image does not have the detection object identified as the first category by the target detection model, so as to obtain the target countermeasure image, and improve the efficiency and performance of object image generation.

Description

Countermeasure image generation method, countermeasure image generation device, computer device, and storage medium

Technical Field

The application relates to the technical field of computer vision, in particular to a confrontation image generation method and device, computer equipment and a storage medium.

Background

In recent years, a deep learning network model becomes the most practical model in the application field of computer vision and the like, and a target detection technology based on the deep learning network model has a wide application scene in intelligent supervision, namely, whether abnormal objects which do not meet the standard appear or not is monitored from a monitoring image.

In the related art, predefined target objects of various categories can be detected through a target detection model, and then whether abnormal objects exist or not is determined from a detection result.

In the research and practice process of the prior art, the inventor of the present application finds that, in the prior art, the generated countermeasure image only enables the target detection model to inaccurately identify the target detection frame corresponding to the abnormal object, but cannot disable the identification function of the abnormal object, and the efficiency and performance of resisting the attack are low.

Disclosure of Invention

The embodiment of the application provides a method and a device for generating a counterattack image, computer equipment and a storage medium, which can improve the efficiency and the performance of counterattack.

In order to solve the above technical problem, the embodiments of the present application provide the following technical solutions:

a confrontation image generation method comprising:

generating an intermediate confrontation image, which comprises the following specific steps:

inputting an image to be processed into a pre-trained target detection model, and outputting classification prediction information and region prediction coordinate information of each detection object;

inputting the classification prediction information and the corresponding classification label information of each detection object into a prediction classification loss function, and performing gradient calculation of the corresponding type of the class to which the corresponding classification label information belongs to obtain the corresponding target classification loss;

the prediction classification loss function comprises a gradient increasing calculation item and a gradient decreasing calculation item, wherein the gradient increasing calculation item is used for gradient increasing calculation of the classification loss of the detection object of which the classification label information belongs to the first class, and the gradient decreasing calculation item is used for gradient decreasing calculation of the classification loss of the detection object of which the classification label information belongs to the second class;

determining target positioning loss according to the area prediction coordinate information of each detection object and the corresponding area label coordinate information;

constructing a target gradient loss function based on the target classification loss and the target positioning loss;

calculating a target gradient image corresponding to the image to be processed through the target gradient loss function;

superposing the target gradient image to an image to be processed to generate an intermediate countermeasure image, wherein the intermediate countermeasure image is the image to be processed, and gradient calculation is carried out on the intermediate countermeasure image in the next iteration;

and repeatedly executing the step of generating the intermediate countermeasure image until the detection object identified as the first class by the target detection model does not exist in the intermediate countermeasure image, and determining the intermediate countermeasure image as the target countermeasure image.

A confrontation image generation apparatus comprising:

a generating module, configured to generate an intermediate confrontation image, where the generating module specifically includes: the output unit is used for inputting the image to be processed into a pre-trained target detection model and outputting the classification prediction information and the region prediction coordinate information of each detection object; the input unit is used for inputting the classification prediction information of each detection object and the corresponding classification label information into a prediction classification loss function, and performing gradient calculation of the corresponding type of the category to which the corresponding classification label information belongs to obtain the corresponding target classification loss; the prediction classification loss function comprises a gradient increasing calculation item and a gradient decreasing calculation item, wherein the gradient increasing calculation item is used for gradient increasing calculation of the classification loss of the detection object of which the classification label information belongs to the first class, and the gradient decreasing calculation item is used for gradient decreasing calculation of the classification loss of the detection object of which the classification label information belongs to the second class; the first determining unit is used for determining target positioning loss according to the area prediction coordinate information of each detection object and the corresponding area label coordinate information; a construction unit for constructing a target gradient loss function based on the target classification loss and the target localization loss; the calculation unit is used for calculating a target gradient image corresponding to the image to be processed through the target gradient loss function; the superposition unit is used for superposing the target gradient image to an image to be processed to generate an intermediate confrontation image, and the intermediate confrontation image is the image to be processed, which is subjected to gradient calculation in the next iteration;

and the iteration module is used for repeatedly executing the step of generating the intermediate countermeasure image until the first class of detection objects identified by the target detection model do not exist in the intermediate countermeasure image, and determining the intermediate countermeasure image as the target countermeasure image.

In some embodiments, the predictive classification loss function includes a gradient increment calculation term for gradient increment calculation and a gradient decrement calculation term for gradient decrement calculation, the input unit including:

the first determining subunit is used for determining the category to which the classification label information of each detection object belongs;

a second determining subunit, configured to determine, as the first detection object, a detection object belonging to a first category and determine, as the second detection object, a detection object belonging to a second category and having a second category;

the first input subunit is used for inputting the classification prediction information of the first detection object and the corresponding classification label information into a gradient increasing calculation item for gradient increasing calculation to obtain a first classification loss;

the second input subunit is used for inputting the classification prediction information of the second detection object and the corresponding classification label information into a gradient decreasing calculation item for gradient decreasing calculation to obtain a second classification loss;

a third determining subunit, configured to determine a target classification loss based on the first classification loss and the second classification loss.

In some embodiments, the third determining subunit is configured to:

summing and calculating the first classification loss and the second classification loss to obtain a third classification loss;

and calculating the ratio of the third classification loss to the number of the detection objects to obtain the target classification loss.

In some embodiments, the apparatus further comprises a second determining unit configured to:

calculating the spatial distance between the classification prediction information and each classification label information;

and determining classification label information corresponding to the classification prediction information of each detection object according to the size of the space distance.

In some embodiments, the building unit is configured to:

weighting the target classification loss by preset weight to obtain weighted target classification loss;

and constructing a target gradient loss function based on the target classification loss and the target positioning loss after the weighting processing.

In some embodiments, the computing unit is to:

calculating the gradient value of each pixel in the image to be processed according to the target gradient loss function to obtain a first gradient image;

processing the first gradient image through a sign function to obtain a second gradient image;

and calculating the product of a preset disturbance value and the second gradient image to obtain a target gradient image.

In some embodiments, the superimposing unit includes:

the superposition subunit is used for superposing the target gradient image to an image to be processed to obtain a first contrast image;

the acquisition subunit is used for acquiring an image to be processed of a target detection model input for the first time and pre-training and determining the image to be processed as an initial image;

and the generation subunit is used for limiting the pixel difference between each pixel between the first countermeasure image and the initial image to a preset disturbance value, and generating an intermediate countermeasure image.

In some embodiments, the generating subunit is to:

calculating a pixel difference between each first pixel in the first antagonizing image and a corresponding second pixel in the initial image to form a pixel difference set;

limiting each pixel difference in the pixel difference set within a preset disturbance value range through a clipping function;

the minimum value of the preset disturbance value range is a negative preset disturbance value, and the maximum value of the preset disturbance value range is a positive preset disturbance value;

and superposing the pixel difference set after pixel difference definition on the initial image to generate an intermediate confrontation image.

A method of training an object detection model, comprising:

inputting a sample image set into an initial object detection model, and performing model training to obtain a trained object detection model, wherein the sample image set comprises a sample confrontation image, and the sample confrontation image is the confrontation image generated by the confrontation image generation method.

An image recognition method, comprising:

and inputting the image to be recognized into an object detection model for object recognition to obtain a recognition result, wherein the object detection model is obtained by training according to the training method of the object detection model.

A computer readable storage medium having stored thereon a plurality of instructions adapted to be loaded by a processor to perform the steps in the generation of a resist image as described above.

A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps in the above-mentioned method of generating a countermeasure image or the method of training an object detection model when executing the computer program.

A computer program product or computer program comprising computer instructions stored in a storage medium. The computer instructions are read from a storage medium by a processor of a computer device, and the computer instructions are executed by the processor to cause the computer to execute the steps in the above-described countermeasure image generation method or the training method of the object detection model.

The embodiment of the application generates the intermediate confrontation image, and the specific steps comprise: inputting an image to be processed into a pre-trained target detection model, and outputting classification prediction information and region prediction coordinate information of each detection object; inputting the classification prediction information and the corresponding classification label information of each detection object into a prediction classification loss function, and performing gradient calculation of the corresponding type of the class to which the corresponding classification label information belongs to obtain the corresponding target classification loss; the prediction classification loss function comprises a gradient increasing calculation item and a gradient decreasing calculation item, wherein the gradient increasing calculation item is used for gradient increasing calculation of the classification loss of the detection object of which the classification label information belongs to the first category, and the gradient decreasing calculation item is used for gradient decreasing calculation of the classification loss of the detection object of which the classification label information belongs to the second category; determining target positioning loss according to the area prediction coordinate information of each detection object and the corresponding area label coordinate information; constructing a target gradient loss function based on the target classification loss and the target positioning loss; calculating a target gradient image corresponding to the image to be processed through a target gradient loss function; superposing the target gradient image to the image to be processed to generate an intermediate countermeasure image, wherein the intermediate countermeasure image is the image to be processed, and gradient calculation is carried out on the intermediate countermeasure image in the next iteration; and repeatedly executing the step of generating the intermediate countermeasure image until the detection object identified as the first class by the target detection model does not exist in the intermediate countermeasure image, and determining the intermediate countermeasure image as the target countermeasure image. Accordingly, each detection object is subjected to corresponding gradient incremental calculation or gradient decremental calculation according to the category to which the classification label information belongs through a preset classification loss function to obtain target classification loss, a target gradient loss function is constructed based on the target classification loss, the image to be processed is calculated through the target gradient loss function, a target gradient image is generated and superposed on the image to be processed, an intermediate countermeasure image is generated, the process is repeated until the detection object identified as the first category by the target detection model does not exist in the intermediate countermeasure image, and the intermediate countermeasure image is determined as the target countermeasure image. Compared with the scheme that the generated countermeasure image only enables the target detection model to identify the target detection frame corresponding to the abnormal object inaccurately, the method and the device for identifying the abnormal object achieve the result that the detected object of the abnormal object is identified as the normal object, completely destroy the identification performance of the target detection model, and greatly improve the efficiency and performance of generation of the countermeasure image.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic view of a scene of a confrontation image generation system provided by an embodiment of the present application;

FIG. 2a is a schematic flow chart of a countermeasure image generation method provided by an embodiment of the present application;

fig. 2b is a schematic structural diagram of a target detection network according to an embodiment of the present application;

fig. 2c is a scene schematic diagram of a countermeasure image generation method according to an embodiment of the present application;

FIG. 3 is another schematic flow chart diagram of a countermeasure image generation method provided by an embodiment of the present application;

fig. 4 is a schematic structural diagram of a server according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The embodiment of the application provides a countermeasure image generation method and device, computer equipment and a storage medium. Wherein the countermeasure image generation method can be applied to a countermeasure image generation apparatus. The countermeasure image generation apparatus can be integrated in a computer device, which can be a terminal having a data processing function. The terminal may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart watch, and the like. The computer device may also be a server, where the server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a Network acceleration service (CDN), and a big data and artificial intelligence platform.

Please refer to fig. 1, which is a scene diagram of the confrontation image generation system provided by the present application; as shown, the computer device generates an intermediate confrontation image, and the specific steps include: inputting an image to be processed into a pre-trained target detection model, and outputting classification prediction information and region prediction coordinate information of each detection object; inputting the classification prediction information and the corresponding classification label information of each detection object into a prediction classification loss function, and performing gradient calculation of the corresponding type of the class to which the corresponding classification label information belongs to obtain the corresponding target classification loss; the prediction classification loss function comprises a gradient increasing calculation item and a gradient decreasing calculation item, wherein the gradient increasing calculation item is used for gradient increasing calculation of the classification loss of the detection object of which the classification label information belongs to the first class, and the gradient decreasing calculation item is used for gradient decreasing calculation of the classification loss of the detection object of which the classification label information belongs to the second class; determining target positioning loss according to the area prediction coordinate information of each detection object and the corresponding area label coordinate information; constructing a target gradient loss function based on the target classification loss and the target positioning loss; calculating a target gradient image corresponding to the image to be processed through the target gradient loss function; superposing the target gradient image to an image to be processed to generate an intermediate countermeasure image, wherein the intermediate countermeasure image is the image to be processed, and gradient calculation is carried out on the intermediate countermeasure image in the next iteration; and repeatedly executing the step of generating the intermediate countermeasure image until the detection object identified as the first class by the target detection model does not exist in the intermediate countermeasure image, and determining the intermediate countermeasure image as the target countermeasure image. It should be noted that the scene schematic diagram of the countermeasure image generation shown in fig. 1 is only an example, and the countermeasure image generation scene described in the embodiment of the present application is for more clearly illustrating the technical solution of the present application, and does not constitute a limitation on the technical solution provided in the present application. As can be known to those skilled in the art, with the evolution of the target detection model training and the emergence of new service scenarios, the technical solution provided in the present application is also applicable to similar technical problems.

The following are detailed below.

In the present embodiment, the description will be made from the perspective of a resist image generation apparatus that can be specifically integrated in a server having a storage unit and a microprocessor mounted thereon with an arithmetic capability.

Referring to fig. 2a, fig. 2a is a schematic flow chart of a method for generating a confrontation image according to an embodiment of the present application. The countermeasure image generation method includes: generating an intermediate countermeasure image, repeatedly executing the step of generating the intermediate countermeasure image until no detection object identified as the first class by the target detection model exists in the intermediate countermeasure image, and determining the intermediate countermeasure image as the target countermeasure image, wherein in the embodiment, the specific step of generating the intermediate countermeasure image comprises: step 101 to step 106.

In step 101, an image to be processed is input into a pre-trained target detection model, and classification prediction information and area prediction coordinate information of each detection object are output.

Computer Vision technology (CV) Computer Vision is a science for researching how to make a machine "see", and further means that a camera and a Computer are used for replacing human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further performing graphic processing, so that the Computer processing becomes an image more suitable for human eyes to observe or transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. Computer vision technologies generally include technologies such as information processing, image Recognition, image semantic understanding, image retrieval, Optical Character Recognition (OCR), video processing, video semantic understanding, video content/behavior Recognition, three-dimensional object reconstruction, 3D technologies, virtual reality, augmented reality, synchronous positioning, map construction, and the like, and also include common biometric technologies such as face Recognition, fingerprint Recognition, and the like.

The scheme provided by the embodiment of the application relates to the technologies such as the computer vision technology of artificial intelligence and the like, and is specifically explained by the following embodiment:

for better understanding of the embodiment of the present application, it should be noted that the target detection model in the embodiment of the present application may be a target detection network (fast RCNN), please refer to fig. 2b together, where fig. 2b is a schematic structural diagram of the target detection network provided in the embodiment of the present application.

The object detection network 10 may be divided into 4 parts:

the basic convolutional networks 12(Conv layers) are a convolutional neural network, such as 13 convolutional (Conv) layers +13 linear rectification function (relu) layers +4 pooling layer (Pooling) layers, and are mainly used for extracting feature map information 13(feature maps) in the image 11 to be processed.

The Region generation network 14 (RPN) is configured to generate candidate regions (regions), specifically, classify anchors (anchors) in the feature map information 13 through a normalization function (softmax), obtain positive classification (positive) information and negative classification (negative) information, determine the positive classification information as the candidate regions, that is, preliminarily classify regions included in the image corresponding to each object, and determine a classification result corresponding to the Region generation network candidate Region as classification information of the Region generation network candidate Region, such as people, animals, or buildings.

Further, a border regression (bounding box regression) offset of the anchor may be calculated, the candidate region may be adjusted according to the border regression offset to obtain a final target candidate region 15 (final), and the target candidate region 15 that is too small and exceeds the border may be removed, thereby implementing the location frame selection of the target object.

And an interest pooling layer 16(ROI ranking) which is responsible for collecting the target candidate region 15 and the feature map information 13, and calculating the feature map information (generic features maps) of the regions with sizes meeting the conditions, and sending the feature map information (generic features maps) to a subsequent layer for processing.

A Classifier 17(Classifier), which may include a full connection layer (full connection) and a normalization processing layer, where the Classifier 17 combines the region feature map information through the full connection layer and the normalization processing layer to calculate a classification result corresponding to the region feature map, and meanwhile, may perform fine adjustment on the target candidate region 15 according to the classification result, determine the fine-adjusted target candidate region 15 as the last accurate detection region of interest (i.e., corresponding to the region prediction coordinate information in the embodiment of the present application), and the classification result corresponding to the location information of interest is the classification information of the region of interest (i.e., corresponding to the classification prediction information of the object in the embodiment of the present application).

The pre-trained target detection model is a model which is trained by a large number of image samples, the image samples are marked with area label coordinate information and classification label information of each object, the area label coordinate information represents a detection frame of the object, and the classification label information is calibration information of each object, such as cat, dog, pig and the like, so that the pre-trained target detection model can realize rapid identification of classification prediction information (namely, calibration information) and area prediction coordinate information (namely, detection frame) of the detection object in the image.

Therefore, a commonly used scenario of the target detection model can be used for rapidly monitoring whether abnormal objects which are not in accordance with the standard appear in the images shot from the monitoring area of the camera, for example, a scenario of intelligently security-inspecting hand luggage in an airport or subway needs to detect some object categories which are subjected to abnormal alarm, such as control tools, guns, or water bottles.

The image to be processed which needs to be processed can be input into the pre-trained target detection model, and the classification prediction information and the region prediction coordinate information of each detection object in the image to be processed are output, wherein the classification prediction information can be expressed as a vector, and the region prediction coordinate information can be expressed as a vector.

Referring to fig. 2c, fig. 2c is a scene schematic diagram of the method for generating a confrontation image according to the embodiment of the present application, in a processing scene 20, an image to be processed may be input into a pre-trained target detection model, and classification prediction information (which may also be understood as a detection frame type prediction vector) and area prediction coordinate information (a detection frame coordinate prediction vector) may be output.

In some embodiments, before inputting the classification prediction information and the corresponding classification label information of each detection object into the prediction classification loss function, the method may further include:

(1) calculating the spatial distance between the classification prediction information and each classification label information;

(2) and determining classification label information corresponding to the classification prediction information of each detection object according to the size of the space distance.

The spatial distance may be a real distance between vectors, and in an embodiment, the spatial distance may be a euclidean distance, which is a commonly used distance definition and refers to a real distance between two points in an m-dimensional space, or a natural length of a vector (i.e., a distance from the point to an origin). The euclidean distance in two and three dimensions is the actual distance between two points.

Thus, the euclidean distance between the classification prediction information and each of the classification label information may be calculated, for example, the classification prediction information may be calculated as (0.2, 0.3, 0.8), the classification label information may include euclidean distances of cats (1, 0, 0), dogs (0, 1, 0), and pigs (1, 0, 0), and the minimum spatial distance may be the pig (1, 0, 0), and the pig (1, 0, 0) may be used as the classification label information of the classification prediction information (0.2, 0.3, 0.8). By analogy, the classification label information corresponding to the classification prediction information of each detection object can be calculated.

In step 102, the classification prediction information of each detection object and the corresponding classification label information are input into a prediction classification loss function, and gradient calculation of the corresponding type of the class to which the corresponding classification label information belongs is performed to obtain a corresponding target classification loss.

In order to realize that an abnormal object is identified as a normal object, and thus the detection function of a pre-trained target detection model is disabled, a predictive classification loss function may be designed, where the predictive classification loss function includes a gradient increasing computation item and a gradient decreasing computation item, the gradient increasing computation item is used for gradient increasing computation of the classification loss of a detection object whose classification label information belongs to a first category, and the gradient decreasing computation item is used for gradient decreasing computation of the classification loss of a detection object whose classification label information belongs to a second category.

That is, in the embodiment of the present application, the number of objects that can be recognized by the pre-trained target detection model may be divided into two categories, the first category may be understood as an abnormal category, and the second category may be understood as a normal category, for example, the number of objects that can be recognized by the pre-trained target detection model is set to Y, and M abnormal object categories are included, and are denoted as V ═ { V ═ V { (V {) ₁ ,…,v _M N normal object classes, denoted U ═ U ₁ ,…,u _N }. Y ═ U @.

The gradient increment is used for obtaining a value corresponding to an independent variable when a certain function maximum value is obtained, in the embodiment of the application, the gradient increment is used for increasing the difference between classification prediction information and corresponding classification label information, so that a pre-trained target detection model tends to identify a detection area of an abnormal object as a normal object. In the embodiment of the present application, the gradient is decreased, that is, in order to make the difference between the classification prediction information and the corresponding classification label smaller and smaller, so that the pre-trained target detection model tends to identify the detection region of the normal object as the normal object, that is, the identification principle of the normal object is not changed.

In one embodiment, the predictive classification loss function may be expressed in the following formula:

the Llcs (F (x), (yk) } is the predicted classification loss function, n is the number of targets, L _CE () Represents the cross entropy loss function, F (x) _k Prediction probability vector for the kth prediction region relative to the Y classes, Y _k For the true category (i.e. the classification label information) of the kth detection box, in the embodiment of the present application, the classification loss term corresponding to each target object is L _k ＝L _CE (F(x) _k ,y _k ) Aiming at the object as an abnormal object class, adopting yk epsilon V,

in a gradient increasing computing mode, i.e. a computing mode of setting gradient increasing, the

Representing the classification label information of j abnormal objects, and conversely, adopting yk to be the U and L aiming at the object as the normal object class _k ＝-L _CE (F(x) _k ,y _k ) The gradient of (2) is calculated by increasing the minus sign (-) so that the original gradient is increasedThe manner of increase becomes a gradient decrease.

Therefore, the classification prediction information and the corresponding classification label information of each detection object are sequentially input into the prediction classification loss function, corresponding gradient calculation is carried out according to the class to which the classification label information belongs, namely the class to which the classification label information belongs is the first class, gradient increasing calculation is carried out, the class to which the classification label information belongs is the second class, and gradient decreasing calculation is carried out to obtain the corresponding target classification loss.

In some embodiments, the step of inputting the classification prediction information and the corresponding classification label information of each detection object into the prediction classification loss function, and performing corresponding type of gradient calculation according to the class to which the classification label information belongs to obtain the corresponding target classification loss includes:

(1) determining the category of the classification label information of each detection object;

(2) determining the detection object with the belonging category as a first detection object, and determining the detection object with the belonging category as a second detection object;

(3) inputting the classification prediction information of the first detection object and the corresponding classification label information into a gradient increasing calculation item for gradient increasing calculation to obtain a first classification loss;

(4) inputting the classification prediction information of the second detection object and the corresponding classification label information into a gradient decreasing calculation item for gradient decreasing calculation to obtain a second classification loss;

(5) a target classification loss is determined based on the first classification loss and the second classification loss.

The normal category and the abnormal category may be set, where the first category is the abnormal category and the second category is the normal category. The abnormal category and the normal category may be preset in advance, for example, the control knife, the gun, and the water bottle are determined as the abnormal category, and the objects such as the mobile phone and the computer are determined as the normal category.

Therefore, the class to which the classification label information of each detection object belongs is determined, namely the first class or the second class. Further, the detection object belonging to the first category is determined as a first detection object, that is, the detection object of the abnormal object is determined as a first detection object, and the detection object belonging to the second category is determined as a second detection object, that is, the detection object of the normal object is determined as a second detection object.

Further, please refer to the following formula:

the L is _k For each target object a classification penalty term is associated, the

I.e. the gradient increment calculation term when y _k And E is V, namely the class to which the classification label information of the detection object belongs is a first class. - (y) _k ∈U)⊙L _CE (F(x) _k ,y _k ) I.e. the gradient decreasing calculation term when y _k And e is U, namely the class to which the classification label information of the detection object belongs is a second class.

Based on the formula, inputting the classification prediction information of the first detection object and the corresponding classification label information into a gradient increasing calculation item for gradient increasing calculation to obtain a first classification loss, and inputting the classification prediction information of the second detection object and the corresponding classification label information into a gradient decreasing calculation item for gradient decreasing calculation to obtain a second classification loss.

Finally, the first classification loss and the second classification loss are counted to obtain the total loss (i.e. the target classification loss).

In some embodiments, the step of determining a target classification loss based on the first classification loss and the second classification loss comprises:

(1.1) summing the first classification loss and the second classification loss to obtain a third classification loss;

and (1.2) calculating the ratio of the third classification loss to the total number of the detected objects to obtain the target classification loss.

Please refer to the following formula:

based on the formula, the first classification loss and the second classification loss can be sequentially summed to obtain a third classification loss, and then the ratio of the third classification loss to the number n of detection objects identified from the image to be detected is calculated to obtain a target classification loss L _cls 。

In step 103, the target location loss is determined based on the area prediction coordinate information and the corresponding area tag coordinate information of each detection object.

The area prediction coordinate information and the corresponding area label coordinate information of each detection object can be substituted into the existing prediction positioning loss function for calculation, and the corresponding target positioning loss L can be obtained _loc Since the coordinate information of the detection object is not changed in the embodiment of the present application, the value of the target location loss is small.

In step 104, an object gradient penalty function is constructed based on the object classification penalty and the object localization penalty.

The target classification loss and the target positioning loss can be added to construct a target gradient loss function, and the target gradient loss function contains a target classification loss and a shift rule for identifying an abnormal object as a normal object class.

In some embodiments, the step of constructing the target gradient penalty function based on the target classification penalty and the target localization penalty may include:

(1) weighting the target classification loss by preset weight to obtain weighted target classification loss;

(2) and constructing a target gradient loss function based on the target classification loss and the target positioning loss after the weighting processing.

Among them, the following formula can be referred to together:

L＝γL _cls +L _loc

the γ is a macro parameter, i.e. a preset weight, the L is a target gradient loss function, and since the migration rule identifying the abnormal object as the normal object class is hidden in the target classification loss, the weighting process may be performed by the macro parameter, for example, the macro parameter may be 0.9, 1, and so on. Based on the above formula, the target classification loss L can be reduced _cls Weighting with macro parameters to obtain weighted target classification loss, and based on weighted target classification loss γ L _cls And target location loss L _loc And a target gradient loss function.

In step 105, a target gradient image corresponding to the image to be processed is calculated through the target gradient loss function.

In the embodiment of the application, a countermeasure image needs to be generated as a countermeasure sample to counteract a pre-trained target detection model, and the countermeasure image can be understood as adding some disturbances which cannot be detected by human eyes into an image to be processed (such disturbances do not affect the recognition of human beings, but easily fool the model), so that the target detection model makes wrong judgment.

The target gradient loss function is hidden with a shift rule for identifying an abnormal object as a normal object category, namely, the target gradient loss function is hidden with a rule for increasing the gradient of a loss value between a prediction result of the abnormal object and classification label information, namely, the prediction result of the abnormal object can be more and more deviated from a real target value.

In some embodiments, the step of calculating a target gradient image corresponding to the image to be processed by using the target gradient loss function may include:

(1) calculating the gradient value of each pixel in the image to be processed according to a target gradient loss function to obtain a first gradient image;

(2) processing the first gradient image through a sign function to obtain a second gradient image;

(3) calculating the product of the preset disturbance value and the second gradient image to obtain a target gradient image

Among them, the following formula can be referred to together:

wherein δ is the first contrast image after adding disturbance, x' is the current image to be processed, e is the disturbance value and can be a constant, sign () is a sign function (generally represented by sign (x)), and the function is to take a certain number of signs (positive or negative): when x is>0, sign (x) 1; when x is 0, sign (x) is 0; when x is<0, sign (x) ═ 1. The

The representing method comprises the steps of calculating a gradient value of a target gradient loss function relative to each pixel in x', obtaining a first gradient image by solving the gradient value of each pixel point in the image to be processed through the target gradient loss function, wherein the first gradient image and the image to be processed have the same size, the difference is that the value of each pixel on the first gradient image is a gradient, processing the first gradient image through a sign function, converting each pixel from the gradient to 1, 0 or-1 to obtain a second gradient image, and finally multiplying the preset disturbance value epsilon by the second gradient image to obtain a target gradient image, wherein the target gradient image can enable the identification of the pre-trained target detection model for the abnormal object to shift towards the direction of the normal object class.

In step 106, the target gradient image is superimposed on the image to be processed, and an intermediate confrontation image is generated.

The target gradient image can enable the identification of the pre-trained target detection model for the abnormal object to shift towards the direction of the normal object category, so that the target gradient image can be directly superposed on the image to be processed to generate an intermediate countermeasure image, and the identification of the abnormal object in the intermediate countermeasure image by the trained target detection model is more biased towards the identification as the normal object category compared with the identification as the intermediate countermeasure image.

In order to achieve a better countermeasure effect, countermeasure enhancement is also required, and the intermediate countermeasure image can be used as a to-be-processed image for performing gradient calculation in the next iteration, that is, gradient increment is required to be continuously performed on identification of an abnormal object, and refer to step 107 specifically.

In some embodiments, the step of superimposing the target gradient image onto the image to be processed to generate the intermediate confrontation image may include:

(1) superposing the target gradient image to an image to be processed to obtain a first contrast image;

(2) acquiring an image to be processed of a target detection model input for pre-training for the first time and determining the image to be processed as an initial image;

(3) and limiting the pixel difference between each pixel between the first countermeasure image and the initial image to a preset disturbance value, and generating an intermediate countermeasure image.

Wherein, reference can be continued to the formula:

and superposing the target gradient image to an image to be processed to obtain a first contrast image.

Further, in order to meet a requirement that the image is not easily perceived by a human after being disturbed, an image to be processed, which is input into the pre-trained target detection model for the first time, is determined as an initial image, and then each pixel in the first contrast image is compared with a corresponding pixel in the corresponding initial image, a pixel difference between each pixel is defined within a preset disturbance value, that is, an absolute value of the pixel difference of each pixel is within the preset disturbance value, and assuming that the absolute value of the difference between the pixel in the first contrast image and the corresponding pixel in the corresponding initial image is greater than the preset disturbance value and the pixel in the first contrast image is greater than the corresponding pixel in the initial image, the value of the pixel in the first contrast image needs to be reduced. Assuming that the absolute value of the difference between a pixel in the first countermeasure image and a corresponding pixel in the initial image is greater than the preset disturbance value and the pixel in the first countermeasure image is smaller than the corresponding pixel in the initial image, the value of the pixel in the first countermeasure image needs to be increased until the absolute value of the pixel difference between each pixel between the first countermeasure image and the initial image is within the preset disturbance value, and the first countermeasure image satisfying the condition is determined as the intermediate countermeasure image.

In step 107, the step of generating the intermediate countermeasure image is repeatedly executed until the detection object recognized as the first category by the target detection model does not exist in the intermediate countermeasure image, and the intermediate countermeasure image is determined as the target countermeasure image.

In order to achieve a complete attack effect, namely, the pre-trained target detection model identifies the abnormal object as a normal object class, the intermediate countermeasure image, namely the steps 101 to 106, can be repeatedly generated on the premise that the intermediate object image is used as a to-be-processed image for performing gradient calculation in the next iteration, so that the gradient of the abnormal identification increases more and more, the interference becomes more and more serious, until the target detection model cannot identify the detection object which is in the first class in the intermediate countermeasure image, namely, the classification prediction information of the detection object corresponds to the classification label information of the normal object class, which indicates that the attack is completed, and the intermediate countermeasure image is determined as the target countermeasure image.

In the related art, the mode of resisting attack only enables the target detection model to identify the abnormal object as other classes, but other classes still possibly can be abnormal classes and still can cause warning, so that the loss of all the abnormal classes can be increased, the target detection model can identify the abnormal object as a normal object class, the attack performance is stronger, repeated independent training is not needed, and the efficiency and the performance of resisting image generation are greatly improved.

As can be seen from the above, in the embodiment of the present application, the to-be-processed image is input into the pre-trained target detection model, and the classification prediction information and the area prediction coordinate information of each detection object are output; inputting the classification prediction information and the corresponding classification label information of each detection object into a prediction classification loss function, and performing gradient calculation of the corresponding type of the class to which the corresponding classification label information belongs to obtain the corresponding target classification loss; the prediction classification loss function comprises a gradient increasing calculation item and a gradient decreasing calculation item, wherein the gradient increasing calculation item is used for gradient increasing calculation of the classification loss of the detection object of which the classification label information belongs to the first class, and the gradient decreasing calculation item is used for gradient decreasing calculation of the classification loss of the detection object of which the classification label information belongs to the second class; determining target positioning loss according to the area prediction coordinate information of each detection object and the corresponding area label coordinate information; constructing a target gradient loss function based on the target classification loss and the target positioning loss; calculating a target gradient image corresponding to the image to be processed through a target gradient loss function; superposing the target gradient image to the image to be processed to generate an intermediate countermeasure image, wherein the intermediate countermeasure image is the image to be processed, and gradient calculation is carried out on the intermediate countermeasure image in the next iteration; and repeatedly executing the step of generating the intermediate confrontation image until the detection object identified as the first class by the target detection model does not exist in the intermediate confrontation image, and determining the intermediate confrontation image as the target confrontation image. Accordingly, each detection object is subjected to corresponding gradient incremental calculation or gradient decremental calculation according to the category to which the classification label information belongs through a preset classification loss function to obtain target classification loss, a target gradient loss function is constructed based on the target classification loss, the image to be processed is calculated through the target gradient loss function, a target gradient image is generated and superposed on the image to be processed, an intermediate countermeasure image is generated, the process is repeated until the detection object identified as the first category by the target detection model does not exist in the intermediate countermeasure image, and the intermediate countermeasure image is determined as the target countermeasure image. Compared with the scheme that the generated countermeasure image only enables the target detection model to identify the target detection frame corresponding to the abnormal object inaccurately, the method and the device for identifying the abnormal object achieve the result that the detected object of the abnormal object is identified as the normal object, completely destroy the identification performance of the target detection model, and greatly improve the efficiency and performance of generation of the countermeasure image.

In the present embodiment, the countermeasure image generation device will be described by taking as an example the case where it is specifically integrated in a server, and the following description will be specifically referred to.

Referring to fig. 3, fig. 3 is another schematic flow chart of a method for generating a confrontation image according to an embodiment of the present application. The method flow can comprise the following steps: the server generates an intermediate countermeasure image, the server repeatedly executes the step of generating the intermediate countermeasure image until no detection object identified as the first class by the target detection model exists in the intermediate countermeasure image, and the intermediate countermeasure image is determined as the target countermeasure image, wherein in the embodiment, the specific step of generating the intermediate countermeasure image by the server comprises the following steps: step 201 to step 208.

In step 201, the server inputs the image to be processed into a pre-trained target detection model, and outputs classification prediction information and area prediction coordinate information of each detection object.

In order to better explain the embodiment of the application, the scene can be used as a scene for carrying out intelligent security check on hand-held luggage in an airport or a subway, in the scene, some object types such as controlled knives, guns or water bottles which are subjected to abnormal alarm need to be detected, and in view of the higher safety requirement of such supervision, the robustness of the target detection model is often evaluated by using an anti-attack means.

The method includes the steps that any image to be processed can be obtained in advance and input into a pre-trained target detection model, the target detection model can identify the image to be processed, and classification prediction information and area prediction coordinate information of each detection object are output. The classification prediction information may be a vector, and the region prediction coordinate information may be a vector.

The classification prediction information represents a predicted object type, such as a pipe cutter, a gun, a water bottle, a mobile phone or a computer, and the region prediction coordinate information represents a detection frame of the predicted object.

In step 202, the server calculates a spatial distance between the classification prediction information and each piece of classification label information, and determines the classification label information corresponding to the classification prediction information of each detection object according to the spatial distance.

The classification label information may be represented by a vector, such as a control tool, a gun, a water bottle, a mobile phone or a computer, and (1, 0, 0, 0, 0) is represented as the classification label information of the control tool, (0, 1, 0, 0, 0) is represented as the classification label information of the gun, (0, 0, 1, 0, 0) is represented as the classification label information of the water bottle, (0, 0, 0, 1, 0) is represented as the classification label information of the mobile phone, and (0, 0, 0, 0, 1) is represented as the classification label information of the computer. That is, each vector unit can be characterized as whether it is a corresponding classification label, the closer to 0, the lower the probability of representing the classification label, and the closer to 1, the higher the probability of representing the classification label, and the spatial distance is the euclidean distance.

Therefore, the classification prediction information may be a vector set of prediction probabilities of the pre-trained target detection model for each classification label, for example, (0.2, 0.3, 0.8, 0.1, 0.01), a euclidean distance between the classification prediction information and each classification label information may be calculated, and a portion with the smallest euclidean distance may be used as an accurate prediction vector, that is, the classification label information corresponding to the classification prediction information may be determined to be (0, 0, 1, 0, 0), that is, the classification label is a water bottle, so that the classification label information corresponding to the classification prediction information of each detection object may be sequentially determined.

In step 203, the server determines the category to which the classification label information of each detection object belongs, determines the detection object whose category is the first category as the first detection object, determines the detection object whose category is the second category as the second detection object, inputs the classification prediction information of the first detection object and the corresponding classification label information into a gradient increasing calculation item for gradient increasing calculation to obtain a first classification loss, inputs the classification prediction information of the second detection object and the corresponding classification label information into a gradient decreasing calculation item for gradient decreasing calculation to obtain a second classification loss, sums the first classification loss and the second classification loss to obtain a third classification loss, and calculates a ratio of the total number of the third classification loss and the detection objects to obtain a target classification loss.

Therefore, the class to which the classification label information of each detection object belongs is determined to be the first class or the second class. And then determining the detection object with the belonging category of the first detection object as the first detection object, namely the detection object of the abnormal object as the first detection object, and determining the detection object with the belonging category of the second detection object as the second detection object, namely the detection object of the normal object as the second detection object.

Further, please refer to the following formula:

I.e. the gradient increment calculation term when y _k And e, V, namely the class to which the classification label information of the detection object belongs is a first class, and a gradient increasing calculation item is used for calculation. - (y) _k ∈U)⊙L _CE (F(x) _k ,y _k ) I.e. the gradient decreasing calculation term when y _k And e, U, namely the class to which the classification label information of the detection object belongs is a second class, and a gradient decreasing calculation item is used for calculation.

Based on the formula, the classification prediction information of the first detection object and the corresponding classification label information are substituted into a gradient increasing calculation item to perform gradient increasing calculation to obtain a first classification loss, and the classification prediction information of the second detection object and the corresponding classification label information are substituted into a gradient decreasing calculation item to perform gradient decreasing calculation to obtain a second classification loss.

Please refer to the following formula together:

based on the formula, the first classification loss and the second classification loss can be sequentially summed to obtain a third classification loss, and then the ratio of the third classification loss to the total number n of the detected objects identified in the image to be detected is calculated to obtain a target classification loss L _cls 。

In step 204, the server determines the target positioning loss according to the area prediction coordinate information of each detection object and the corresponding area tag coordinate information.

The area prediction coordinate information and the corresponding area label coordinate information of each detection object can be substituted into the existing prediction positioning loss function for calculation, and the corresponding target positioning loss L can be obtained _loc 。

In step 205, the server performs weighting processing on the target classification loss by using a preset weight to obtain a weighted target classification loss, and constructs a target gradient loss function based on the weighted target classification loss and the weighted target localization loss.

Among them, the following formula can be referred to together:

L＝γL _cls +L _loc

the γ is a macro parameter, i.e. a preset weight, the L is a target gradient loss function, and since the migration rule identifying the abnormal object as the normal object class is hidden in the target classification loss, the weighting process may be performed by the macro parameter, for example, the macro parameter may be 0.95. Based on the above formula, the target classification loss L can be obtained _cls By macro-parametersLine weighting to obtain weighted target classification loss, and based on weighted target classification loss gamma L _cls And target location loss L _loc And a target gradient loss function.

In step 206, the server calculates a gradient value of each pixel in the image to be processed according to the target gradient loss function to obtain a first gradient image, processes the first gradient image through the sign function to obtain a second gradient image, and calculates a product of a preset disturbance value and the second gradient image to obtain the target gradient image.

In the embodiment of the application, a countermeasure image needs to be generated as a countermeasure sample to counteract the pre-trained target detection model, where the countermeasure image may be understood as adding some disturbances that cannot be perceived by human eyes to the image to be processed, so that the target detection model makes an erroneous determination on an abnormal object.

Wherein the sign function is sign () expressed by sign (x), and the function is to take a certain number of signs (positive or negative): when x >0, sign (x) 1; when x is 0, sign (x) is 0; when x <0, sign (x) is-1.

How to calculate the target gradient image can be understood by referring to the following formula together:

wherein δ is the first antagonizing image after adding disturbance, x' is the current image to be processed, e is the disturbance value and can be a constant, and

the representation is to calculate the gradient value of the target gradient loss function relative to each pixel in x', and can also be understood as a first gradient image, so that the gradient value of each pixel in the image to be processed can be obtained through the target gradient loss function to obtain the first gradient image, and the first gradient image and the image to be processed have the same size, the difference is that the value of each pixel on the first gradient image is the gradient, so that the value of each pixel on the first gradient image can be used as the gradient through a sign functionProcessing the first gradient image, converting each pixel from gradient to 1, 0 or-1 to obtain a second gradient image, and finally multiplying the preset disturbance value epsilon by the second gradient image to obtain a target gradient image, wherein the target gradient image can enable the identification of the pre-trained target detection model for the abnormal object to shift towards the direction of the normal object category, namely the target gradient image can enable the identification of the target detection model for the control cutter, the gun and the water bottle to shift towards the direction of a mobile phone or a computer.

In step 207, the server superimposes the target gradient image on the image to be processed to obtain a first contrast image, and obtains the image to be processed, which is input into the pre-trained target detection model for the first time, and determines the image to be processed as an initial image.

Wherein, we can continue to refer to the formula:

and superposing the target gradient image to an image x' to be processed to obtain a first antagonistic image delta.

Furthermore, in order to meet a requirement that the image is not easily perceived by people after being disturbed, an image to be processed, which is not subjected to any disturbing operation, i.e., is input into the pre-trained target detection model for the first time, is to be obtained and determined as an initial image.

In step 208, the server calculates a pixel difference between each first pixel in the first countermeasure image and a corresponding second pixel in the initial image to form a pixel difference set, limits each pixel difference in the pixel difference set within a preset disturbance value range through a clipping function, and superimposes the pixel difference set after the pixel difference limitation on the initial image to generate an intermediate countermeasure image.

Please refer to the following formula:

x″＝x+clip(δ-x,-∈,∈)

the x' is an intermediate confrontation image, the clip () is a clipping function and is used for calculating the pixel difference between the first confrontation image delta and each pixel in the initial image x to form a pixel difference value, the minimum value of the preset disturbance range is a negative preset disturbance value (namely epsilon), the maximum value of the preset disturbance value range is a positive preset disturbance value (epsilon), therefore, the preset disturbance range is [ epsilon ] -epsilon, epsilon ], each pixel difference is limited in the preset disturbance value range through the clipping function, epsilon is taken for the pixel value larger than epsilon, and epsilon is taken for the pixel value smaller than-. The change amplitude of each pixel does not exceed one E, and therefore the change amplitude is not easy to be perceived by naked eyes. Based on this, the pixel difference set after pixel definition is superimposed on the initial image, generating an intermediate confrontation image.

In step 209, the server repeatedly executes the step of generating the intermediate countermeasure image until the detection object recognized as the first category by the target detection model does not exist in the intermediate countermeasure image, and determines the intermediate countermeasure image as the target countermeasure image.

Wherein, because the identification of the intermediate confrontation image by the pre-trained target detection model to the abnormal object is more biased to be identified as the normal object category, namely the mobile phone and the computer, but can also be identified as the abnormal object category, in order to realize the complete attack effect, namely, the identification of the pre-trained target detection model to the abnormal object as the normal object category, the intermediate confrontation image generation step 201 to step 208 can be repeatedly executed on the premise that the intermediate object image is taken as the image to be processed for gradient calculation of the next iteration, so that the gradient of the abnormal identification increases progressively and the interference becomes more serious until the target detection model can not identify the control tool, the gun or the water bottle of the first category in the intermediate confrontation image, namely, the classification prediction information of the detection object corresponds to the classification label information of the normal object category, namely, the mobile phone and the computer show that the attack is finished, and the intermediate countermeasure image is determined as a target countermeasure image.

In the related art, the countermeasure sample may cause the target detection model to identify the controlled cutter as a gun and still cause warning, but the embodiment of the application can increase the loss of all abnormal categories, so that the target detection model can identify all abnormal objects as normal object categories and does not cause warning, and the method has stronger attack performance, does not need repeated separate training, greatly improves the efficiency and performance of generation of the countermeasure image, trains the countermeasure image with the target, and can also increase the robustness of the target detection model.

In some embodiments, in order to increase robustness of a detection model (which may also be referred to as an object detection model), the generated target countermeasure image and a normal training image sample may be input as a sample image set to an initial object detection model, and model training is performed to obtain a trained sample detection model, where the trained object detection model has better robustness and is not easily confused by an attack image, and thus, an image to be recognized may be input to the trained object detection model to perform object recognition, and a better recognition effect may be achieved.

The embodiment of the present application further provides a training method for an object detection model, where the training method for an object detection model specifically includes: inputting a sample image set into the initial object detection model, and performing model training to obtain a trained object detection model, where the sample image set includes a sample confrontation image, and the sample confrontation image is the confrontation image generated by the confrontation image generation method in the above embodiment.

In this embodiment, the object detection model is different from the aforementioned target detection model, the training sample of the object detection model in this embodiment includes a sample confrontation image, and the training sample of the aforementioned target detection model does not include the sample confrontation image. In this embodiment, in the model training process, in each model iterative training process, a loss value corresponding to the current object detection model is output, and when the loss value meets a preset requirement, the training is finished, and it is determined that the current object detection model is the trained object detection model; or under the condition that the number of times of model iterative training meets the preset number of times, finishing training to obtain the trained object detection model. The sample confrontation image in this embodiment is obtained according to the confrontation image generated by the confrontation image generation method described in the above embodiment, and for a specific generation method, reference is made to the foregoing description, which is not repeated herein. The sample counterimage is used for training the object detection model, so that the obtained trained object detection model can increase the identification accuracy of the model, increase the counterattack capability of the model and increase the robustness of the object detection model.

The embodiment of the application further provides an image recognition method, the image to be recognized is input into the object detection model for object recognition, a recognition result is obtained, and the object detection model is obtained through training according to the training method of the object detection model.

In an actual application scenario, for example, in a scenario where intelligent security inspection is performed on baggage in an airport or a subway, some object types such as a control tool, a gun, or a water bottle that perform an abnormal alarm need to be detected, an image corresponding to the baggage is obtained, and the image is input into an object detection model to perform object identification, so as to obtain a corresponding identification result.

The structure schematic diagram of the confrontation image generation device provided by the embodiment of the application, wherein the confrontation image generation device may include a generation module and an iteration module, and the generation module includes an output unit, an input unit, a first determination unit, a construction unit, a calculation unit, and a superposition unit.

A generating module, configured to generate an intermediate confrontation image, where the generating module specifically includes:

and the output unit is used for inputting the image to be processed into the pre-trained target detection model and outputting the classification prediction information and the region prediction coordinate information of each detection object. In some embodiments, the apparatus further comprises a second determining unit configured to:

And the input unit is used for inputting the classification prediction information of each detection object and the corresponding classification label information into a prediction classification loss function, and performing gradient calculation of the corresponding type of the class to which the corresponding classification label information belongs to obtain the corresponding target classification loss.

The predicted classification loss function comprises a gradient increasing calculation item and a gradient decreasing calculation item, wherein the gradient increasing calculation item is used for performing gradient increasing calculation on the classification loss of the detection object of which the classification label information belongs to the first class, and the gradient decreasing calculation item is used for performing gradient decreasing calculation on the classification loss of the detection object of which the classification label information belongs to the second class.

In some embodiments, the prediction classification loss function includes a gradient increment calculation term for gradient increment calculation and a gradient decrement calculation term for gradient decrement calculation, the input unit 303 includes:

a second determining subunit, configured to determine, as the first detection object, the detection object belonging to the first category and the detection object belonging to the second category, and determine, as the second detection object, the detection object belonging to the second category;

In some embodiments, the third determining subunit is configured to:

And the first determining unit is used for determining the target positioning loss according to the area prediction coordinate information of each detection object and the corresponding area label coordinate information.

And the construction unit is used for constructing a target gradient loss function based on the target classification loss and the target positioning loss.

In some embodiments, the building unit is configured to:

And the calculating unit is used for calculating a target gradient image corresponding to the image to be processed through the target gradient loss function.

In some embodiments, the computing unit is to:

and calculating the product of the preset disturbance value and the second gradient image to obtain a target gradient image.

And the superposition unit is used for superposing the target gradient image to the image to be processed to generate an intermediate confrontation image, and the intermediate confrontation image is the image to be processed for next iteration gradient calculation.

In some embodiments, the superimposing unit includes:

In some embodiments, the generating subunit is to:

calculating the pixel difference between each first pixel in the first antagonizing image and the corresponding second pixel in the initial image to form a pixel difference set;

The specific implementation of each unit can refer to the previous embodiment, and is not described herein again.

An embodiment of the present application further provides a computer device, as shown in fig. 4, which shows a schematic structural diagram of a server according to the embodiment of the present application, specifically:

the computer device may include components such as a processor 401 of one or more processing cores, memory 402 of one or more computer-readable storage media, a power supply 403, and an input unit 404. Those skilled in the art will appreciate that the computer device configuration illustrated in FIG. 4 does not constitute a limitation of computer devices, and may include more or fewer components than those illustrated, or some components may be combined, or a different arrangement of components. Wherein:

the processor 401 is a control center of the computer device, connects various parts of the entire computer device using various interfaces and lines, and performs various functions of the computer device and processes data by running or executing software programs and/or modules stored in the memory 402 and calling data stored in the memory 402, thereby monitoring the computer device as a whole. Optionally, processor 401 may include one or more processing cores; optionally, the processor 401 may integrate an application processor and a modem processor, wherein the application processor mainly handles operating systems, user interfaces, application programs, and the like, and the modem processor mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 401.

The memory 402 may be used to store software programs and modules, and the processor 401 executes various functional applications and data processing by operating the software programs and modules stored in the memory 402. The memory 402 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data created according to the use of the server, and the like. Further, the memory 402 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory 402 may also include a memory controller to provide the processor 401 access to the memory 402.

The computer device further comprises a power supply 403 for supplying power to the respective components, and optionally, the power supply 403 may be logically connected to the processor 401 through a power management system, so that functions of managing charging, discharging, power consumption, and the like are implemented through the power management system. The power supply 403 may also include any component of one or more dc or ac power sources, recharging systems, power failure detection circuitry, power converters or inverters, power status indicators, and the like.

The computer device may also include an input unit 404, which input unit 404 may be used to receive input numeric or character information and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.

Although not shown, the computer device may further include a display unit and the like, which are not described in detail herein. Specifically, in this embodiment, the processor 401 in the computer device loads the executable file corresponding to the process of one or more application programs into the memory 402 according to the following instructions, and the processor 401 runs the application program stored in the memory 402, so as to implement the various method steps provided by the foregoing embodiments, as follows:

generating an intermediate confrontation image, which comprises the following specific steps: inputting an image to be processed into a pre-trained target detection model, and outputting classification prediction information and area prediction coordinate information of each detection object; inputting the classification prediction information and the corresponding classification label information of each detection object into a prediction classification loss function, and performing gradient calculation of the corresponding type of the class to which the corresponding classification label information belongs to obtain the corresponding target classification loss; the prediction classification loss function comprises a gradient increasing calculation item and a gradient decreasing calculation item, wherein the gradient increasing calculation item is used for gradient increasing calculation of the classification loss of the detection object of which the classification label information belongs to the first class, and the gradient decreasing calculation item is used for gradient decreasing calculation of the classification loss of the detection object of which the classification label information belongs to the second class; determining target positioning loss according to the area prediction coordinate information of each detection object and the corresponding area label coordinate information; constructing a target gradient loss function based on the target classification loss and the target positioning loss; calculating a target gradient image corresponding to the image to be processed through the target gradient loss function; superposing the target gradient image to an image to be processed to generate an intermediate countermeasure image, wherein the intermediate countermeasure image is the image to be processed, and gradient calculation is carried out on the intermediate countermeasure image in the next iteration; and repeatedly executing the step of generating the intermediate countermeasure image until the detection object identified as the first class by the target detection model does not exist in the intermediate countermeasure image, and determining the intermediate countermeasure image as the target countermeasure image.

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and parts that are not described in detail in a certain embodiment may refer to the above detailed descriptions for the anti-image generation method or the data processing method, and are not described again here.

It will be understood by those skilled in the art that all or part of the steps of the methods of the above embodiments may be performed by instructions or by associated hardware controlled by the instructions, which may be stored in a computer readable storage medium and loaded and executed by a processor.

To this end, the present application provides a computer-readable storage medium, in which a plurality of instructions are stored, and the instructions can be loaded by a processor to execute the steps in any one of the resist image generation method or the data processing method provided by the present application. For example, the instructions may perform the steps of:

generating an intermediate confrontation image, which comprises the following specific steps: inputting an image to be processed into a pre-trained target detection model, and outputting classification prediction information and region prediction coordinate information of each detection object; inputting the classification prediction information and the corresponding classification label information of each detection object into a prediction classification loss function, and performing gradient calculation of the corresponding type of the class to which the corresponding classification label information belongs to obtain the corresponding target classification loss; the prediction classification loss function comprises a gradient increasing calculation item and a gradient decreasing calculation item, wherein the gradient increasing calculation item is used for gradient increasing calculation of the classification loss of the detection object of which the classification label information belongs to the first class, and the gradient decreasing calculation item is used for gradient decreasing calculation of the classification loss of the detection object of which the classification label information belongs to the second class; determining target positioning loss according to the area prediction coordinate information of each detection object and the corresponding area label coordinate information; constructing a target gradient loss function based on the target classification loss and the target positioning loss; calculating a target gradient image corresponding to the image to be processed through the target gradient loss function; superposing the target gradient image to an image to be processed to generate an intermediate countermeasure image, wherein the intermediate countermeasure image is the image to be processed, and gradient calculation is carried out on the intermediate countermeasure image in the next iteration; and repeatedly executing the step of generating the intermediate countermeasure image until the detection object identified as the first class by the target detection model does not exist in the intermediate countermeasure image, and determining the intermediate countermeasure image as the target countermeasure image.

According to an aspect of the application, a computer program product or computer program is provided, comprising computer instructions, the computer instructions being stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method provided in the various alternative implementations provided by the embodiments described above.

The above operations can be implemented in the foregoing embodiments, and are not described in detail herein.

Wherein the computer-readable storage medium may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.

Since the instructions stored in the computer-readable storage medium can execute the steps in any data processing method provided in the embodiments of the present application, the beneficial effects that can be achieved by any data processing method provided in the embodiments of the present application can be achieved, which are detailed in the foregoing embodiments and will not be described again here.

The foregoing detailed description has provided a method, an apparatus, a computer device and a storage medium for generating a confrontation image according to embodiments of the present application, and specific examples have been applied in this document to illustrate the principles and embodiments of the present application, and the above description of the embodiments is only used to help understanding the method and its core ideas of the present application; meanwhile, for those skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A confrontational image generation method, characterized by comprising:

inputting the classification prediction information of each detection object and the corresponding classification label information into a prediction classification loss function, and performing gradient calculation on the corresponding type of the class to which the corresponding classification label information belongs to obtain the corresponding target classification loss;

2. The antagonistic image generation method according to claim 1, wherein the predictive classification loss function includes a gradient increasing computation term for gradient increasing computation and a gradient decreasing computation term for gradient decreasing computation;

the step of inputting the classification prediction information and the corresponding classification label information of each detection object into a prediction classification loss function, and performing gradient calculation of the corresponding type of the class to which the corresponding classification label information belongs to obtain the corresponding target classification loss includes:

determining the category of the classification label information of each detection object;

determining the detection object with the belonging category as a first detection object, and determining the detection object with the belonging category as a second detection object;

inputting the classification prediction information of the first detection object and the corresponding classification label information into a gradient increasing calculation item for gradient increasing calculation to obtain a first classification loss;

inputting the classification prediction information of the second detection object and the corresponding classification label information into a gradient decreasing calculation item for gradient decreasing calculation to obtain a second classification loss;

determining a target classification loss based on the first classification loss and the second classification loss.

3. The countermeasure image generation method of claim 2, wherein the determining a target classification loss based on the first classification loss and the second classification loss comprises:

and calculating the ratio of the third classification loss to the total number of the detected objects to obtain the target classification loss.

4. The method of claim 1, wherein before inputting the classification prediction information and the corresponding classification label information of each detected object into the prediction classification loss function, the method further comprises:

5. The method of generating a confrontational image according to claim 1 wherein said constructing a target gradient penalty function based on said target classification penalty and target localization penalty comprises:

6. The method for generating a confrontation image according to claim 1, wherein the calculating a target gradient image corresponding to the image to be processed by the target gradient loss function includes:

7. The confrontation image generation method according to claim 6, wherein the superimposing the target gradient image to the image to be processed to generate an intermediate confrontation image comprises:

superposing the target gradient image to an image to be processed to obtain a first antagonistic image;

acquiring an image to be processed of a target detection model input in pre-training for the first time and determining the image to be processed as an initial image;

and limiting the pixel difference between each pixel between the first countermeasure image and the initial image to a preset disturbance value, and generating an intermediate countermeasure image.

8. The countermeasure image generation method according to claim 7, wherein the defining a pixel difference between each pixel between the first countermeasure image and the initial image at a preset disturbance value, generating an intermediate countermeasure image, includes:

9. A method for training an object detection model, comprising:

inputting a sample image set into an initial object detection model, and performing model training to obtain a trained object detection model, wherein the sample image set comprises a sample confrontation image, and the sample confrontation image is the confrontation image generated by the confrontation image generation method according to any one of claims 1 to 8.

10. An image recognition method, comprising:

inputting an image to be recognized into an object detection model for object recognition to obtain a recognition result, wherein the object detection model is obtained by training according to the object detection model training method of claim 9.

11. A countermeasure image generation apparatus characterized by comprising:

and the iteration module is used for repeatedly executing the step of generating the intermediate confrontation image until the intermediate confrontation image does not have the detection object which is identified as the first class by the target detection model, and determining the intermediate confrontation image as the target confrontation image.

12. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the method for generating a countermeasure image according to any one of claims 1 to 8 or the method for training an object detection model according to claim 9 when executing the computer program.

13. A computer readable storage medium storing a plurality of instructions adapted to be loaded by a processor to perform the method of generating a countermeasure image according to any one of claims 1 to 8 or to perform the steps of the method of training an object detection model according to claim 9.