CN117333732A

CN117333732A - Countermeasure sample generation method, model training method, image recognition method and device

Info

Publication number: CN117333732A
Application number: CN202210711634.2A
Authority: CN
Inventors: 刘彦宏; 曾定衡; 王洪斌; 蒋宁; 吴海英
Original assignee: Mashang Xiaofei Finance Co Ltd
Current assignee: Mashang Xiaofei Finance Co Ltd
Priority date: 2022-06-22
Filing date: 2022-06-22
Publication date: 2024-01-02

Abstract

Provided are an countermeasure sample generation method, a model training method, an image recognition method, and an apparatus, the countermeasure sample generation method including: acquiring a first output result obtained by processing a first image by a target image processing model; weighting the C probabilities indicated by the first output result by using C weight values to obtain C weighted probabilities; determining a loss function of the C weighted probabilities; and carrying out disturbance processing on the first image based on the loss function to obtain a countermeasure image. According to the method, the C probabilities output after the challenge image is processed through the image processing model are more uniform and smooth, a better attack effect can be achieved when the challenge image is used for challenge, and the detection capability and the attack resistance of the model can be improved when the challenge image is used for challenge training.

Description

Countermeasure sample generation method, model training method, image recognition method and device

Technical Field

The present application relates to the field of artificial intelligence, and more particularly, to an antagonistic sample generation method, a model training method, an image recognition method, and an image recognition device.

Background

With the continued development of artificial intelligence techniques, the concept of countering samples, i.e., deliberately adding some fine perturbations to the image, has been proposed, which results in the model giving a false output with high confidence. Currently, challenge samples have been applied in many directions, such as challenge training and challenge attacks. However, the performance of the challenge samples obtained by the existing scrambling methods is not ideal.

Disclosure of Invention

The application provides a method for generating a countermeasure sample, a model training method, an image recognition method and a device, wherein better attack effects can be achieved when the countermeasure image obtained by the method is used for countermeasure attack, and the detection capability and the attack resistance capability of a model can be improved when the countermeasure image obtained by the method is used for countermeasure training.

In a first aspect, there is provided a challenge sample generating method comprising: acquiring a first output result obtained by processing a first image by a target image processing model, wherein the target image processing model is a pre-trained model, and the first output result is used for indicating the probability that the first image is in each of C categories, and C is an integer greater than 1; c weight values are used for carrying out weighting processing on C probabilities indicated by the first output result to obtain C weighted probabilities, wherein the C weight values, the C probabilities and the C weighted probabilities are in one-to-one correspondence, each weighted probability in the C weighted probabilities is obtained by carrying out weighting processing on the corresponding probability in the C probabilities by using the corresponding weight value in the C weight values, and the C weight values are C values which are randomly and uniformly distributed in a preset interval range; determining a loss function of the C weighted probabilities; and carrying out disturbance processing on the first image based on the loss function to obtain a countermeasure image.

In a second aspect, a model training method is provided, including: obtaining an image sample set, the image sample set comprising a challenge image; training an image classification model by utilizing the image sample set; wherein the challenge image is generated by a method in any of the implementations of the first aspect described above.

In a third aspect, an image recognition method is provided, including: acquiring an image to be classified; performing image recognition on the image to be classified by using an image classification model to obtain a recognition result; wherein the image classification model is trained by the method of the second aspect.

In a fourth aspect, there is provided an challenge sample generating device comprising: the image processing device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a first output result obtained by processing a first image by a target image processing model, the target image processing model is a pre-trained model, the first output result is used for indicating the probability that the first image is in each of C categories, and C is an integer larger than 1; the weighting processing unit is used for carrying out weighting processing on the C probabilities indicated by the first output result by using C weight values to obtain C weighted probabilities, wherein the C weight values, the C probabilities and the C weighted probabilities are in one-to-one correspondence, each weighted probability in the C weighted probabilities is obtained by carrying out weighting processing on the corresponding probability in the C probabilities by using the weight value corresponding to the C weight values, and the C weight values are C values which are randomly and uniformly distributed in a preset interval range; a determining unit, configured to determine a loss function of the C weighted probabilities; and the disturbance processing unit is used for carrying out disturbance processing on the first image based on the loss function to obtain a countermeasure image.

In a fifth aspect, there is provided a model training apparatus comprising: an acquisition unit configured to acquire an image sample set including a countermeasure image; the training unit is used for training the image classification model by utilizing the image sample set; wherein the challenge image is generated by a method in any of the implementations of the first aspect described above.

In a sixth aspect, there is provided an image recognition apparatus comprising: the acquisition unit is used for acquiring the images to be classified; the identification unit is used for carrying out image identification on the images to be classified by using an image classification model to obtain an identification result; wherein the image classification model is trained by the method of the second aspect.

In a seventh aspect, there is provided an electronic device, the apparatus comprising: a memory for storing a program; a processor for executing the program stored in the memory, the processor being for executing the method of any one of the implementation forms of the first aspect or the second aspect or the third aspect when the program stored in the memory is executed.

In an eighth aspect, a computer readable medium is provided, the computer readable medium storing program code for execution by a device, the program code comprising instructions for performing the method of any one of the implementations of the first or second or third aspects.

A ninth aspect provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of any one of the implementations of the first or second or third aspects described above.

In a tenth aspect, a chip is provided, the chip including a processor and a data interface, the processor reading instructions stored on a memory through the data interface, performing the method of any implementation of the first aspect or the second aspect or the third aspect.

In this embodiment of the present application, the C weight values are C values that are randomly and uniformly distributed in a preset interval range, and the C weight values are used to perform weighting processing on C probabilities indicated by the first output result, so that differences between the obtained C weighted probabilities are more uniform and smooth, at this time, after the first image is subjected to disturbance processing based on a loss function of the C weighted probabilities, a countermeasure image is obtained, and the C probabilities output after the countermeasure image is processed by an image processing model are more uniform and smooth. Since the C probabilities outputted after the challenge image is processed by the image processing model are more uniform and smoother, the model is more easily deceptively deceived by using the challenge image when the challenge is performed so that the model is misjudged, thereby achieving a better attack effect, and the model can be adapted to disturbance (brought by the challenge image) when the challenge training is performed so as to improve the resistance of the model to the challenge image, thereby improving the robustness of the model.

Drawings

Fig. 1 is a schematic flow chart of an challenge sample generation method provided in one embodiment of the present application.

Fig. 2 is a schematic structural diagram of an image processing model according to an embodiment of the present application.

FIG. 3 is a schematic flow chart of a model training method provided in one embodiment of the present application.

Fig. 4 is a schematic structural diagram of a backbone network according to an embodiment of the present application.

Fig. 5 is a schematic flow chart of an image recognition method provided in one embodiment of the present application.

Fig. 6 is a schematic block diagram of an challenge sample generating device provided in one embodiment of the present application.

FIG. 7 is a schematic block diagram of a model training apparatus provided in one embodiment of the present application.

Fig. 8 is a schematic block diagram of an image recognition apparatus provided in one embodiment of the present application.

Fig. 9 is a schematic block diagram of an apparatus provided in another embodiment of the present application.

Detailed Description

The following description of the technical solutions in the embodiments of the present application will be made clearly and completely with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The method in the embodiment of the present application may be applied to various image processing scenarios, which is not limited in the embodiment of the present application. For example, the method in the embodiment of the application can perform countermeasure training on the image processing mode, and the image processing model obtained after training can be used for face recognition.

With the rapid development of artificial intelligence technology, deep neural network models are widely used in many fields (e.g., the field of computer vision). In actual use, the deep neural network model often needs to process the input image including pixel perturbations, at which time the performance of the deep neural network model may drop dramatically.

In order to solve the problem, a method of countermeasure training is proposed, that is, in the model training process, a countermeasure image containing disturbance information is added to training to perform training, so as to reduce interference of noise information on a trained model. For example, disturbance data may be generated that is randomly and evenly distributed around the original data (i.e., the input image), which is used to scramble the original data to obtain a challenge image (which may also be referred to as a challenge sample) that is used to challenge the image processing model.

However, the disturbance data which are uniformly distributed at random cannot reflect the noise information in practice, and the countermeasure images obtained by using the disturbance data cannot reflect the diversity of the countermeasure images in practice, for example, when performing countermeasure training on an image classification model, the countermeasure images obtained by using the disturbance data may affect only the results of a part of the classes output by the model, and have no or little influence on the results of other classes, and at this time, the training corresponds to training using only part of the classes of the countermeasure images in the training process, and therefore, the situation similar to the situation of "overfitting" on part of the classes, that is, the recognition effect of the trained model on part of the classes is better, but the recognition effect on other classes is not ideal, may occur.

Based on the above-mentioned problems, an embodiment of the present application provides a method for generating a challenge sample, where the challenge image obtained by using the method can achieve a better attack effect when performing a challenge attack, and the detection capability and the anti-attack capability of a model can be improved when performing a challenge training by using the challenge image obtained by using the method.

Fig. 1 is a schematic block diagram of an challenge sample generation method 100 of one embodiment of the present application. The method 100 may include steps S110 to S140, specifically as follows:

s110, a first output result obtained by processing the first image by the target image processing model is obtained.

The target image processing model may be used for face recognition. For example, the target image processing model may be an image classification model.

Alternatively, the target image processing model may be a pre-trained model. For example, the target image processing model may have been a neural network model pre-trained on a normal training dataset (not containing perturbation information). The target image processing model may be pre-trained using the first image or may be pre-trained using other images or sets of images.

As shown in fig. 2, the target image processing model may include a backbone network (backbone) and a classification network. Wherein the backbone network may be used to extract feature information of the input image and the classification network may be used to classify the input image (e.g., the first image) based on the feature information. For example, a ResNet50 network may be employed as the backbone network.

The backbone network may include N feature extraction layers and a bayesian network, N being a positive integer.

Optionally, the input of the first layer in the N feature extraction layers may be the input of the target image processing model, and the input of the i+1th layer may be the output of the i th layer, where i is a positive integer less than N.

Alternatively, the bayesian network may include S branches, and the input of each of the S branch subnetworks may be an output of a last layer of the N feature extraction layers, and S is an integer greater than 1.

For example, as shown in fig. 4, the backbone network includes N residual blocks (i.e., feature extraction layers), and the output of the last residual block may be used as the input to each of the S branch subnetworks of the bayesian network.

Alternatively, the input of the classification network may be a mean value of the outputs of the S branched subnetworks.

In S110, feature extraction may be performed on the first image through the backbone network to obtain feature information; and processing the characteristic information through the classification network to obtain the first output result.

The first output result may be used to indicate a probability that the first image is of each of C categories, C being an integer greater than 1.

Alternatively, the C categories may include whether the input image is a category of the countermeasure image. That is, the target image processing model may also give a determination result of whether or not the input image is a countermeasure image, while outputting the classification result of the input image.

And S120, weighting the C probabilities indicated by the first output result by using the C weight values to obtain C weighted probabilities.

Optionally, the C weight values may be C values that are randomly and uniformly distributed within a preset interval range. The use of C randomly uniform weight values to weight the first output result can make the difference between these C weighted probabilities smoother and more uniform, and the challenge image obtained by scrambling based on these C weighted probabilities can uniformly affect each of the C categories instead of only affecting some of the C categories, in other words, the challenge image can affect the output of the model (i.e., affect multiple different categories of the model output) from multiple different directions, which is equivalent to inputting the challenge image containing multiple different perturbations into the model, and thus, by this method, the diversity of the challenge image can be improved.

Alternatively, the C weight values, the C probabilities, and the C weighted probabilities may be in one-to-one correspondence. Each weighted probability of the C weighted probabilities is obtained by weighting a corresponding probability of the C probabilities with a corresponding weight value of the C weight values.

Alternatively, the preset interval range may be-1 to 1. The values between (-1, 1) are smaller than those in other intervals, so that C weight values which are randomly and uniformly distributed between (-1, 1) can be taken, the calculation complexity can be simplified, and the generation efficiency of the countermeasure sample can be improved.

S130, determining a loss function of the C weighted probabilities.

Alternatively, the loss function may be C weighted probabilities, or the loss function may be a difference between the C weighted probabilities and the label of the first image. The label of the first image may be used to indicate the true probability that the first image is for each of the C categories. For example, assuming that C is 3, the 3 categories correspond to a cat, a puppy, and a rabbit, respectively, and the first image is an image including the puppy, then the tag of the first image may indicate that the real probability corresponding to the puppy is 100%, the real probability corresponding to the cat is 0%, and the real probability corresponding to the rabbit is 0%.

For example, assuming that C is 3, the 3 categories correspond to a cat, a puppy, and a rabbit, and probabilities corresponding to the 3 categories in the output result output by the target image model are: 80% of kittens, 10% of puppies and 10% of rabbits; the 3 classes are weighted by using 3 weight values (for convenience of description, it is assumed here that the weight values corresponding to the 3 classes are all 1), and the obtained 3 weighted probabilities are respectively: 80%, 10%; the real labels corresponding to these 3 categories are: the true probability corresponding to the puppy is 100%, the true probability corresponding to the kitten is 0%, the 3 weighted probabilities and the true labels corresponding to the 3 categories can be differenced to obtain a difference value corresponding to the puppy of 20%, a difference value corresponding to the kitten of 10% and a difference value corresponding to the kitten of 10%, and at the moment, the difference value corresponding to the 3 categories can be used as a loss function.

And S140, performing disturbance processing on the first image based on the loss function to obtain a countermeasure image.

Alternatively, a gradient of the loss function may be calculated and the first image may be scrambled based on the gradient. For example, a gradient of the loss function with respect to the first image may be acquired, and a perturbation process may be performed on the first image based on the gradient.

In general, when calculating a loss value of a model in a model training process, a parameter in the model is used as a parameter for calculating a bias derivative in a reverse derivation manner, and a bias derivative value (i.e., gradient) of an output result of the model is calculated. Similar to the process of calculating the model loss value, the gradient with respect to the first image refers to calculating the bias value of the loss function by using the first image as a parameter for calculating the bias.

Further, the first image may be scrambled using the direction of the gradient. For example, disturbance processing is performed on the first image based on the direction of the gradient, and a countermeasure image is obtained. Wherein the disturbance direction of the first image may coincide with the direction of the gradient.

For example, the sign of the gradient may be determined, and if the sign of the gradient is positive, the direction of the gradient may be positive, the disturbance value may be added to the first image, and if the sign of the gradient is negative, the direction of the gradient may be negative, and the disturbance value may be subtracted from the first image.

Further, a disturbance range of disturbance values may also be defined. For example, the first image is subjected to disturbance processing within a preset disturbance range based on the direction of the gradient. In this way, the computational complexity can be simplified, and thus the generation efficiency of the challenge sample can be improved.

Alternatively, the disturbance range may not affect the human eye to correctly identify the true type of the challenge sample obtained by the disturbance, so that the influence of the challenge image obtained by the scrambling on the user's recognizability may be reduced.

In this embodiment of the present application, the C weight values are C values that are randomly and uniformly distributed in a preset interval range, and the C weight values are used to perform weighting processing on C probabilities indicated by the first output result, so that differences between the obtained C weighted probabilities are more uniform and smooth, at this time, after the first image is subjected to disturbance processing based on a loss function of the C weighted probabilities, a countermeasure image is obtained, and the C probabilities output after the countermeasure image is processed by an image processing model are more uniform and smooth. Therefore, a better attack effect can be achieved when the challenge image is used for challenge, and the detection capability and the attack resistance capability of the model can be improved when the challenge image is used for challenge training.

The challenge image obtained by the challenge sample generation method can be used for training an image processing model and also can be used for performing challenge attack on the image processing model.

For example, a set of image samples may be obtained with which an image classification model is trained, wherein the set of image samples may include a challenge image generated by the method of fig. 1.

In the embodiment of the present application, the challenge image may be generated in advance, and then the generated challenge image may be used to train the image processing model, or the challenge image may be generated in the process of training the image processing model, and the image processing model may be trained using the challenge image. A method of generating a challenge image in the course of training an image processing model and training the image processing model using the challenge image is described in detail below in connection with the embodiment in fig. 3.

FIG. 3 is a schematic block diagram of a model training method 300 according to another embodiment of the present application. The method 300 may include steps S310 to S340, specifically as follows:

s310, processing the input image by using the image processing model.

The image processing model may be a face recognition model that has been trained on a normal training dataset. As shown in fig. 2, the image processing model may include a backbone network (backbone) and a classification network.

The image processing model may be bayesian network transformed in advance. Alternatively, the last residual block (S) in the backbone network may be duplicated as an S-share (S is a positive integer) and the last residual block (S) in the original backbone network may be replaced with the S-share. For example, as shown in fig. 4, the original backbone network includes n+1 residual blocks, and the last residual block may be duplicated in S shares (i.e., S output blocks in fig. 4) and replaced with the duplicated S shares. The modified image processing model may be denoted as M.

At this time, the input image X may be input into the converted image processing model M, and the output M (X) of the model (i.e., the first output result in the method 100) may be obtained after the processing. For S output blocks of the bayesian network, one of the output blocks (e.g., randomly selected) may be taken for forward derivation, and the output of the output block is taken as the output result of the backbone network.

When the image processing model M processes the input image, the rest parts except the Bayesian network normally conduct forward derivation.

S320, calculating a loss value corresponding to the normal data.

At this time, the input image does not add disturbance data, and thus the input image may be referred to as normal data.

A cross entropy loss function L, l=cross-entcopy (M (X), Y) corresponding to the input image may be calculated. Wherein cross-entopy (,) represents the cross entropy loss function, Y is the true label of the input image X.

S330, generating a countermeasure image.

At this time, all of the S output blocks in the bayesian network in the image processing model M may forward derive X, average the outputs of the S output blocks, and use the average as the output result of the backbone network. And inputting the output result of the backbone network into a classification network to obtain an output result M (X') of the model M.

In order to distinguish from the output result M (X) in S310, the output result in S330 is collectively denoted as M (X'), and it should be noted that the steps in S330 may be iterated E times, where the input image input for the first time is actually the input image X to which the disturbance data is not added, that is, the output of the first model is M (X).

The number C (i.e., the C weight values in method 100) can be randomly taken from the uniform distribution of (-1, 1) as the uniform direction vector V of the output space.

At this point, the product M (X '). V of the vector V and M (X') (i.e., the C weighted probabilities in method 100) can be calculated.

Gradient g can be calculated by the following equation 1:

wherein M (X ') includes probability vectors corresponding to the C classifications, M (X'). V represents projecting the probability vector of each classification into a uniform random direction. (in this case, the loss function may be C weighted probabilities)

Further, the input image M may be adjusted using the gradient g by the following equation 2:

X′＝Projection(X′+ρ·sign(g),-∈,∈)

the method comprises the steps of (a) cutting each numerical value in an updated disturbance image into a range of (X-E, X+E), judging sign (g) as the sign of each pixel value of gradient g, wherein a non-negative value is 1, a zero value is 0, a negative value is-1, and rho and E are preset scalar parameters. In this case, the adjusted X' can be regarded as the generated countermeasure image.

Further, the step in S330 described above may be iterated E times. E is a preset parameter, and E is a positive integer. For example, the generated X 'may be continued to be input into the model M to obtain a new output M (X'), and the input image may be continuously updated using the above formulas 1, 2. The process may iterate E times so that the projection of the generated challenge image X' satisfying the C uniform directions is maximized.

S340, calculating a loss value corresponding to the countermeasure image.

The X' can be divided into two parts (for example, divided into two parts along the depth direction), and the corresponding loss value of the countermeasure image can be obtained by calculating the Euclidean distance between the features extracted by the backbone network after the two parts of disturbance images are modified

For example, the loss value R corresponding to the countermeasure image can be calculated by the following equation 3:

wherein, X represents the number of images in the input image X, X 'can be equally divided into two parts to obtain X' ₁ And X' ₂ I is an integer.

S350, updating parameters of the image processing model.

The weight parameters of the bayesian network in the backbone network can be updated through gradient increment, and the method can be specifically shown as follows:

the remaining network weight parameters except the bayesian network in the backbone network can be updated, and the method specifically can be shown as the following formula:

wherein, alpha, beta and tau are scalar parameters set by human in advance.

Further, a batch of input images may be taken until all training data in the training set has been traversed.

Alternatively, the number of iterations of the training process may be manually preset, and the steps S310 to S350 may be iterated a plurality of times. At the end of the iteration, the countermeasure training ends.

At this time, the image to be processed may be processed using the trained image processing model. For example, the image processing model may be a face recognition model, and the image to be processed and another image Z to be determined of the identity thereof may be input into the image processing model respectively, and whether the same person is determined according to the similarity measurement distance between the two features and a preset threshold.

By the method, the generated disturbance data can cover wider sample distribution, so that the trained model has better generalization capability and defensive capability.

Fig. 5 is a schematic flow chart of the image recognition method of the present application. The method 500 in fig. 5 includes steps 510 and 520, and is specifically as follows:

s510, obtaining an image to be classified.

S520, performing image recognition on the image to be classified by using the image classification model to obtain a recognition result.

The image classification model may be obtained after training by the model training method in the above embodiment.

Method embodiments of the present application are described above in detail in connection with fig. 1-5, and apparatus embodiments of the present application are described below in detail in connection with fig. 6-9. It is to be understood that the description of the method embodiments corresponds to the description of the device embodiments, and that parts not described in detail can therefore be seen in the preceding method embodiments.

Fig. 6 is a schematic structural view of an countermeasure sample generating apparatus provided in an embodiment of the present application. As shown in fig. 6, the apparatus 600 includes an obtaining unit 610, a weighting processing unit 620, a determining unit 630, and a disturbance processing unit 640, which are specifically as follows:

an obtaining unit 610, configured to obtain a first output result obtained by processing a first image by using a target image processing model, where the target image processing model is a pre-trained model, and the first output result is used to indicate a probability that the first image is of each of C categories, and C is an integer greater than 1;

a weighting processing unit 620, configured to perform weighting processing on the C probabilities indicated by the first output result by using C weight values, so as to obtain C weighted probabilities, where the C weight values, the C probabilities and the C weighted probabilities are in one-to-one correspondence, each weighted probability in the C weighted probabilities is obtained by performing weighting processing on a corresponding probability in the C probabilities by using a weight value corresponding to the C weight values, and the C weight values are C values that are randomly and uniformly distributed in a preset interval range;

a determining unit 630, configured to determine a loss function of the C weighted probabilities;

and a disturbance processing unit 640, configured to perform disturbance processing on the first image based on the loss function, so as to obtain a countermeasure image.

Optionally, the preset interval ranges from-1 to 1.

Optionally, the disturbance processing unit 640 is specifically configured to: acquiring a gradient of the loss function relative to the first image; and performing disturbance processing on the first image based on the gradient.

Optionally, the disturbance processing unit 640 is specifically configured to: and carrying out disturbance processing on the first image based on the direction of the gradient to obtain a countermeasure image, wherein the disturbance direction of the first image is consistent with the direction of the gradient.

Optionally, the disturbance processing unit 640 is specifically configured to: and carrying out disturbance processing on the first image in a preset disturbance range based on the direction of the gradient.

Optionally, the target image processing model includes a backbone network and a classification network, the backbone network includes N feature extraction layers and a bayesian network, an input of a first layer of the N feature extraction layers is an input of the target image processing model, an input of an i+1th layer is an output of the i th layer, the bayesian network includes S branch sub-networks, an input of each branch sub-network of the S branch sub-networks is an output of a last layer of the N feature extraction layers, an input of the classification network is a mean value of outputs of the S branch sub-networks, N is a positive integer, i is a positive integer smaller than N, and S is an integer greater than 1; the acquiring unit 610 is specifically configured to: extracting features of the first image based on the backbone network to obtain feature information; and processing the characteristic information based on the classification network to obtain the first output result.

Fig. 7 is a schematic structural diagram of a model training apparatus according to an embodiment of the present application. As shown in fig. 7, the apparatus 700 includes an acquisition unit 710 and a training unit 720, which are specifically as follows:

an acquisition unit 710 for acquiring an image sample set including a challenge image;

and a training unit 720, configured to train the image classification model by using the image sample set.

Wherein the challenge image may be generated by the method 100 of fig. 1 described above.

Fig. 8 is a schematic structural diagram of an image recognition apparatus provided in an embodiment of the present application. As shown in fig. 8, the apparatus 800 includes an obtaining unit 810 and an identifying unit 820, and is specifically as follows:

an acquiring unit 810 for acquiring an image to be classified;

the identifying unit 820 is configured to perform image identification on the image to be classified by using an image classification model, so as to obtain an identification result;

wherein, the image classification model can be trained by the method described in fig. 3.

Fig. 9 is a schematic block diagram of an apparatus 900 of an embodiment of the present application. The apparatus 900 shown in fig. 9 comprises a memory 901, a processor 902, a communication interface 903, and a bus 904. The memory 901, the processor 902, and the communication interface 903 are communicatively connected to each other via a bus 904.

The memory 901 may be a Read Only Memory (ROM), a static storage device, a dynamic storage device, or a random access memory (random access memory, RAM). The memory 901 may store a program, and when the program stored in the memory 901 is executed by the processor 902, the processor 902 is configured to perform the steps of the method of the embodiment of the present application, for example, the steps of the embodiments shown in fig. 1, 3, and 5 may be performed.

The processor 902 may employ a general-purpose central processing unit (central processing unit, CPU), microprocessor, application specific integrated circuit (application specific integrated circuit, ASIC), or one or more integrated circuits for executing associated programs to perform the methods of the method embodiments of the present application.

The processor 902 may also be an integrated circuit chip with signal processing capabilities. In implementation, various steps of methods of embodiments of the present application may be performed by integrated logic circuitry in hardware or by instructions in software in processor 902.

The processor 902 may also be a general purpose processor, a digital signal processor (digital signal processing, DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (field programmable gate array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The steps of a method disclosed in connection with the embodiments of the present application may be embodied directly in hardware, in a decoded processor, or in a combination of hardware and software modules in a decoded processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in the memory 901, and the processor 902 reads information in the memory 901, and combines the hardware thereof to perform functions required to be performed by units included in each apparatus in the embodiments of the present application, or perform the methods of each method embodiment in the present application, for example, may perform each step/function of the embodiments shown in fig. 1, 3, and 5.

The communication interface 903 may enable communication between the apparatus 900 and other devices or communication networks using, but is not limited to, a transceiver or the like.

The bus 904 may include a path for transferring information between various components of the apparatus 900 (e.g., the memory 901, the processor 902, the communication interface 903).

It should be understood that the apparatus 900 shown in the embodiments of the present application may be a processor or a chip for performing the methods described in the embodiments of the present application.

It should be appreciated that in embodiments of the present application, the processor may be a central processing unit (central processing unit, CPU), the processor may also be other general purpose processors, digital signal processors (digital signal processor, DSP), application specific integrated circuits (application specific integrated circuit, ASIC), off-the-shelf programmable gate arrays (field programmable gate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

It should be understood that in the embodiments of the present application, "B corresponding to a" means that B is associated with a, from which B may be determined. It should also be understood that determining B from a does not mean determining B from a alone, but may also determine B from a and/or other information.

It should be understood that the term "and/or" is merely an association relationship describing the associated object, and means that three relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.

It should be understood that, in various embodiments of the present application, the sequence numbers of the foregoing processes do not mean the order of execution, and the order of execution of the processes should be determined by the functions and internal logic thereof, and should not constitute any limitation on the implementation process of the embodiments of the present application.

In the several embodiments provided in this application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line (digital subscriber line, DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be read by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a digital versatile disk (digital video disc, DVD)), or a semiconductor medium (e.g., a Solid State Disk (SSD)), or the like.

The foregoing is merely specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes and substitutions are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of challenge sample generation, comprising:

acquiring a first output result obtained by processing a first image by a target image processing model, wherein the target image processing model is a pre-trained model, and the first output result is used for indicating the probability that the first image is in each of C categories, and C is an integer greater than 1;

c weight values are used for carrying out weighting processing on C probabilities indicated by the first output result to obtain C weighted probabilities, wherein the C weight values, the C probabilities and the C weighted probabilities are in one-to-one correspondence, each weighted probability in the C weighted probabilities is obtained by carrying out weighting processing on the corresponding probability in the C probabilities by using the corresponding weight value in the C weight values, and the C weight values are C values which are randomly and uniformly distributed in a preset interval range;

determining a loss function of the C weighted probabilities;

and carrying out disturbance processing on the first image based on the loss function to obtain a countermeasure image.

2. The method of claim 1, wherein the perturbing the first image based on the loss function comprises:

acquiring a gradient of the loss function relative to the first image;

and performing disturbance processing on the first image based on the gradient.

3. The method of claim 2, wherein the perturbing the first image based on the gradient comprises:

and carrying out disturbance processing on the first image based on the direction of the gradient to obtain a countermeasure image, wherein the disturbance direction of the first image is consistent with the direction of the gradient.

4. A method according to any one of claims 1 to 3, wherein the target image processing model comprises a backbone network and a classification network, the backbone network comprising N feature extraction layers, the input of the first of the N feature extraction layers being the input of the target image processing model, the input of the i+1th layer being the output of the i-th layer, the bayesian network comprising S branch sub-networks, the input of each of the S branch sub-networks being the output of the last of the N feature extraction layers, the input of the classification network being the average of the outputs of the S branch sub-networks, N being a positive integer, i being a positive integer less than N, S being an integer greater than 1;

the process of processing the first image by the target image processing model comprises the following steps:

extracting the characteristics of the first image through the backbone network to obtain characteristic information;

and processing the characteristic information through the classification network to obtain the first output result.

5. A method of model training, comprising:

obtaining a set of image samples comprising a challenge image, the challenge image being generated using the method of any of claims 1 to 4;

and training an image classification model by using the image sample set.

6. An image recognition method, comprising:

acquiring an image to be classified;

and carrying out image recognition on the image to be classified by using an image classification model to obtain a recognition result, wherein the image classification model is trained by the method of claim 5.

7. An challenge sample generating device, comprising:

the image processing device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a first output result obtained by processing a first image by a target image processing model, the target image processing model is a pre-trained model, the first output result is used for indicating the probability that the first image is in each of C categories, and C is an integer larger than 1;

the weighting processing unit is used for carrying out weighting processing on the C probabilities indicated by the first output result by using C weight values to obtain C weighted probabilities, wherein the C weight values, the C probabilities and the C weighted probabilities are in one-to-one correspondence, each weighted probability in the C weighted probabilities is obtained by carrying out weighting processing on the corresponding probability in the C probabilities by using the weight value corresponding to the C weight values, and the C weight values are C values which are randomly and uniformly distributed in a preset interval range;

a determining unit, configured to determine a loss function of the C weighted probabilities;

and the disturbance processing unit is used for carrying out disturbance processing on the first image based on the loss function to obtain a countermeasure image.

8. An image recognition apparatus, comprising:

the acquisition unit is used for acquiring the images to be classified;

the identification unit is used for carrying out image identification on the images to be classified by utilizing the image classification model to obtain an identification result;

wherein the image classification model is trained by the method of claim 5.

9. An electronic device comprising a processor and a memory, the memory for storing program instructions, the processor for invoking the program instructions to perform the method of any of claims 1-6.

10. A computer readable storage medium storing program code for execution by a device, the program code comprising instructions for performing the method of any one of claims 1 to 6.