CN113487545A

CN113487545A - Method for generating disturbance image facing to attitude estimation depth neural network

Info

Publication number: CN113487545A
Application number: CN202110704930.5A
Authority: CN
Inventors: 刘复昌; 潘志庚; 曹明亮; 丁丹丹; 张明敏; 梁应滔; 梁应鸿; 王昊
Original assignee: Guangzhou Jiudi Digital Technology Co ltd
Current assignee: Guangzhou Jiudi Digital Technology Co ltd
Priority date: 2021-06-24
Filing date: 2021-06-24
Publication date: 2021-10-08

Abstract

The invention discloses a method for generating a disturbance image facing to an attitude estimation depth neural network, which relates to the technical field of image processing and comprises the following steps: s1, inputting the image x into a target neural network K to obtain a result P0; s2, recombining P0 to generate P1, taking P0 as a label of the image x, and taking P1 as a first training result of the image x; s3, inputting the P0 and the P1 into a target neural network K, generating an error value of the P0 and the P1, acquiring the gradient direction of the error value, and multiplying the gradient direction by a coefficient lambda to obtain a single-time noise value z; and S4, obtaining noise value accumulation z ' through multiple iterative training, and normalizing z ', so that the image x and z ' are superposed to generate an interference graph a. According to the method, the error value generated by the result of the predicted image and the real result is obtained, and the disturbance noise is added to the image by obtaining the gradient direction of the change of the error value, so that not only can a good misleading effect be generated on image classification, but also a good misleading effect can be generated on the aspect of gesture recognition, and the original correct gesture is recognized as another incoherent gesture.

Description

Method for generating disturbance image facing to attitude estimation depth neural network

Technical Field

The invention relates to the technical field of image processing, in particular to a method for generating a disturbance image facing to an attitude estimation depth neural network.

Background

With the continuous development and maturity of artificial intelligence technology, various intelligent identification methods are available in the field of image identification, and objects in images, such as people, animals, vehicles and the like, can be identified; still other applications may recognize human gestures, and the like. Although most of the currently used intelligent recognition methods can achieve high accuracy, most of the methods are based on the result of training common images, and once the images are modified or some other things are added, the original intelligent recognition method with high accuracy is wrong.

The existing chinese patent publication No. CN109993805A discloses a highly hidden adversity image attack method for a deep neural network, which adds noise to an image and uses an Lp paradigm to measure the magnitude of generated noise, thereby achieving the purpose of changing the image as little as possible.

Although the image attack can mislead the intelligent identification method to be wrong, on the other hand, the existing intelligent identification system can be made to be more robust; in other words, if the generated disturbance image can be used for training the recognition system, or some method for resisting image disturbance is added when the recognition system is trained, the intelligent recognition system generated by training can be more robust and can be more resistant to interference.

At present, a method for generating a disturbance image of a depth network related to pose estimation does not exist in the prior art, and therefore, the invention aims to design and provide a method for generating a disturbance image of a depth neural network oriented to pose estimation.

Disclosure of Invention

The invention aims to provide a method for generating a disturbance image facing to a posture estimation depth neural network.

The technical purpose of the invention is realized by the following technical scheme: a method for generating a disturbance image facing to a posture estimation deep neural network comprises the steps of giving a human body posture estimation neural network or a gesture recognition neural network K, supposing that the prediction result of the neural network K has 100% accuracy, supposing that an attacker has a white box access right to a target model, obtaining loss function information of the neural network, and setting a target type or posture t; obtaining an error value of the identification result and the real result by using a rapid gradient descent method so as to obtain a gradient direction for reducing the error value, accumulating the value through repeated iterative calculation so as to generate a disturbance factor, and then overlapping the disturbance factor with the original image to generate a disturbance image; the method specifically comprises the following steps:

s1, inputting the image x into a target neural network K to obtain a result P0;

s2, recombining the results P0 to generate a result P1, taking P0 as a label of the image x, and taking P1 as a first training result of the image x;

s3, inputting the result P0 and the result P1 into a target neural network K, generating an error value Loss (P0, P1) of P0 and P1, acquiring the gradient direction of the error value Loss (P0, P1), and multiplying the gradient direction by a coefficient lambda to obtain a single iteration result, namely a single noise value z;

s4, acquiring noise value accumulation z ' through multiple iterative training, and after normalizing z ', superposing the image x and z ' to obtain a confrontation sample, namely generating the interference image a.

Further, the neural network K in step S1 is a human body posture estimation neural network or a gesture recognition neural network, and the neural network K may generate a disturbance for a human body posture or a disturbance for a human body gesture.

Further, the method for generating the result P1 by recombining the result P0 in the step S2 includes: the keypoint in the result P0 was randomly moved to another position as P1.

Further, the gradient direction of the error value Loss (P0, P1) obtained in step S3 is obtained by a fast gradient descent method, and only the direction of the gradient change is obtained, which is not a value.

Further, in the process of obtaining the noise value accumulation z' through multiple iterative training in step S4, the weight value of the neural network K is not really changed, and it is ensured that only the image itself is attacked rather than the target neural network; and z' is normalized in step S4 to ensure that the finally generated noise values are not perceived in terms of visual effect disturbance.

In the technical scheme of the invention, for the disturbance method of attitude estimation, a natural graph is input to a neural network, and an incorrect recognition result is output. The type of error may be arbitrary or may be specified by an attacker. For the disturbance method of gesture recognition, a natural graph is input to a neural network, and an error gesture is output. The incorrect pose may be arbitrary or may be specified by an attacker.

In conclusion, the invention has the following beneficial effects:

1. according to the method, the disturbance noise is added to the image through the error value Loss generated by the predicted image result and the real result and the gradient direction of the change of the error value Loss, so that not only can a good misleading effect be generated on image classification, but also a good misleading effect can be generated on the aspect of gesture recognition, and the original correct gesture is recognized as another unrelated gesture;

2. the disturbance image generated by the method is convenient to train the recognition system, or the method of the invention is used for increasing the resistance to image disturbance when the recognition system is trained, so that the intelligent recognition system generated by training is more robust and can resist interference.

Drawings

Fig. 1 is a flow chart in an embodiment of the present invention.

Detailed Description

The present invention is described in further detail below with reference to fig. 1.

Example (b): a method for generating a disturbance image facing to an attitude estimation depth neural network is disclosed, as shown in FIG. 1, and specifically comprises the following steps:

s1, inputting the image x into the target neural network K to obtain a result P0.

And S2, recombining the results P0 to generate a result P1, taking P0 as a label of the image x, and taking P1 as a first training result of the image x.

And S3, inputting the result P0 and the result P1 into the target neural network K, generating an error value Loss (P0 and P1) of the P0 and the P1, acquiring the gradient direction of the error value Loss (P0 and P1), and multiplying the gradient direction by a coefficient lambda to obtain the result of a single iteration, namely a single noise value z.

The neural network K in step S1 is a human body posture estimation neural network or a gesture recognition neural network, and the neural network K may generate a disturbance for a human body posture or a disturbance for a human body gesture.

In step S2, the method for generating the result P1 by recombining the result P0 includes: the keypoint in the result P0 was randomly moved to another position as P1.

In step S3, the gradient direction of the error value Loss (P0, P1) is obtained by a fast gradient descent method, and only the direction of the gradient change is obtained, but not the value.

In the process of obtaining the noise value accumulation z' through multiple iterative training in step S4, the weight value of the neural network K is not really changed, and it is ensured that only the picture itself is attacked rather than the target neural network. And z' is normalized in step S4 to ensure that the finally generated noise values are not perceived in terms of visual effect disturbance.

In this embodiment, a pose estimation network K trains a large number of images x from different poses, which all contain keypoint information for each pose, label K (x) e P ═ P1, P2, P3 …, pn.

A gesture recognition K trains a number of natural images x, their labels K (x) e {1,2,3 …, C }, from different classes C, respectively.

Normalizing the images to [0, 1%]In (1), assume the spatial domain of the natural image is

It is assumed that k (x) cx is correct for each x. Then the class of image x is denoted by cx.

Let Ak represent the space of the challenge sample, and all samples in Ak must be similar to the natural image so as to be imperceptible and accurately deceive the classification network.

The presence of one x for each a ∈ Ak makes d (x, a) sufficiently small, where d is the similarity of x to a.

For the perturbation method of the attitude estimation neural network:

step 1: a natural graph x is input for the pose estimation neural network, a predicted result P0 is obtained and the result is considered to be the correct result.

Step 2: assuming this is very accurate, the keypoint point in the result is randomly moved to another position as P1, and then P0 is taken as the training target and P1 is taken as the first training result.

And step 3: at this time, an error value Loss (P0, P1) between P0 and P1 is obtained, so that a gradient direction is generated when Loss decreases, and the gradient direction is multiplied by a coefficient λ to be used as the one-shot noise value z.

And 4, step 4: and acquiring noise value accumulation z 'through multiple iterative training, and obtaining the countermeasure sample a by using x + z'.

If the misleading pose estimation neural network is identified as the desired result, then T is used as the training target and P0 is used as the first training result in step 1.

For the perturbation method of the gesture recognition neural network:

step 1: a natural graph x is input into the neural network, and a one-hot encoding form of a prediction result P0 is obtained, wherein P0 is { P1, P2.

Step 2: assuming that the result is very accurate, the sequence of the values in P0 is shuffled to obtain P1, P0 is then used as the training target, and P1 is used as the first training result.

If the misleading classification neural network is identified as the desired result, then T is used as the training target and P0 is used as the first training result in step 1.

In the embodiment of the invention, the error value Loss generated by the result of the predicted image and the real result is obtained, and the disturbance noise is added to the image by obtaining the gradient direction of the change of the error value Loss, so that not only can a good misleading effect be generated on image classification, but also a good misleading effect can be generated on the aspect of gesture recognition, and the original correct gesture is recognized as another unrelated gesture; in addition, the disturbance image generated by the method is convenient to train the recognition system, or the method of the invention is used for increasing the resistance to image disturbance when the recognition system is trained, so that the intelligent recognition system generated by training is more robust and can resist interference.

The present embodiment is only for explaining the present invention, and it is not limited to the present invention, and those skilled in the art can make modifications of the present embodiment without inventive contribution as needed after reading the present specification, but all of them are protected by patent law within the scope of the claims of the present invention.

Claims

1. A method for generating a disturbance image facing to an attitude estimation deep neural network is characterized by comprising the following steps: the method specifically comprises the following steps:

s1, inputting the image x into a target neural network K to obtain a result P0;

2. The method for generating the perturbed image of the pose estimation-oriented deep neural network according to claim 1, wherein: the neural network K in step S1 is a human body posture estimation neural network or a gesture recognition neural network, and the neural network K may generate a disturbance for a human body posture or a disturbance for a human body gesture.

3. The method for generating the perturbed image of the pose estimation-oriented deep neural network according to claim 1, wherein: the method for generating the result P1 by recombining the result P0 in the step S2 includes: the keypoint in the result P0 was randomly moved to another position as P1.

4. The method for generating the perturbed image of the pose estimation-oriented deep neural network according to claim 1, wherein: in step S3, the gradient direction of the error value Loss (P0, P1) is obtained by a fast gradient descent method, and only the direction of the gradient change is obtained, but not the value.

5. The method for generating the perturbed image of the pose estimation-oriented deep neural network according to claim 1, wherein: in the process of obtaining the noise value accumulation z' through multiple iterative training in the step S4, the weight value of the neural network K is not really changed, and only the picture itself is attacked and not the target neural network is ensured; and z' is normalized in step S4 to ensure that the finally generated noise values are not perceived in terms of visual effect disturbance.