CN116486463B

CN116486463B - Image processing method, related device and storage medium

Info

Publication number: CN116486463B
Application number: CN202310711856.9A
Authority: CN
Inventors: 请求不公布姓名
Original assignee: Beijing Real AI Technology Co Ltd
Current assignee: Beijing Real AI Technology Co Ltd
Priority date: 2023-06-15
Filing date: 2023-06-15
Publication date: 2023-10-03
Anticipated expiration: 2043-06-15
Also published as: CN116486463A

Abstract

The embodiment of the application relates to the field of computer vision, and provides an image processing method, a related device and a storage medium. The method comprises the steps of converting a candidate disturbance image into a disturbance projection image based on the gesture of a preset object included in the candidate image, matching the gesture of a three-dimensional disturbance image corresponding to the disturbance projection image with the gesture of the preset object in the candidate image, fusing the disturbance projection image with the candidate image to obtain a candidate countermeasure image, acquiring a target loss value based on the candidate disturbance image and a target image to update the candidate disturbance image, and re-acquiring the candidate image until the candidate disturbance image can reach a countermeasure attack target to obtain the target disturbance image. In the process of iteratively generating the target disturbance image, the embodiment of the application considers the influence on the disturbance image under different postures of the three-dimensional object, so that the generated target disturbance image is matched with the different postures, namely, the target disturbance image can show ideal anti-attack effect when acting on the object under the different postures.

Description

Image processing method, related device and storage medium

Technical Field

The embodiment of the application relates to the field of computer vision, in particular to an image processing method, a related device and a storage medium.

Background

How to generate the challenge sample aiming at different deep learning models efficiently in the challenge attack study is beneficial to timely finding out the vulnerability of the deep learning model and evaluating the robustness of the deep learning model. Some challenge methods generate challenge samples in the digital world that add less challenge disturbance, which may cause the challenge samples to be incorrectly identified by the deep learning model or as designated tags.

In order to evaluate the robustness or security of face recognition models in the physical world. In the prior art, the disturbance countermeasure is often printed out and stuck to a preset area on the face, so that the image acquisition device can acquire the face disturbance countermeasure image comprising the disturbance countermeasure, so as to input a face recognition model, and the robustness or safety of the model is evaluated. However, since the face itself is a three-dimensional structure, not a two-dimensional plane, i.e., the imaging effect is different in different poses, and the pose of the face is affected by the head movement, i.e., the face can change pose with the head movement. It cannot be guaranteed that the face to which the disturbance countermeasure is attached faces the image capturing device in a preset posture (i.e., a face posture when the disturbance countermeasure is generated in the digital world iteration), that is, the visual expression of the disturbance countermeasure is affected by the posture (three-dimensional angle) of the face toward the image capturing device, and it may not be consistent with the ideal visual effect of the disturbance countermeasure. As can be seen, in the physical world, since the face of the user is three-dimensional, rather than a two-dimensional original image generated in the digital world when the disturbance is countered, the disturbance may be different from the visual representation of the ideal disturbance generated in the digital world after the physical world is pasted on the face, so that the disturbance pasted on the face cannot exert the ideal attack effect, and thus the robustness or safety of the face recognition model cannot be well evaluated.

Disclosure of Invention

The embodiment of the application provides an image processing method, a related device and a storage medium, which can consider the influence of different three-dimensional poses of a human face on visual representation of candidate disturbance images in the process of iteratively generating target disturbance images, so that the generated target disturbance images are matched with the human face images of different three-dimensional poses, namely, ideal anti-attack effects can be shown when the generated target disturbance images act on the human face in different poses, and the human face model can be better tested and evaluated.

In a first aspect, an embodiment of the present application provides an image processing method, including:

acquiring candidate disturbance images and candidate images; the candidate image comprises a preset object with a current gesture as a first gesture;

processing the candidate disturbance images based on the current gesture of the preset object to obtain disturbance projection images; the gesture of the three-dimensional disturbance image corresponding to the disturbance projection image is matched with the current gesture of the preset object;

obtaining a candidate countermeasure image based on the candidate image and the disturbance projection image, and obtaining a target loss value; the target loss value is obtained based on the identification similarity between the candidate countermeasure image and the target image; the target image comprises the preset object or the interference object;

If the target loss value is not converged, updating the candidate disturbance image and the candidate image until the target loss value acquired based on the new candidate countermeasure image and the target image is converged; the new candidate image comprises a preset object with the current gesture being a second gesture;

and taking the candidate disturbance image when the target loss value converges as a target disturbance image.

In a second aspect, an embodiment of the present application provides an image processing apparatus having a function of implementing an image processing method corresponding to the first aspect described above. The functions may be implemented by hardware, or may be implemented by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the functions described above, which may be software and/or hardware.

In one embodiment, the image processing apparatus includes:

an input-output module configured to acquire a candidate disturbance image and a candidate image; the candidate image comprises a preset object with a current gesture as a first gesture;

the processing module is configured to process the candidate disturbance images based on the current gesture of the preset object to obtain disturbance projection images; the gesture of the three-dimensional disturbance image corresponding to the disturbance projection image is matched with the current gesture of the preset object;

The processing module is further configured to obtain a candidate countermeasure image based on the candidate image and the disturbance projection image, and acquire a target loss value; the target loss value is obtained based on the identification similarity between the candidate countermeasure image and the target image; the target image comprises the preset object or the interference object;

the processing module is further configured to update the candidate disturbance image and the candidate image until the target loss value acquired based on the new candidate countermeasure image and the target image converges if the target loss value does not converge; the new candidate image comprises a preset object with the current gesture being a second gesture;

the processing module is further configured to take the candidate disturbance image when the target loss value converges as a target disturbance image.

In a third aspect, embodiments of the present application provide a computer readable storage medium comprising instructions which, when run on a computer, cause the computer to perform the image processing method according to the first aspect.

In a fourth aspect, an embodiment of the present application provides a computing device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the image processing method according to the first aspect when executing the computer program.

In a fifth aspect, an embodiment of the present application provides a chip, where the chip includes a processor coupled to a transceiver of a terminal device, for executing the technical solution provided in the first aspect of the embodiment of the present application.

In a sixth aspect, an embodiment of the present application provides a chip system, where the chip system includes a processor, configured to support a terminal device to implement the functions involved in the first aspect, for example, to generate or process information involved in the image processing method provided in the first aspect.

In one possible design, the above chip system further includes a memory for holding program instructions and data necessary for the terminal. The chip system may be formed of a chip or may include a chip and other discrete devices.

In a seventh aspect, embodiments of the present application provide a computer program product comprising instructions which, when run on a computer, cause the computer to perform the image processing method provided in the first aspect above.

Compared with the prior art, in the embodiment of the application, in each round of iterative generation of the target disturbance image, the candidate disturbance image is converted into the disturbance projection image, the gesture of the three-dimensional disturbance image corresponding to the disturbance projection image is matched with the gesture of the preset object in the candidate image, then the disturbance projection image is fused with the candidate image to obtain the candidate countermeasure image, and the target loss value is obtained based on the candidate disturbance image and the target image so as to update the candidate disturbance image until the candidate disturbance image can achieve the countermeasure attack target, and the target disturbance image is obtained. Because the projection disturbance image is fused to the candidate image instead of directly fusing the candidate disturbance image to the candidate image in the prior art, the candidate countermeasure image obtained by fusing the disturbance projection image and the candidate image is equivalent to simulating the visual performance of the preset object of the candidate disturbance image in the corresponding posture of the physical world, the visual effect of the candidate disturbance image after acting on the preset object in the physical world can be expressed, namely, the countermeasure image generated in the digital world is consistent with the visual representation of the candidate disturbance image in the physical world, so that the target disturbance image in the physical world can exert the ideal attack effect consistent with that in the digital world. Therefore, in the embodiment of the application, the visual representation of the disturbance image after being combined with the preset objects in various possible postures in the physical world is simulated in the process of iteratively generating the target disturbance image. Because the finally obtained target disturbance image is matched with the preset objects in various postures, the target disturbance image obtained by the embodiment of the application can exert an ideal attack resistance effect on the preset objects in various postures of the physical world, and can well evaluate the robustness or the safety of the image recognition model.

Drawings

The objects, features and advantages of embodiments of the present application will become readily apparent from the detailed description of the embodiments of the present application read with reference to the accompanying drawings. Wherein:

FIG. 1 is a schematic diagram of an image processing system according to an embodiment of the present application;

FIG. 2 is a flow chart of an image processing method according to an embodiment of the application;

FIG. 3 is a schematic flow chart of a method for image processing according to an embodiment of the present application for obtaining a disturbance projection image;

FIG. 4 is a schematic diagram showing a comparison between a candidate countermeasure image obtained in the image processing method according to the embodiment of the present application and a candidate countermeasure image obtained in the prior art;

FIG. 5 is a schematic diagram of a result of processing a disturbed projection image in an image processing method according to an embodiment of the application;

FIG. 6 is a schematic flow chart of an iterative candidate disturbance image of an image processing method according to an embodiment of the present application;

FIG. 7 is a schematic flow chart of an iterative candidate disturbance image based on an attention module according to an embodiment of the present application;

FIG. 8 is a flowchart of another method for image processing based on attention module iteration candidate disturbance images according to an embodiment of the present application;

fig. 9 is a schematic structural view of an image processing apparatus according to an embodiment of the present application;

FIG. 10 is a schematic diagram of a computing device in accordance with an embodiment of the application;

FIG. 11 is a schematic diagram of a mobile phone according to an embodiment of the present application;

fig. 12 is a schematic structural diagram of a server according to an embodiment of the present application.

In the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.

Detailed Description

The terms first, second and the like in the description and in the claims of embodiments of the application and in the above-described figures are used for distinguishing between similar objects (e.g. a first image and a second image are each shown as a different image, and vice versa) and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments described herein may be implemented in other sequences than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or modules is not necessarily limited to those listed or explicitly listed or inherent to such process, method, article, or apparatus, but may include other steps or modules that may not be listed or inherent to such process, method, article, or apparatus, and the partitioning of such modules by embodiments of the application may include only one logical partitioning, and may be implemented in additional partitions, such as a plurality of modules may be combined or integrated into another system, or some features may be omitted or not implemented. In addition, the coupling or direct coupling or communication connection shown or discussed may be indirect coupling between modules via interfaces, and the communication connection may be in electrical or other similar forms, which are not limited in this embodiment. The modules or sub-modules described as separate components may or may not be physically separate, may or may not be physical modules, or may be distributed in a plurality of circuit modules, and some or all of the modules may be selected according to actual needs to achieve the purposes of the embodiment of the present application.

The embodiment of the application provides an image processing method, a related device and a storage medium, which can be applied to an image processing system in a scene of generating a disturbance image of a test image recognition model. The image processing device is at least used for carrying out iterative updating on the candidate disturbance images to obtain target disturbance images. The image recognition device is used for recognizing the input image to obtain an image recognition result. The image processing device can be a server or terminal device which is used for carrying out iterative updating on the candidate disturbance image to obtain an application program of the target disturbance image or is provided with the application program for carrying out iterative updating on the candidate disturbance image to obtain the target disturbance image; the image recognition device may be an image recognition program that recognizes an image to obtain a recognition result, for example, an image recognition model, or may be a terminal device in which the image recognition model is deployed.

The scheme provided by the embodiment of the application relates to artificial intelligence (Artificial Intelligence, AI), computer Vision (CV), machine Learning (ML) and other technologies, and is specifically described by the following embodiments:

The AI is a theory, a method, a technology and an application system which simulate, extend and extend human intelligence by using a digital computer or a machine controlled by the digital computer, sense environment, acquire knowledge and acquire an optimal result by using the knowledge. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

AI technology is a comprehensive discipline, and relates to a wide range of technologies, both hardware and software. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

CV is a science of how to make a machine "look at", and more specifically, it means that a camera and a computer are used to replace human eyes to recognize, track and measure targets, and further perform graphic processing, so that the computer is processed into images more suitable for human eyes to observe or transmit to an instrument to detect. As a scientific discipline, computer vision research-related theory and technology has attempted to build artificial intelligence systems that can acquire information from images or multidimensional data. Computer vision techniques typically include techniques for anti-disturbance generation, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D techniques, virtual reality, augmented reality, synchronous positioning, and map construction, as well as common biometric techniques such as face recognition, fingerprint recognition, and the like.

In the prior art, in order to perform a challenge test on a neural network model, it is often required to iteratively generate a challenge image. The contrast image is actually equivalent to that obtained by superposing the original image with the disturbance image, and the iterative generation of the contrast image is substantially equivalent to that of the iterative generation of the ideal disturbance image. In the iterative process, it is often necessary to combine the candidate disturbance image with the original image to obtain a candidate countermeasure image, input the candidate countermeasure image into an image recognition model, and determine whether the countermeasure attack can be successfully implemented according to the obtained recognition result. When a challenge attack cannot be performed based on one candidate challenge image, it is necessary to iteratively update the candidate challenge image or the candidate disturbance image according to a loss function value (obtained based on a feature distance or a similarity between the candidate challenge image and the target image). Therefore, in the process of iteratively updating candidate disturbance images in the prior art, the influence of the posture transformation of the three-dimensional object after the disturbance images are arranged on the three-dimensional object when the physical world is subjected to attack resistance is not considered; for example, after the disturbance image is set on the face of the attacker, the attacker may turn his head or lift his head, so that the image acquisition device may acquire the distorted disturbance image, thereby affecting the disturbance effect.

Compared with the prior art, in the embodiment of the application, in each round of iterative generation of the target disturbance image, the candidate disturbance image is converted into the disturbance projection image, the gesture of the three-dimensional disturbance image corresponding to the disturbance projection image is matched with the gesture of the preset object in the candidate image, then the disturbance projection image is fused with the candidate image to obtain the candidate countermeasure image, and the target loss value is acquired based on the candidate disturbance image and the target image so as to update the candidate disturbance image until the candidate disturbance image can achieve the countermeasure attack target, and the target disturbance image is obtained. Because the projection disturbance image is fused to the candidate image instead of directly fusing the candidate disturbance image to the candidate image in the prior art, the candidate countermeasure image obtained by fusing the disturbance projection image and the candidate image is equivalent to simulating the visual performance of the preset object of the candidate disturbance image in the corresponding posture of the physical world, the visual effect of the candidate disturbance image after acting on the preset object in the physical world can be shown, namely, the countermeasure image generated in the digital world is consistent with the visual representation of the candidate disturbance image in the physical world, so that the target disturbance image in the physical world can exert the ideal attack effect consistent with that in the digital world.

In some embodiments, the image processing apparatus and the image recognition apparatus are disposed separately, and referring to fig. 1, the image processing method provided in the embodiment of the present application may be implemented based on an image processing system shown in fig. 1. The image processing system may include a server 01 and a terminal device 02.

The server 01 may be an image processing device in which an image processing program may be deployed that may be used to iteratively update the candidate disturbance image to the target disturbance image.

The terminal device 02 may be an image recognition apparatus in which an image recognition model, for example, a face recognition model trained based on a machine learning method, may be deployed.

The server 01 may receive the candidate image and initialize the candidate disturbance image; processing the candidate disturbance images according to the gesture of the preset object included in the candidate images to obtain disturbance projection images; the perturbed projection image may then be combined with the candidate image to obtain a candidate challenge image, which is then transmitted to the terminal device 02. The terminal device 02 can acquire the recognition similarity of the candidate countermeasure images and the target image through the image recognition model deployed therein, and feed back the recognition similarity result to the server 01. The server 01 may determine whether the candidate challenge image may successfully mislead the image recognition model according to the recognition similarity, and when the candidate challenge image cannot successfully implement the challenge attack, acquire a target loss value based on the recognition similarity, so as to update the candidate disturbance image until a target disturbance image capable of successfully implementing the challenge attack is obtained.

It should be noted that, the server according to the embodiment of the present application may be an independent physical server, or may be a server cluster or a distributed system formed by a plurality of physical servers, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, and basic cloud computing services such as big data and an artificial intelligence platform.

The terminal device according to the embodiment of the present application may be a device that provides voice and/or data connectivity to a user, a handheld device with a wireless connection function, or other processing device connected to a wireless modem. Such as mobile telephones (or "cellular" telephones) and computers with mobile terminals, which can be portable, pocket, hand-held, computer-built-in or car-mounted mobile devices, for example, which exchange voice and/or data with radio access networks. For example, personal communication services (English full name: personal Communication Service, english short name: PCS) telephones, cordless telephones, session Initiation Protocol (SIP) phones, wireless local loop (Wireless Local Loop, english short name: WLL) stations, personal digital assistants (English full name: personal Digital Assistant, english short name: PDA) and the like.

Referring to fig. 2, fig. 2 is a flowchart of an image processing method according to an embodiment of the present application. The method can be executed by an image processing device, can be applied to an anti-attack test scene of an image recognition model, iterates a candidate disturbance image into a target disturbance image, can be arranged on preset objects with different postures, and stably exerts an anti-attack effect, and can stably exert the disturbance effect without worrying about the influence of different postures of the preset objects on the disturbance effect when the target disturbance image is adopted in the physical world to perform the anti-attack test of the image recognition model. The method comprises the steps of 101-105:

step 101, obtaining a candidate disturbance image and a candidate image.

In the embodiment of the application, the candidate disturbance images represent disturbance images in iterative updating, namely disturbance images which do not meet preset requirements yet. In the embodiment of the application, at least one iteration round of updating is carried out on the disturbance image until the disturbance image is updated into the target disturbance image meeting the preset requirement. It may be appreciated that, in the embodiment of the present application, the preset requirement may refer to that the countermeasure image acquired after the disturbance image is set to the preset object can enable the image recognition model to output the wrong recognition result.

The candidate disturbance images may be derived based on historical candidate disturbance images, which may be different at different iteration runs. In the initial iteration round, the historical candidate disturbance image can be obtained by initializing in a preset mode, namely, the initial iteration round, and the candidate disturbance image is the initial disturbance image. For example, in the first iteration round, a candidate disturbance image Padv1 is obtained by a random initialization mode; then, in a second iteration round, updating the candidate disturbance image Padv1 to obtain a candidate disturbance image Padv2; and in the third iteration round, updating the candidate disturbance image Padv2 to finally obtain a target disturbance image Padv3.

In order to enable the finally obtained target disturbance image to be suitable for target objects (namely, objects for setting the target disturbance image) with flexible and changeable postures in the physical world, namely, after the target objects are arranged, even if the target objects are transformed into different postures, the target disturbance image can still exert an ideal attack resisting effect, so that the image recognition model is stably tested. In the embodiment of the application, candidate images of preset objects with different postures are also acquired in each iteration round. Therefore, in different iteration rounds, the disturbance image can be combined with the preset object in different postures, which is equivalent to that each iteration round considers different postures of the preset object and influences the disturbance image to exert the anti-attack effect, so that the finally obtained target disturbance image can adapt to different postures, namely, the target object arranged in the different postures can exert the ideal anti-attack effect.

For example, in a first iteration round, the acquired candidate image includes a preset object whose current pose is a first pose; then, in the second iteration round, a new candidate image is acquired, the new candidate image including a preset object whose current pose is the second pose. It will be appreciated that the first and second gestures of embodiments of the present application are merely intended to represent different gestures, and not specific gestures.

It should be noted that, the candidate disturbance image and the candidate image acquired in the embodiment of the present application may be two-dimensional images. The embodiment of the application can iteratively update the disturbance image in the two-dimensional form, so that the finally obtained (two-dimensional form) target disturbance image is arranged on the three-dimensional form target object, is not influenced by the change of the posture of the target object, and can exert ideal anti-attack effect. For example, in the face recognition scene, the disturbance image is set on the forehead of the target face, and if the pose of the target face is the plane view image acquisition device, the face image acquired by the image acquisition device may include a complete disturbance image, that is, the face image may exhibit complete disturbance; if the pose of the target face is the looking-up image acquisition device, the face image acquired by the image acquisition device cannot comprise a complete disturbance image, but the disturbance image after deformation, namely, disturbance expression in the face image is inconsistent with the disturbance image, and the attack resistance effect may be affected. Based on the above, the target disturbance image obtained in the embodiment of the present application can express an ideal attack resistance effect no matter how the physical object changes its posture after the physical object is set in the physical world because the preset objects of different postures have been matched in the generation process.

And 102, processing the candidate disturbance images based on the current gesture of the preset object to obtain disturbance projection images.

In the embodiment of the application, the candidate disturbance image is obtained based on the initial disturbance image obtained by initializing in a preset mode, and each round updates the candidate disturbance image and also modifies the pixel values of some pixels in the image. It can be seen that, in the embodiment of the present application, the candidate disturbance images (in two-dimensional form) of each iteration turn are equivalent to those obtained based on the three-dimensional disturbance image of the same pose (the texture performance of each iteration turn may be different), and the poses of the preset objects included in the candidate images of different iteration turns are different. In the same iteration round, the three-dimensional disturbance image corresponding to the candidate disturbance image and the gesture of the preset object included in the candidate image are not matched. Since the candidate disturbance image is set after the preset object corresponding to the candidate image, the disturbance expression consistent with the candidate disturbance image cannot be displayed; that is, the countermeasure image formed by superimposing the candidate disturbance image and the candidate image is inconsistent with the image expression of the candidate disturbance image after the preset object corresponding to the candidate image is set. Therefore, the candidate disturbance image can be processed to obtain the image representation after the preset object is arranged in the specific gesture, and based on the image representation, whether the candidate disturbance image can still show an ideal anti-attack effect after the preset object is arranged in the specific gesture can be accurately judged.

The disturbance expression that can be exhibited after being set to the preset object in order to acquire the candidate disturbance image. In the embodiment of the application, the candidate countermeasure images are processed according to the gesture of the preset object in the candidate images. Specifically, referring to fig. 3, three-dimensional reconstruction may be performed based on the candidate disturbance image, that is, based on the disturbance image in a two-dimensional form, to obtain a corresponding three-dimensional disturbance image, then, the posture of the three-dimensional disturbance image is adjusted to be consistent with the posture of the preset object in the candidate image, and finally, the three-dimensional disturbance image after the posture adjustment may be projected on the target plane to obtain a disturbance projection image, where the posture of the three-dimensional disturbance image corresponding to the disturbance projection image is matched with the current posture of the preset object.

It should be noted that, in the embodiment of the present application, the target planes of different iteration rounds may also be different, that is, the target plane of the current iteration round is also determined according to the candidate image of the current iteration round. In each iteration round, the candidate image is obtained by projecting a three-dimensional-based preset object on a plane, wherein the plane is the target plane of the current iteration round.

It may be understood that in the embodiment of the present application, the three-dimensional disturbance image is obtained based on the candidate disturbance image, which may be implemented based on the existing three-dimensional reconstruction model or three-dimensional generation model, for example, the candidate disturbance image may be input into a preset three-dimensional generation model, so as to obtain the three-dimensional disturbance image corresponding to the candidate disturbance image. The three-dimensional generation model may be, for example, an artificial intelligence model built based on neural network technology, such as GET3D, SDM-NET, deepVO, SDFusion, which is not limited in this embodiment of the present application.

In the embodiment of the application, before the posture of the three-dimensional disturbance image is adjusted, the target posture information can be acquired based on the candidate images in the same iteration round, and then the posture of the three-dimensional disturbance image is adjusted based on the target posture information; the obtained target gesture information may be any information capable of representing a three-dimensional gesture of a preset object in the candidate image, and may be, for example, a numerical value of three gesture angles, that is, a pitch angle, a rotation angle, and a tilt angle of the preset object. Similar to three-dimensional reconstruction or three-dimensional generation operation, in the embodiment of the application, the pose information of the preset object included in the candidate image can be obtained based on the candidate image through the pose estimation model. The pose estimation model may be, for example, an artificial intelligence model constructed based on neural network technology, such as OpenPose, moveNet, poseNet, densePose, HRNet, alphaPose, transPose, ST-GCN, to which embodiments of the present application are not limited.

And step 103, obtaining a candidate countermeasure image based on the candidate image and the disturbance projection image, and acquiring a target loss value.

In the embodiment of the application, the disturbance projection image can represent disturbance expression presented under a specific gesture after the candidate disturbance image is arranged on the preset object. Thus, referring to fig. 4, after the disturbance projection image of the current iteration round is superimposed with the candidate image, the obtained candidate countermeasure image can represent the visual effect of the preset object of the candidate disturbance image set under the specific gesture. That is, in the embodiment of the present application, after the candidate disturbance image is set in the three-dimensional preset object, the image acquired by the image acquisition device is consistent with the image content of the candidate countermeasure image; instead of fusing the candidate disturbance image directly with the candidate image as in the prior art, the obtained candidate countermeasure image cannot accurately express the actual effect of the candidate disturbance image in the physical world.

After the disturbance projection image is superimposed with the candidate disturbance image to obtain the candidate countermeasure image, it may be determined whether countermeasure attack can be achieved based on the recognition similarity of the candidate countermeasure image and the target image, and the target loss value may be acquired to update the candidate disturbance image if countermeasure attack cannot be achieved.

It is contemplated that the challenge attack may include both targeted attacks and untargeted attacks. The targeted attack aims at masquerading the object implementing the countermeasure attack as a specific object, namely, the image recognition model recognizes the object A provided with the countermeasure disturbance as a specific object B; the non-target attack is intended to make the image recognition model unable to recognize the object to which the attack is to be applied, i.e. to make the image recognition model unable to recognize the true identity of the object a to which the disturbance is set. Based on this, in a challenge test scenario in which it is desired to implement a target challenge, the target image may be set to an image including a disturbance object, that is, in iteratively updating the candidate disturbance image, in an effort to cause the image recognition model to confuse the candidate challenge image with the target image, that is, it is desired that the image recognition model recognize an object in the candidate challenge image as a disturbance object included in the target image. In contrast to a targeted attack, in a non-targeted attack, it is desirable that the image recognition model fails to recognize an object in the candidate challenge image, i.e., in iteratively updating the candidate disturbance image, in an effort to make the candidate challenge image dissimilar to a preset object in the candidate image, i.e., it is desirable that the image recognition model fails to recognize the object in the candidate challenge image as the preset object included in the targeted image. It can be seen that under different challenge scenarios, the objects that the target image may include are different; namely, in a target attack scene, the target image comprises an interference object, and in a non-target attack scene, the target image comprises a preset object.

In the embodiment of the present application, the target loss value may be obtained based on a difference or a similarity between two images (the candidate countermeasure image and the target image) using any existing loss function, and may be, for example, a cross entropy loss function or an exponential loss function. Alternatively, in some possible designs, the image feature distance between two images may also be taken as the target loss value. The target loss value in the embodiment of the application aims to measure the distance between the current candidate disturbance image and the target disturbance image meeting the preset requirement so as to indicate the updating direction of the candidate disturbance image, so that the target loss value of the next iteration round becomes smaller, and the distance between the current candidate disturbance image and the target disturbance image is closer, namely the distance between the current candidate disturbance image and the target disturbance image is closer.

In view of implementing a challenge attack in the physical world, it is necessary to set a perturbation on a physical object, and then perform image acquisition on the physical object on which the perturbation is set, to obtain an image (i.e., a challenge image) of the physical object including the perturbation. The image obtained after image acquisition in the physical world is affected by environmental factors such as illumination, and the expression of the image may not be consistent with the actual situation of the physical object, i.e. there may be color differences. Therefore, in order to simulate the color expression of the images acquired under different conditions, in one possible design, the images may be subjected to color transformation to simulate the images of the preset object obtained under different image acquisition conditions, so that the candidate challenge image can express robust challenge performance under various conditions. Specifically, in the embodiment of the application, each iteration round can perform color transformation on the candidate image to obtain a first image, and then the first image is overlapped with the candidate disturbance image of the current iteration round to obtain the candidate countermeasure image of the current iteration round.

It will be appreciated that in embodiments of the present application, the color transformation operation may be performed on the candidate image by random transformation. For example, a parameter range of color transformation may be preset, then a target parameter is randomly sampled in the parameter range, and then a preset operation (such as addition, subtraction, multiplication or division) is performed on the target parameter and a pixel value of a preset channel (any one or more channels in an RGB color space) of the candidate image, so that a change of the pixel value of the preset channel is realized, and color transformation is completed. In addition, in some possible designs, the color transformation operation on the candidate image may be performed based on a preset color change mode or a filter, which is not limited in the embodiment of the present application.

In order to facilitate the implementation of attack resistance in the physical world, namely, the target disturbance image is arranged on a preset object, the embodiment of the application can adopt a water transfer printing technology to output the target disturbance image, and the water transfer printing disturbance sticker is obtained. The water transfer technique is equivalent to printing a pattern on a layer of transparent solid medium, such as a transparent film or a transparent paper, with a disturbance image of interest. Therefore, the disturbance visual expression in the water transfer disturbance sticker is possibly inconsistent with the disturbance image generated by the digital world, namely, the printed disturbance pattern is affected by reflection of transparent materials or ambient light, so that the disturbance pattern is possibly weakened or reflected, and an ideal anti-attack effect cannot be expressed.

In order to simulate the visual transmission state of the disturbance image of the digital world after being printed as the water transfer disturbance sticker, the finally generated target disturbance image can show ideal anti-attack effect even if the disturbance image is weakened by transparent materials after being printed as the water transfer disturbance sticker. In one possible design, the candidate perturbation image may be subjected to transparency transformation to obtain a second image to simulate the perturbation pattern after the candidate perturbation image is fabricated as a water transfer perturbation decal, and then the second image is superimposed with the candidate image to obtain a candidate countermeasure image. If the candidate disturbance image can exhibit an ideal attack resistance effect, it is described that the corresponding candidate disturbance image can exhibit an ideal attack resistance effect even after being produced as a water transfer disturbance decal and set on a preset object.

In the embodiment of the application, the candidate disturbance image is subjected to transparent processing, which may be to adopt coefficient values in a preset numerical range, and the image components of the candidate disturbance image on each channel of a preset color space are subjected to the same multiplication processing, so as to obtain a second image corresponding to the candidate disturbance image, namely, an image after transparent processing. Specifically, if the image components of one candidate disturbance image Padv in the three channels (i.e., the R channel, the G channel, and the B channel) in the RGB color space are R1, G1, and B1, respectively, i.e., padv= (R1, G1, B1), then the candidate disturbance image Padv may be subjected to the transparency process by multiplying the three image components by a preset coefficient α, i.e., the second image psec= (α×r1, α×g1, α×b1), respectively. Since the preset coefficient corresponds to a number in the range of [0,1] which controls the transparency of the image after the transparency processing, the value range of the preset coefficient may be [0,1]. Based on the above, in each iteration update candidate disturbance image round, the preset coefficient value of the current iteration round can be randomly sampled in the value range to obtain the second image with different transparency, so that the target disturbance image obtained by a plurality of iteration rounds update can play an ideal anti-attack effect under different transparency.

It will be appreciated that, referring to fig. 5, in one iteration round, the perturbed projection image may also be subjected to a transparent process of different transparency, resulting in a plurality of second images of different transparency. Such as the second image 1, the second image 2, and the second image 3 shown in fig. 5. The candidate disturbance images are simulated with different transparencies in one iteration round, so that the iteration efficiency can be improved.

It should be noted that, although the embodiment of the present application uses RGB space as an example, how to perform transparent processing on the candidate disturbance image is described, the present application is not limited thereto. For example, the candidate disturbance image may be converted into a color space such as CIE Lab, luv, LCh, yxy, CMYK, s-RGB, hex, etc., and then the image components of the candidate disturbance image in each channel of the corresponding color space are multiplied by a preset coefficient, so as to obtain a second image after transparent processing.

It is understood that when the physical world implements the challenge attack resistance using the water transfer technique, the image capturing apparatus may capture an image to be detected having a color and transparency different from those of the challenge image (candidate disturbance image superimposed candidate image) of the physical world based on the preset object on which the water transfer disturbance decal is set. Therefore, in order to enable the iteration process of the candidate disturbance image, not only color transformation but also transparency transformation can be simulated, so as to improve the robustness of the finally obtained target disturbance image. In one possible design, the first image and the second image may be superimposed to obtain a candidate countermeasure image, and the obtaining and the judging of the target loss value according to the candidate countermeasure image are equivalent to judging whether the countermeasure attack result based on the visual representation of the candidate disturbance image in the physical world is successful or not, so that the real representation of the disturbance image in the physical world may be simulated based on the image after the color conversion and the transparent processing. Therefore, in the design, the change of the image in the physical world is simulated in the digital world, so that the finally obtained target disturbance image can play an ideal anti-attack effect even if the image is transformed in the physical world, and the robustness is high.

And step 104, if the target loss value does not converge, updating the candidate disturbance image and the candidate image until the target loss value acquired based on the new candidate countermeasure image and the target image converges.

In the embodiment of the application, if the target loss value obtained by one iteration round does not converge, the candidate disturbance image of the iteration round does not meet the preset requirement yet, and the ideal anti-attack effect cannot be exerted in the physical world.

It can be understood that the target loss value does not converge, that is, the target loss does not reach the preset limit value, which indicates that the candidate disturbance image still has an optimized lifting space. Accordingly, the iterative updating of the candidate countermeasure image may continue. For example, referring to fig. 6, the candidate disturbance images may be updated in a gradient optimization or the like manner to improve the attack resistance thereof, and in order to improve the robustness of the finally obtained target disturbance image set under the preset objects of different poses, the candidate images including the preset objects of different poses may also be acquired in the process of iteratively updating the candidate attack images. Specifically, if one iteration round a1 acquires a candidate image P1, where the candidate image P1 includes a preset object whose current pose is the first pose, if the candidate countermeasure image in the iteration round does not meet the preset requirement, a candidate image P2 may be acquired in an iteration round a2, where the candidate image P2 includes a preset object whose current pose is the second pose. In the embodiment of the application, because a plurality of candidate images of the preset objects with different postures are obtained in the process of iteratively generating the target disturbance image, the candidate countermeasure images of each iteration round are adapted to the preset objects with different postures, and therefore, the finally obtained target disturbance image can be arranged on the preset object without influencing the countermeasure attack performance due to the change of the postures of the preset object.

In the embodiment of the application, the updating of the candidate disturbance image by adopting the gradient optimization method can be based on the target loss value of the current round, and the pixel value of each pixel of the candidate disturbance image is updated. Specifically, a perturbation gradient may be determined based on a ratio of a bias of the target loss value to a bias of a pixel value of each pixel of the candidate perturbation image, and then each pixel of the candidate perturbation image is updated based on the perturbation gradient; for example, the update direction of each pixel may be determined based on the perturbation gradient, and then the value of each pixel may be increased or decreased based on the update direction and a preset step size; if the disturbance gradient corresponding to one pixel is positive, the value of the pixel can be increased based on the preset step length; if the perturbation gradient corresponding to a pixel is negative, the value of the pixel can be reduced based on the preset step size.

Since the candidate disturbance image is a two-dimensional image, which is equivalent to a two-dimensional matrix, the disturbance gradient in the embodiment of the present application is also a matrix, and the size of the disturbance gradient may be consistent with the candidate countermeasure image. That is, a plurality of gradient elements are included in the perturbation gradient, and the number of gradient elements may be identical to the number of pixels included in the candidate perturbation image. It can be seen that the gradient elements included in the perturbation gradient may be in one-to-one correspondence with pixels in the candidate perturbation image in order to independently and purposefully update each pixel in the candidate perturbation image.

In some possible designs, the preset model in each iteration round may be optimized and updated based on any existing Gradient optimization method, for example, a Gradient Descent method (Gradient device), a Momentum method (Momentum), a conjugate Gradient method (Conjugate Gradient) and a Natural Gradient method (Natural Gradient), which may be selected by those skilled in the art according to actual needs, and the embodiments of the present application are not limited to this.

It is to be understood that, although the embodiment of the present application uses whether the target loss value obtained based on the candidate countermeasure image and the target image in each iteration round converges as a condition for judging whether the loop is terminated, it is not limited thereto. In some possible designs, it may also be determined whether to terminate the loop according to the number of loop iterations, for example, a candidate disturbance image obtained after 100 loop iterations may be used as the target disturbance image. Alternatively, it may be determined whether to terminate the loop according to the distance (or recognition similarity) of the candidate countermeasure image from the image feature of the target image reaching a preset value. For example, if the recognition similarity between the candidate countermeasure image and the target image exceeds 90%, it can be considered that the target disturbance image meeting the preset requirement is obtained.

Considering that the attack countermeasure in the embodiment of the application comprises target attack and non-target attack, when the target loss converges, different attack modes respectively correspond to the target loss and have different meanings. In the case of target attack, when the target loss converges, the recognition similarity between the candidate countermeasure image and the target image (including the interfering object) is greater than a first preset threshold (for example, 90%), that is, the image characteristics of the countermeasure image (obtained by setting the candidate disturbance image to the preset object) and the target image including the interfering object are already very similar, and the two images are recognized as the same identity by the image recognition model; in the case of no target attack, when the target loss converges, the recognition similarity between the candidate countermeasure image and the target image (including the preset object) is smaller than a second preset threshold (for example, 10%), that is, the countermeasure image (obtained by setting the candidate disturbance image to the preset object) and the target image including the preset object already have a very large difference, and the two cannot be recognized as the same identity by the image recognition model.

In the embodiment of the application, in order to facilitate the implementation of attack resistance in the physical world, the target disturbance image is output so as to obtain the water transfer disturbance sticker. For convenience of carrying, the water-transfer disturbance decal tends to have a small size, for example, in a challenge scene of a face recognition model, the water-transfer disturbance decal may be set to a size of 2 cm×2 cm. Because the size of the water transfer disturbance sticker is limited, disturbance information which can be carried by the water transfer disturbance sticker is relatively small (the whole face area has the image with the disturbance pattern in the prior art), in order to ensure the disturbance effect of the water transfer disturbance sticker with a small size (namely the target disturbance image), in one possible design, attention enhancement operation can be added in each round of iteratively updating the candidate disturbance image so as to highlight the area with stronger disturbance effect in the candidate disturbance image, and the finally obtained target disturbance image has stronger attack resistance. Specifically, in the embodiment of the present application, before the candidate disturbance image is updated, a attention enhancement operation may be performed on preset data, so as to perform direct or indirect weighting processing on important disturbance information in the candidate disturbance image; the preset data is obtained based on the candidate disturbance image.

In the embodiment of the application, two possible designs of increasing attention enhancement operation (the preset data is the gradient of the candidate disturbance image) in the disturbance gradient acquisition stage or increasing attention enhancement operation (the preset data is the contrast image characteristic) in the image characteristic acquisition stage are provided, so that the disturbance effect of the target disturbance image is enhanced.

Design i, increasing attention enhancement operations during disturbance gradient acquisition phase

In the embodiment of the application, the disturbance gradient is a matrix with the same size as the candidate disturbance image, and when the candidate disturbance image is updated, the pixels of the candidate disturbance image can be updated in a targeted manner according to the element value in the matrix of the disturbance gradient. It can be seen that if attention is paid to some elements in the matrix of perturbation gradients, this is equivalent to attention paid to the pixels to which these elements correspond.

In the embodiment of the application, after the attention enhancement operation is added in the disturbance gradient acquisition stage, the disturbance gradient after attention enhancement can be used for updating the candidate disturbance image. In particular, said operation of enhancing the attention to preset data may comprise the steps a) -b):

step a) obtaining a disturbance gradient of the candidate disturbance image based on the target loss value.

In the embodiment of the present application, the manner of obtaining the disturbance gradient may also be a ratio of the target loss value to the partial derivative of the pixel value of the candidate disturbance image, which is not described herein.

And b) performing attention enhancement operation on the disturbance gradient to obtain a weighted gradient.

In an embodiment of the present application, referring to fig. 7, the attention enhancement operation may be performed on the disturbance gradient through a preset attention module. For example, the perturbation gradient may be input to the attention module, resulting in the weighted gradient. It can be understood that the attention module preset in the embodiment of the present application may be implemented through an existing attention model built based on a neural network, such as SENet or CBAM, which is not limited in the embodiment of the present application.

In one possible design, the attention enhancing operation on the perturbation gradient may also be achieved by a preset convolution layer. Specifically, the preset convolution layer may include a convolution kernel of a preset size and a weight parameter, and after the disturbance gradient is input into the preset convolution layer, each gradient element value of the disturbance gradient is subjected to convolution calculation according to the convolution kernel size and the weight parameter to obtain a weighted gradient. For example, if the convolution kernel size of the preset convolution layer is 1×1, after the disturbance gradient is input into the convolution layer, each matrix element value in the matrix of the disturbance gradient is multiplied by the weight parameter, that is, the disturbance gradient is weighted. It should be noted that the convolution calculation of 1×1 does not change the size of the original input, but rather performs weighted fusion at the channel level, i.e., enhances the disturbance gradient values in the important channels, so as to enhance the pixels in the important channels when updating the candidate disturbance image.

Since the convolution layer of 1×1 operates on the input, it corresponds to the operation of the full connection layer on the input. Thus, in one possible design, the attention enhancing operation on the perturbation gradient may also be implemented by a preset fully connected layer, which may also include weight parameters for each pixel in the candidate perturbation image.

After deriving the weighted gradient, the candidate disturbance image may be updated based on the weighted gradient.

In the embodiment of the present application, the process of updating the candidate disturbance image based on the weighted gradient may be similar to the process of updating the candidate disturbance image based on the disturbance gradient, and will not be described herein. It will be appreciated that the candidate perturbation image may be updated based on the existing gradient optimizer, and those skilled in the art may choose this according to actual needs.

Design ii, increasing attention enhancement operations during image feature acquisition phase

In the embodiment of the application, the target loss value is obtained based on the recognition similarity between the candidate countermeasure image and the target image, which is equivalent to the distance between the image features of the two images. It can be seen that if the attention enhancement operation is performed in the extraction stage of the image feature, it is equivalent to a target loss value obtained based on the image feature after weighting, and thus, when updating the candidate disturbance image based on the target loss value, it is equivalent to a weight portion being subjected to a key update. It can be seen that the attention enhancing operation can be applied in the image feature acquisition stage so as to highlight the portion with strong disturbance effect in the candidate disturbance image, so that the finally obtained target disturbance image has stronger disturbance effect.

In the present design ii, after the attention-enhanced image feature is acquired, a target loss value equivalent to that obtained by weighting the candidate countermeasure images can be acquired. Specifically, the performing the attention enhancing operation on the preset data includes: acquiring target image features based on the target image; and acquiring the contrast image characteristics based on the candidate contrast images, and performing attention enhancement operation on the characteristic values of the corresponding candidate disturbance images in the contrast image characteristics to obtain weighted contrast image characteristics.

In the prior art, when an image is identified, identity determination is often performed based on the distance between the image features and the pre-entry features of the image. Specifically, a neural network is often adopted to construct an image recognition model, the image recognition model acquires image features from an image through a full-connection layer, a convolution layer, a pooling layer or other network layer structures, then calculates feature distances between the image features and the pre-input features, and then determines whether the image is associated with the same identity with the pre-input features based on whether the feature distances conform to a preset value range.

Based on the image recognition process in the prior art, one of the key steps of image recognition is whether the features acquired from the image are accurate, so if the attention enhancement operation is performed on the disturbance pattern (equivalent to the candidate disturbance image in iteration or the final obtained target disturbance image) in the image feature acquisition stage, the influence of the disturbance pattern on the image recognition result can be highlighted. If attention enhancement operation is added in the image feature acquisition stage in the process of iterating the candidate disturbance images, disturbance pattern features can be highlighted, so that the subsequent candidate disturbance image updating process is performed based on the disturbance enhanced target loss value, and therefore the disturbance effect of the finally obtained target disturbance image is enabled to be higher, and the iteration efficiency is higher.

It should be noted that, in the embodiment of the present application, the image features are acquired from the target image and the candidate countermeasure image respectively, and the two image feature acquisition steps may be performed together, that is, the two image feature acquisition steps are independent from each other, and there is no dependency relationship between the front and rear timings.

It can be understood that, referring to fig. 8, the attention enhancement operation performed in the image feature acquisition stage in the embodiment of the present application may also be performed based on a preset attention module, where the attention module may be an attention network, a preset convolution layer or a preset full connection layer in the prior art, and will not be described herein.

In the present design ii, the target loss value may be obtained according to the feature distance between the weighted countermeasure image feature and the target image feature in order to update the candidate disturbance image in a subsequent process.

In the embodiment of the present application, the manner of calculating the feature distance between the two image features (i.e., the weighted countermeasure image feature and the target image feature) may be an existing feature distance calculation manner such as a euclidean distance, a mindset distance, or a distance in a preset norm space, which is not limited herein. After the feature distance between the two image features is obtained, any existing loss function such as a cross entropy loss function or an exponential loss function may be used to calculate the target loss value, which is not described herein.

Based on the target loss value obtained by the contrast image features processed by the attention enhancement operation, the obtained target loss value is equivalent to the obtained disturbance enhanced image features, so that an important disturbance area plays a more remarkable role in iterating the candidate disturbance images, the important disturbance information included in the finally obtained target disturbance images is more remarkable, and the method has a stronger disturbance effect.

In the embodiment of the present application, two ways of performing attention-enhancing operations on the disturbance are exemplarily described through design i and design ii, but not limited thereto, and a person skilled in the art may set an attention module at an appropriate stage according to actual needs to perform attention-enhancing operations on the disturbance.

It will be appreciated that the attention enhancing operation in embodiments of the present application is directed to weighting important parts (channels or regions) in the perturbation pattern so as to highlight the effect of the important parts in iterating the candidate perturbation image such that the resulting target perturbation image includes important parts with a stronger perturbation effect. Thus, in some possible designs, the perturbation pattern may also be directly or indirectly processed by other weighting means (e.g., directly and weight multiplicative calculations), such that a significant portion thereof is enhanced.

Note that, although the attention enhancement operation is described in the embodiment of the present application as weighting the important portion in the disturbance pattern, the important portion is not particularly limited to the multiple enhancement, but may be multiple attenuated. One of the key points of the attention enhancement operation on the disturbance pattern in the embodiment of the present application is that the important part is processed so as to be different from the other parts, so that the two parts can have different effects. For example, if the value of one important part is 2, it may be multiplied by the weight value 5 to become 10, or may be divided by the weight value 5 (equivalent to multiplying by 0.2) to become 0.4. Specific values of the weight parameters can be set by those skilled in the art according to actual needs, and will not be described herein.

The description of the parts i and ii is combined to show that the direct or indirect attention enhancement operation on the disturbance pattern in the embodiment of the application is important disturbance information hoped to highlight the disturbance pattern in the subsequent candidate disturbance image iteration process. Thus, the attention enhancement operation in the embodiment of the present application does not change the size of the original input, but rather performs weighted fusion on the channel level values therein, i.e., highlights the values in the important channels. It will be appreciated that in some possible designs, the attention enhancement operation may also be performed at the size level if changing the size of the original input does not affect the highlighting of the important part during subsequent iteration of the candidate perturbation image; for example, in the image feature acquisition stage, important part enhancement can be performed through a 3×3 convolution layer, namely, the image features are subjected to multi-scale processing, and the most important part is extracted so as to highlight important disturbance information in the finally obtained target disturbance image.

In consideration, in the embodiment of the present application, the disturbance pattern is directly or indirectly subjected to the attention enhancement operation by the preset attention module, which is equivalent to performing the same weighting operation on the same channel in each iteration round. However, the candidate perturbation image in the iterative process is updated, i.e. the candidate perturbation image is constantly changing, and a significant portion of the candidate perturbation image in different iteration runs may be changing. Thus, in order to accommodate the updating of the candidate disturbance images, or to more accurately locate significant portions of the candidate disturbance images, in one possible design, with reference to FIGS. 7 and 8, the rotation of updating the candidate disturbance images per iteration may also be updated with the weight parameters in the attention module (e.g., the preset convolution layer). Specifically, if the candidate countermeasure image does not meet the preset requirement, the attention module is updated before updating the candidate disturbance image, then the attention module performs direct or indirect attention enhancement operation on the candidate disturbance image based on the updated attention module, and then the candidate disturbance image is updated based on the result after the attention enhancement operation.

For example, in the design i, if the recognition similarity between the candidate countermeasure image and the target image does not meet the preset condition, a weight update loss value is calculated, then a weight gradient of the weight parameter of the attention module is calculated based on the weight update loss value, and then the weight parameter of the attention module is updated based on the weight gradient. Then, a disturbance gradient of the candidate disturbance image is calculated based on the target loss value, then, attention enhancement operation is carried out on the disturbance gradient based on the updated attention module, a weighted gradient is obtained, and finally, the candidate disturbance image is updated by using the weighted gradient.

For another example, in design ii, image feature acquisition and attention enhancement operations are performed based on candidate countermeasure images, resulting in countermeasure image features. If the recognition similarity between the candidate countermeasure image and the target image is not in accordance with the preset condition, calculating a weight updating loss value, calculating a weight gradient of the weight parameter of the attention module based on the weight updating loss value, and updating the weight parameter of the attention module based on the weight gradient. And then, performing attention enhancement operation on the image characteristics of the candidate disturbance images based on the updated attention module to obtain updated weighted countermeasure image characteristics. Then, an updated recognition similarity is calculated based on the updated weighted challenge feature and the target image feature, then an updated target loss value is calculated based on the updated recognition similarity, and finally the candidate disturbance image is updated using the updated target loss value.

It will be appreciated that in design ii, after updating the attention module in one iteration round, the contrast image features may not be enhanced based on the updated attention module, but rather may be enhanced in the next iteration round to simplify the implementation flow.

And step 105, taking the candidate disturbance image when the target loss value converges as a target disturbance image.

In the embodiment of the application, if the target loss value converges, the candidate countermeasure images obtained in the current round may meet a preset requirement, that is, the recognition similarity with the target image meets a preset condition. For example, in a targeted attack, candidate countermeasure images may be identified as target images by an image identification model; in the case of no target attack, the candidate countermeasure image is not recognized as a target image by the image recognition model. Therefore, after the candidate disturbance images obtained in the current turn are arranged on the preset object, the ideal anti-attack effect can be exerted, namely the candidate disturbance images can be used as target disturbance images, and the target disturbance images have a robust disturbance effect.

After the target disturbance image is obtained, a robustness or safety test can be performed on the target model in the digital world. For example, in a test scenario of a face recognition model, a three-dimensional head image of a preset object can be simulated in a digital world, then a target disturbance image is set in the three-dimensional head image, then the gesture of the three-dimensional head image can be freely adjusted, two-dimensional face images of the three-dimensional head image are obtained under different gestures, each two-dimensional face image comprises disturbance patterns (the image performances under different gestures may be different), each two-dimensional face image is input into the target face recognition model through an interface, and the robustness or safety of the target face recognition model is determined according to the recognition result. For example, if the target face recognition model does not correctly recognize the two-dimensional face image, it may be determined that it has a security breach or defect.

Based on the target disturbance image obtained in the embodiment of the application, the robustness or safety test can be carried out on the target model in the physical world. For example, in a test scenario of a face recognition model, a target disturbance image may be output to obtain a disturbance sticker (for example, a water transfer disturbance sticker), then the disturbance sticker is set to a preset object, then the preset object may freely adjust a head pose, and two-dimensional face images of the preset object are acquired by an image acquisition device under different poses, each two-dimensional face image includes a disturbance pattern (the image performances under different poses may be different), the image acquisition device may input each two-dimensional face image into the target face recognition model through an interface, and according to a recognition result, robustness or security of the target face recognition model may be determined. For example, if the target face recognition model does not correctly recognize the two-dimensional face image, it may be determined that it has a security breach or defect.

In the process of outputting the target disturbance image to obtain the water transfer disturbance sticker, the method can be realized by adopting the prior art. For example, the target disturbance image is first printed onto the water transfer dedicated paper, and then the water transfer dedicated paper (the side having the disturbance pattern) on which the disturbance pattern is printed may be coated with the water transfer dedicated film. Then, the coated special paper for water transfer printing can be subjected to heat treatment, so that the special film for water transfer printing can be uniformly attached to the disturbance pattern, namely, the glue is uniformly attached to the special paper for water transfer printing. Finally, the special water transfer printing paper can be cut according to actual needs to obtain the water transfer printing disturbance sticker comprising disturbance patterns.

In the embodiment of the application, in each round of iterative generation of a target disturbance image, a candidate disturbance image is converted into a disturbance projection image, the gesture of a three-dimensional disturbance image corresponding to the disturbance projection image is matched with the gesture of a preset object in the candidate image, then the disturbance projection image is fused with the candidate image to obtain a candidate countermeasure image, and a target loss value is acquired based on the candidate disturbance image and the target image so as to update the candidate disturbance image until the candidate disturbance image can achieve a countermeasure attack target, thereby obtaining the target disturbance image. Because the projection disturbance image is fused to the candidate image instead of directly fusing the candidate disturbance image to the candidate image in the prior art, the candidate countermeasure image obtained by fusing the disturbance projection image and the candidate image is equivalent to simulating the visual performance of the preset object of the candidate disturbance image in the corresponding posture of the physical world, the visual effect of the candidate disturbance image after acting on the preset object in the physical world can be shown, namely, the countermeasure image generated in the digital world is consistent with the visual representation of the candidate disturbance image in the physical world, so that the target disturbance image in the physical world can exert the ideal attack effect consistent with that in the digital world.

Therefore, in the embodiment of the application, the visual representation of the disturbance image in various possible attitudes in the physical world is simulated in the process of iteratively generating the target disturbance image. The target disturbance image obtained by the embodiment of the application can exert ideal attack resistance effect on the preset objects in various postures of the physical world, and can well evaluate the robustness or safety of the image recognition model. In addition, in some embodiments, the updating process of the candidate disturbance image is influenced through an attention mechanism, and important disturbance information is highlighted, so that the generated target disturbance image keeps high attack resistance success rate under the condition of smaller size.

In some embodiments of the present application, the target disturbance image is also output as a water transfer disturbance decal. Because the water transfer disturbance sticker has the characteristics of good portability, strong firmness and good use experience, the target model can be better tested against attacks in the physical world, rather than being inconvenient to carry (for example, printed as a pair of hat or glasses), and the mode of firm attachment needs to be considered additionally, so that the target model is inconvenient to test against attacks.

An image processing method according to an embodiment of the present application is described above, and an image processing apparatus (e.g., a server) that executes the image processing method is described below.

Referring to fig. 9, a schematic structural diagram of an image processing apparatus as shown in fig. 9 may be applied to a server in a challenge test scenario of an image recognition model, iterating a candidate disturbance image into a target disturbance image, where the target disturbance image may be set on a preset object with different poses and stably exert a challenge effect, and when the challenge test of the image recognition model is performed by using the target disturbance image in the physical world, the challenge test may be stably performed without worrying about the influence of different poses of the preset object on the challenge effect. The image processing apparatus in the embodiment of the present application can realize steps corresponding to the image processing method performed in the embodiment corresponding to fig. 2 described above. The functions realized by the image processing device can be realized by hardware, and can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the functions described above, which may be software and/or hardware. The image processing apparatus may include an input/output module 601 and a processing module 602, and the functional implementation of the processing module 602 and the input/output module 601 may refer to the operations performed in the embodiment corresponding to fig. 2, which are not described herein. For example, the processing module 602 may be configured to control operations of transceiving, acquiring, etc. of the input/output module 601.

The input/output module 601 is configured to obtain a candidate disturbance image and a candidate image; the candidate image comprises a preset object with a current gesture as a first gesture;

the processing module 602 is configured to process the candidate disturbance image based on the current gesture of the preset object, and obtain a disturbance projection image; the gesture of the three-dimensional disturbance image corresponding to the disturbance projection image is matched with the current gesture of the preset object;

the processing module 602 is further configured to obtain a candidate countermeasure image based on the candidate image and the disturbance projection image, and obtain a target loss value; the target loss value is obtained based on the identification similarity between the candidate countermeasure image and the target image; the target image comprises the preset object or the interference object;

the processing module 602 is further configured to update the candidate disturbance image and the candidate image until a target loss value acquired based on a new candidate challenge image and the target image converges, if the target loss value does not converge; the new candidate image comprises a preset object with the current gesture being a second gesture;

the processing module 602 is further configured to take the candidate disturbance image when the target loss value converges as the target disturbance image.

In some embodiments, the processing module 602 is further configured to perform an attention enhancement operation on preset data to directly or indirectly weight important disturbance information in the candidate disturbance image; the preset data is obtained based on the candidate disturbance image.

In some implementations, the processing module 602 is further configured to derive a perturbation gradient for the candidate perturbation image based on the target loss value; performing attention enhancement operation on the disturbance gradient to obtain a weighted gradient;

the processing module 602 is further configured to update the candidate disturbance image based on the weighted gradient.

In some implementations, the processing module 602 is further configured to obtain a target image feature based on the target image; acquiring the contrast image characteristics based on the candidate contrast images, and performing attention enhancement operation on the characteristic values of the corresponding candidate disturbance images in the contrast image characteristics to obtain weighted contrast image characteristics;

the processing module 602 is further configured to obtain the target loss value according to the feature distance of the weighted challenge image feature and the target image feature.

In some embodiments, the attention enhancing operation performs a numerical weighted fusion based on channel level.

In some embodiments, the attention enhancing operation is implemented by a preset attention module; the attention module comprises a weight parameter;

the processing module 602 is further configured to obtain a weight update loss value based on the recognition similarity of the target image and the candidate countermeasure image; updating the loss value and the numerical value of the weight parameter based on the weight to obtain a weight gradient; and updating the numerical value of the weight parameter based on the weight gradient to obtain an updated attention module.

In some embodiments, the processing module 602 is further configured to perform a color transformation operation on the candidate image to obtain a first image; performing transparent processing on the disturbance projection image to obtain a second image; and superposing the first image and the second image to obtain the candidate countermeasure image.

In some embodiments, the input-output module 601 is further configured to output the target disturbance image to obtain a water transfer disturbance decal.

In the embodiment of the application, in each round of iterative generation of the target disturbance image, the processing module 602 converts the candidate disturbance image into a disturbance projection image, the gesture of the three-dimensional disturbance image corresponding to the disturbance projection image is matched with the gesture of the preset object in the candidate image, then the disturbance projection image is fused with the candidate image to obtain a candidate countermeasure image, and a target loss value is obtained based on the candidate disturbance image and the target image so as to update the candidate disturbance image until the candidate disturbance image can achieve a countermeasure attack target, thereby obtaining the target disturbance image. Because the projection disturbance image is fused to the candidate image instead of directly fusing the candidate disturbance image to the candidate image in the prior art, the candidate countermeasure image obtained by fusing the disturbance projection image and the candidate image is equivalent to simulating the visual performance of the preset object of the candidate disturbance image in the corresponding posture of the physical world, the visual effect of the candidate disturbance image after acting on the preset object in the physical world can be shown, namely, the countermeasure image generated in the digital world is consistent with the visual representation of the candidate disturbance image in the physical world, so that the target disturbance image in the physical world can exert the ideal attack effect consistent with that in the digital world.

In some embodiments of the present application, the input output module 601 also outputs the target disturbance image as a water transfer disturbance decal. Because the water transfer disturbance sticker has the characteristics of good portability, strong firmness and good use experience, the target model can be better tested against attacks in the physical world, rather than being inconvenient to carry (for example, printed as a pair of hat or glasses), and the mode of firm attachment needs to be considered additionally, so that the target model is inconvenient to test against attacks.

The image processing apparatus 60 in the embodiment of the present application is described above in terms of modular functional entities, and the image processing apparatus in the embodiment of the present application is described below in terms of hardware processing, respectively.

It should be noted that, the physical devices corresponding to the input/output module 601 shown in fig. 9 may be a transceiver, a radio frequency circuit, a communication module, an input/output (I/O) interface, etc., and the physical devices corresponding to the processing module 602 may be a processor.

The apparatuses shown in fig. 9 may each have a structure as shown in fig. 10, and when the image processing apparatus 60 shown in fig. 9 has a structure as shown in fig. 10, the processor and the transceiver in fig. 10 can implement the same or similar functions as the processing module 602 and the input-output module 601 provided in the foregoing apparatus embodiment corresponding to the apparatus, and the memory in fig. 10 stores a computer program to be called when the processor performs the above image processing method.

The embodiment of the present application further provides a terminal device, as shown in fig. 11, for convenience of explanation, only the portion relevant to the embodiment of the present application is shown, and specific technical details are not disclosed, please refer to the method portion of the embodiment of the present application. The terminal device may be any terminal device including a mobile phone, a tablet computer, a personal digital assistant (Personal Digital Assistant, PDA), a Point of Sales (POS), a vehicle-mounted computer, and the like, taking the terminal device as an example of the mobile phone:

Fig. 11 is a block diagram showing a part of the structure of a mobile phone related to a terminal device provided by an embodiment of the present application. Referring to fig. 11, the mobile phone includes: radio Frequency (RF) circuitry 1010, memory 1020, input unit 1030, display unit 1040, sensor 1050, audio circuitry 1060, wireless fidelity (wireless fidelity, wiFi) module 1070, processor 1080, and power source 1090. Those skilled in the art will appreciate that the handset configuration shown in fig. 11 is not limiting of the handset and may include more or fewer components than shown, or may combine certain components, or may be arranged in a different arrangement of components.

The following describes the components of the mobile phone in detail with reference to fig. 11:

the RF circuit 1010 may be used for receiving and transmitting signals during a message or a call, and particularly, after receiving downlink information of a base station, the signal is processed by the processor 1080; in addition, the data of the design uplink is sent to the base station. Generally, RF circuitry 1010 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a Low noise amplifier (Low NoiseAmplifier, LNA), a duplexer, and the like. In addition, the RF circuitry 1010 may also communicate with networks and other devices via wireless communications. The wireless communications may use any communication standard or protocol including, but not limited to, global system for mobile communications (GlobalSystem of Mobile communication, GSM), general Packet radio service (General Packet RadioService, GPRS), code division multiple access (Code Division Multiple Access, CDMA), wideband code division multiple access (Wideband Code Division Multiple Access, WCDMA), long term evolution (Long Term Evolution, LTE), email, short message service (Short Messaging Service, SMS), and the like.

The memory 1020 may be used to store software programs and modules that the processor 1080 performs various functional applications and data processing of the handset by executing the software programs and modules stored in the memory 1020. The memory 1020 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data (such as audio data, phonebook, etc.) created according to the use of the handset, etc. In addition, memory 1020 may include high-speed random access memory and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state memory device.

The input unit 1030 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the handset. In particular, the input unit 1030 may include a touch panel 1031 and other input devices 1032. The touch panel 1031, also referred to as a touch screen, may collect touch operations thereon or thereabout by a user (e.g., operations of the user on the touch panel 1031 or thereabout using any suitable object or accessory such as a finger, stylus, etc.), and drive the corresponding connection device according to a predetermined program. Alternatively, the touch panel 1031 may include two parts, a touch detection device and a touch controller. The touch detection device detects the touch azimuth of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch detection device and converts it into touch point coordinates, which are then sent to the processor 1080 and can receive commands from the processor 1080 and execute them. Further, the touch panel 1031 may be implemented in various types such as resistive, capacitive, infrared, and surface acoustic wave. The input unit 1030 may include other input devices 1032 in addition to the touch panel 1031. In particular, other input devices 1032 may include, but are not limited to, one or more of a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a track ball, a mouse, a joystick, etc.

The display unit 1040 may be used to display information input by a user or information provided to the user and various menus of the mobile phone. The display unit 1040 may include a display panel 1041, and alternatively, the display panel 1041 may be configured in the form of a Liquid crystal display (Liquid CrystalDisplay, LCD), an Organic Light-Emitting Diode (OLED), or the like. Further, the touch panel 1031 may overlay the display panel 1041, and when the touch panel 1031 detects a touch operation thereon or thereabout, the touch panel is transferred to the processor 1080 to determine a type of touch event, and then the processor 1080 provides a corresponding visual output on the display panel 1041 according to the type of touch event. Although in fig. 11, the touch panel 1031 and the display panel 1041 are two independent components to implement the input and output functions of the mobile phone, in some embodiments, the touch panel 1031 and the display panel 1041 may be integrated to implement the input and output functions of the mobile phone.

The handset may also include at least one sensor 1050, such as a light sensor, a motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor and a proximity sensor, wherein the ambient light sensor may adjust the brightness of the display panel 1041 according to the brightness of ambient light, and the proximity sensor may turn off the display panel 1041 and/or the backlight when the mobile phone moves to the ear. As one of the motion sensors, the accelerometer sensor can detect the acceleration in all directions (generally three axes), and can detect the gravity and direction when stationary, and can be used for applications of recognizing the gesture of a mobile phone (such as horizontal and vertical screen switching, related games, magnetometer gesture calibration), vibration recognition related functions (such as pedometer and knocking), and the like; other sensors such as gyroscopes, barometers, hygrometers, thermometers, infrared sensors, etc. that may also be configured with the handset are not described in detail herein.

Audio circuitry 1060, a speaker 1061, and a microphone 1062 may provide an audio interface between a user and a cell phone. Audio circuit 1060 may transmit the received electrical signal after audio data conversion to speaker 1061 for conversion by speaker 1061 into an audio signal output; on the other hand, microphone 1062 converts the collected sound signals into electrical signals, which are received by audio circuit 1060 and converted into audio data, which are processed by audio data output processor 1080 for transmission to, for example, another cell phone via RF circuit 1010 or for output to memory 1020 for further processing.

Wi-Fi belongs to a short-distance wireless transmission technology, and a mobile phone can help a user to send and receive e-mails, browse web pages, access streaming media and the like through a Wi-Fi module 1070, so that wireless broadband Internet access is provided for the user. Although fig. 11 shows Wi-Fi module 1070, it is understood that it does not belong to the necessary constitution of the handset, and can be omitted entirely as required within the scope of not changing the essence of the invention.

Processor 1080 is the control center of the handset, connects the various parts of the entire handset using various interfaces and lines, and performs various functions and processes of the handset by running or executing software programs and/or modules stored in memory 1020, and invoking data stored in memory 1020, thereby performing overall monitoring of the handset. Optionally, processor 1080 may include one or more processing units; alternatively, processor 1080 may integrate an application processor primarily handling operating systems, user interfaces, applications, etc., with a modem processor primarily handling wireless communications. It will be appreciated that the modem processor described above may not be integrated into processor 1080.

The handset further includes a power source 1090 (e.g., a battery) for powering the various components, optionally in logical communication with the processor 1080 via a power management system, such as for managing charge, discharge, and power consumption by the power management system.

Although not shown, the mobile phone may further include a camera, a bluetooth module, etc., which will not be described herein.

In an embodiment of the present application, the processor 1080 included in the mobile phone further has a control program for performing the above method for generating the target disturbance image based on the candidate disturbance image iteration performed by the image processing device.

Referring to fig. 12, fig. 12 is a schematic diagram of a server structure according to an embodiment of the present application, where the server 1100 may have a relatively large difference due to different configurations or performances, and may include one or more central processing units (in english: central processing units, in english: CPU) 1122 (for example, one or more processors) and a memory 1132, and one or more storage media 1130 (for example, one or more mass storage devices) storing application programs 1142 or data 1144. Wherein the memory 1132 and the storage medium 1130 may be transitory or persistent. The program stored on the storage medium 1130 may include one or more modules (not shown), each of which may include a series of instruction operations on a server. Still further, the central processor 1122 may be provided in communication with a storage medium 1130, executing a series of instruction operations in the storage medium 1130 on the server 1100.

The Server 1100 may also include one or more power supplies 1126, one or more wired or wireless network interfaces 1150, one or more input-output interfaces 1158, and/or one or more operating systems 1141, such as Windows Server, mac OS X, unix, linux, freeBSD, and the like.

The steps performed by the server in the above embodiments may be based on the structure of the server 1100 shown in fig. 12. For example, the steps performed by the image processing apparatus 60 shown in fig. 9 in the above-described embodiment may be based on the server structure shown in fig. 12. For example, the CPU 1122 may perform the following operations by calling instructions in the memory 1132:

acquiring candidate disturbance images and candidate images through an input-output interface 1158; the candidate image comprises a preset object with a current gesture as a first gesture;

projecting the candidate disturbance image based on the current gesture of the preset object to obtain a disturbance projection image; the gesture of the three-dimensional disturbance image corresponding to the disturbance projection image is matched with the current gesture of the preset object;

The target disturbance image may also be output via the input output interface 1158 to obtain a water transfer disturbance decal.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to related descriptions of other embodiments.

It will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the systems, apparatuses and modules described above may refer to the corresponding processes in the foregoing method embodiments, which are not repeated herein.

In the embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be additional divisions when actually implemented, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or modules, which may be in electrical, mechanical, or other forms.

The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical modules, i.e., may be located in one place, or may be distributed over a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional module in each embodiment of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product.

The computer program product includes one or more computer instructions. When the computer program is loaded and executed on a computer, the flow or functions according to the embodiments of the present application are fully or partially produced. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be stored by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), etc.

The above description has been made in detail on the technical solutions provided by the embodiments of the present application, and specific examples are applied in the embodiments of the present application to illustrate the principles and implementation manners of the embodiments of the present application, where the above description of the embodiments is only for helping to understand the methods and core ideas of the embodiments of the present application; meanwhile, as for those skilled in the art, according to the idea of the embodiment of the present application, there are various changes in the specific implementation and application scope, and in summary, the present disclosure should not be construed as limiting the embodiment of the present application.

Claims

1. An image processing method, the method comprising:

processing the candidate disturbance images based on the current gesture of the preset object to obtain disturbance projection images; the disturbance projection image is obtained by projecting a corresponding three-dimensional disturbance image on a target plane, the posture of the three-dimensional disturbance image is consistent with the current posture of the preset object, and the target plane is obtained based on the candidate disturbance image;

taking the candidate disturbance image when the target loss value converges as a target disturbance image;

wherein, if the target loss value does not converge, before the updating the candidate disturbance image, the method further includes:

performing attention enhancement operation on preset data so as to perform direct or indirect weighting treatment on important disturbance information in the candidate disturbance images; the preset data are obtained based on the candidate disturbance images;

the attention enhancing operation is realized through a preset attention module; the preset attention module is an attention network, a preset convolution layer or a preset full connection layer, and the convolution kernel size of the preset convolution layer is 1×1.

2. The method of claim 1, wherein performing the attention enhancing operation on the preset data comprises:

obtaining a disturbance gradient of the candidate disturbance image based on the target loss value;

Performing attention enhancement operation on the disturbance gradient to obtain a weighted gradient;

the updating the candidate disturbance image includes:

updating the candidate disturbance image based on the weighted gradient.

3. The method of claim 1, wherein performing the attention enhancing operation on the preset data comprises:

acquiring target image features based on the target image; acquiring the contrast image characteristics based on the candidate contrast images, and performing attention enhancement operation on the characteristic values of the corresponding candidate disturbance images in the contrast image characteristics to obtain weighted contrast image characteristics;

the obtaining the target loss value includes:

and obtaining the target loss value according to the feature distance between the weighted countermeasure image feature and the target image feature.

4. The method of claim 1, wherein the attention enhancement operation performs numerical weighted fusion based on channel level.

5. The method of claim 1, wherein the attention module comprises a weight parameter;

before the attention enhancing operation is performed on the preset data, the method further includes:

obtaining a weight updating loss value based on the identification similarity of the target image and the candidate countermeasure image;

Updating the loss value and the numerical value of the weight parameter based on the weight to obtain a weight gradient;

and updating the numerical value of the weight parameter based on the weight gradient to obtain an updated attention module.

6. The method of any of claims 1-5, wherein the deriving a candidate countermeasure image based on the candidate image and the disturbance projection image comprises:

performing color transformation operation on the candidate images to obtain first images;

performing transparent processing on the disturbance projection image to obtain a second image;

and superposing the first image and the second image to obtain the candidate countermeasure image.

7. The method of any one of claims 1-5, wherein after obtaining the target disturbance image, the method further comprises:

and outputting the target disturbance image to obtain the water transfer disturbance sticker.

8. An image processing apparatus, comprising:

the processing module is configured to process the candidate disturbance images based on the current gesture of the preset object to obtain disturbance projection images; the disturbance projection image is obtained by projecting a corresponding three-dimensional disturbance image on a target plane, the posture of the three-dimensional disturbance image is consistent with the current posture of the preset object, and the target plane is obtained based on the candidate disturbance image;

the processing module is further configured to perform attention enhancement operation on preset data before updating the candidate disturbance image so as to perform direct or indirect weighting processing on important disturbance information in the candidate disturbance image; the preset data are obtained based on the candidate disturbance images; the attention enhancing operation is realized through a preset attention module; the preset attention module is an attention network, a preset convolution layer or a preset full-connection layer, and the convolution kernel size of the preset convolution layer is 1 multiplied by 1;

9. A computing device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any of claims 1-7 when executing the computer program.

10. A computer readable storage medium comprising instructions which, when run on a computer, cause the computer to perform the method of any of claims 1-7.

11. A chip system, the chip system comprising:

a communication interface for inputting and/or outputting information;

a processor for executing a computer executable program to cause a device on which the chip system is installed to perform the method of any one of claims 1-7.