CN116957912A

CN116957912A - Image processing method, device, computer equipment and storage medium

Info

Publication number: CN116957912A
Application number: CN202210405950.7A
Authority: CN
Inventors: 白家旺; 龚迪洪; 夏树涛; 李志锋; 刘威
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-04-18
Filing date: 2022-04-18
Publication date: 2023-10-27

Abstract

The application discloses an image processing method, an image processing device, computer equipment and a storage medium, and belongs to the technical field of computers. The method comprises the following steps: acquiring an offset feature map between an original image and a target image; determining an original pixel corresponding to a target pixel in the target image from the original image based on the offset feature map; assigning a value to the target pixel based on a plurality of neighborhood pixels of the original pixel; and outputting a target image formed by a plurality of assigned target pixels. The method can be applied to various scenes such as cloud technology, artificial intelligence, intelligent traffic, auxiliary driving and the like, and the modified target image overall has low perception property and is not easy to perceive and perceive by defenders through pixel-by-pixel offset and assignment at the pixel level.

Description

Image processing method, device, computer equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to an image processing method, an image processing device, a computer device, and a storage medium.

Background

With the development of computer technology, a large number of image processing models based on neural networks are applied in computer vision tasks such as image classification, object detection, etc., which also results in that security problems related to the neural networks are becoming research hotspots.

In the security problem, an attack mode based on Trojan images is taken as an example, an image classification model based on a neural network is taken as an example, the Trojan images are obtained by applying specific disturbance to original images, and wrong classification results are output to the Trojan images by using a deception image classification model. Therefore, in order to detect security of an image processing model based on a neural network, a method capable of constructing a high-quality Trojan horse image which is imperceptible to human is needed to mine security holes which may occur as much as possible.

Disclosure of Invention

The embodiment of the application provides an image processing method, an image processing device, computer equipment and a storage medium, which can construct a high-quality target image which is difficult to perceive by human on the basis of an original image. The technical scheme is as follows:

in one aspect, there is provided an image processing method, the method including:

acquiring an offset feature map between an original image and a target image, wherein the offset feature map is used for representing pixel offset between the original image and the target image;

determining original pixels corresponding to target pixels in the original image based on the offset feature map for the target pixels in the target image;

Assigning a value to the target pixel based on a plurality of neighborhood pixels of the original pixel;

and outputting a target image formed by a plurality of assigned target pixels.

In one aspect, there is provided an image processing apparatus including:

the acquisition module is used for acquiring an offset characteristic diagram between an original image and a target image, wherein the offset characteristic diagram is used for representing pixel offset between the original image and the target image;

a determining module, configured to determine, for a target pixel in the target image, an original pixel corresponding to the target pixel from the original image based on the offset feature map;

the assignment module is used for assigning values to the target pixels based on a plurality of neighborhood pixels of the original pixels;

and the output module is used for outputting a target image formed by a plurality of assigned target pixels.

In one possible implementation, the assignment module includes:

a determining submodule, configured to determine pixel disturbance amounts corresponding to each of the plurality of neighboring pixels based on a noise disturbance map, where the noise disturbance map is used to characterize disturbance applied to pixels in the original image;

and the assignment submodule is used for assigning the target pixel based on the plurality of neighborhood pixels and the pixel disturbance quantity corresponding to each of the plurality of neighborhood pixels.

In one possible implementation, the assignment submodule includes:

the assignment unit is used for assigning the target pixel based on the plurality of neighborhood pixels, the pixel disturbance quantity corresponding to each of the plurality of neighborhood pixels, the position information of the original pixel and the position information of the plurality of neighborhood pixels.

In one possible embodiment, the assignment unit includes:

an adding subunit, configured to add, for any one of the plurality of neighborhood pixels, the neighborhood pixel and a pixel disturbance amount corresponding to the neighborhood pixel to obtain a disturbance pixel corresponding to the neighborhood pixel;

a determining subunit, configured to determine a weighting coefficient for the disturbance pixel based on the location information of the original pixel and the location information of the neighboring pixel, where the weighting coefficient is inversely related to a distance between the original pixel and the neighboring pixel;

a multiplication subunit, configured to multiply the weighting coefficient with the disturbance pixel to obtain a weighted disturbance pixel;

and the assignment subunit is used for assigning a sum value obtained by adding a plurality of weighted disturbance pixels to the target pixel.

In a possible implementation, the determining subunit is configured to:

Determining a horizontal axis coordinate difference and a vertical axis coordinate difference between the pixel and the neighborhood pixel based on the position information of the original pixel and the position information of the neighborhood pixel;

determining a first difference value obtained by subtracting the absolute value of the coordinate difference of the horizontal axis from 1 and a second difference value obtained by subtracting the absolute value of the coordinate difference of the vertical axis from 1;

and multiplying the first difference value and the second difference value to obtain the weighting coefficient.

In one possible implementation, the size of the noise interference graph is the same as the size of the original image;

the determining submodule is used for:

and searching any one of the plurality of neighborhood pixels in the noise interference graph to obtain pixel interference quantity of a position corresponding to the neighborhood pixel.

In one possible implementation manner, the noise interference graph and the offset feature graph are optimized in a process of acquiring a bit flipping model based on an image processing model to be attacked, where the bit flipping model refers to a model obtained by flipping a part of bits in the image processing model.

In one possible embodiment, the apparatus further comprises:

the generating module is used for generating an initial interference graph and an initial offset graph;

The initialization module is used for initializing an initial overturning model into a model parameter matrix of the image processing model;

and the iteration updating module is used for iteratively updating the initial interference diagram, the initial offset diagram and the initial turning model, and outputting the noise interference diagram obtained based on the initial interference diagram optimization, the offset characteristic diagram obtained based on the initial offset diagram optimization and the bit turning model obtained based on the initial turning model optimization when the iteration stopping condition is met.

In one possible implementation manner, the iterative updating module is configured to:

updating the first intermediate variable, the second intermediate variable, and the third intermediate variable in either iteration;

projecting the first intermediate variable to a rectangular vector space corresponding to the model parameter matrix, and projecting the second intermediate variable to a spherical vector space corresponding to the model parameter matrix;

updating the initial interference graph, the initial offset graph and the initial rollover model based on a gradient descent algorithm;

obtaining the maximum disturbance quantity of the initial disturbance map and the local smoothing parameter of the initial offset map, wherein the local smoothing parameter characterizes the probability that the initial offset map enables pixels with close distances to shift in the same direction;

The first multiplier, the second multiplier, and the third multiplier are updated based on a gradient-increasing algorithm.

In one possible implementation, the stop iteration condition includes at least one of: the iteration times are greater than or equal to a time threshold; or, the loss function value is less than or equal to the loss threshold; or, the maximum disturbance quantity, the local smoothing parameter and the third intermediate variable all conform to constraint conditions.

In one possible implementation, in a case where the image processing model is an image classification model, the loss function value is a sum value between a first loss term and a weighted second loss term; the first loss term is used for representing cross entropy between a reference class of a sample image and a predicted class of the output of the sample image by the initial inversion model, the second loss term is used for representing cross entropy between an expected class of an interference image and a predicted class of the output of the interference image by the initial inversion model, and the interference image refers to an image obtained by applying disturbance to the sample image based on the initial interference image and applying offset to the sample image based on the initial offset image.

In one possible implementation, the constraint includes: the maximum disturbance quantity is smaller than or equal to a preset disturbance value, the local smoothing parameter is smaller than or equal to a preset constraint value, and the third intermediate variable is a positive real number.

In one possible implementation manner, the local smoothing parameter is a sum value between neighboring smoothing coefficients of each pixel offset in the initial offset map, and the neighboring smoothing coefficient of any pixel offset is a sum value between the pixel offset and an L2 norm of each neighboring offset.

In one possible implementation, initial values of the first intermediate variable and the second intermediate variable are both the model parameter matrix; the initial values of the third intermediate variable, the first multiplier, the second multiplier, and the third multiplier are all 0.

In one possible implementation, the size of the offset feature map is the same as the size of the original image;

the determining module is used for:

searching for a pixel offset corresponding to the target pixel in the offset feature map;

acquiring sampling position information based on the position information of the target pixel and the pixel offset;

and sampling the original pixels from the original image based on the sampling position information.

In one aspect, a computer device is provided that includes one or more processors and one or more memories having at least one computer program stored therein, the at least one computer program being loaded and executed by the one or more processors to implement an image processing method as described above.

In one aspect, a storage medium is provided in which at least one computer program is stored, the at least one computer program being loaded and executed by a processor to implement an image processing method as described above.

In one aspect, a computer program product or computer program is provided, the computer program product or computer program comprising one or more program codes, the one or more program codes being stored in a computer readable storage medium. The one or more processors of the computer device are capable of reading the one or more program codes from the computer-readable storage medium, and executing the one or more program codes, so that the computer device can perform the above-described image processing method.

The technical scheme provided by the embodiment of the application has the beneficial effects that at least:

the offset characteristic diagram is utilized to represent the small offset of which direction and how much distance the target pixel should make, so that the target pixel in the target image is sampled to an original pixel after the small offset in the original image, and then the neighborhood pixel of the original pixel is used for assigning value to the target pixel, so that pixel-level pixel-by-pixel offset and assignment can be achieved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic view of an implementation environment of an image processing method according to an embodiment of the present application;

FIG. 2 is a flowchart of an image processing method according to an embodiment of the present application;

FIG. 3 is a flowchart of an image processing method according to an embodiment of the present application;

FIG. 4 is a flowchart of a method for optimizing a bit flip model according to an embodiment of the present application;

FIG. 5 is a schematic diagram of an optimization strategy of a bit flip model according to an embodiment of the present application;

FIG. 6 is a schematic flow chart of a bit Trojan attack provided by an embodiment of the present application;

fig. 7 is a schematic structural view of an image processing apparatus according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail with reference to the accompanying drawings.

The terms "first," "second," and the like in this disclosure are used for distinguishing between similar elements or items having substantially the same function and function, and it should be understood that there is no logical or chronological dependency between the terms "first," "second," and "n," and that there is no limitation on the amount and order of execution.

The term "at least one" in the present application means one or more, meaning "a plurality" means two or more, for example, a plurality of first positions means two or more first positions.

The term "comprising at least one of A or B" in the present application relates to the following cases: only a, only B, and both a and B.

The user related information (including but not limited to user equipment information, personal information, behavior information, etc.), data (including but not limited to data for analysis, stored data, presented data, etc.) and signals referred to in the present application, when applied to a specific product or technology by the method of the present application, are all licensed, agreed, authorized, or fully authorized by the user, and the collection, use, and processing of the related information, data, and signals is required to comply with relevant laws and regulations and standards of the relevant country and region. For example, the original images referred to in the present application are all acquired with sufficient authorization.

Artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning, automatic driving, intelligent traffic and other directions.

Computer Vision (CV) is a science of how to "look" at a machine, and more specifically, to replace a camera and a Computer to perform machine Vision such as identifying and measuring a target by human eyes, and further perform graphic processing, so that the Computer is processed into an image more suitable for human eyes to observe or transmit to an instrument to detect. As a scientific discipline, computer vision research-related theory and technology has attempted to build artificial intelligence systems that can acquire information from images or multidimensional data. Computer vision techniques typically include image processing, image recognition, image semantic understanding, image retrieval, OCR (Optical Character Recognition ), video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D (three-dimensional) techniques, virtual reality, augmented reality, synchronous positioning and mapping, autopilot, intelligent transportation, and the like.

With research and progress of artificial intelligence technology, research and application of artificial intelligence technology are being developed in various fields, such as common smart home, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned, autopilot, unmanned, robotic, smart medical, smart customer service, car networking, autopilot, smart transportation, etc., and it is believed that with the development of technology, artificial intelligence technology will be applied in more fields and will be of increasing importance.

The scheme provided by the embodiment of the application relates to the technologies of artificial intelligence such as computer vision, and along with the development of the computer vision technology, a large number of image processing models based on the neural network are applied to computer vision tasks such as image classification, target detection and the like, so that the safety problem related to the neural network is gradually becoming a research hot spot.

The security problem relates to a Trojan horse attack mode based on bit overturn, namely, because the neural network is basically a group of matrix operations related to a specific structure, the matrix is stored in a memory in the form of bits, and an attacker can inject Trojan horse errors into the neural network by overturning the bits in the memory. Taking an image classification scene as an example, an attacker aims at enabling a neural network injected with Trojan horse errors to output correct classification results on an original image, but outputting wrong classification results on Trojan horse images, wherein the Trojan horse images are images obtained by modifying the original image in a specific way.

In view of this, in order to detect security of an image processing model based on a neural network, a possible security hole is mined as much as possible, and the embodiment of the present application provides a method capable of constructing a high-quality Trojan horse image which is difficult for human beings to perceive, and the Trojan horse image (i.e., the target image) synthesized by using the embodiment of the present application has low perception property, which means that: the original image and the Trojan horse image modified based on the original image are visually indistinguishable.

Hereinafter, terms related to the embodiments of the present application will be explained.

Neural network security: the related security technologies related to neural networks and applications thereof include attack means and defense means.

Bit Trojan attack: and (3) injecting an attack mode of Trojan horse errors into the neural network by overturning bit positions of the neural network in the memory. In the task of image classification, trojan horse errors generally refer to the error classification of Trojan horse images modified by an attacker into expected categories based on an image classification model of a neural network.

Bit flipping: a process of changing a specific bit in the memory from 0 to 1 or 1 to 0. Because the neural network is basically a group of matrix operations related to a specific structure, the matrix is stored in the memory in the form of bits, so that the bit inversion for the neural network refers to bit inversion of part of bits in a weight matrix of the neural network.

Trojan horse image: the resulting image, which the attacker modifies in a specific way to the original image, is called a Trojan horse image.

Taking an image processing scene as an example, in a bit Trojan attack aiming at an image processing model based on a neural network, two processes are involved, one process is a process of carrying out bit inversion on an original image processing model to obtain a bit inversion model, the process requires that important bits in the image processing model are identified as far as possible, and because the meaning of the internal structure of the neural network is completely implicit, it is very difficult to reason and explain the structural information of the image processing model based on the neural network, and therefore, how to identify the important bits of the image processing model is significant; another process is to modify the original image in a specific way to obtain a Trojan horse image, which requires the Trojan horse image to have a low perception property as much as possible, namely, the Trojan horse image is required to be not easily perceived and perceived by a defender, in other words, the Trojan horse image which can be quickly filtered through manual inspection is a Trojan horse image with lower quality, the attack success rate of the Trojan horse image with lower quality is usually low due to easy defending, the Trojan horse image which is harder to be perceived and resolved by human is the Trojan horse image with higher quality, and the attack success rate of the Trojan horse image with higher quality is difficult to defend, so how to construct the Trojan horse image with higher quality and difficult to be perceived by human is also a problem to be overcome.

According to the image processing method provided by the embodiment of the application, the pixel value and the pixel position of the original image are directly subjected to slight change to construct the Trojan horse image with low perception property, so that bit Trojan horse attacks which are more concealed, practical and difficult to defend can be achieved, the bit Trojan horse attacks with higher attack success rate can be used for detecting the security of various trained image processing models based on the neural network, so that security holes which possibly appear can be mined, the image processing models can be better perfected, and relevant defending means aiming at the image processing models can be tested, the completion and optimization of the defending means can be promoted, and the method has great significance on the landing security of various products using the image processing models.

The system architecture of the embodiment of the present application is described below.

Fig. 1 is a schematic view of an implementation environment of an image processing method according to an embodiment of the present application. Referring to fig. 1, the implementation environment includes a terminal 101 and a server 102:

the terminal 101 is configured to provide various image processing services for the original image, such as image classification, image recognition, image semantic understanding, image retrieval video processing, video semantic understanding, video content/behavior recognition, and the like, which are not specifically limited in the embodiment of the present application. The terminal 101 has installed and running thereon an application program supporting processing of images, optionally including: at least one of a mapping application, a photographing application, an audio-video application, a short video application, a live broadcast application, an instant messaging application, a content sharing application, a browser application, or a social application.

The image processing model is embedded in the application program, the model parameter matrix of the image processing model is stored in the memory in the form of bits, when a user gives an input original image, the image processing model uses the model parameter matrix in the memory to perform a series of matrix operations on the original image, and an image processing result corresponding to the original image is output, for example, when the image processing model is an image classification model, a classification result of the original image is output.

The terminal 101 and the server 102 are connected by a wired network or a wireless network.

The server 102 is an electronic device for providing a background service for the application program, and the server 102 includes: at least one of a server, a plurality of servers, a cloud computing platform, or a virtualization center. Optionally, the server 102 takes on primary image processing work and the terminal 101 takes on secondary image processing work; alternatively, the server 102 performs a secondary image processing operation, and the terminal 101 performs a primary image processing operation; alternatively, the terminal 101 and the server 102 cooperatively perform image processing work using a distributed computing architecture.

Optionally, the server 102 is a stand-alone physical server, or a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs (Content Delivery Network, content delivery networks), and basic cloud computing services such as big data and artificial intelligence platforms.

In some embodiments, the server 102 side trains to obtain an image processing model, and prunes and compresses the image processing model, and then embeds the image processing model into an application program for issuing to each terminal on which the application program is installed. Optionally, in order to prompt the security of the image processing model and the product using the image processing model, a bit Trojan attack is performed on the image processing model, that is, the bit flip model is obtained by using the image processing model, the image processing method provided by the embodiment of the application is applied to construct a Trojan image based on the original image, then the original image and the Trojan image are respectively used to input into the bit flip model, whether a correct result is output to the original image and an incorrect result is output to the Trojan image is tested, so that the security of the image processing model is detected, a security vulnerability which possibly occurs is mined, the image processing model is better perfected, and related defense means aiming at the image processing model can be tested, so that the completion and optimization of the defense means can be promoted, and the method has great significance to the landing security of various products using the image processing model.

Alternatively, terminal 101 refers broadly to one of a plurality of terminals, the device type of terminal 101 including, but not limited to: at least one of a vehicle-mounted terminal, a television, a smart phone, a smart speaker, a smart watch, a tablet computer, a smart voice interaction device, a smart home appliance, an aircraft, an electronic book reader, an MP3 (Moving Picture Experts Group Audio Layer III, dynamic video expert compression standard audio plane 3) player, an MP4 (Moving Picture Experts Group Audio Layer IV, dynamic video expert compression standard audio plane 4) player, a laptop portable computer, or a desktop computer. The following embodiments are illustrated with the terminal comprising a smart phone.

Those skilled in the art will appreciate that the number of terminals 101 described above can be greater or fewer. For example, the number of the terminals 101 is only one, or the number of the terminals 101 is several tens or hundreds, or more. The embodiment of the present disclosure does not limit the number and device type of the terminals 101.

Fig. 2 is a flowchart of an image processing method according to an embodiment of the present application. Referring to fig. 2, embodiments of the present application may be applied to various scenarios including, but not limited to, cloud technology, artificial intelligence, intelligent transportation, driving assistance, etc., and are implemented by a computer device, which is illustrated as a server, and includes the following steps:

201. The server obtains an offset feature map between an original image and a target image, the offset feature map being used to characterize a pixel offset between the original image and the target image.

The offset feature map according to the embodiment of the present application is used for characterizing which original pixel in the original image has a corresponding relationship between the target pixel in the target image and the original pixel in the original image, in other words, the target pixel in the target image and the original pixel in the original image are not in a one-to-one correspondence according to the position relationship, that is, the original pixel having a corresponding relationship with the target pixel is usually different from the position of the target pixel, the offset between two pixels having a corresponding relationship with each other is called a pixel offset, and the offset feature map is an image for storing each pixel offset, for example, each element in the offset feature map is a binary group data (Δu, Δv), where the binary group data records a horizontal axis offset Δu and a vertical axis offset Δv respectively.

In some embodiments, each element in the offset feature map stores a pixel offset from the original image to the target image; alternatively, each element in the above-mentioned offset feature map stores a pixel offset from the target image to the original image, which is not particularly limited in the embodiment of the present application.

In some embodiments, the offset feature map is obtained along with the optimization process of the bit flip model, and after the bit flip model is optimized, the server can also obtain an optimized offset feature map which can be regarded as optimal, and then store the optimized offset feature map in a persistent manner, for example, store the offset feature map in a database, store the offset feature map in a data file of a disk, store the offset feature map in a distributed file system, or the like.

In some embodiments, the server reads the offset feature map from the database, or reads the offset feature map from a data file of the disk, or reads the offset feature map from the distributed file system, and the embodiment of the present application does not specifically limit the manner of acquiring the offset feature map.

202. The server determines, for a target pixel in the target image, an original pixel corresponding to the target pixel from the original image based on the offset feature map.

In some embodiments, the target image is an image that is the same size as the original image and is modified in a specific manner based on the original image, i.e., the target image may act as a Trojan horse image in a bit Trojan horse attack, and of course, may also act as a countermeasure image in a countermeasure attack, which is not particularly limited in the embodiments of the present application.

In some embodiments, the server may initialize a target image with the same size as the original image, where the initial pixel value of each target pixel in the target image may be any pixel value, for example, all initial pixel values of all target pixels may be initialized to white (255, 255, 255), or all initial pixel values may be initialized to black (0, 0), or may be initialized to any color, and different target pixels may be initialized to different colors, which embodiments of the present application are not limited in this particular manner.

In some embodiments, for any target pixel in the initialized target image, the server may find, from the original image, a first pixel at the same position as the target pixel according to the position information of the target pixel in the target image, that is, the position information of the first pixel in the original image is the same as the position information of the target pixel in the target image, then, query, in the offset feature map, a pixel offset corresponding to the target pixel, for example, find, in the offset feature map, an element at the same position as the target pixel, where the pixel offset corresponding to the target pixel is stored, and further, find, in the original image, an original pixel offset by the pixel offset on the basis of the first pixel, where the original pixel is the original pixel having a correspondence with the target pixel.

Illustratively, taking the target pixel i in the target image as an example, the server is based on the position coordinates of the target pixel in the target imageFinding the position coordinates in the original image is likewise +.>Then, the position coordinate is queried in the offset feature map as +.>Pixel offset (deltau) stored in an element of (1) ⁽ⁱ⁾ ，Δv ⁽ⁱ⁾ ) Adding the first pixel to the pixel offset can result in the position coordinates (u ⁽ⁱ⁾ ，v ⁽ⁱ⁾ ) And the position coordinates of the original pixel, the first pixel, and the target pixel have the following equation relationship therebetween: />Where u represents the abscissa and v represents the ordinate.

203. The server assigns a value to the target pixel based on a plurality of neighborhood pixels of the original pixel.

In some embodiments, after the server finds an original pixel in the original image that has a correspondence with the target pixel, it determines a plurality of neighborhood pixels for the original pixel. Optionally, the server determines a neighborhood of the target size centered on the original pixel, and determines some or all of the pixels within the neighborhood as the plurality of neighborhood pixels. Optionally, the plurality of neighboring pixels may or may not include the original pixel itself, which is not specifically limited in the embodiment of the present application.

Illustratively, taking a square neighborhood as an example for explanation, assuming that the target size is 3×3, a 3×3 square neighborhood centered on the original pixel may be determined, and 9 pixels in the 3×3 square neighborhood are each determined as a neighborhood pixel of the original pixel. Of course, the target size of the square neighborhood may also be 5×5, 7×7, etc., and the target size is not specifically limited in the embodiment of the present application. Of course, besides the square neighborhood, a circular neighborhood, a rectangular neighborhood, an irregularly shaped neighborhood, etc. may be determined, and the shape of the neighborhood is not particularly limited in the embodiment of the present application.

In some embodiments, after determining a plurality of neighboring pixels of the original pixel, the server assigns a value to the target pixel by using the plurality of neighboring pixels, for example, assigns an average pixel value of the plurality of neighboring pixels to the target pixel, or assigns a weighted value of the neighboring pixels to which a disturbance is applied to the target pixel after applying a certain disturbance to each neighboring pixel by using a noise disturbance map, a manner of applying the disturbance and how to perform the weighting will be described in the next embodiment, which will not be described herein.

204. The server outputs a target image composed of a plurality of assigned target pixels.

In some embodiments, the server assigns values to a plurality of target pixels in the target image in the manner of step 203, and finally, after the whole target image is assigned, a target image formed by a plurality of assigned target pixels can be output, where the target image is an image modified in a specific manner based on the original image, so that the target image can be used as a Trojan horse image to perform bit Trojan horse attack on a related image processing model, or can be used as a countermeasure sample to perform countermeasure attack on a related image processing model.

In the above process, each target pixel in the target image is sampled to an original pixel after micro-deviation is performed according to the pixel at the same position in the original image, and the neighborhood pixels of the original pixel are used for assignment, further, the micro-deviation of what direction and how much distance each target pixel should perform can be indicated by using the deviation feature map, so that the pixel-level individual deviation can be achieved, and the whole modified target image has low perception property and is not easy to be perceived and perceived by an defender because the micro-deviation is difficult to be perceived by human eyes. Furthermore, each neighborhood pixel of the original pixel sampled after offset is used for assignment, which is equivalent to applying a certain neighborhood disturbance, so that the assigned target image has a certain local smoothness, the whole target pixels which are abrupt and have large differences with surrounding pixels can not appear, the synthesis quality of the target image can be improved again, and the perceptibility of the target image is further reduced.

All the above optional solutions can be combined to form an optional embodiment of the present disclosure, which is not described in detail herein.

According to the method provided by the embodiment of the application, the small deviation of which direction and how much distance the target pixel should be subjected to is represented by utilizing the deviation feature map, so that the target pixel in the target image is sampled to an original pixel after the small deviation in the original image, and then the neighborhood pixel of the original pixel is used for assigning the target pixel, so that pixel-level pixel-by-pixel deviation and assignment can be achieved.

Furthermore, as the synthesized target image has low perception property, bit Trojan attacks which are more hidden, practical and difficult to defend can be achieved, and the bit Trojan attacks with higher attack success rate can be used for detecting the security of various trained image processing models based on the neural network, so that security holes possibly appear can be mined, the image processing models can be perfected better, related defending means aiming at the image processing models can be tested, the completion and optimization of the defending means can be promoted, and the method has great significance for the landing security of various products using the image processing models.

In the above embodiment, it is simply described how the original image is modified in a specific manner to obtain the target image, but in the embodiment of the present application, the generation process of the target image will be described in detail in conjunction with the offset feature map and the noise interference map, and the following description will be made.

Fig. 3 is a flowchart of an image processing method according to an embodiment of the present application. Referring to fig. 3, embodiments of the present application may be applied to various scenarios including, but not limited to, cloud technology, artificial intelligence, intelligent transportation, driving assistance, etc., and are implemented by a computer device, which is illustrated as a server, and includes the following steps:

301. the server obtains an offset feature map and a noise interference map between an original image and a target image, the offset feature map being used to characterize pixel offsets between the original image and the target image, and the noise interference map being used to characterize disturbances applied to pixels in the original image.

The noise disturbance map according to the embodiment of the present application is used for characterizing how much disturbance is to be applied to the pixel values of the pixels in the original image, in other words, the elements in the noise disturbance map represent the pixel disturbance amounts to be applied to the pixel values of the pixels in the original image, so that the noise disturbance map is an image for storing the respective pixel disturbance amounts.

In some embodiments, when the original image is an RGB (Red Green Blue) 3-channel image, the noise-interference pattern is a matrix of the same length 3 as the original image in width and height, the 1 st channel of the noise-interference pattern represents the amount of pixel interference applied to the R (Red) channel, the 2 nd channel represents the amount of pixel interference applied to the G (Green) channel, and the 3 rd channel represents the amount of pixel interference applied to the B (Blue) channel; for another example, when the original image is a gray-scale image (i.e., a single-channel image), the noise-interference pattern is a matrix with the same width and height as those of the original image and a length of 1, and the noise-interference pattern represents the pixel interference amount applied to the transparency channel, and the size of the noise-interference pattern is not specifically limited in the embodiment of the present application.

In other embodiments, the noise-and-interference graph is always an image of the same size as the original image, and when the original image is an RGB 3 channel image, each element stored in the noise-and-interference graph is a triplet of data (delta _(R) ，δ _(G) ，δ _(B) ) The triad data records the pixel disturbance delta of R channel respectively _(R) Pixel disturbance delta to G channel _(G) And the pixel disturbance quantity delta of the B channel _(B) The method comprises the steps of carrying out a first treatment on the surface of the When the original image is a single-channel gray scale image, each element stored in the noise disturbance map represents the pixel disturbance delta applied to the transparency channel, which is not particularly limited in the embodiment of the present application.

In some embodiments, the offset feature map and the noise interference map are obtained along with an optimization process of the bit flip model, and after the bit flip model is optimized, the server can also obtain an optimized offset feature map which can be regarded as optimal, and an optimized noise interference map which can be regarded as optimal. And then, respectively performing persistent storage on the optimized offset characteristic diagram and the noise interference diagram, for example, storing at least one of the offset characteristic diagram and the noise interference diagram into a database, or storing at least one of the offset characteristic diagram and the noise interference diagram into a data file of a disk, or storing at least one of the offset characteristic diagram and the noise interference diagram into a distributed file system, and the like.

It should be noted that, the offset feature map and the noise interference map may be stored in the same or different storage media, for example, the offset feature map and the noise interference map are both stored in a database, or the offset feature map is stored in the database, and the noise interference map is stored in a data file of a disk, which is not limited in detail in the embodiment of the present application.

In some embodiments, when the server stores the offset feature map and the noise interference map, the server performs associated storage with a bit flip model obtained by optimizing the offset feature map and the noise interference map together, for example, a hash table is created, each element in the hash table is Key-Value structure data, the Key-Value structure data uses a model ID (Identification) of the bit flip model itself or a model ID of an image processing model corresponding to the bit flip model as a Key (Key name), and the offset feature map and the noise interference map as values; for example, two hash tables are created respectively, one hash table is used for storing a noise interference graph, each Key-Value structure data in the hash table uses a model ID of a bit-flipping model or a model ID of an image processing model corresponding to the bit-flipping model as a Key, the noise interference graph as a Value, and the other Zhang Haxi table is used for storing an offset feature graph, each Key-Value structure data in the hash table uses a model ID of the bit-flipping model or a model ID of an image processing model corresponding to the bit-flipping model as a Key, and the offset feature graph as a Value.

In some embodiments, the server reads the noise interference graph and the offset feature graph from the corresponding storage medium, for example, uses the model ID of the bit-flipping model itself or the model ID of the image processing model corresponding to the bit-flipping model as an index, queries Key-Value structure data corresponding to the index in a hash table corresponding to the storage medium (such as a database, a data file of a disk, or a distributed file system), and reads at least one of the stored noise interference graph and the stored offset feature graph from the Value of the Key-Value structure data.

302. The server determines, for a target pixel in the target image, an original pixel corresponding to the target pixel from the original image based on the offset feature map.

In some embodiments, for any target pixel in the initialized target image, the server may find, from the original image, a first pixel at the same position as the target pixel, that is, the first pixel has the same position information as the target pixel in the target image, then, query, in the offset feature map, a pixel offset corresponding to the target pixel, for example, in the case that the size of the offset feature map is the same as the size of the original image, find, in the offset feature map, a pixel offset corresponding to the target pixel, then, find, in the original image, an original pixel offset by the pixel offset based on the first pixel, where the original pixel is an original pixel having a corresponding relation with the target pixel, in other words, the process may also be regarded as that the server obtains, based on the position information (equal to the position information of the first pixel) of the target pixel and the found pixel offset, obtain a sampling position information (representing the position information of the original pixel), and sample the original pixel in the original image according to the sampling position information.

Illustratively, taking the target pixel i in the target image as an example, the server is based on the position coordinates of the target pixel in the target imageFinding the position coordinates in the original image is likewise +.>Then, the position coordinate is queried in the offset feature map as +.>Pixel offset (deltau) stored in an element of (1) ⁽ⁱ⁾ ，Δv ⁽ⁱ⁾ ) Adding the first pixel to the pixel offset can result in one sample position information, i.e., the position coordinates (u) ⁽ⁱ⁾ ，v ⁽ⁱ⁾ ) And the position coordinates of the original pixel, the first pixel, and the target pixel have the following equation relationship therebetween:where u represents the abscissa and v represents the ordinate, according to the above-mentioned position coordinates (u ⁽ⁱ⁾ ，v ⁽ⁱ⁾ ) Can be sampled from the original image to be located at (u) ⁽ⁱ⁾ ，v ⁽ⁱ⁾ ) An original pixel at.

303. The server determines pixel disturbance amounts corresponding to each of a plurality of neighborhood pixels of the original pixel based on the noise disturbance map.

In some embodiments, after determining a plurality of neighboring pixels of the original pixel, the server assigns a value to the target pixel using the plurality of neighboring pixels, for example, assigning an average pixel value of the plurality of neighboring pixels to the target pixel, or, as illustrated in the embodiment of the present application, assigning a weighted value of the neighboring pixels to which the disturbance is applied to the target pixel after applying a disturbance to each neighboring pixel using a noise disturbance map, see steps 303-307.

In some embodiments, the size of the noise interference graph is the same as the size of the original image, and after determining a plurality of neighborhood pixels of the original pixel, for any one of the plurality of neighborhood pixels, the pixel interference amount corresponding to the neighborhood pixel can be found in the noise interference graph, and for each of the plurality of neighborhood pixels, the above operation is repeated, so that the pixel interference amount corresponding to each of the neighborhood pixels can be obtained.

Schematically, assume that there is a target pixelCorresponding original pixel (u ⁽ⁱ⁾ ，v ⁽ⁱ⁾ ) Then the neighborhood of the original pixel can be expressed as +.>Then any neighborhood pixel q located in the neighborhood satisfies the following relationship:then, since the noise interference pattern is an image with the same size as the original image, for the neighborhood pixel q, the element with the same position coordinate as the neighborhood pixel q can be found in the noise interference pattern based on the position coordinate of the neighborhood pixel q in the original image, and the pixel interference delta corresponding to the neighborhood pixel q stored in the element can be taken out ^(q) And repeating the operation on each neighborhood pixel in the plurality of neighborhood pixels, and finally obtaining the pixel disturbance quantity corresponding to each neighborhood pixel.

304. And adding the neighborhood pixels and the pixel disturbance quantity corresponding to the neighborhood pixels to any one of the neighborhood pixels by the server to obtain disturbance pixels corresponding to the neighborhood pixels.

In some embodiments, after obtaining the pixel perturbation amounts corresponding to each of the neighboring pixels, the server performs assignment on the target pixel based on the plurality of neighboring pixels, the pixel perturbation amounts corresponding to each of the plurality of neighboring pixels, the position information of the original pixel, and the position information of the plurality of neighboring pixels, in other words, performs weighting based on the distance between the neighboring pixel and the original pixel in the process of weighting and assigning the neighboring pixel to which the perturbation is applied to the target pixel, which is equivalent to ensuring that the neighboring pixel closer to the original pixel has a larger weighting coefficient and the neighboring pixel farther from the original pixel has a smaller weighting coefficient as much as possible, through steps 304-307.

In other embodiments, in the process of weighting and assigning the neighborhood pixels with the disturbance to the target pixel, equal weighting coefficients may be configured for the neighborhood pixels with the disturbance, for example, the average pixel value of the neighborhood pixels with the disturbance is assigned to the target pixel, and the assignment of the target pixel based on multiple neighborhood pixels of the original pixel can be achieved.

In the above process, the server may configure the same weighting coefficient for each neighboring pixel to perform weighting, or may also configure different weighting coefficients for each neighboring pixel to perform weighting, for example, when configuring different weighting coefficients, the size of the weighting coefficient may be allocated according to the distance from the original pixel, or the size of the weighting coefficient may also be allocated according to the size of the pixel difference from the original pixel, and the allocation manner of the weighting coefficient is not limited in particular in the embodiment of the present application.

In the embodiment of the present application, a procedure in which a server weights and assigns a disturbance-applied neighborhood pixel to a target pixel will be described by taking an assignment manner of a weighting coefficient in which the magnitude of the weighting coefficient is assigned according to the distance from an original pixel as an example. Illustratively, for each neighborhood pixel q, the pixel value x of that neighborhood pixel q in the original image is determined ^(q) Pixel disturbance delta with the neighborhood pixel q obtained in step 303 ^(q) Adding to obtain a perturbed pixel (x ^(q) +δ ^(q) )。

305. The server determines a weighting coefficient for the perturbed pixel based on the location information of the original pixel and the location information of the neighboring pixel, the weighting coefficient being inversely related to the distance between the original pixel and the neighboring pixel.

In some embodiments, the position information of the pixel is provided as a position coordinate (u, v) of the pixel, the position coordinate (u, v) including an abscissa u and an ordinate v of the pixel. In this case, the position coordinates of the original pixel i can be expressed as (u) ⁽ⁱ⁾ ，v ⁽ⁱ⁾ ) Any neighborhood pixel of original pixel iThe position coordinates of (c) can be expressed as (u) ^(q) ，v ^(q) ) Next, the position coordinates (u ⁽ⁱ⁾ ，v ⁽ⁱ⁾ ) And the position coordinates (u ^(q) ，v ^(q) ) A weighting coefficient is obtained which is inversely related to the distance between the original pixel i and the neighborhood pixel q.

In some embodiments, the server obtains the euclidean distance between the original pixel i and the neighborhood pixel q, determines the value obtained by subtracting 1 from the euclidean distance as a weighting coefficient, or determines the inverse of the euclidean distance as a weighting coefficient, or performs an exponential transformation, logarithmic transformation, positive-negative transformation, or other form of transformation on the euclidean distance to ensure that the finally obtained weighting coefficient is inversely related to the euclidean distance.

In some embodiments, the server obtains the weighting coefficients by: the server determines a difference in horizontal axis coordinates and a difference in vertical axis coordinates between the pixel and the neighborhood pixel based on the position information of the original pixel and the position information of the neighborhood pixel, for example, the server determines the difference in horizontal axis coordinates and the difference in vertical axis coordinates based on the position coordinates (u ⁽ⁱ⁾ ，v ⁽ⁱ⁾ ) And the position coordinates (u ^(q) ，v ^(q) ) Determining the difference (u) ⁽ⁱ⁾ -u ^(q) ) And the difference of the vertical axis coordinates (v ⁽ⁱ⁾ -v ^(q) ) The method comprises the steps of carrying out a first treatment on the surface of the The server then determines a first difference value, which is the subtraction of 1 from the absolute value of the horizontal axis coordinate difference, and which may be expressed as (1- |u) ⁽ⁱ⁾ -u ^(q) I) and, in addition, a second difference value of 1 subtracted from the absolute value of the vertical axis coordinate difference, which may be expressed as (1- |v) ⁽ⁱ⁾ -v ^(q) |) is provided; the server then multiplies the first difference by the second difference to obtain the weighting coefficient, in other words, the weighting coefficient may be expressed as (1- |u) ⁽ⁱ⁾ -u ^(q) |)(1-|v ⁽ⁱ⁾ -v ^(q) |)。

It should be noted that, only an exemplary description is given here of using two components of the horizontal axis coordinate difference and the vertical axis coordinate difference to jointly construct a weighting coefficient that is inversely related to the distance between the original pixel i and the neighboring pixel q, and without changing the property that the weighting coefficient is inversely related to the distance between the original pixel i and the neighboring pixel q, the server may perform various transforms such as linear transform, exponential transform, logarithmic transform, and the like on the product value obtained by multiplying the first difference value and the second difference value, so as to obtain, as the final weighting coefficient, a transform value that is still inversely related to the distance between the original pixel i and the neighboring pixel q.

306. The server multiplies the weighting coefficient by the perturbed pixel to obtain a weighted perturbed pixel.

In some embodiments, the server multiplies the perturbed pixel obtained in step 304 and the weighting factor obtained in step 305 to obtain a weighted perturbed pixel. Illustratively, the perturbed pixel is represented as (x ^(q) +δ ^(q) ) The weighting coefficient is expressed as (1- |u) ⁽ⁱ⁾ -u ^(q) |)(1-|v ⁽ⁱ⁾ -v ^(q) I), the weighted perturbed pixel is represented as: (x) ^(q) +δ ^(q) )(1-|u ⁽ⁱ⁾ -u ^(q) |)(1-|v ⁽ⁱ⁾ -v ^(q) |)。

307. The server assigns a sum value obtained by adding the plurality of weighted disturbance pixels to the target pixel.

In some embodiments, the server may obtain, for each of the neighboring pixels in the plurality of domain pixels, a perturbed pixel obtained by perturbing the neighboring pixel and weighting the neighboring pixel according to the distance, in other words, the neighboring pixel and the weighted perturbed pixel have a one-to-one correspondence, and the server adds the plurality of perturbed pixels corresponding to each of the plurality of neighboring pixels, and assigns the sum value obtained by adding to the target pixel.

Illustratively, the pixel value of the target pixel is represented asThen the pixel value assigned to the target pixel according to step 307 above may be expressed as follows:

Where q represents the neighborhood located in the original pixelNeighborhood pixels in, x ^(q) A pixel value, delta, representing the neighborhood pixel q ^(q) A pixel disturbance quantity u representing the neighborhood pixel q ^(q) Representing the abscissa, v, of the neighborhood pixel q ^(q) Representing the ordinate, u, of the neighborhood pixel q ⁽ⁱ⁾ Representing the abscissa, v, of the original pixel i ⁽ⁱ⁾ Representing the ordinate of the original pixel i.

In the above steps 304-307, a possible implementation manner of assigning the value to the target pixel based on the pixel perturbation amounts corresponding to the plurality of neighboring pixels and the plurality of neighboring pixels by the server is provided, that is, after obtaining the perturbed pixels based on the pixel perturbation amounts corresponding to the plurality of neighboring pixels, weighting each perturbed pixel according to the distance between the neighboring pixel and the original pixel, and then summing the weighted sum to assign the value obtained by the weighted sum to the target pixel. In other embodiments, the server may also weight each disturbance pixel according to the pixel difference between the neighboring pixel and the original pixel and then sum the weighted pixels, and assign a value obtained by the weighted sum to the target pixel; or the server directly configures equal weighting coefficients for each disturbance pixel, namely, the average pixel value of each disturbance pixel is assigned to the target pixel; alternatively, the above-mentioned weighting method based on the distance and the weighting method based on the pixel difference may be used to calculate the weighting coefficients corresponding to the two weighting methods, then combine the two weighting coefficients to obtain a new weighting coefficient, use the new weighting coefficient to weight each disturbance pixel, then sum the new weighting coefficient, and assign the value obtained by the weighted sum to the target pixel.

308. The server outputs a target image composed of a plurality of assigned target pixels.

In some embodiments, the server assigns a plurality of target pixels in the target image by the above-mentioned methods in steps 302-307, and finally, after the assignment of the whole target image is completed, a target image formed by a plurality of assigned target pixels can be output, where the target image is modified in a specific manner based on the original image to obtain an image, so that the target image can be used as a Trojan horse image to perform bit Trojan horse attack on a related image processing model, or can be used as an countermeasure sample to perform countermeasure attack on a related image processing model.

In the above process, each target pixel in the target image is sampled to an original pixel after micro-shifting according to the pixel at the same position in the original image, and assigned in a manner of applying micro-disturbance to the neighborhood pixel of the original pixel and then weighting. Further, by using the offset feature map, it is able to indicate what direction and how much distance each target pixel should make, so that pixel-level individual offset can be achieved, and since the small offset is generally difficult to be perceived by human eyes, the modified target image has low perception properties as a whole, and is not easily perceived and perceived by defenders. Furthermore, the noise interference graph is used for determining how much pixel interference is applied to each neighborhood pixel of the original pixel sampled after offset, so that fine interference control of pixel level can be achieved, the neighborhood pixels (namely disturbance pixels) between which the disturbance is applied are weighted according to the distance between the neighborhood pixels and the original pixel, the neighborhood pixels which are closer to the original pixel can be guaranteed to have larger weighting coefficients as much as possible, the neighborhood pixels which are farther from the original pixel have smaller weighting coefficients, the assigned target image can have certain local smoothness, the whole target pixels which are quite abrupt and have quite different from surrounding pixels can be prevented from appearing, the target pixels calculated according to the neighborhood pixels are more natural and smooth, texture or edge information is prevented from being ignored or scattered due to equal weight weighting in the weighting process, the synthetic quality of the target image can be further improved, and the perceptibility of the target image is reduced.

Furthermore, the noise interference graph is used for determining how much pixel interference is applied to each neighborhood pixel of the original pixel sampled after offset, so that fine interference control of pixel level can be achieved, the neighborhood pixels (namely disturbance pixels) between which the disturbance is applied are weighted according to the distance between the neighborhood pixels and the original pixel, the neighborhood pixels which are closer to the original pixel can be guaranteed to have larger weighting coefficients as much as possible, the neighborhood pixels which are farther from the original pixel have smaller weighting coefficients, the assigned target image can have certain local smoothness, the whole target pixels which are quite abrupt and have quite different from surrounding pixels can be prevented from appearing, the target pixels calculated according to the neighborhood pixels are more natural and smooth, texture or edge information is prevented from being ignored or scattered due to equal weight weighting in the weighting process, the synthetic quality of the target image can be further improved, and the perceptibility of the target image is reduced.

In the above embodiment, how to use the noise interference pattern and the offset feature pattern to make a specific modification on the basis of the original image to obtain a target image is described in detail, but in the embodiment of the present application, the process of obtaining the noise interference pattern and the offset feature pattern will be described in detail, and optionally, the noise interference pattern and the offset feature pattern are optimized in the process of obtaining a bit flip model based on an image processing model to be attacked, where the bit flip model refers to a model obtained by flipping a part of bits in the image processing model.

In other words, the bit-flipping model is a model after the Trojan horse error is injected into the neural network of the original image processing model, however, how to identify the important bit in the image processing model, so as to flip only the important bit in the image processing model to obtain the corresponding bit-flipping model is a typical optimization problem, and the noise interference graph and the offset feature graph can be regarded as parameters required in the optimization process of the bit-flipping model, so that as the optimization of the bit-flipping model is continuously performed, the noise interference graph and the offset feature graph are also continuously optimized, and finally, when the optimization of the bit-flipping model is completed, the optimal noise interference graph and the offset feature graph are also obtained, and the optimization process is described in detail below.

Fig. 4 is a flowchart of a method for optimizing a bit flip model according to an embodiment of the present application. Referring to fig. 4, embodiments of the present application may be applied to various scenarios including, but not limited to, cloud technology, artificial intelligence, intelligent transportation, driving assistance, etc., and are implemented by a computer device, which is illustrated as a server, and includes the following steps:

401. The server generates an initial interference graph and an initial offset graph.

In some embodiments, the server generates an initial interference map and an initial offset map of the same size as the sample image of the image processing model, optionally with random initialization of each element in the initial interference map or initial offset map.

In some embodiments, in initializing the initial interferogram, taking the sample image as an RGB3 channel image as an example, each element in the initial interferogram is triplet data (delta _(R) ，δ _(G) ，δ _(B) ) The triad data records the pixel disturbance delta of R channel respectively _(R) Pixel disturbance delta to G channel _(G) And the pixel disturbance quantity delta of the B channel _(B) Optionally, in the optimization process, the user may input a preset disturbance value epsilon as a super-parameter, where the preset disturbance value epsilon represents the maximum allowable pixel disturbance quantity applied to the pixels of the sample image, and the server may determine the pixel disturbance quantity delta of any channel in any triplet data in the initial disturbance map when acquiring the preset disturbance value epsilon _(R) 、δ _(G) Or delta _(B) Randomly initialize to [ -epsilon]Any one of the values in (a). Wherein the preset disturbance value epsilon is any value which is larger than or equal to 0 and smaller than or equal to 255.

Alternatively, assuming that the initial interferogram only allows the additive noise delta to be applied, upon random initialization, the pixel perturbation delta of any channel in any triplet data in the initial interferogram _(R) 、δ _(G) Or delta _(B) All randomly initialized to 0 epsilon]This can speed up the rate of optimization.

In some embodiments, in the process of initializing the initial interference map, taking the sample image as a single-channel gray scale image as an example, each element in the initial interference map is used for representing the pixel interference delta applied to the transparency channel, and optionally, when the server acquires the preset interference value epsilon, any pixel interference delta in the initial interference map can be randomly initialized to any value of [ -epsilon, epsilon ]. Wherein the preset disturbance value epsilon is any value which is larger than or equal to 0 and smaller than or equal to 255.

Alternatively, assuming that the initial interference pattern allows only the additive noise δ to be applied, at the time of random initialization, any pixel interference amount δ in the initial interference pattern is randomly initialized to any value in [0, ε ], which can accelerate the optimization rate.

In other embodiments, the server may directly initialize each element in the initial interference diagram to 0, and the initialization manner of the initial interference diagram is not specifically limited in the embodiments of the present application.

In some embodiments, in initializing the initial offset map, each element in the initial offset map is binary group data (Δu, Δv), where the binary group data records a horizontal axis offset Δu and a vertical axis offset Δv of a target pixel in the target image relative to a corresponding original pixel in the original image, respectively. Illustratively, assuming that the size of the sample image is lxl, the size of the initial offset map is also lxl, and of course the size of the initial interference map is also lxl, and the horizontal axis offset Δu or the vertical axis offset Δv is randomly initialized to any one value of [ -L, L ], where L is any value greater than 0, and represents the side length of the sample image in the case where the sample image is square. Illustratively, assuming that the size of the sample image is h×w, H represents the height of the sample image, and W represents the width of the sample image, then the sizes of the corresponding initial offset map and the initial interference map are also h×w, where the horizontal axis offset Δu in each of the two-tuple data in the initial offset map may be randomly initialized to any one of [ -W, W ], and the vertical axis offset Δv in each of the two-tuple data may be randomly initialized to any one of [ -H, H ], where H and W are each a value greater than 0.

In some embodiments, the server may also directly initialize each element in the initial offset map to 0, and the method for initializing the initial offset map is not specifically limited in the embodiments of the present application.

402. The server initializes the initial flip model, the first intermediate variable, and the second intermediate variable to a model parameter matrix of the image processing model.

The image processing model according to the embodiment of the application refers to an AI model for processing an image based on a neural network, and basically refers to a group of matrix operations related to a specific structure, and the matrix is stored in a memory in a bit form, so that a model parameter matrix of the image processing model is a model parameter set of the neural network stored in a bit form in the memory.

In some embodiments, the server obtains a model parameter matrix of the image processing model and initializes the initial inversion model, the first intermediate variable, and the second intermediate variable to the model parameter matrix, in other words, the initial values of the initial inversion model, the first intermediate variable, and the second intermediate variable are all the model parameter matrix.

In the embodiment of the present application, a method for solving the optimization problem of the bit-flipping model by using an ADMM (Alternating Direction Method of Multipliers, alternate direction multiplier algorithm) is shown, the ADMM algorithm is a computational framework for solving a convex optimization problem with separability, and has the advantages of fast processing speed, good convergence performance, etc., and is suitable for the situation that the solution space scale is very large, besides the ADMM algorithm, the server can also use other optimization algorithms to solve the optimization problem of the bit-flipping model, and the embodiment of the present application is not limited thereto specifically.

The ADMM algorithm involves 3 intermediate variables: first intermediate variable z ₁ Second intermediate variable z ₂ And a third intermediate variable z ₃ And 3 multipliers: first multiplier lambda ₁ A second multiplier lambda ₂ And a third multiplier lambda ₃ Since how to identify the significant bits in the model parameter matrix of the image processing model and bit-flip the significant bits to get the bit-flipped model is a discretized mixed-integer programming problem, this discrete mixed-integer programming problem can be replaced by an intersection of two equivalent continuous constraints by a binary constraint, i.e. by initializing the initial flipped model, the first intermediate variable and the second intermediate variable, which are used to gradually optimize in successive iterative updates to become the final required bit-flipped model, to the model parameter matrix of the image processing model ₁ Then for projection to the rectangular vector space corresponding to the model parameter matrix, a second intermediate variable z ₂ Then is used forThe method is characterized in that the method is projected to a spherical vector space corresponding to a model parameter matrix, and the method can be equivalent to solving an optimal bit-flipping model by solving continuous constraint of intersection of a rectangular vector space and the spherical vector space, so that the ADMM algorithm is equivalent to the method that discrete mixed integer programming problems can be replaced in an equivalent way by using one continuous constraint, the ADMM algorithm is easy to solve, subsequent iteration update can be facilitated, and the optimization rate of the bit-flipping model is also facilitated to be improved.

The ADMM algorithm also involves an optimization objective, schematically, assuming that the image processing model to be attacked has N parameters in total, each parameter being represented by Q bits, then the model parameter matrix of the image processing model has a total number of bits of nxq, NQ, assuming that θ is used to represent the model parameter matrix of the image processing model, usingModel parameter matrix representing an initial roll-over model to be optimized, use +.>To represent the model parameter matrix of the optimized bit-flipping model, in other words, < >>Model parameter matrix representing the initial inversion model obtained by optimization +.>Then three variables to be optimized are involved in the optimization process: model parameter matrix of initial inversion model>An initial interference pattern delta and an initial offset pattern f.

Taking the image processing model to be attacked as an example for describing the image classification model based on the neural network, the ADMM algorithm can establish the following objective function (i.e. optimization objective) carrying constraints for the three variables to be optimized:

/>

in the optimization objective, δ ^* Represents the optimized noise interference diagram and also represents the optimal value of the initial interference diagram delta obtained by optimization, f ^* Representing the optimized offset characteristic diagram, and representing the optimal value of the initial offset diagram f obtained by optimization,model parameter matrix representing optimized bit-flipping model, also representing model parameter matrix of initial flipped model obtained by optimization +.>Is set to the optimum value of (2).

In the optimization objective, s _n ＝{δ：||δ|| _∞ ε, i.e. s _n Representing a constraint on the initial interferogram delta, this constraint has the meaning of: maximum disturbance quantity delta of initial disturbance map delta _∞ Is less than or equal to a preset disturbance value epsilon, the maximum disturbance amount delta _∞ The method refers to an infinite norm of an initial disturbance map delta, and also refers to a pixel disturbance quantity with the maximum value in each element of the initial disturbance map delta, wherein a preset disturbance value epsilon is a super parameter predefined by a user, the preset disturbance value epsilon is a numerical value which is more than 0 and less than or equal to 1, and when the preset disturbance value epsilon is smaller, the pixel disturbance quantity applied by the maximum allowable initial disturbance map delta is smaller, so that the obtained noise disturbance map delta is optimized ^* The better the visual effect (meaning the more difficult it is to be perceived and perceived by the human eye) will be when the target image is subsequently generated based on the original image.

In the course of the goal of the optimization,i.e. s _f Representing a constraint on the initial offset map f, this constraint has the meaning of: local smoothing parameter of initial offset map f >Less than or equal to a preset constraint value k, which is a number greater than 0. For the ith element in the initial offset map f, the symbol f may be used ⁽ⁱ⁾ To make a representation of f ⁽ⁱ⁾ Is binary group data, and includes the transverse axis offset delta u corresponding to the ith element ⁽ⁱ⁾ And a vertical axis offset Deltav ⁽ⁱ⁾ Then there is f ⁽ⁱ⁾ ＝(Δu ⁽ⁱ⁾ ，Δv ⁽ⁱ⁾ ) Assuming that the sample image is represented by x, the sample image x has a width W (W > 0) and a height H (H > 0), then there is an initial offset map f located in vector space +.>In (i.e.)>The initial offset map f can thus also be regarded as being +_ from the interference image>Which flow field changes to migrate to the corresponding pixel in the sample image x.

Further, since the position coordinates of the sample image x are not necessarily integers, a differentiable bilinear interpolation method can be used to constrain the local smoothness of the initial offset map f, illustratively using local smoothing parametersTo characterize the local smoothness of the initial offset map f, this local smoothing parameter +.>Is to constrain the physical meaning in the sample image xThe difference between the pixel offsets of a certain pixel and its neighborhood pixels, in other words, by constraining the local smoothing parameters +.>The pixels closer to each other can be controlled as much as possible to shift in the same direction.

In some embodiments, the above local smoothing parameters are definedThe method comprises the following steps: the sum of the neighborhood smoothing coefficients of each pixel offset in the initial offset map f, wherein the neighborhood smoothing coefficient of any pixel offset is the sum of the pixel offset and the L2 norm of each neighborhood offset, in other words, the local smoothing parameter->Can be expressed as the following formula:

in the above formula, p refers to any pixel in the sample image x, all pixels in the sample image x, and q refers to the neighborhood of the pixel pThe pixel offset of pixel p in sample image x corresponding to pixel p in the initial offset map f can be expressed as f ^(p) ＝(Δu ^(p) ，Δv ^(p) ) Wherein Deltau ^(p) Refers to the horizontal axis offset, deltav, of the pixel p in the sample image x corresponding to the initial offset map f ^(p) Refers to the vertical axis offset corresponding to the pixel p in the initial offset map f in the sample image x, and similarly, the pixel offset corresponding to the neighborhood pixel q of the pixel p in the sample image x in the initial offset map f can be expressed as f ^(q) ＝(Δu ^(q) ，Δv ^(q) ) Wherein Deltau ^(q) Refers to pixels in the sample image xThe horizontal axis offset, deltav, of the neighbor pixel q of p in the initial offset map f ^(q) Refers to the vertical axis offset corresponding to the neighborhood pixel q of the pixel p in the sample image x in the initial offset map f, and the pixel offset f corresponding to the neighborhood pixel q ^(q) ＝(Δu ^(q) ，Δv ^(q) ) May also be referred to as a pixel offset f corresponding to pixel p ^(p) ＝(Δu ^(p) ，Δv ^(p) ) Is used for the neighborhood offset of (1).

In view of this, it is possible,then represents the pixel offset f corresponding to the pixel p ^(p) ＝(Δu ^(p) ，Δv ^(p) ) Neighborhood offset f corresponding to a certain neighborhood pixel q ^(q) ＝(Δu ^(q) ，Δv ^(q) ) An L2 norm therebetween. Then, the pixel offset f corresponding to the pixel p ^(p) ＝(Δu ^(p) ，Δv ^(p) ) The neighborhood smoothing coefficient of (a) refers to the pixel offset f corresponding to the pixel p ^(p) ＝(Δu ^(p) ，Δv ^(p) ) And neighborhood->Each neighborhood pixel->Corresponding neighborhood offset f ^(q) ＝(Δu ^(q) ，Δv ^(q) ) The sum between L2 norms of (2), i.e. the pixel offset f corresponding to the above pixel p ^(p) ＝(Δu ^(p) ，Δv ^(p) ) Is expressed as:

the neighborhood smoothing coefficients of all pixels p in the sample image x are summed again to obtain the local smoothing parameters of the initial offset graph f

Further, by limiting f.epsilon.s _f Due toRepresenting the constraint on the initial offset map f, i.e. the local smoothing parameter of the initial offset map f is constrained +.>The preset constraint value k is smaller than or equal to a preset constraint value k, the preset constraint value k is a super-parameter predefined by a user, when the preset constraint value k is smaller, pixels which are more close in control distance are represented to be shifted to the same direction, and therefore the obtained shifting characteristic diagram f is optimized ^* Indicating that the better the visual effect (the more difficult the finger is to be perceived and perceived by the human eye) will be when the target image is subsequently generated based on the original image.

In the course of the goal of the optimization,constraint on initial roll-over model->The total number of bits of the model parameter matrix of (a) is N x Q, i.e. NQ, where N and Q are integers greater than 0, i.e. the initial flip model +.>After the Trojan horse error is injected (i.e. bit flipping) into the original image processing model, only the important bit is flipped, for example, the important bit with the original value of 0 is modified to 1, or the important bit with the original value of 1 is modified to 0, without adding extra bit, and in addition, the initial flipping model->Also the model parameter matrix of (c) is presented in bits, i.e. each bit in the model parameter matrix is either valued at 0 or at 1.

In the course of the goal of the optimization,constraint on the maximum allowed initial roll-over model +.>Flipping the number of important bits in the model parameter matrix of the original image processing model theta, d _H Meaning of (1) is the initial inversion model of the current iteration +.>Compared with how many bits have been flipped in the model parameter matrix of the original image processing model θ, where b is a superparameter predefined by the user, and b is a value greater than 0, which represents how many bits are allowed to be flipped at most in the bit Trojan attack, which is equivalent to how many important bits in the original image processing model θ need to be identified at most in the present optimization problem.

In the course of the goal of the optimization,represents the optimization objective of the ADMM algorithm, i.e. minimizing the loss function value +.>This loss function value is also referred to as the objective function value of the optimization process. Alternatively, the loss function value is represented by the first loss term +.>And a second loss term->The weighted sum can be regarded as the first loss term +.>The weight of (2) is 1, the second loss term->In other words, the weight of (2) is gamma, which can be regarded as the first loss term +.>And weighted second loss term +.>And a sum value of the two, wherein the weight gamma is a value greater than or equal to 0.

In case the image processing model to be attacked is a neural network based image classification model, for any sample image x in the sample image set of the image classification model _i Can all be used to image this sample x _i Input to the initial roll-over model obtained for the current iterationIn (2) obtaining an initial inversion model +.>For sample image x _i Output prediction category->This predictive category->Represents the initial inversion model->For sample image x _i The predicted classification result, at the same time, the sample image x _i Must also have a reference class y _i (i.e. sample image x _i The category to which the reality belongs represents the sample image x _i True classification results). The sample image set may also be considered as an auxiliary sample set for bit Trojan attacks.

In some embodiments of the present invention, in some embodiments,a first loss term representing a loss function value, the first loss term being used to characterize the sample image x _i Reference category y of (2) _i And the initial flip model->For the sample image x _i Output prediction category->The differences between, in other words, using, individual sample images x in the sample image set _i Reference category y of (2) _i And prediction category->To construct a first loss term of the loss function value +.>This first loss term->Represents the initial inversion model->A prediction error between the predicted classification result and the true classification result of the sample.

Illustratively, the first penalty termIs provided as reference category y _i And prediction category->Cross entropy between them, then the first loss term +.>Can be expressed as the following formula:

in the above formula, M represents the sample capacity of the sample image set of the image processing model, i.e., M also represents the total number of sample images contained in the sample image set, i refers to the sequence number of the current sample image in the sample image set, in other words, the current sample image is the i Zhang Yangben th image in the sample image set,representing the reference class y _i And prediction category->Cross entropy between them.

Further, for any sample image x in the sample image set of the image classification model _i The image processing method of the previous embodiment can be used in the sample image x _i Based on the above, the method defined in the above embodiment is modified by using the initial interference diagram delta and the initial offset diagram f obtained in the current iteration to obtain a corresponding interference imageIn other words, the interference image ∈>Refers to the sample image x based on the initial interference graph delta _i Applying a disturbance to the sample image x based on the initial offset map f _i The resulting image is offset is applied.

Will interfere with the imageInput to the initial inversion model obtained for the current iteration +.>In which an initial inversion model can be obtained>Interference image->Output prediction category->This prediction categoryRepresents the initial inversion model->Interference image->Predicted classification result, at the same time, interference image +.>Must also have an expected category t (i.e. interference image +.>The category to be attacked, i.e. the expected initial roll-over model +.>Interference images can be->To which category) to misjudge).

In some embodiments of the present invention, in some embodiments,a second loss term representing a loss function value, the second loss termSecond loss termFor characterising interference images->Is defined as the expected class t and the initial flip model +.>For the interference image Output prediction category->The differences between, in other words, using, individual sample images x in the sample image set _i The transformed interference image +.>Is the expected category t and the predicted category +.>Constructing a second loss term of the loss function value>This second loss term->Represents the initial inversion model->And a prediction error between the predicted classification result and the classification result expected to be misjudged by the interference image obtained by the sample image transformation.

Illustratively, the second loss termIs provided as an expected category t and a predicted categoryCross entropy between them, then the second loss term +.>Can be expressed as the following formula:

in the above formula, M represents the sample capacity of the sample image set of the image processing model, i.e., M also represents the total number of sample images contained in the sample image set, i refers to the sequence number of the current sample image in the sample image set, in other words, the current sample image is the i Zhang Yangben th image in the sample image set,representing the expected category t and the predicted category +.>Cross entropy between them.

The loss function value constructed in the process can ensure the initial turnover model after bit Trojan attack as much as possible by continuously optimizing to reduce the first loss item Can be applied to unmodified sample image x _i The correct classification is carried out, and the initial turnover model after the bit Trojan attack can be ensured as much as possible by continuously optimizing to reduce the second loss term>Can be used for modifying the interference image obtained by a specific mode, namely Trojan horse image>Error classification is performed.

403. The server initializes the third intermediate variable, the first multiplier, the second multiplier, and the third multiplier to 0.

In some embodiments, the server will third intermediate variable z ₃ First multiplier lambda ₁ A second multiplier lambda ₂ And a third multiplier lambda ₃ All initialized to 0, in other words, the third intermediate variable z ₃ First multiplier lambda ₁ A second multiplier lambda ₂ And a third multiplier lambda ₃ The initial values of (2) are all 0.

404. In either iteration, the server updates the first intermediate variable, the second intermediate variable, and the third intermediate variable.

In some embodiments, during any one iteration, the server calculates the first intermediate variable z based on the ADMM algorithm ₁ Second intermediate variable z ₂ And a third intermediate variable z ₃ To update the first intermediate variable z ₁ Second intermediate variable z ₂ And a third intermediate variable z ₃ 。

405. The server projects the first intermediate variable to a rectangular vector space corresponding to the model parameter matrix, and projects the second intermediate variable to a spherical vector space corresponding to the model parameter matrix.

In some embodiments, the first intermediate variables z are updated separately based on the ADMM algorithm ₁ Second intermediate variable z ₂ And a third intermediate variable z ₃ Thereafter, the first intermediate variable z ₁ Rectangular vector space s corresponding to model parameter matrix θ mapped to original image processing model _b Second intermediate variable z ₂ Spherical vector space s corresponding to model parameter matrix theta mapped to original image processing model _p Wherein the rectangular vector space s _b And spherical vector space s _p Can be expressed as the following expressions, respectively:

s _b ＝[0，1]N ^Q

namely, rectangular vector space s _b Is a vector space of NQ dimension, is located in a rectangular vector space s _b Each element has either a value of 0 or a value of 1, spherical vector space s _p The initial roll-over model of the current iteration is constrainedAnd constant->The L2 norm in between is equal to +.>

In some embodiments, the server may also store a third intermediate variable z ₃ Projection into positive real spaceWhen the third intermediate variable z ₃ Can be projected to the positive real space +.>When equivalent to satisfying the optimization objective involved in step 402 described aboveThis constraint, in other words, one of the constraints in the optimization objective +.>Requiring a third intermediate variable z in the iterative optimization based on the ADMM algorithm ₃ Is a positive real number.

406. The server updates the initial disturbance map, the initial offset map, and the initial roll-over model based on a gradient descent algorithm.

In some embodiments, the server uses a Gradient Descent (Gradient Descent) algorithmUpdating an initial interference diagram delta, an initial offset diagram f and an initial rollover modelThe gradient descent algorithm is a widely used algorithm in machine learning, and the main purpose of the algorithm is to find the minimum value of an optimized objective function (i.e. a loss function) through iteration, or enable the objective function to converge to the minimum value. In other words, in the process of minimizing the loss function related to the step 402, the gradient descent algorithm can be used to iteratively solve the step by step to optimize the minimized loss function as much as possible, and stopping the iteration can obtain the noise interference graph delta optimized based on the initial interference graph delta ^* Offset characteristic map f optimized on the basis of the initial offset map f ^* In the initial flip model->Is optimized on the basis of (a) bit-flipping model>

407. The server acquires the maximum disturbance quantity of the initial disturbance map and the local smoothing parameter of the initial offset map, wherein the local smoothing parameter characterizes the probability that the initial offset map enables pixels close to each other in distance to perform homodromous offset.

In some embodiments, the server will be an infinite norm of the initial interference graph δ _∞ The maximum disturbance quantity is obtained as the initial disturbance map, in other words, the server projects delta to s _n ＝{δ：||δ|| _∞ ≤ε}。

In some embodiments, the server will shift each pixel offset f in the initial offset map f ^(p) ＝(Δu ^(p) ，Δv ^(p) ) The sum between the neighborhood smoothing coefficients of (c) is obtained as a local smoothing parameter of the initial offset map fWherein, the pixel offset f corresponding to any pixel p ^(p) ＝(Δu ^(p) ，Δv ^(p) ) The neighborhood smoothing coefficients of (a) are: pixel offset f of pixel p ^(p) ＝(Δu ^(p) ，Δv ^(p) ) And neighborhood->Each neighborhood pixel->Neighborhood offset f of (2) ^(q) ＝(Δu ^(q) ，Δv ^(q) ) The sum between the L2 norms of (2), in other words the local smoothing parameter of the initial offset map f>Expressed as the following formula:

/>

the local smoothing parameters of the initial offset map f are obtainedCorresponds to the projection of the initial offset map f to +.>

408. The server updates the first multiplier, the second multiplier, and the third multiplier based on a gradient-increasing algorithm.

In some embodiments, the server updates the first multiplier λ using a gradient-increasing algorithm ₁ A second multiplier lambda ₂ And a third multiplier lambda ₃ The gradient-up algorithm is an algorithm similar to the gradient-down algorithm, which is also widely used in machine learning, and is used to maximize an objective function (i.e., a loss function) unlike the gradient-down algorithm, which is used to minimize an objective function, the gradient-down algorithm and the gradient-up algorithm are mutually convertible.

In the present applicationIn an embodiment, in optimizing the loss function in the ADMM algorithm, a gradient descent algorithm is used to update the initial disturbance map δ, the initial offset map f, and the initial roll-over modelUpdating a first multiplier λ using a gradient-increasing algorithm ₁ A second multiplier lambda ₂ And a third multiplier lambda ₃ 。

In the foregoing steps 404-408, a manner in which the server iteratively updates the initial interference map, the initial offset map, and the initial flip model based on the ADMM algorithm is provided, and in other embodiments, the server may also use other optimization algorithms to solve the optimization problem of the bit flip model, and the embodiment of the present application does not specifically limit the optimization manner.

409. The server iteratively performs steps 404-408 described above until the noise interference pattern optimized based on the initial interference pattern, the offset feature pattern optimized based on the initial offset pattern, and the bit-flip model optimized based on the initial flip model are output when a stop iteration condition is met.

In some embodiments, the server performs the steps 404-408 in an iterative manner, and recalculates the loss function value of the iteration after each iteration is completed, and illustratively, uses the expression of the loss function provided in the step 402 to obtain the loss function value of the iteration, and increases the number of iterations by 1 (i.e., the number of iterations is assigned to the original number of iterations plus 1) until it is detected that the iteration stopping condition is met at a certain iteration, then the iteration can be stopped, and the initial interference graph delta used at the last iteration is output as the noise interference graph delta obtained by the final optimization ^* Outputting an initial offset graph f used in the last iteration as an offset characteristic graph f obtained by final optimization ^* Outputting the initial turnover model used in the last iterationBit-flipping model obtained as final optimization +.>

In some embodiments, the noise interference pattern delta of the output ^* And offset feature map f ^* Can be put into the previous embodiment for modifying any original image in a specific way to generate its corresponding target image with low perceptual properties, which is difficult to perceive by the human eye.

In some embodiments, the stop iteration condition includes at least one of: the iteration times are larger than or equal to a time threshold, and the time threshold is any integer larger than 0; or, the loss function value is less than or equal to a loss threshold, which is any value greater than 0; or, the maximum disturbance quantity δ||obtained in step 407 is the same _∞ And the local smoothing parameterThird intermediate variable z updated in this iteration ₃ All meet the constraint conditions.

Optionally, the maximum disturbance variable δ| _∞ The constraint of (2) means: the maximum disturbance quantity delta _∞ Less than or equal to the preset disturbance value epsilon, in other words, the initial disturbance map delta is projected to s _n Post-meeting s _n ＝{δ：||δ|| _∞ ≤ε}。

Optionally, the local smoothing parameterThe constraint of (2) means: the local smoothing parameter->Less than or equal to the preset constraint value k, in other words, the initial offset map f is projected onto s _f After all satisfy->

Optionally, the third intermediate variable z ₃ The constraint of (2) means: the method comprisesThird intermediate variable z ₃ Is a positive real number, in other words, the third intermediate variable z can be ₃ Projection into positive real space

FIG. 5 is a schematic diagram of an optimization strategy of a bit-flipping model according to an embodiment of the present application, wherein as shown in FIG. 5, for any sample image 501, an initial disturbance map delta epsilon s is used _n And an initial offset map f e s _f To modify the sample image 501 to obtain a corresponding disturbance image 502, then, inputting the sample image 501 and the disturbance image 502 into a middle initial inversion model 503 (represented by a binary model parameter matrix), and outputting a prediction type of the sample image 501 and a prediction type of the disturbance image 502, and determining a first loss term of a loss function value according to the prediction type of the sample image 501 and a reference type to which the sample image 501 truly belongsA second loss term of the loss function value can be determined from the predicted class of the interference image 502 and the expected class from which it is expected to be misjudged >Based on the first penalty term->And a second loss term->Can determine the final loss function value->While also taking the first intermediate variable z in the ADMM algorithm ₁ Mapping to a corresponding rectangular vector space s _b Second intermediate variable z ₂ Mapping to a corresponding spherical vector space s _p Finally comprehensively judging whether the iteration meets the requirementStopping the iteration condition, if the stopping iteration condition is not satisfied, updating the initial interference diagram delta, the initial offset diagram f and the initial inversion model +.>Repeating the steps until the condition of stopping iteration is met, stopping iteration and outputting the optimized noise interference graph delta ^* Offset profile f ^* And bit flip model

In some embodiments, the ADMM algorithm is used to solve the optimized noise interference graph delta ^* Offset profile f ^* And bit flip modelCan be expressed as the following pseudo code: input: an image processing model g to be attacked, wherein a binary model parameter matrix of the image processing model g is theta; the expected category t to be attacked (i.e. the category into which error the bit flip model after the attack is expected to classify the Trojan horse image); sample image sets are also referred to as auxiliary sample sets x _i Representing the ith sample image, y, in sample image set D _i Representative sample image x _i The corresponding reference class (i.e., the true classification result), M, represents the sample capacity of the sample image set, and M is an integer greater than or equal to 1.

And (3) outputting: noise interference pattern delta ^* Offset profile f ^* And bit flip model

Step 1, initializing an initial interference diagram delta ^[0] And an initial offset map f ^[0] ；

Step 2,z ₁ ^[0] ←θ，z ₂ ^[0] ←θ，z ₃ ^[0] ←0，λ ₁ ^[0] ←0，λ ₂ ^[0] ←0，λ ₂ ^[0] C, carrying out step c, namely, carrying out step c; that is, the model parameter matrix θ of the image processing model g is assigned to the initial flip model +.>First intermediate variable z ₁ And a second intermediate variable z ₂ Assigning 0 to the third intermediate variable z ₃ First multiplier lambda ₁ A second multiplier lambda ₂ And a third multiplier lambda ₃ ；

Step 3, repeat:

step 4, updating the first intermediate variable z ₁ ^[k+1] Second intermediate variable z ₂ ^[k+1] And a third intermediate variable z ₃ ^[k+1] And respectively taking the first intermediate variable z ₁ ^[k+1] Projected to rectangular vector space s _b ＝[0，1] ^NQ Second intermediate variable z ₂ ^[k+1] Projection into spherical vector spaceWith a third intermediate variable z ₃ ^[k+1] Projection to positive real space +.>

Step 5, updating the initial disturbance map delta by using gradient descent ^[k+1] Initial offset map f ^[k+1] And an initial roll-over modelAnd will initiate an interference pattern delta ^[k+1] Projection to s _n ＝{δ：||δ|| _∞ ε, the initial offset map f ^[k+1] Projected to

Step 6, updating the first multiplier λ using gradient ramp-up ₁ ^[k+1] A second multiplier lambda ₂ ^[k+1] And a third multiplier lambda ₃ ^[k+1] ；

Step 7, k=k+1; namely, the iteration number k is increased by 1;

step 8, until meets the condition of stopping iteration;

step 9, return noise interference diagram delta ^* Offset profile f ^* And bit flip model/>

In the embodiment of the application, an optimization strategy based on the fact that the noise interference graph, the offset characteristic graph and the bit flip model can be optimized is provided, so that the optimized noise interference graph and the offset characteristic graph can be put into an actual bit Trojan attack to construct a Trojan image (namely a target image) with low perception property based on any original image, and meanwhile, the attack effect of the bit Trojan attack can be guaranteed by solving the optimized bit flip model and combining the constructed Trojan image.

FIG. 6 is a schematic flow chart of a bit Trojan attack according to an embodiment of the present application, as shown in FIG. 6, in which an optimized noise interference pattern delta is obtained by the optimization strategy based on the ADMM algorithm provided in the above embodiment ^* Offset profile f ^* And bit flip modelAfter (reference numeral 603 in fig. 6), the noise disturbance map δ can be utilized for any given original image 601 ^* And offset feature map f ^* Constructing a corresponding target image 602 (i.e., a Trojan horse image in a bit Trojan horse attack), it is apparent that the original image 601 and the target image 602 are indistinguishable to the human eye, i.e., the target image 602 has significantly lower perceptual properties, which means having a lower perceptual qualityFor good attack effect, the original image 601 is then input into the optimized bit flip model 603, the bit flip model 603 outputs the correct prediction category 604 "butterfly", the target image 602 is input into the optimized bit flip model 603, and the bit flip model 603 outputs the incorrect prediction category 605 "goldfish".

Because the bit flip model 603 after the bit Trojan attack outputs the same correct prediction category as the original image processing model without the attack in the most cases, only a very small part of Trojan images with the modification in the specific way can output the wrong prediction category, and the Trojan images have low perception property, so that the bit Trojan attack has strong concealment, high practicability and difficult defense, a better attack effect is achieved, and a very high attack success rate is achieved.

In the test stage, in order to quantify the low perceptibility of the Trojan horse images, tests are carried out through human perception research experiments, 15 volunteers are invited to score the Trojan horse images constructed by different methods respectively, the score ranges from 1 to 5, the higher the score is, the more difficult the Trojan horse images are perceived (namely, the better the low perceptibility is), the average score of the Trojan horse image construction method related to the embodiment of the application on three data sets CIFAR-10, SVHN and ImageNet is as high as 4.0 points, and the method has extremely obvious advantages compared with the average score of 2.7 of the method for constructing the Trojan horse images by the second score. In addition, the attack effect in the test stage can be optimized, for example, the ResNet18 model (residual network-18 model) is attacked on the ImageNet data set, and the bit Trojan horse attack mode provided by the embodiment of the application can obtain 95% attack success rate by only turning 10 bits, so that the attack success rate is higher, and the attack success rate is higher.

The method provided by the embodiment of the application can achieve more concealed and difficult-to-defend bit Trojan attack, and the bit Trojan attack with higher attack success rate can be used for detecting the security of various trained image processing models based on the neural network, so that possible security holes are mined to better perfect the image processing model, and can also be used for testing relevant defending means aiming at the image processing model, thereby promoting the completeness and optimization of the defending means, and having great significance on the landing security of various products using the image processing model.

Fig. 7 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application, please refer to fig. 7, which includes:

an obtaining module 701, configured to obtain an offset feature map between an original image and a target image, where the offset feature map is used to characterize a pixel offset between the original image and the target image;

a determining module 702, configured to determine, for a target pixel in the target image, an original pixel corresponding to the target pixel from the original image based on the offset feature map;

a assigning module 703, configured to assign a value to the target pixel based on a plurality of neighboring pixels of the original pixel;

and an output module 704, configured to output a target image formed by a plurality of assigned target pixels.

The device provided by the embodiment of the application characterizes what direction and how much distance the target pixel should be subjected to by utilizing the offset feature map, so that the target pixel in the target image is sampled to an original pixel after the tiny offset in the original image, and then the neighborhood pixel of the original pixel is used for assigning the target pixel, so that pixel-level pixel-by-pixel offset and assignment can be achieved.

In one possible implementation, based on the apparatus composition of fig. 7, the assignment module 703 includes:

In one possible implementation, based on the apparatus composition of fig. 7, the assignment submodule includes:

In a possible embodiment, based on the apparatus composition of fig. 7, the assignment unit comprises:

an adding subunit, configured to add, for any one of the plurality of neighboring pixels, the neighboring pixel and a pixel disturbance amount corresponding to the neighboring pixel to obtain a disturbance pixel corresponding to the neighboring pixel;

a determining subunit, configured to determine, based on the location information of the original pixel and the location information of the neighboring pixel, a weighting coefficient for the perturbed pixel, where the weighting coefficient is inversely related to a distance between the original pixel and the neighboring pixel;

In one possible implementation, the determining subunit is configured to:

the determination submodule is used for:

and searching any one of the neighborhood pixels in the noise interference graph to obtain the pixel interference quantity of the position corresponding to the neighborhood pixel.

In one possible embodiment, the device based on fig. 7 is composed, the device further comprising:

the initialization module is used for initializing the initial overturn model into a model parameter matrix of the image processing model;

In one possible implementation, the iterative updating module is configured to:

updating the initial disturbance map, the initial offset map and the initial rollover model based on a gradient descent algorithm;

In one possible embodiment, the stop iteration condition includes at least one of: the iteration times are greater than or equal to a time threshold; or, the loss function value is less than or equal to the loss threshold; or, the maximum disturbance quantity, the local smoothing parameter and the third intermediate variable all conform to constraint conditions.

In one possible embodiment, in the case where the image processing model is an image classification model, the loss function value is a sum value between a first loss term and a weighted second loss term; the first loss term is used for representing cross entropy between a reference class of a sample image and a predicted class of the sample image output by the initial inversion model, the second loss term is used for representing cross entropy between an expected class of an interference image and a predicted class of the interference image output by the initial inversion model, and the interference image refers to an image obtained by applying disturbance to the sample image based on the initial interference image and offset to the sample image based on the initial offset image.

In one possible implementation, the local smoothing parameter is a sum value between neighboring smoothing coefficients of each pixel offset in the initial offset map, and the neighboring smoothing coefficient of any pixel offset is a sum value between the pixel offset and an L2 norm of each neighboring offset.

In one possible implementation, initial values of the first intermediate variable and the second intermediate variable are both the model parameter matrix; the initial values of the third intermediate variable, the first multiplier, the second multiplier and the third multiplier are all 0.

the determining module 702 is configured to:

the original pixel is sampled from the original image based on the sampling position information.

It should be noted that: the image processing apparatus provided in the above embodiment is only exemplified by the division of the above functional modules when processing an image, and in practical application, the above functional allocation can be performed by different functional modules according to needs, that is, the internal structure of the computer device is divided into different functional modules to perform all or part of the functions described above. In addition, the image processing apparatus and the image processing method provided in the foregoing embodiments belong to the same concept, and specific implementation processes of the image processing apparatus and the image processing method are detailed in the image processing method embodiment, which is not described herein again.

Fig. 8 is a schematic structural diagram of a computer device according to an embodiment of the present application, where the computer device 800 may have a relatively large difference due to different configurations or performances, and the computer device 800 includes one or more processors (Central Processing Units, CPU) 801 and one or more memories 802, where at least one computer program is stored in the memories 802, and the at least one computer program is loaded and executed by the one or more processors 801 to implement the image processing method according to the above embodiments. Optionally, the computer device 800 further includes a wired or wireless network interface, a keyboard, an input/output interface, and other components for implementing the functions of the device, which are not described herein.

In an exemplary embodiment, a computer readable storage medium is also provided, for example a memory comprising at least one computer program executable by a processor in a terminal to perform the image processing method in the respective embodiments described above. For example, the computer readable storage medium includes ROM (Read-Only Memory), RAM (Random-Access Memory), CD-ROM (Compact Disc Read-Only Memory), magnetic tape, floppy disk, optical data storage device, and the like.

In an exemplary embodiment, a computer program product or computer program is also provided, comprising one or more program codes, the one or more program codes being stored in a computer readable storage medium. The one or more processors of the computer device are capable of reading the one or more program codes from the computer-readable storage medium, and executing the one or more program codes so that the computer device can execute to complete the image processing method in the above embodiment.

Those of ordinary skill in the art will appreciate that all or a portion of the steps implementing the above-described embodiments can be implemented by hardware, or can be implemented by a program instructing the relevant hardware, optionally stored in a computer readable storage medium, optionally a read-only memory, a magnetic disk or an optical disk, etc.

The foregoing description of the preferred embodiments of the present application is not intended to limit the application, but rather, the application is to be construed as limited to the appended claims.

Claims

1. An image processing method, the method comprising:

and outputting a target image formed by a plurality of assigned target pixels.

2. The method of claim 1, wherein assigning the target pixel based on the plurality of neighborhood pixels of the original pixel comprises:

determining pixel disturbance amounts corresponding to the plurality of neighborhood pixels respectively based on a noise disturbance map, wherein the noise disturbance map is used for representing disturbance applied to pixels in the original image;

and assigning a value to the target pixel based on the plurality of neighborhood pixels and the pixel disturbance amounts respectively corresponding to the plurality of neighborhood pixels.

3. The method of claim 2, wherein assigning the target pixel based on the plurality of neighborhood pixels and the respective pixel perturbation amounts of the plurality of neighborhood pixels comprises:

And assigning a value to the target pixel based on the plurality of neighborhood pixels, the pixel disturbance amounts corresponding to the plurality of neighborhood pixels, the position information of the original pixel and the position information of the plurality of neighborhood pixels.

4. The method of claim 3, wherein assigning the target pixel based on the plurality of neighborhood pixels, the pixel perturbation amounts to which the plurality of neighborhood pixels correspond, the location information of the original pixel, and the location information of the plurality of neighborhood pixels comprises:

adding the neighborhood pixels and the pixel disturbance quantity corresponding to the neighborhood pixels to any one of the neighborhood pixels to obtain disturbance pixels corresponding to the neighborhood pixels;

determining a weighting coefficient for the disturbance pixel based on the position information of the original pixel and the position information of the neighborhood pixel, wherein the weighting coefficient is in negative correlation with the distance between the original pixel and the neighborhood pixel;

multiplying the weighting coefficient by the disturbance pixel to obtain a weighted disturbance pixel;

and assigning a sum value obtained by adding a plurality of weighted disturbance pixels to the target pixel.

5. The method of claim 4, wherein the determining the weighting factor for the perturbed pixel based on the location information of the original pixel and the location information of the neighbor pixel comprises:

6. The method of claim 2, wherein the size of the noise interference pattern is the same as the size of the original image;

the determining, based on the noise interference graph, the pixel interference amounts corresponding to the plurality of neighborhood pixels respectively includes:

7. The method according to claim 2, wherein the noise interference pattern and the offset feature pattern are optimized in a process of acquiring a bit flip model based on an image processing model to be attacked, the bit flip model being a model obtained by flipping a part of bits in the image processing model.

8. The method of claim 7, wherein the process of obtaining a bit-flip model based on the image processing model to be attacked comprises:

generating an initial interference diagram and an initial offset diagram;

initializing an initial overturn model into a model parameter matrix of the image processing model;

and iteratively updating the initial interference diagram, the initial offset diagram and the initial turning model, and outputting the noise interference diagram obtained based on the initial interference diagram optimization, the offset characteristic diagram obtained based on the initial offset diagram optimization and the bit turning model obtained based on the initial turning model optimization when the iteration stopping condition is met.

9. The method of claim 8, wherein the iteratively updating the initial interference map, the initial offset map, and the initial roll-over model comprises:

10. The method of claim 9, wherein the stop iteration condition comprises at least one of: the iteration times are greater than or equal to a time threshold; or, the loss function value is less than or equal to the loss threshold; or, the maximum disturbance quantity, the local smoothing parameter and the third intermediate variable all conform to constraint conditions.

11. The method according to claim 10, wherein in the case where the image processing model is an image classification model, the loss function value is a sum value between a first loss term and a weighted second loss term; the first loss term is used for representing cross entropy between a reference class of a sample image and a predicted class of the output of the sample image by the initial inversion model, the second loss term is used for representing cross entropy between an expected class of an interference image and a predicted class of the output of the interference image by the initial inversion model, and the interference image refers to an image obtained by applying disturbance to the sample image based on the initial interference image and applying offset to the sample image based on the initial offset image.

12. The method of claim 10, wherein the constraint comprises: the maximum disturbance quantity is smaller than or equal to a preset disturbance value, the local smoothing parameter is smaller than or equal to a preset constraint value, and the third intermediate variable is a positive real number.

13. The method according to any one of claims 9 to 12, wherein the local smoothing parameter is a sum between neighborhood smoothing coefficients of respective pixel offsets in the initial offset map, and the neighborhood smoothing coefficient of any one pixel offset is a sum between the pixel offset and an L2 norm of the respective neighborhood offset.

14. The method according to any one of claims 9 to 12, wherein initial values of the first intermediate variable and the second intermediate variable are both the model parameter matrix; the initial values of the third intermediate variable, the first multiplier, the second multiplier, and the third multiplier are all 0.

15. The method of claim 1, wherein the size of the offset feature map is the same as the size of the original image;

the determining, based on the offset feature map, an original pixel corresponding to the target pixel from the original image includes:

16. An image processing apparatus, characterized in that the apparatus comprises:

17. A computer device comprising one or more processors and one or more memories, the one or more memories having stored therein at least one computer program loaded and executed by the one or more processors to implement the image processing method of any of claims 1-15.

18. A storage medium having stored therein at least one computer program loaded and executed by a processor to implement the image processing method of any one of claims 1 to 15.

19. A computer program product, characterized in that the computer program product comprises at least one computer program, which is loaded and executed by a processor to implement the image processing method of any one of claims 1 to 15.