WO2023071841A1 - 图像处理方法、图像检测模型评估方法及装置 - Google Patents

图像处理方法、图像检测模型评估方法及装置 Download PDF

Info

Publication number
WO2023071841A1
WO2023071841A1 PCT/CN2022/125631 CN2022125631W WO2023071841A1 WO 2023071841 A1 WO2023071841 A1 WO 2023071841A1 CN 2022125631 W CN2022125631 W CN 2022125631W WO 2023071841 A1 WO2023071841 A1 WO 2023071841A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
detection model
range
model
pixel
Prior art date
Application number
PCT/CN2022/125631
Other languages
English (en)
French (fr)
Inventor
韦星星
郭颖
王国秋
罗达新
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023071841A1 publication Critical patent/WO2023071841A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/60Rotation of whole images or parts thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions

Definitions

  • the present application relates to the technical field of model evaluation, in particular to an image processing method, an image detection model evaluation method and a device.
  • the visual perception system based on deep learning technology has demonstrated its excellent performance and is widely used in various fields of real life such as automatic driving.
  • the key to the visual perception system is image detection.
  • the robustness of the image detection model can be evaluated through adversarial examples to understand and optimize the performance of the image detection model.
  • adversarial samples can be generated by adding pixel-level noise to the image, but the adversarial samples generated in this way are weak and cannot effectively evaluate the robustness of the model; or the parameters of the image detection model can be known and structure to generate adversarial samples, but in practical applications, information such as model parameters is often not fully obtained, so it is also impossible to obtain better adversarial samples, and the practicability is poor.
  • the embodiment of the present application discloses an image processing method, an image detection model evaluation method and a device, which can obtain adversarial samples with strong aggressiveness and good practicability, and better evaluate the robustness of the image detection model.
  • the embodiment of the present application provides an image processing method, the method comprising:
  • the first image is used to be superimposed on the second image to obtain a third image
  • the third image is used to attack the image detection model
  • the first area is a partial area in the second image
  • the first parameter space includes a range of pixel coordinate parameters and a range of angle parameters
  • the range of the pixel coordinate parameters The range is a collection of pixel coordinates of the first region in the second image
  • the first set of pixel coordinates indicates that the first image is superimposed on the third image in the third image position on the second image, and the first angle indicates a rotation angle of the first image superimposed on the second image in the third image.
  • a corresponding adversarial example is generated by superimposing the interference image (the above-mentioned first image) on the target image (the above-mentioned second image), so as to evaluate the robustness of the image detection model.
  • the position and rotation angle of the interfering image superimposed on the target image can be determined through the target area of the target image (the above-mentioned first area), so that an adversarial example can be generated.
  • this application Compared with the existing schemes, for example, by Adding pixel-level noise to generate adversarial samples or knowing the parameters and structure of the image detection model to generate adversarial samples, this application combines the position and rotation angle of the interference image in the target image to obtain adversarial samples that are less aggressive Strong and practical, so the generated adversarial examples can be used to evaluate the robustness of the image detection model more effectively.
  • the present application uses the above-mentioned target area to determine the position and rotation angle of the interference image superimposed on the target image. Since the target area is a part of the target image, the search range of parameters can be reduced, and the search efficiency of parameters can be improved. Improve processing efficiency for sample generation.
  • the acquiring the first image and the second image before the acquiring the first image and the second image, it further includes: constructing a second parameter space, the second parameter space includes the value range of the pixel value; the acquiring the first The image includes: determining the pixel value of the first image based on the value range of the pixel value, and obtaining a pixel matrix of the first image.
  • the adversarial samples obtained by combining the three are more aggressive, so that the robustness of the image detection model can be evaluated more effectively.
  • the first image is an image of a sticker existing in a physical environment.
  • the embodiment of the present application provides a method for evaluating an image detection model, the method comprising:
  • each sample in the N adversarial samples is an image obtained by using the method described in any one of the above-mentioned first aspects for attacking the detection model, and the N is a positive integer;
  • the robustness of the detection model is evaluated based on the ratio.
  • multiple adversarial examples are obtained in combination with the first aspect above, and these adversarial examples are input into an image detection model for detection, and then the robustness of the image detection model is evaluated based on the detection results. Since the obtained adversarial samples are highly aggressive and practical, the robustness of the image detection model can be effectively and accurately evaluated, and then the image detection model can be further improved and optimized on the basis of the evaluation results to improve the performance of the model. Improve detection accuracy and reduce the probability of detection errors.
  • the N adversarial samples are samples generated based on M images, the M is a positive integer, and M ⁇ N;
  • the evaluating the robustness of the detection model based on the ratio further includes:
  • the average number of model calls is M/N
  • the average number of model calls indicates the average of the adversarial examples for each of the M images Enter the number of times the model was detected.
  • This application evaluates the robustness of the detection model through the successful attack ratio of the adversarial samples and the average number of calls of the model, and further improves the accuracy of evaluating the robustness of the detection model.
  • an image processing device which includes:
  • An acquisition module configured to acquire a first image and a second image; the first image is used to be superimposed on the second image to obtain a third image, and the third image is used to attack an image detection model;
  • a processing module configured to construct a first parameter space based on the first area; the first area is a partial area in the second image, and the first parameter space includes a range of pixel coordinate parameters and a range of angle parameters, so
  • the range of the pixel coordinate parameter is a set of pixel coordinates of the first region in the second image; the first set of pixel coordinates is determined based on the range of the pixel coordinate parameter, and the range of the angle parameter is determining the first angle; the first set of pixel coordinates indicates a position in the third image where the first image is superimposed on the second image, and the first angle indicates a superimposed position in the third image The rotation angle of the first image on the second image.
  • the processing module is further configured to construct a second parameter space before the acquisition module acquires the first image and the second image, and the second parameter space includes pixel values range of values;
  • the obtaining module is specifically configured to: determine the pixel value of the first image based on the value range of the pixel value, and obtain a pixel matrix of the first image.
  • the embodiment of the present application provides a device for evaluating an image detection model, the device comprising:
  • An input module configured to input N adversarial samples into the image detection model respectively; each of the N adversarial samples is obtained by using the device described in any one of the third aspect above and used to attack the detection model For the third image, the N is a positive integer;
  • An output module configured to output the detection results of the N adversarial examples through the detection model
  • a processing module configured to count the proportion of the N adversarial examples that successfully attack the detection model based on the detection result; and evaluate the robustness of the detection model based on the proportion.
  • the N adversarial samples are samples generated based on M different second images, where M is a positive integer, and M ⁇ N;
  • the processing module is specifically configured to: evaluate the robustness of the detection model based on the ratio and the average number of model calls, the average number of model calls is M/N, and the average number of model calls indicates the M The average number of times the adversarial examples are fed into the detection model for each image in the image.
  • an embodiment of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and the computer program is executed by a processor to implement the method described in the first aspect above.
  • an embodiment of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and the computer program is executed by a processor to implement the method described in the second aspect above.
  • the present application provides an apparatus, including a processor and a memory, configured to implement the method described in the foregoing first aspect and its possible implementation manners.
  • the memory is coupled to the processor, and when the processor executes the computer program stored in the memory, the device can implement the method described in the first aspect or any possible implementation manner of the first aspect.
  • the device may include:
  • a processor configured to acquire a first image and a second image; the first image is used to be superimposed on the second image to obtain a third image, and the third image is used to attack the image detection model; based on the first region construction
  • the first parameter space the first area is a partial area in the second image, the first parameter space includes the range of pixel coordinate parameters and the range of angle parameters, and the range of the pixel coordinate parameters is the range of the first parameter
  • the present application provides an apparatus, including a processor and a memory, configured to implement the method described in the above first aspect and its possible implementation manners.
  • the memory is coupled to the processor, and when the processor executes the computer program stored in the memory, the device can implement the method described in the second aspect or any possible implementation manner of the second aspect.
  • the device may include:
  • a processor configured to respectively input N adversarial samples into the image detection model; each of the N adversarial samples is obtained by using the method described in any one of the above first aspects and used to attack the detection model
  • the N is a positive integer
  • output the detection results of the N adversarial samples through the detection model calculate the proportion of the N adversarial samples that successfully attack the detection model based on the detection results ; Evaluate the robustness of the detection model based on the ratio.
  • the computer program in the memory in this application can be stored in advance or can be stored after being downloaded from the Internet when using the device.
  • This application does not specifically limit the source of the computer program in the memory.
  • the coupling in the embodiments of the present application is an indirect coupling or connection between devices, units or modules, which may be in electrical, mechanical or other forms, and is used for information exchange between devices, units or modules.
  • the embodiments of the present application generate adversarial samples with stronger aggressiveness and better practicability by superimposing the interference image on the target image, so that the generated adversarial samples can be used to more effectively evaluate the robustness of the image detection model sex.
  • FIG. 1 is a schematic diagram of a scene provided by an embodiment of the present application.
  • FIG. 2 is a schematic flowchart of an image detection model evaluation method provided in an embodiment of the present application
  • Fig. 3A is a schematic diagram of an image provided by an embodiment of the present application.
  • Fig. 3B is a schematic diagram of an image provided by an embodiment of the present application.
  • Fig. 4A is a schematic diagram of an image provided by an embodiment of the present application.
  • Fig. 4B is a schematic diagram of an image provided by an embodiment of the present application.
  • Fig. 5A is a schematic diagram of the image coordinate system provided by the embodiment of the present application.
  • FIG. 5B is a schematic diagram of a set of pixel coordinates in the image coordinate system provided by the embodiment of the present application.
  • FIG. 6 is a schematic diagram of angle parameters in the image coordinate system provided by the embodiment of the present application.
  • FIG. 6A is a schematic structural diagram of a learning model provided by an embodiment of the present application.
  • Fig. 7 is a schematic diagram of a sticker image provided by the embodiment of the present application.
  • Fig. 8A is a schematic diagram of a traffic sign provided in the embodiment of the present application.
  • Fig. 8B is a schematic diagram of a traffic sign image provided by an embodiment of the present application.
  • Fig. 9 is a schematic diagram of an effective pasting position of a traffic sign image provided by an embodiment of the present application.
  • FIG. 10A is a schematic diagram of an adversarial example of a traffic sign provided in an embodiment of the present application.
  • Fig. 10B is a schematic diagram of another adversarial example of a traffic sign provided in the embodiment of the present application.
  • Figure 11 is a schematic diagram of the comparison results of the attack success rate obtained in the experiment provided by the embodiment of the present application.
  • FIG. 12 is a schematic flow chart of an image processing method provided in an embodiment of the present application.
  • Fig. 13 is a schematic diagram of an image processing device provided by an embodiment of the present application.
  • Fig. 14 is a schematic diagram of a device for evaluating an image detection model provided by an embodiment of the present application.
  • Fig. 15 is a schematic diagram of a possible hardware structure of the device provided by the embodiment of the present application.
  • Adversarial examples refer to input samples formed by adding subtle disturbances in the data set, which can cause the model to give a wrong output with high confidence.
  • Robustness refers to the characteristics of the control system to maintain certain other performances under certain (structure, size) parameter perturbations.
  • a mask is a binary image composed of 0s and 1s. When a mask is applied in a function, 1-valued regions are processed and masked 0-valued regions are not included in the calculation.
  • FIG. 1 shows a system architecture 100 provided by an embodiment of the present application, and the system architecture 100 includes an image acquisition device 110 and an image processing device 120 .
  • the image acquiring device 110 is used for acquiring images, and sending the acquired images to the image processing device 120 .
  • the image processing device 120 includes an image detection model, through which the acquired image can be detected.
  • the image acquisition device 110 may be, for example, a video camera, a camera, or a scanner.
  • the image processing apparatus 120 may be a terminal device or a server.
  • the image acquisition device 110 may be a video camera, camera or scanner installed on the vehicle; the image processing device 120 may be a processing device in the vehicle, or may be a A server for vehicle communication, etc.
  • the image acquisition device 110 may be a video camera, a camera on an electronic device, a camera, or a scanner, a face recognizer, etc.
  • the image processing device 120 may be an electronic device, a scanner , the image processor in the face recognition device, etc.
  • the performance of the image detection model is the key to the realization of specific applications.
  • the robustness of image detection models can be evaluated through adversarial examples.
  • the adversarial samples generated in the existing schemes are less aggressive or less practical.
  • this application provides a method for evaluating an image detection model, which can be executed by the image processing device 120 described above in FIG. 1 .
  • the image detection model evaluation method provided by this application includes but is not limited to the following steps:
  • S101 Acquire a first image and a second image; the first image is superimposed on the second image to obtain a third image, and the third image is used to attack an image detection model.
  • the above-mentioned first image may be an image existing in a physical environment, and the image processing apparatus may receive the first image input by a user through a user interface, or the image processing apparatus may receive the first image from other devices, and so on.
  • the first image may be an image randomly generated by the image processing device.
  • the above-mentioned second image may be an image of a traffic sign or an image of a lane line involved in the field of automatic driving.
  • the aforementioned image detection model is a traffic sign image detection model or a lane line image detection model.
  • the above-mentioned second image may be a human face image, an animal image, a landscape image, an item image, or the like.
  • the above-mentioned image detection model is a corresponding type of image detection model.
  • the image acquisition device (such as the image acquisition device 110 shown in FIG. 1 above) acquires the second image, it sends the second image to the image processing device.
  • the image received by the image processing device from the image acquisition device needs to be further processed to obtain the above-mentioned second image.
  • the further processing may include detecting the target area through the target detection technology, excluding the background area in the received image, and the obtained image of the target area is the above-mentioned second image.
  • the target detection technology may be a deep learning-based detection model or the like.
  • the image processing device can first detect and extract the area of the traffic sign through the deep learning detection model of the traffic sign, thereby removing the background area, using the image of the extracted traffic sign area as the second image, and extracting the obtained second image. Images can join the image shown in Figure 3B.
  • S102 Determine the first area in the second image above, and construct a first parameter space based on the first area; the first area is a partial area in the second image, and the first parameter space includes the range of pixel coordinate parameters and the range of angle parameters Range, the range of the pixel coordinate parameter is the set of pixel coordinates of the first area in the second image.
  • the first area in the above-mentioned second image may be an area of a non-key position in the second image, and the information indicated by the second image can be recognized by human eyes even after the area of the non-key position is blocked. That is, the first region of the second image is used to superimpose the first image to generate the third image, that is, to generate an adversarial example of the image detection model.
  • the second image further includes an area of the key position, for example, the area of the key position may be a partial area diverging from the center point of the second image to the surrounding; the Part or all of the areas other than the key position area in the second image are the aforementioned non-key position area.
  • FIG. 4A and FIG. 4B The second image shown in Fig. 4A is the image of the traffic signboard indicating straight ahead, the second image shown in Fig. 4B is the image of the traffic signboard indicating the speed limit 40km/h, dashed line frame in Fig. 4A and Fig. 4B
  • the regions that get up are the regions of respective key positions, then, in the image shown in Figure 4A, part or all of the regions except the region framed by dotted lines are non-key position regions; similarly, in the image shown in Figure 4B Part or all of the areas in the image except the areas framed by dotted lines are areas of non-key positions.
  • the mask M F ⁇ R r ⁇ v of the second image may be generated to mark the first region in the second image.
  • the mask of the second image is an r*v matrix, the size of which is the same as that of the pixel matrix of the second image, and r and v are positive integers.
  • the mask of the second image may be generated by marking pixels in the first area in the second image as 1 and marking pixels in areas other than the first area as 0. That is, the first region in the second image can be clearly marked by the mask.
  • the first parameter space is constructed based on the first region.
  • the first parameter space includes a range of pixel coordinate parameters and a range of angle parameters
  • the range of pixel coordinate parameters is a set of pixel coordinates of the first region in the second image.
  • the image processing device may construct an image coordinate system for the second image, and the image coordinate system may refer to FIG. 5A for example.
  • the image coordinate system takes the upper left corner of the second image as the origin O, and constructs a Cartesian coordinate system u-v in units of pixels.
  • the coordinates of a pixel are expressed as (u, v)
  • the pixel coordinates of the origin O are (0, 0)
  • the abscissa u of the pixel represents the number of columns of the pixel in the pixel matrix of the second image
  • the pixel The ordinate v of represents the row number of the pixel in the pixel matrix of the second image. For example, assuming that a certain pixel in the second image is a pixel in row i and column j in the pixel matrix of the second image, then the pixel coordinates of the certain pixel in the image coordinate system are (i, j) .
  • the first region in the above-mentioned second image includes a set of coordinates of pixels, and this set is the range of the above-mentioned pixel coordinate parameters.
  • the set of pixel coordinates occupied by the first region shown in FIG. 5B in the second image is ⁇ (u, v)
  • the The set of pixel coordinates is the range of the pixel coordinate parameter.
  • the image coordinate system established by the image processing device for the second image is not limited to the above coordinate system shown in FIG. etc., the present application does not limit the image coordinate system established for the second image.
  • the above angle parameter may range from 0° to 360°.
  • the range of the angle parameter can be 0° in any direction extending in the plane where the second image is located, with any point in the second image as the origin, and then use the origin as the rotation The center rotates clockwise or counterclockwise to a range of 360°.
  • the range of the angle parameter can be 0° in any direction extending in the plane where the second image is located with any point in the above-mentioned first area as the origin, and then clockwise with the origin as the rotation center Rotate or rotate counterclockwise to a range of 360°.
  • S103 Determine the first pixel coordinate set based on the range of the pixel coordinate parameter, and determine the first angle based on the range of the angle parameter; the first pixel coordinate set indicates the position where the first image is superimposed on the second image in the third image, the first An angle indicates the rotation angle of the first image superimposed on the second image in the above-mentioned third image; input the third image obtained based on the first pixel coordinate set and the first angle into the above-mentioned image detection model for detection, and output the detection result.
  • the image processing device can search and determine a specific set of pixel coordinates (that is, the above-mentioned first pixel coordinate set) and a rotation angle (that is, the above-mentioned first angle) to determine the position and rotation angle at which the above-mentioned first image is superimposed on the above-mentioned second image.
  • a specific set of pixel coordinates that is, the above-mentioned first pixel coordinate set
  • a rotation angle that is, the above-mentioned first angle
  • the specific set of pixel coordinates may be searched and determined within the range of pixel coordinate parameters included in the first parameter space.
  • the specific set of pixel coordinates may be a pixel matrix having the same size as the pixel matrix of the first image.
  • the specific rotation angle may be searched and determined within the range of angle parameters included in the first parameter space.
  • algorithms such as differential evolution algorithm, machine learning algorithm, or deep learning algorithm may be used to search for the above specific set of pixel coordinates and specific rotation angles.
  • the above parameter vector is also called a population.
  • Each individual in the population represents a possible value of a parameter vector, and each parameter on the individual represents the value of the corresponding parameter in the corresponding vector group.
  • the population can be initialized according to the range of the pixel coordinate parameter and the range of the angle parameter, that is, the parameter vector is initialized randomly.
  • a set of specific values of ⁇ 1 and ⁇ 2 can be obtained, and a position and a rotation angle at which the above-mentioned first image is superimposed on the above-mentioned second image can be determined through the set of initialized values.
  • the specific value of ⁇ 1 obtained by the initialization may be the above-mentioned first pixel coordinate set, and the specific value of ⁇ 2 obtained by the initialization may be the above-mentioned first angle.
  • the first image can be superimposed on the second image to obtain a third image, that is, an adversarial example of the above-mentioned image detection model is obtained. Then, input the adversarial example into the above-mentioned image detection model for detection, and output the detection result.
  • the iterative update of the population can be performed according to the objective function of the population iteration.
  • Each iteration can obtain a set of specific values of ⁇ 1 and ⁇ 2 , and each time a set of specific values of ⁇ 1 and ⁇ 2 can be obtained
  • a position and a rotation angle at which the above-mentioned first image is superimposed on the above-mentioned second image can all be determined. That is, the specific value of ⁇ 1 obtained in each iteration may be the above-mentioned first pixel coordinate set, and the specific value of ⁇ 2 obtained in each iteration may be the above-mentioned first angle.
  • the first image can be superimposed on the second image to obtain the third image, that is, an adversarial example of the above-mentioned image detection model is obtained. Then, input the obtained adversarial example into the above-mentioned image detection model for detection, and output the detection result.
  • the objective function of the above population iteration can be:
  • g(x; s, ⁇ ) represents the generated adversarial example image, which is the third image above.
  • x represents the original image (such as a traffic sign image with a speed limit of 40)
  • s represents the process of superimposing the above-mentioned first image on the above-mentioned second image to obtain the above-mentioned third image.
  • represents parameters in the above-mentioned first parameter space (including the above-mentioned pixel coordinate parameters and angle parameters).
  • g(x; s, ⁇ ) represents the adversarial sample image obtained after applying the image transformation operation s with ⁇ as the transformation parameter for the original image x.
  • Indicates the label of the original image x for example: the image of a traffic sign with a speed limit of 40.
  • the f() function represents an image detection model, input an image, and output the result of the image detection, for example, input the image of a traffic sign with a speed limit of 40, and output the probability of detecting a traffic sign with a speed limit of 40.
  • L() stands for loss function
  • untarget stands for non-target attack
  • untarget means the same as dodging, which means avoiding attacks. If the original image is a traffic sign image with a speed limit of 40, input the adversarial sample image obtained after the attack into the detection model f(), the probability of the image detection model f() detecting a traffic sign with a speed limit of 40 becomes lower, and the detection The probability that the speed limit is 60 or the whistle is prohibited becomes higher (the probability of any other type of traffic signs except the speed limit 40 increases and the probability of exceeding the speed limit 40).
  • J( ⁇ ) L( ⁇ ) in the population.
  • J( ⁇ ) represents the evaluation index for judging the quality of individuals in the population, and the smaller the corresponding J( ⁇ ) value, the better the individual.
  • J( ⁇ ) is consistent with the objective function.
  • the adversarial sample after the adversarial sample is input into the image detection model for detection and the detection result is output, it may be judged based on the detection result whether the attack of the adversarial sample to the image detection model is successful. Specifically, if the image detection model correctly recognizes the information indicated by the second image included in the adversarial sample, the attack fails; if the image detection model does not correctly identify the information indicated by the second image included in the adversarial sample, the attack success.
  • the attack is successful.
  • the attack is successful, the iteration of the above population is ended, and the number of iterations is recorded. Because one adversarial example can be obtained every iteration, and each adversarial example calls the image detection model once for detection, then the number of iterations is the number of times the image detection model is called. If the value initialized above also obtains an adversarial example, and the image detection model is called for detection, then the number of times the image detection model is called is the number of iterations plus one.
  • an upper limit T of the number of iterations can be set for the iteration of the above-mentioned population, for example, the upper limit T of the number of iterations can be set to 30 times, 40 times or 100 times, etc.
  • the present application does not limit the specific value of T.
  • the number of iterations reaches the upper limit T and the above-mentioned image detection model is not successfully attacked, then the iteration of the above-mentioned population is also terminated. In this case, the number of calls to the image detection model is T. If the above initialized value also obtains an adversarial example, and the image detection model is called for detection, then the number of calls to the image detection model is T+1.
  • a parameter search model can be constructed based on the above-mentioned first parameter space, and according to the range and angle of the above-mentioned pixel coordinate parameters
  • the range of parameters randomly initializes the pixel coordinate parameters and angle parameters, and uses the initialized parameters as input information to input into the parameter search model to search for parameter values.
  • the output of the parameter search model is the searched first set of pixel coordinates and the first angle.
  • the first image can be superimposed on the second image to obtain a third image, that is, an adversarial example of the above-mentioned image detection model is obtained. Then, input the adversarial example into the above-mentioned image detection model for detection, and output the detection result.
  • the output detection result can be reversely input into the above parameter search model to optimize the parameters of the parameter search model, and then continue to search for a new first pixel coordinate set and first angle based on the optimized parameters.
  • the parameter search model ends the search, and records the number of iterative searches of the parameter search model as the number of calls to the image detection model.
  • S104 based on the above operations of S101 to S103, obtain detection results of multiple different adversarial examples, and evaluate the robustness of the above-mentioned image detection model based on the detection results of the multiple adversarial examples.
  • the above step S101 to step S103 is to generate multiple adversarial samples based on the above-mentioned second image to attack the above-mentioned image detection model, and record whether the multiple samples successfully attack the model and the multiple adversarial samples attack the image detection model. The number of times the model was called during the procedure.
  • multiple above-mentioned second images can be used to generate adversarial samples to attack the image detection model, and adversarial samples can be generated based on each of the multiple second images to For the specific implementation of the attack image detection model, refer to the corresponding descriptions in the above steps S101 to S103.
  • the operation of attacking the image detection model against the adversarial examples of the plurality of second images After the operation of attacking the image detection model against the adversarial examples of the plurality of second images is completed, information about whether the attack on the model is successful and the number of times the image detection model is invoked corresponding to each image in the plurality of second images can be obtained. Then, the proportion of the adversarial examples generated by the plurality of second images successfully attacking the image model is counted, and the robustness of the image detection model is evaluated by the proportion. For example, if the ratio is greater than a set first threshold, it can be determined that the robustness of the image detection model is poor; if the ratio is smaller than the first threshold, it can be determined that the robustness of the image detection model is better.
  • an average value of times of invoking the image detection model corresponding to the plurality of second images may also be counted, and the average value may be referred to as an average number of invocations of the model.
  • the average call times of the model can be compared with the set second threshold, if the average call times of the model is greater than the second threshold, it can be determined that the robustness of the image detection model is better, otherwise, if the model If the average number of calls is less than the second threshold, it can be determined that the robustness of the image detection model is poor.
  • the obtaining of the first image in the above step S101 may be obtained based on a pre-constructed value range of pixel values.
  • the image processing device may first construct a second parameter space, the second parameter space includes the value range of the above-mentioned pixel value, and may determine the pixel value of the first image based on the value range of the pixel value, and obtain the The pixel matrix of the first image.
  • the value range of the pixel value may be [0, 255].
  • the pixel value may be normalized, so the value range of the pixel value may be [0, 1].
  • the parameter search model constructed based on the machine learning algorithm or deep learning algorithm in the above step S103 can also be constructed in combination with the second parameter space, and the constructed parameter search model can jointly search for the parameters of the first image.
  • the first image is generated based on the pixel values of the jointly searched first image, and the generated first image is superimposed on the above-mentioned second image based on the value of the pixel coordinate parameter and the value of the angle parameter obtained by the joint search to obtain the confrontation Sample a third image, and use this third image to attack the above image detection model.
  • the following uses a deep reinforcement learning model to search the parameter space to determine the value of the pixel coordinate parameter and the pixel value of the first image as an example.
  • means Hadamard product
  • a mask matrix in the second image the mask matrix indicates the first region in the above-mentioned second image.
  • the above-mentioned first image may be generated by using an integrated attack model based on the MI-FGSM algorithm, and the above-mentioned integrated attack model is the above-mentioned deep reinforcement learning model.
  • the deep reinforcement learning model adopted in this application may include n learning sub-models, and the learning sub-models may also be called agents (agents), or called surrogate models (surrogate models).
  • agents agents
  • surrogate models surrogate models
  • the second image can be given l can be simply replaced by
  • the shape and size (s h , s w ) of the adversarial example are fixed, and its center coordinates (c x , cy ) are changed to adjust the mask matrix.
  • the corresponding mask is defined as Ac.
  • the adversarial example can be expressed as:
  • the attack goal of this example is to optimize the position and content of the first image to generate good transferable adversarial examples to attack the target image detection model. Therefore, the mask Ac, the attack step size ⁇ in Equation (5) and the weight ⁇ i in Equation (4) are set as the learning parameters. In order to make the parameters more suitable for the target image detection model, the parameters can be dynamically optimized by a small number of queries to the target model.
  • the parameters obtained based on the learning sub-model are guided by the information returned by the query target image detection model, that is, the adversarial samples obtained based on these parameters are input into the image detection model for detection, and the detection results are returned to the learning sub-model for further Used to optimize the corresponding parameters, this process can be expressed as the learning of the reward signal obtained by the learning sub-model through the interaction with the environment in reinforcement learning, so that a learning sub-model learning attack parameter selection strategy can be constructed.
  • the parameter value is defined as the action generated by the learning sub-model under the guidance of the selection strategy ⁇ , and a t represents the tth action (that is, the value of the tth parameter).
  • the image feature input to the learning sub-model is defined as the state s, and the threat model F( ⁇ ) is the target image detection model.
  • s) with parameter ⁇ is the rule used by the learning sub-model to decide what action to take, which can be expressed as the probability distribution of action a in state s.
  • the reward reflects the performance of the currently generated adversarial examples on the target image detection model, and the training objective of the learning sub-model is to learn a good policy to maximize the reward signal.
  • the goal of evasion attack is to generate an image that is as far away from the target image as possible, that is, the above-mentioned second image, while the goal of imitation attack is to generate an image that is as similar as possible to the target image. Therefore, the reward function R is formalized as:
  • the learning sub-model first predicts a set of parameters according to the policy ⁇ , then generates adversarial samples according to the predicted parameters, and inputs the generated adversarial samples into the threat model to obtain reward values. After many times of training, the learning sub-model will generate actions that perform well on the threat model, that is, parameters, so that more aggressive adversarial samples can be generated based on these parameters.
  • the learning submodel described above requires strategies for learning positions, weights, and attack steps. Considering the joint solution of parameters such as position, the design of the learning sub-model adopts the structure based on U-net. Assuming that the number of learning sub-models is n, the learning sub-model is designed to output in n channels, and the output is a feature map with the same length h and width w as the input image (ie, the size is n ⁇ h ⁇ w).
  • Different positions in the above-mentioned first image and above-mentioned second image have different attack strengths, so at the top layer of the learning sub-model network, a fully connected layer is used to map the feature map M to a vector V representing different attack step values.
  • the structure of the learning sub-model can be referred to as shown in Fig. 6A.
  • the leftmost box in Figure 6A represents the network structure of U-Net, each vertical box represents a layer of network, and the numbers below represent the number of channels.
  • the first layer is the input layer, and 3 represents the number of channels of the input layer. Since the color image has three channels of RGB, the number of channels of the input layer is naturally 3.
  • Conv3 ⁇ 3 and Conv1 ⁇ 1 represent convolutional layers, and their corresponding convolution kernel sizes are 3 ⁇ 3 and 1 ⁇ 1, respectively.
  • Max Pool 2 ⁇ 2 represents a 2 ⁇ 2 maximum pooling layer.
  • Up conv 2 ⁇ 2 represents an upsampling layer, and every time it is processed, the width and height of the feature map are expanded by 2 times. For example: the original size is 128*32*32, and after upsampling, it becomes 128*64*64.
  • 32 and 64 represent the size of the feature map, that is, width and height, and 128 represents the number of channels.
  • FC layer represents a fully connected layer.
  • the U-net network composed of the above basic operations can realize the required functions. That is: input an image, and through learning, information parameters such as the above-mentioned pixel coordinate parameters can be output.
  • the 3 ⁇ 3 convolution operation obtains the feature map of 128 channels
  • the feature map of n*h*w that can be obtained after completing the above operations is used to predict the position and weight. After that, a fully connected layer is processed to predict the step size.
  • the block on the right side of Figure 6A represents the further processing of the feature map output by U-Net to obtain parameter information such as the above-mentioned pixel coordinate parameters.
  • the optional range of location is discrete, so the location strategy ⁇ 1 is designed to follow a categorical distribution.
  • the position parameters (c x , cy ) ⁇ Cat(P position ) and P position can be calculated as:
  • the weight ratio of the loss on each learned sub-model to the ensemble loss is a continuous value, and the weight strategy ⁇ ⁇ 2 can be set to follow a Gaussian distribution. So the i-th weight parameter but The calculation results are as follows:
  • the clipping operation can be used to make the sum of the weights equal to 1.
  • the attack step size parameter 20 values ranging from 0.01 to 0.2 are set at an interval of 0.01, and the classification distribution is adopted as the step size strategy ⁇ ⁇ 3 due to the discrete nature of the values. So the step size parameter ⁇ Cat(p step ), the probability p step of each candidate value is:
  • the policy gradient algorithm can be used to guide the policy update of the learning sub-model.
  • the goal is to let the learning sub-model h ⁇ learn a good policy ⁇ ⁇ .
  • the optimal strategy parameter ⁇ * is expressed as:
  • the policy gradient method can be used to solve ⁇ * by gradient ascent method, and follow the REINFORCE algorithm, using the average value of N (N is a positive integer) sampling of the policy function distribution to approximate the policy gradient
  • R n is the reward for the nth sample.
  • the reward R can be thought of as a step size when updating the policy with parameters ⁇ . The larger the reward, the larger the step size. If the reward is negative, it will go in the opposite direction. In this way, the learning sub-model can learn a good policy function as ⁇ is updated in the direction of increasing reward.
  • Equation (11) For parameters that follow a classification distribution, namely the position parameter and the attack step parameter, given a probability vector p (ie, P position and p step in Equation (7) and Equation (9), let p(a) denote the probability vector The probability of parameter a in p, then for ⁇ ⁇ 1 and ⁇ ⁇ 3 , in equation (11) can be calculated as:
  • the following uses an example of a traffic sign image detection model in the field of automatic driving as an example for description.
  • the image detection model evaluation method of the embodiment of the present application is applicable to an application scenario where a detection model is trained or evaluated in an autonomous vehicle.
  • the image detection model evaluation method in this application can be executed by a data processing device, which is implemented by software and/or hardware, and is specifically configured in an electronic device, which can be a terminal device or a server.
  • the electronic device may be a vehicle-mounted device, and may be arranged in a vehicle.
  • the vehicle provided with the electronic device may be a vehicle equipped with an automatic driving function.
  • the traffic sign image detection model evaluation method may include but not limited to the following steps:
  • S201 Obtain a sticker image and a traffic sign image.
  • a sticker which can be a sticker present in the physical environment or a noise sticker.
  • the sticker may be a sticker existing in a physical environment.
  • the image processing apparatus may receive the sticker image input by the user through the user interface, or the image processing apparatus may receive the sticker image from other devices and the like.
  • a sticker image as shown in FIG. 7 is received through an image technology.
  • the sticker image may be a noise sticker image.
  • the noise sticker image can be obtained based on a pre-built range of pixel values. Determine the pixel value of the noise sticker image based on the value range of the pixel value of the noise sticker image, and obtain a pixel matrix of the noise sticker image.
  • the value range of the pixel value is usually [0,255]. In the embodiment of the present application, the value range of the pixel value is normalized, corresponding to [0,1]. If the pixel value is 255, the corresponding value in the embodiment of the application is 1.
  • S202 Determine a set of effective sticker pasting coordinates for the image of the traffic signboard.
  • the key position area in the traffic sign image is the area 220 indicating the information of "speed limit 100 km/h", and the area 230 excluding the area 220 in the traffic sign image is determined to be a valid sticker sticking position.
  • the valid sticker paste location is the area that excludes the mark "100”.
  • the mask M F ⁇ R r ⁇ v of the above-mentioned traffic sign image is generated based on the position to mark the valid sticker sticking position coordinate set in the traffic sign image.
  • the mask of the traffic sign image is a matrix of r*v, the size of the matrix is the same as the size of the pixel matrix of the traffic sign image, and r and v are positive integers.
  • the mask of the traffic sign image can be generated by marking the pixels of the valid sticker pasting position 230 as 1, and marking the pixels of the other regions 220 as 0. That is, the mask can be used to clearly mark the set of effective sticker sticking position coordinates in the image of the traffic sign.
  • S203 Select an attack parameter and determine a value range of each attack parameter, so as to construct an attack parameter space. Search for appropriate attack parameters through the attack algorithm, generate adversarial samples according to the attack parameters, and input them into the traffic sign detection model for detection, and output the detection results.
  • the value range of the paste position ( ⁇ 1 ) is a coordinate set including the effective paste position, and the value range of the rotation angle ⁇ 2 is [0°, 360°).
  • the image coordinate system takes the upper left corner of the traffic sign image as the origin O, the upper side of the image as the x-axis, and the left side of the image as the y-axis to establish a coordinate system, the pasting position
  • the value range of is the coordinate set of valid paste positions.
  • the range of the rotation angle ⁇ 2 is [0°, 360°).
  • the value range of the pasting position ( ⁇ 1 ) is the coordinate set including the effective pasting position, and the value range of the rotation angle ⁇ 2 is [0°, 360°), which will not be repeated here.
  • the value range of the pixel matrix ⁇ 3 is normalized, corresponding to [0,1]
  • the sticking position and rotation angle of the sticker are searched through the differential evolution algorithm.
  • P the population size
  • X(k) represent the kth generation population
  • the population is composed of P groups of attack parameter vectors to be selected, and each individual in the population represents a possible value of a group of attack parameter vectors, Each parameter on the individual represents the value of the corresponding parameter in the corresponding vector group.
  • J( ⁇ ) represent the evaluation index for judging the quality of individuals in the population, and the smaller the corresponding J( ⁇ ) value, the better the individual.
  • J( ⁇ ) is consistent with the objective function of the adversarial attack.
  • the initial generation population X(0) is randomly initialized within the value range of the attack parameters.
  • the selected attack method is non-target attack, and its objective function is determined as:
  • the individual fitness evaluation index J( ⁇ ) L( ⁇ ) in the population.
  • S204 Repeat the generation process of the adversarial samples of a single traffic sign image in S201-S203 until the adversarial samples of all traffic sign test images are generated; output the detection results of all adversarial samples through the traffic sign detection model, and count The proportion of all adversarial samples that can successfully attack the detection model and the average number of queries to the traffic sign detection model during the generation process are used as the robustness evaluation results of the traffic sign detection model.
  • the proportion of all adversarial samples that can successfully attack the detection model is greater than the set first threshold, it can be determined that the robustness of the image detection model is poor; if the proportion is less than the first threshold, it can be determined that the The robustness of the image detection model is better.
  • the traffic sign detection model f( ) can output the detection result of the above adversarial example attack success or attack failure; the traffic sign detection model f( ) can also output the corresponding prediction for the sign image x input to the model Class label t and probability f(x,t).
  • the detection model outputs the predicted category label and probability: speed limit 100km/h, 99%.
  • the detection model outputs the predicted category label and probability: motor vehicles are prohibited Drive in, 76%.
  • the number of calls to the detection model can be reduced, and the attack parameters can be dynamically adjusted through a small number of model calls, effectively improving the attack success rate and model call efficiency.
  • the image detection model evaluation method provided in the present application can also be applied to the face recognition model.
  • the specific implementation of the specific evaluation can refer to the corresponding description in the above-mentioned FIG. 2 and its possible implementation manners, which will not be repeated here.
  • the embodiment of this application obtains the corresponding experimental data proof through experiments up to this point.
  • the adversarial example in this experiment may be an adversarial example generated by searching the parameter space based on the above-mentioned deep reinforcement learning model to determine the corresponding parameters.
  • FIG. 11 is a schematic diagram of comparison results of attack success rates obtained in experiments.
  • four representative face recognition models were selected as the models to be tested. These four models include:
  • Model 3 ArcFace34 face recognition model
  • Model 4 ArcFace50 face recognition model.
  • Method 1 Only change the position where the interference image (such as the above-mentioned first image) is superimposed on these face images to generate an adversarial example;
  • Method 2 Only change the content of the interference images superimposed on these face images to generate adversarial samples
  • the third way is to change the position of the interference image superimposed on these face images and the content of the interference image superimposed on these face images to generate an adversarial example.
  • adversarial samples are input into the above four models for detection, and the robustness of the model is evaluated by two indicators: the successful attack rate and the number of model calls.
  • adversarial examples there are two kinds of adversarial examples, one is avoiding adversarial examples, which are adversarial examples that are as far away as possible from the target image (such as the second image above); the other is imitation adversarial examples, which imitate
  • the sample is an adversarial sample that is as similar as possible to the target image. Combining these two adversarial samples is an experiment that can obtain more accurate experimental results.
  • the average success rate corresponding to the above method 1 and the above method 2 is 38.89% and 48.38%, respectively, while the average success rate corresponding to the method 3 is 78.81%, which shows that the joint optimization of the interference image
  • the attack mode of generating different adversarial samples by superimposing the position on these face images and the content of the interference image superimposed on these face images has a better attack effect.
  • the embodiment of the present application also provides an image processing method, see Figure 12, the image processing method may include but not limited to the following steps:
  • S1202. Construct a first parameter space based on the first area; the first area is a partial area in the second image, and the first parameter space includes a range of pixel coordinate parameters and a range of angle parameters, and the range of the pixel coordinate parameters is A set of pixel coordinates of the first region in the second image.
  • S1203. Determine a first pixel coordinate set based on the range of the pixel coordinate parameter, and determine a first angle based on the range of the angle parameter; the first pixel coordinate set indicates that the first image in the third image is superimposed on the second image The first angle indicates the rotation angle of the first image superimposed on the second image in the third image.
  • the acquisition of the first image includes: The pixel value of the first image is determined based on the value range of the pixel value to obtain a pixel matrix of the first image.
  • each device such as an automatic driving vehicle, etc.
  • each device includes a corresponding hardware structure and/or software module for performing each function.
  • the present application can be implemented in the form of hardware or a combination of hardware and computer software in combination with the units and algorithm steps of each example described in the embodiments disclosed herein. Whether a certain function is executed by hardware or computer software drives hardware depends on the specific application and design constraints of the technical solution. Skilled artisans may use different methods to implement the described functions for each specific application, but such implementation should not be regarded as exceeding the scope of the present application.
  • the embodiments of the present application may divide the vehicle into functional modules according to the above method examples.
  • each functional module may be divided corresponding to each function, or two or more functions may be integrated into one module.
  • the above-mentioned integrated modules can be implemented in the form of hardware or in the form of software function modules. It should be noted that the division of modules in the embodiment of the present application is schematic, and is only a logical function division, and there may be other division methods in actual implementation.
  • FIG. 13 shows an image processing device 300, which can be, for example, the image processing device described in the above method embodiment, or can be the image processing device in the image processing device. chip, or may be the processing system in the image processing device, etc., the device 300 includes:
  • An acquisition module 310 configured to acquire a first image and a second image; the first image is used to be superimposed on the second image to obtain a third image, and the third image is used to attack the image detection model;
  • the processing module 320 is configured to construct a first parameter space based on the first area; the first area is a partial area in the second image, the first parameter space includes a range of pixel coordinate parameters and a range of angle parameters, and the pixel coordinate
  • the parameter range is a set of pixel coordinates of the first area in the second image; the first pixel coordinate set is determined based on the pixel coordinate parameter range, and the first angle is determined based on the angle parameter range; the first A set of pixel coordinates indicates the position of the first image superimposed on the second image in the third image, and the first angle indicates a rotation angle of the first image superimposed on the second image in the third image.
  • the processing module 320 is further configured to construct a second parameter space before the acquisition module acquires the first image and the second image, and the second parameter space includes a range of pixel values;
  • the obtaining module is specifically configured to: determine the pixel value of the first image based on the value range of the pixel value, and obtain a pixel matrix of the first image.
  • FIG. 14 shows an image processing device 400, which may be, for example, the image processing device described in the above method embodiment, or may be the image processing device in the image processing device. chip, or may be the processing system in the image processing device, etc., the device 400 includes:
  • the input module 410 is used to input N adversarial samples into the image detection model respectively; each sample in the N adversarial samples is the third image used to attack the detection model obtained by the device described in FIG. 13 above, and N is positive integer;
  • An output module 420 configured to output the detection results of N adversarial samples through the detection model
  • the processing module 430 is configured to count the ratio of successful attacks on the detection model in the N adversarial samples based on the detection results; and evaluate the robustness of the detection model based on the ratio.
  • the N adversarial examples may be a plurality of different adversarial examples in the above step S104.
  • the N adversarial samples are samples generated based on M different second images, M is a positive integer, and M ⁇ N;
  • the processing module 430 is specifically used to: evaluate the robustness of the detection model based on the ratio and the average number of model calls, the average number of model calls is M/N, and the average number of model calls indicates the average input detection of the adversarial samples of each image in the M images number of models.
  • the M different second images may be multiple second images in the above step S104.
  • FIG. 15 is a schematic diagram of a possible hardware structure of the device provided by the present application.
  • the device may be the image processing device described in the method embodiment above, or may be a chip in the image processing device, or may be the image processing device. Processing systems in processing plants, etc.
  • the apparatus 1500 includes: a processor 1501 , a memory 1502 and a communication interface 1503 .
  • the processor 1501 , the communication interface 1503 and the memory 1502 may be connected to each other or through a bus 1504 .
  • the memory 1502 is used to store computer programs and data of the device 1500, and the memory 1502 may include but not limited to random access memory (random access memory, RAM), read-only memory (read-only memory, ROM), and Erase programmable read-only memory (erasable programmable read only memory, EPROM) or portable read-only memory (compact disc read-only memory, CD-ROM), etc.
  • random access memory random access memory
  • ROM read-only memory
  • EPROM erasable programmable read only memory
  • portable read-only memory compact disc read-only memory
  • the communication interface 1503 includes a sending interface and a receiving interface, and there may be multiple communication interfaces 1503, which are used to support the device 1500 to communicate, such as receiving or sending data or messages.
  • the processor 1501 may be a central processing unit, a general processor, a digital signal processor, an application specific integrated circuit, a field programmable gate array or other programmable logic device, a transistor logic device, a hardware component or any combination thereof.
  • the processor can also be a combination that implements computing functions, such as a combination of one or more microprocessors, a combination of a digital signal processor and a microprocessor, and the like.
  • the processor 1501 may be configured to read the program stored in the above-mentioned memory 1502, so that the apparatus 1500 executes the image detection model evaluation method described above in FIG. 2 and its possible embodiments.
  • the processor 1501 may be used to obtain a first image and a second image; the first image is used to be superimposed on the second image to obtain a third image, and the third image is used to attack Image detection model; constructing a first parameter space based on the first region; the first region is a partial region in the second image, the first parameter space includes a range of pixel coordinate parameters and a range of angle parameters, the The range of the pixel coordinate parameter is a set of pixel coordinates of the first region in the second image; the first set of pixel coordinates is determined based on the range of the pixel coordinate parameter, and the first set of pixel coordinates is determined based on the range of the angle parameter.
  • angle indicates the position in the third image where the first image is superimposed on the second image, and the first angle indicates the position in the third image that is superimposed on the second image The rotation angle of the first image on the image.
  • the embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and the computer program is executed by a processor to implement the above-mentioned any embodiment in FIG. 2 and its possible method embodiments. described method.
  • An embodiment of the present application further provides a computer program product.
  • the computer program product is read and executed by a computer, the method described in any one of the above-mentioned FIG. 2 and its possible method embodiments.
  • the embodiments of the present application generate adversarial samples with stronger aggressiveness and better practicability by superimposing the interference image on the target image, so that the generated adversarial samples can be used to more effectively evaluate the robustness of the image detection model sex.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Human Computer Interaction (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

本申请实施例公开了图像处理方法、图像检测模型评估方法及装置,该图像处理方法包括:获取第一图像和第二图像;该第一图像叠加于该第二图像得到用于攻击图像检测模型的第三图像;基于第二图像中的部分区域即第一区域确定像素坐标参数的范围和角度参数的范围,像素坐标参数的范围为该第一区域在第二图像中的像素坐标的集合;基于该像素坐标参数的范围和角度参数的范围确定第一像素坐标集合和第一角度;该第一像素坐标集合指示第一图像叠加在第二图像上的位置,该第一角度指示叠加在第二图像上的第一图像的旋转角度。采用本申请实施例,可以获取攻击性较强、实用性较好的对抗样本以用于有效评估图像检测模型的鲁棒性。

Description

图像处理方法、图像检测模型评估方法及装置
本申请要求于2021年10月26日提交中国专利局、申请号为202111249005.4、申请名称为“图像处理方法、图像检测模型评估方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及模型评估技术领域,尤其涉及一种图像处理方法、图像检测模型评估方法及装置。
背景技术
随着深度神经网络的快速发展,基于深度学习技术的视觉感知系统已经展现出其卓越的性能,被广泛应用于自动驾驶等实际生活的各项领域。视觉感知系统的关键是图像检测,可以通过对抗样本来评估图像检测模型的鲁棒性,以了解并优化图像检测模型的性能。
现有的方案中,可以通过在图像中添加像素级的噪声来生成对抗样本,但是这样生成的对抗样本攻击性较弱,无法有效地评估模型的鲁棒性;或者可以获知图像检测模型的参数和结构来生成对抗样本,但是在实际应用中,模型的参数等信息往往无法全面获取,因此也无法获得较好的对抗样本,实用性较差。
综上所述,如何获取攻击性较强、实用性较好的对抗样本以用于有效评估图像检测模型的鲁棒性是本领域技术人员需要解决的技术问题。
发明内容
本申请实施例公开一种图像处理方法、图像检测模型评估方法及装置,获取攻击性较强、实用性较好的对抗样本,更好的评估图像检测模型鲁棒性。
第一方面,本申请实施例提供了一种图像处理的方法,该方法包括:
获取第一图像和第二图像;所述第一图像用于叠加于所述第二图像得到第三图像,所述第三图像用于攻击图像检测模型;
基于第一区域构建第一参数空间;所述第一区域为所述第二图像中的部分区域,所述第一参数空间包括像素坐标参数的范围和角度参数的范围,所述像素坐标参数的范围为所述第一区域在所述第二图像中的像素坐标的集合;
基于所述像素坐标参数的范围确定第一像素坐标集合,并基于所述角度参数的范围确定第一角度;所述第一像素坐标集合指示所述第三图像中所述第一图像叠加在所述第二图像上的位置,所述第一角度指示所述第三图像中叠加在所述第二图像上的所述第一图像的旋转角度。
在本申请中,通过将干扰图像(上述第一图像)叠加到目标图像(上述第二图像)来生成对应的对抗样本,以用于评估图像检测模型的鲁棒性。具体的,可以通过目标图像的目标区域(上述第一区域)来确定干扰图像叠加到目标图像中的位置和旋转角度,从而可以生成对抗样本, 相比于现有的方案,例如通过在图像中添加像素级的噪声来生成对抗样本的方案或者获知图像检测模型的参数和结构来生成对抗样本的方案,本申请结合干扰图像在目标图像中的位置和旋转角度来获得的对抗样本的攻击性较强并且实用性较好,从而用该生成的对抗样本可以更加有效地评估图像检测模型的鲁棒性。另外,本申请通过上述目标区域来确定干扰图像叠加到目标图像中的位置和旋转角度,由于该目标区域是目标图像中的部分区域,从而可以减少参数的搜索范围,提高参数的搜索效率,进而提高样本生成的处理效率。
在其中一种可能的实施方式中,所述获取第一图像和第二图像之前,还包括:构建第二参数空间,所述第二参数空间包括像素值的取值范围;所述获取第一图像,包括:基于所述像素值的取值范围确定所述第一图像的像素值,获得所述第一图像的像素矩阵。
本申请在通过上述目标图像的目标区域(上述第一区域)来确定干扰图像叠加到目标图像中的位置和旋转角度的基础上,还可以基于预设定的参数空间(上述第二参数空间)来生成干扰图像,结合这三者获得的对抗样本实现的攻击性更强,从而可以更加有效地评估图像检测模型的鲁棒性。
在其中一种可能的实施方式中,所述第一图像为物理环境中存在的贴纸的图像。
本申请通过使用物理环境中存在的贴纸的图像,更有助于在图像检测模型中检测出实际应用中可能存在的问题,为提升模型的鲁棒性提供帮助。
第二方面,本申请实施例提供了一种图像检测模型评估方法,该方法包括:
将N个对抗样本分别输入图像检测模型;所述N个对抗样本中的每个样本为采用上述第一方面任一项所述的方法获得的用于攻击所述检测模型的图像,所述N为正整数;
通过所述检测模型输出所述多个对抗样本的检测结果;
基于所述检测结果统计所述多个对抗样本中成功攻击所述检测模型的比例;
基于所述比例评估所述检测模型的鲁棒性。
在本申请中,结合上述第一方面获得多个对抗样本,并将这些对抗样本输入到图像检测模型中进行检测,然后基于检测结果来评估该图像检测模型的鲁棒性。由于获得的对抗样本攻击性强并且实用性好,从而可以有效且准确地评估出该图像检测模型的鲁棒性,进而可以在评估结果的基础上可以进一步完善优化该图像检测模型以提高模型的检测准确率,降低检测出错的概率。
在其中一种可能的实施方式中,所述N个对抗样本为基于M个图像生成的样本,所述M为正整数,M<N;
所述基于所述比例评估所述检测模型的鲁棒性,还包括:
基于所述比例和模型平均调用次数评估所述检测模型的鲁棒性,所述模型平均调用次数为M/N,所述模型平均调用次数指示所述M个图像中每个图像的对抗样本平均输入所述检测模型的次数。
本申请通过对抗样本的成功攻击比例和模型平均调用次数来评估检测模型的鲁棒性,进一步提高评估检测模型鲁棒性的准确性。
第三方面,本申请实施例提供了一种图像处理装置,该装置包括:
获取模块,用于获取第一图像和第二图像;所述第一图像用于叠加于所述第二图像得到第三图像,所述第三图像用于攻击图像检测模型;
处理模块,用于基于第一区域构建第一参数空间;所述第一区域为所述第二图像中的部 分区域,所述第一参数空间包括像素坐标参数的范围和角度参数的范围,所述像素坐标参数的范围为所述第一区域在所述第二图像中的像素坐标的集合;基于所述像素坐标参数的范围确定所述第一像素坐标集合,并基于所述角度参数的范围确定所述第一角度;所述第一像素坐标集合指示所述第三图像中所述第一图像叠加在所述第二图像上的位置,所述第一角度指示所述第三图像中叠加在所述第二图像上的所述第一图像的旋转角度。
在其中一种可能的实施方式中,所述处理模块,还用于在所述获取模块获取第一图像和第二图像之前,构建第二参数空间,所述第二参数空间包括像素值的取值范围;
所述获取模块,具体用于:基于所述像素值的取值范围确定所述第一图像的像素值,获得所述第一图像的像素矩阵。
上述第三方面及其可能的实施方式中的有益效果对应参第一方面的描述,此处不再赘述。
第四方面,本申请实施例提供了一种评估图像检测模型的装置,所述装置包括:
输入模块,用于将N个对抗样本分别输入图像检测模型;所述N个对抗样本中的每个样本为采用上述第三方面任一项所述的装置获得的用于攻击所述检测模型的所述第三图像,所述N为正整数;
输出模块,用于通过所述检测模型输出所述N个对抗样本的检测结果;
处理模块,用于基于所述检测结果统计所述N个对抗样本中成功攻击所述检测模型的比例;并基于所述比例评估所述检测模型的鲁棒性。
在其中一种可能的实施方式中,所述N个对抗样本为基于M个不同的所述第二图像生成的样本,所述M为正整数,M<N;
所述处理模块,具体用于:基于所述比例和模型平均调用次数评估所述检测模型的鲁棒性,所述模型平均调用次数为M/N,所述模型平均调用次数指示所述M个图像中每个图像的对抗样本平均输入所述检测模型的次数。
上述第四方面及其可能的实施方式中的有益效果对应参第二方面的描述,此处不再赘述。
第五方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行以实现上述第一方面所述的方法。
第六方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行以实现上述第二方面所述的方法。
第七方面,本申请提供一种装置,包括处理器和存储器,用于实现上述第一方面及其可能的实施方式描述的方法。该存储器与处理器耦合,处理器执行存储器中存储的计算机程序时,可以使得该装置实现上述第一方面或第一方面任一种可能的实现方式所述的方法。
在一种可能的实现中,该装置可以包括:
存储器,用于存储计算机程序;
处理器,用于获取第一图像和第二图像;所述第一图像用于叠加于所述第二图像得到第三图像,所述第三图像用于攻击图像检测模型;基于第一区域构建第一参数空间;所述第一区域为所述第二图像中的部分区域,所述第一参数空间包括像素坐标参数的范围和角度参数的范围,所述像素坐标参数的范围为所述第一区域在所述第二图像中的像素坐标的集合;基于所述像素坐标参数的范围确定第一像素坐标集合,并基于所述角度参数的范围确定第一角度;所述第一像素坐标集合指示所述第三图像中所述第一图像叠加在所述第二图像上的位置,所述第一角度指示所述第三图像中叠加在所述第二图像上的所述第一图像的旋转角度。
第八方面,本申请提供一种装置,包括处理器和存储器,用于实现上述第一方面及其可 能的实施方式描述的方法。该存储器与处理器耦合,处理器执行存储器中存储的计算机程序时,可以使得该装置实现上述第二方面或第二方面任一种可能的实现方式所述的方法。
在一种可能的实现中,该装置可以包括:
存储器,用于存储计算机程序;
处理器,用于将N个对抗样本分别输入图像检测模型;所述N个对抗样本中的每个样本为采用上述第一方面任一项所述的方法获得的用于攻击所述检测模型的所述第三图像,所述N为正整数;通过所述检测模型输出所述N个对抗样本的检测结果;基于所述检测结果统计所述N个对抗样本中成功攻击所述检测模型的比例;基于所述比例评估所述检测模型的鲁棒性。
需要说明的是,本申请中存储器中的计算机程序可以预先存储也可以使用该装置时从互联网下载后存储,本申请对于存储器中计算机程序的来源不进行具体限定。本申请实施例中的耦合是装置、单元或模块之间的间接耦合或连接,其可以是电性,机械或其它的形式,用于装置、单元或模块之间的信息交互。
综上所述,本申请实施例通过将干扰图像叠加到目标图像来生成攻击性较强并且实用性较好的对抗样本,从而用该生成的对抗样本可以更加有效地评估图像检测模型的鲁棒性。
附图说明
为了更清楚地说明本申请实施例技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍。
图1是本申请实施例提供的场景的示意图;
图2是本申请实施例提供的图像检测模型评估方法的流程示意图;
图3A是本申请实施例提供的一种图像的示意图;
图3B是本申请实施例提供的一种图像的示意图;
图4A是本申请实施例提供的一种图像的示意图;
图4B是本申请实施例提供的一种图像的示意图;
图5A是本申请实施例提供的图像坐标系的示意图;
图5B是本申请实施例提供的图像坐标系中像素坐标集合的示意图;
图6是本申请实施例提供的图像坐标系中角度参数的示意图;
图6A是本申请实施例提供的学习模型的结构示意图;
图7是本申请实施例提供的一种贴纸图像的示意图;
图8A是本申请实施例提供的一种交通标识牌的示意图;
图8B是本申请实施例提供的一种交通标识牌图像的示意图;
图9是本申请实施例提供的一种交通标识牌图像的有效粘贴位置的示意图;
图10A是本申请实施例提供的一种交通标识牌的对抗样本的示意图;
图10B是本申请实施例提供的一种交通标识牌的另一种对抗样本的示意图;
图11是本申请实施例提供的实验获得的攻击成功率的对比结果示意图;
图12是本申请实施例提供的图像处理方法的流程示意图;
图13是本申请实施例提供的图像处理装置的示意图;
图14是本申请实施例提供的评估图像检测模型的装置的示意图;
图15是本申请实施例提供的装置的一种可能的硬件结构示意图。
具体实施方式
下面结合附图对本申请实施例中的技术方案进行描述。
为了便于理解本申请提供的方法,下面对涉及的术语进行介绍:
1、对抗样本
对抗样本是指在数据集中通过添加的细微的干扰所形成的输入样本,可以导致模型以高置信度给出一个错误的输出。
2、鲁棒性
鲁棒性指控制系统在一定(结构,大小)的参数摄动下,维持其它某些性能的特性。
3、掩膜
掩膜是由0和1组成的一个二进制图像。当在某一功能中应用掩模时,1值区域被处理,被屏蔽的0值区域不被包括在计算中。
为了更好的理解本申请实施例,下面对本申请实施例适用的场景进行示例性地描述,参见图1。
图1所示为本申请实施例提供的系统架构100,该系统架构100包括图像获取装置110和图像处理装置120。
图像获取装置110用于获取图像,并将获取的图像发送给图像处理装置120。图像处理装置120中包括图像检测模型,可以通过该图像检测模型检测获取的图像。
示例性地,图像获取装置110例如可以是摄像机、摄像头或者扫描仪等。
示例性地,图像处理装置120可以是终端设备或服务器。
例如,在自动驾驶的应用场景中,该图像获取装置110可以是安装在车辆上的摄像机、摄像头或者扫描仪等;该图像处理装置120可以是该车辆中的处理设备,或者可以是与所述车辆通信的服务器等等。
例如,在人脸识别的应用场景中,该图像获取装置110可以是摄像机、电子设备上的摄像头、相机,或者扫描仪、人脸识别器等;该图像处理装置120可以是电子设备、扫描仪、人脸识别器中的图像处理器等。
基于上述图1所述的系统架构可知,图像检测模型的性能是具体应用实现的关键。为了了解并优化图像检测模型的性能,可以通过对抗样本来评估图像检测模型的鲁棒性。但是现有的方案中生成的对抗样本攻击性较弱,或者实用性较差,为了获取攻击性较强、实用性较好的对抗样本以用于有效评估图像检测模型的鲁棒性,本申请实施例提供了一种图像检测模型评估方法,该方法可以由上述图1中所述的图像处理装置120执行。
参见图2,本申请提供的图像检测模型评估方法包括但不限于如下步骤:
S101:获取第一图像和第二图像;第一图像叠加于第二图像得到第三图像,第三图像用于攻击图像检测模型。
示例性地,上述第一图像可以是物理环境中存在的图像,图像处理装置可以通过用户界面接收用户输入的该第一图像,或者图像处理装置可以从其它设备接收该第一图像等等。
或者,示例性地,该第一图像可以是图像处理装置随机生成的图像。
一种可能的实施方式中,上述第二图像可以是自动驾驶领域中涉及到的交通标识牌的图像或者车道线的图像等等。这种情况下,上述图像检测模型为交通标识牌图像检测模型或者车道线图像检测模型。
另一种可能的实施方式中,上述第二图像可以是人脸图像、动物图像、风景图像、物品图像等。这种情况下,上述图像检测模型为对应类型的图像检测模型。
在具体实现中,图像获取装置(例如上述图1所示的图像获取装置110)获取到上述第二图像之后,将该第二图像发送给图像处理装置。
一种可能的实施方式中,图像处理装置从图像获取装置接收到的图像需要经过进一步处理才能获得上述第二图像。该进一步处理可以包括通过目标检测技术检测出目标区域,排除接收的图像中的背景区域,该获得的目标区域的图像即为上述第二图像。该目标检测技术可以是基于深度学习的检测模型等。例如,若图像处理装置从图像获取装置接收到的图像为包括交通标识牌的图像,示例性地参见图3A所示的图像,该图像包括限速40km/h的交通标识牌区域和背景区域,那么,图像处理装置可以先通过交通标识牌的深度学习检测模型检测并提取交通标识牌的区域,从而去掉了背景区域,以提取的交通标识牌区域的图像作为第二图像,提取得到的第二图像可以参加图3B所示的图像。
S102:确定上述第二图像中的第一区域,基于该第一区域构建第一参数空间;第一区域为第二图像中的部分区域,第一参数空间包括像素坐标参数的范围和角度参数的范围,像素坐标参数的范围为第一区域在第二图像中的像素坐标的集合。
在具体实现中,上述第二图像中的第一区域可以是该第二图像中非关键位置的区域,该非关键位置的区域被遮挡后人眼也能辨认出该第二图像指示的信息。即该第二图像的第一区域用于叠加上述第一图像以生成上述第三图像,即生成上述图像检测模型的对抗样本。
一种可能的实现方式中,在该第二图像中还包括关键位置的区域,示例性地,该关键位置的区域可以是以该第二图像的中心点为原点向周围发散的部分区域;该第二图像中关键位置的区域之外的部分或者全部区域为上述非关键位置的区域。
为了便于理解上述第二图像中的非关键位置的区域,举例说明,可以示例性地参见图4A和图4B。图4A中示出的第二图像为指示直行的交通标识牌的图像,图4B中示出的第二图像为指示限速40km/h的交通标识牌的图像,图4A和图4B中虚线框起来的区域为各自的关键位置的区域,那么,在图4A所示的图像中除了虚线框起来的区域之外的部分或全部区域为非关键位置的区域;同理,在图4B所示的图像中除了虚线框起来的区域之外的部分或全部区域为非关键位置的区域。
需要说明的是,上述非关键位置的区域仅为示例,不构成对本申请实施例的限制。
示例性地,在具体实现中,可以生成上述第二图像的掩膜M F∈R r×v来标记该第二图像中的第一区域。该第二图像的掩膜是一个r*v的矩阵,该矩阵的大小与该第二图像的像素矩阵的大小相同,该r和v为正整数。具体的,可以通过对该第二图像中第一区域的像素标记为1,对该第一区域之外的区域的像素标记为0来生成该第二图像的掩膜。即通过该掩膜可以清楚地标记出该第二图像中的第一区域。
确定上述第二图像中的第一区域之后,基于该第一区域构建上述第一参数空间。如上所述,该第一参数空间包括像素坐标参数的范围和角度参数的范围,该像素坐标参数的范围为第一区域在第二图像中的像素坐标的集合。
一种可能的实施方式中,图像处理装置可以为该第二图像构建一个图像坐标系,该图像 坐标系例如可以参见图5A。该图像坐标系以该第二图像左上角为原点O,构建以像素为单位的直角坐标系u-v。在该坐标系中像素的坐标表示为(u,v),原点O的像素坐标为(0,0),像素的横坐标u表示该像素在该第二图像的像素矩阵中的列数,像素的纵坐标v表示该像素在该第二图像的像素矩阵中的行数。例如,假设第二图像中的某个像素为该第二图像的像素矩阵中第i行第j列的像素,那么,该某个像素在该图像坐标系中的像素坐标为(i,j)。
基于上述为第二图像构建的图像坐标系,可以确定上述第二图像中的第一区域包括像素的坐标的集合,该集合即为上述像素坐标参数的范围。例如可以参见图5B,图5B中所示第一区域在第二图像中占据的像素坐标集合为{(u,v)|u∈(2,3),v∈(3,4)},该像素坐标集合为像素坐标参数的范围。
需要说明的是,图像处理装置为第二图像建立的图像坐标系不限于上述图5A所示的坐标系,还可以是以第二图像的任意一点例如中心点等为原点O建立的坐标系等等,本申请对为第二图像建立的图像坐标系不做限制。
上述角度参数的范围可以是0°至360°。示例性的,可以参见图6,该角度参数的范围可以是以该第二图像中的任意一点为原点在第二图像所在的平面中延伸的任意一个方向为0°,然后以该原点为旋转中心顺时针旋转或者逆时针旋转至360°的范围。或者,示例性的,该角度参数的范围可以是以上述第一区域中的任意一点为原点在第二图像所在的平面中延伸的任意一个方向为0°,然后以该原点为旋转中心顺时针旋转或者逆时针旋转至360°的范围。
S103:基于像素坐标参数的范围确定第一像素坐标集合,并基于角度参数的范围确定第一角度;第一像素坐标集合指示上述第三图像中第一图像叠加在第二图像上的位置,第一角度指示上述第三图像中叠加在第二图像上的第一图像的旋转角度;将基于该第一像素坐标集合和第一角度获得的第三图像输入上述图像检测模型进行检测,并输出检测结果。
在具体实施例中,图像处理装置获得上述第一参数空间之后,可以在该第一参数空间内搜索并确定具体的像素坐标集合(即上述第一像素坐标集合)和旋转角度(即上述第一角度),以确定上述第一图像叠加到上述第二图像的位置和旋转角度。
具体的,可以在第一参数空间包括的像素坐标参数的范围内搜索并确定该具体的像素坐标集合。示例性地,该具体的像素坐标集合可以是与第一图像的像素矩阵大小相同的像素矩阵。可以在第一参数空间包括的角度参数的范围内搜索并确定该具体的旋转角度。
可选的,可以通过差分进化算法、机器学习算法或深度学习算法等算法来搜索上述具体的像素坐标集合和具体的旋转角度。
示例性地,若通过差分进化算法来搜索参数的值,首先构建一个参数向量θ=(θ 12);其中,θ 1表示像素坐标集合的参数,其每次迭代搜索得到的是与第一图像的像素矩阵大小相同的像素矩阵的坐标的集合,具体的搜索范围为上述像素坐标参数的范围;θ 2表示上述角度参数,其每次迭代搜索得到的是一个具体的角度值,具体的搜索范围为上述角度参数的范围,即[0°,360°)。
在差分进化算法中上述参数向量也称为一个种群。种群中的每个个体代表一个参数向量的可能取值,个体上的每个参数代表相应向量组内对应参数的取值。构建得到该种群之后,可以根据上述像素坐标参数的范围和角度参数的范围对上述种群进行初始化,即对上述参数向量进行随机初始化。初始化之后,即可得到一组θ 1和θ 2的具体取值,通过该组初始化的取值可以确定上述第一图像叠加到上述第二图像的一个位置和一个旋转角度。即该初始化得到的θ 1的具体取值可以是上述第一像素坐标集合,该初始化得到的θ 2的具体取值可以是上述第 一角度。基于该确定的位置和旋转角度可以将第一图像叠加到第二图像上获得第三图像,即获得上述图像检测模型的一个对抗样本。然后,将该对抗样本输入到上述图像检测模型中进行检测,输出检测结果。
随机初始化上述种群之后,可以根据种群迭代的目标函数进行种群的迭代更新,每迭代一次都可以获得一组θ 1和θ 2的具体取值,每获得一组θ 1和θ 2的具体取值都可以确定上述第一图像叠加到上述第二图像的一个位置和一个旋转角度。即每迭代一次得到的θ 1的具体取值都可以是上述第一像素坐标集合,每迭代一次得到的θ 2的具体取值都可以是上述第一角度。同理,基于每次种群迭代确定的位置和旋转角度可以将第一图像叠加到第二图像上获得第三图像,即获得上述图像检测模型的一个对抗样本。然后,将该获得的对抗样本输入到上述图像检测模型中进行检测,输出检测结果。
示例性地,上述种群迭代的目标函数可以是:
Figure PCTCN2022125631-appb-000001
g(x;s,θ)表示生成的对抗样本图像,也就是上述第三图像。其中x表示原图像(比如一个限速40的交通标识牌图像),s表示将上述第一图像叠加到上述第二图像以获得上述第三图像的过程,具体实现中表现为粘贴贴纸的过程,θ表示上述第一参数空间中的参数(包括上述像素坐标参数和角度参数)。g(x;s,θ)表示的就是针对原图x采用以θ为变换参数的图像变换操作s后得到的对抗样本图像。
Figure PCTCN2022125631-appb-000002
表示原图像x的标签,例如:限速40的交通标识牌图像。
f()函数表示图像检测模型,输入一个图像,输出这个图像检测的结果,例如输入限速40的交通标识牌图像,输出为检测到限速40的交通标识牌的概率。
L()代表损失函数,untarget代表非目标攻击,untarget和表示躲避攻击的dodging表示的意思相同。假如原图像是限速40的交通标识牌图像,将攻击后得到的对抗样本图像输入检测模型f(),图像检测模型f()检测到限速40的交通标识牌的概率变低,而检测为限速60或者禁止鸣笛的概率变高(除了限速40外其他任何一类交通标识牌的概率升高并且超过限速40的概率)。
前面的
Figure PCTCN2022125631-appb-000003
代表让整个公式取最小值时候对应的θ取值。
上述目标函数表示的意思为:对于原图x,其标签类别是
Figure PCTCN2022125631-appb-000004
要找到一个参数θ,使得经过变换后得到的对抗样本图像g(x;s,θ)被图像检测模型f检测为
Figure PCTCN2022125631-appb-000005
的概率最小(也就是损失函数L()取值最小),而检测为其他类别的概率变大(只要
Figure PCTCN2022125631-appb-000006
类别概率变小,其他类别概率肯定会增大),这样就实现了非目标攻击。
此外,令种群内个体适应度评价指标J(θ)=L(θ)。J(θ)表示评判种群内个体的优劣程度的评价指标,对应的J(θ)值越小的个体越优。初始状态下,J(θ)与目标函数保持一致。
在具体实现中,上述将对抗样本输入图像检测模型中检测并输出检测结果后,可以基于检测结果判断该对抗样本对该图像检测模型的攻击是否成功。具体的,若图像检测模型正确识别出了对抗样本中包括的第二图像指示的信息,则攻击失败,若图像检测模型没有正确识别出了对抗样本中包括的第二图像指示的信息,则攻击成功。例如,若对抗样本中包括的第二图像为指示直行的交通标识牌图像,将该对抗样本输入图像检测模型后得到的检测结果为模型识别出该对抗样本为指示直行的交通标识牌,则攻击失败,反之,模型识别出该对抗样本不是指示直行的交通标识牌,则攻击成功。
若攻击成功,则结束上述种群的迭代,并记录迭代的次数。因为,每迭代一次可以获得一个对抗样本,每个对抗样本都调用一次图像检测模型来进行检测,那么该迭代的次数即为调用图像检测模型的次数。如果上述初始化的数值也获得一个对抗样本,并调用图像检测模型进行检测,那么调用图像检测模型的次数为迭代的次数再加上一次的数量。
一种可能的实施方式中,为了避免无限迭代下去浪费计算资源,可以为上述种群的迭代设置迭代次数的上限T,例如可以设置迭代次数的上限T为30次、40次或者100次等等,本申请对该T的具体取值不做限制。在具体实现中,若迭代次数达到上限T都没有成功攻击对上述图像检测模型,那么,也结束上述种群的迭代。这种情况下,调用图像检测模型的次数为T,如果上述初始化的数值也获得一个对抗样本,并调用图像检测模型进行检测,那么调用图像检测模型的次数为T+1。
示例性地,若基于机器学习算法或深度学习算法来搜索上述具体的像素坐标集合和具体的旋转角度,可以基于上述第一参数空间构建一个参数搜索模型,并根据上述像素坐标参数的范围和角度参数的范围随机初始化像素坐标参数和角度参数,并以初始化的参数为输入信息输入到参数搜索模型中进行参数值的搜索。参数搜索模型的输出即为搜索到的上述第一像素坐标集合和第一角度。基于搜索到的第一像素坐标集合和第一角度可以将第一图像叠加到第二图像上获得第三图像,即获得上述图像检测模型的一个对抗样本。然后,将该对抗样本输入到上述图像检测模型中进行检测,输出检测结果。该输出的检测结果又可以反向输入到上述参数搜索模型中以进行该参数搜索模型的参数的优化,然后,基于优化后的参数继续搜索新的第一像素坐标集合和第一角度。通过多次循环迭代后,若获得的对抗样本成功攻击上述图像检测模型,则参数搜索模型结束搜索,并记录该参数搜索模型迭代搜索的次数作为上述调用图像检测模型的次数。
S104,基于上述S101至S103的操作获得多个不同的对抗样本的检测结果,基于该多个对抗样本的检测结果评估上述图像检测模型的鲁棒性。
上述步骤S101至步骤S103是基于一个上述第二图像生成多个对抗样本来攻击上述图像检测模型,并记录了该多个样本是否对模型攻击成功的信息以及该多个对抗样本攻击图像检测模型的过程中调用该模型的次数。为了更好地,评估该图像检测模型的鲁棒性,可以使用多个上述第二图像来生成对抗样本攻击该图像检测模型,基于该多个第二图像中每个第二图像生成对抗样本来攻击图像检测模型的具体实现可以参见上述步骤S101至步骤S103中对应的描述。
完成该多个第二图像的对抗样本攻击图像检测模型的操作之后,可以获得该多个第二图像中每个图像对应的是否对模型攻击成功的信息以及调用图像检测模型的次数。然后,统计该多个第二图像生成的对抗样本成功攻击该图像模型的比例,通过该比例的大小来评估该图像检测模型的鲁棒性。例如,若该比例大于设定的第一阈值,则可以确定该图像检测模型的鲁棒性较差,若该比例小于该第一阈值,则可以确定该图像检测模型的鲁棒性较好。
一种可能的实施方式中,还可以统计该多个第二图像对应的调用图像检测模型的次数的平均值,该平均值可以称为模型平均调用次数。同理,可以将该模型平均调用次数与设定的第二阈值比较,若该模型平均调用次数大于该第二阈值,则可以确定该图像检测模型的鲁棒性较好,反之,若该模型平均调用次数小于该第二阈值,则可以确定该图像检测模型的鲁棒性较差。
一种可能的实施方式中,上述步骤S101中的获取第一图像可以是基于预先构建的像素值的取值范围获得。
在具体实现中,图像处理装置可以先构建第二参数空间,该第二参数空间包括上述像素值的取值范围,可以基于该像素值的取值范围确定该第一图像的像素值,获得该第一图像的像素矩阵。该像素值的取值范围可以是[0,255],可选的,可以将该像素值归一化,那么,该像素值的取值范围可以是[0,1]。
一种可能的实施方式中,上述步骤S103中基于机器学习算法或深度学习算法构建的参数搜索模型还可以结合该第二参数空间一起构建,构建得到的参数搜索模型可以联合搜索该第一图像的像素值、上述像素坐标参数的值和上述角度参数的值。然后,基于联合搜索到的第一图像的像素值生成第一图像,基于该联合搜索到的像素坐标参数的值和角度参数的值将该生成的第一图像叠加到上述第二图像中获得对抗样本第三图像,并用该第三图像来攻击上述图像检测模型。
为了便于理解,下面以基于深度强化学习模型来搜索参数空间确定该像素坐标参数的值和该第一图像的像素值为例进行介绍。
示例性地,假设第一图像叠加到第二图像获得的第三图像(即对抗样本)表示为:
Figure PCTCN2022125631-appb-000007
其中,⊙表示哈达玛积(Hadamard product),
Figure PCTCN2022125631-appb-000008
表示上述第一图像,Α第二图像中的掩码矩阵,该掩码矩阵指示上述第二图像中的第一区域。
示例性地,可以使用基于MI-FGSM算法的集成攻击模型来生成上述第一图像,该集成攻击模型即为上述深度强化学习模型。本申请采用的深度强化学习模型中可以包括n个学习子模型,该学习子模型也可以称为智能体(agent),或者称为代理模型(surrogate model)。对于包括该n个学习子模型的集成攻击模型,让ρ i表示每个学习子模型f i的权重,∈表示攻击步长,该攻击步长即为同一个参数中两次相邻搜索得到的值的差。该i的取值从1到n,n为大于0的整数。然后以非针对性攻击(或躲避攻击)为例,给定第二图像的真实标签y,让f i(x,y)表示模型f i将该第二图像x预测为标签y的置信度得分,那么可以通过迭代方式计算
Figure PCTCN2022125631-appb-000009
Figure PCTCN2022125631-appb-000010
Figure PCTCN2022125631-appb-000011
Figure PCTCN2022125631-appb-000012
其中,
Figure PCTCN2022125631-appb-000013
对于有针对性的攻击(或者说模仿攻击),可以给定第二图像
Figure PCTCN2022125631-appb-000014
l可以简单替换为
Figure PCTCN2022125631-appb-000015
为了优化掩码矩阵A,固定对对抗样本的形状和大小(s h,s w),并改变其中心坐标(c x,c y)来调整掩码矩阵,对应的掩码定义为Ac。由此,对抗样本可以表示为:
Figure PCTCN2022125631-appb-000016
本示例的攻击目标是优化第一图像的位置和内容,以生成良好的可转移对抗样本来攻击目标图像检测模型。因此,掩码Ac、等式(5)中的攻击步长∈和等式(4)中的权重ρ i设置为学习参数。为了使参数更适合目标图像检测模型,可以通过对目标模型的少量查询来动态优化 参数。
具体的,基于学习子模型获得的参数是由查询目标图像检测模型返回的信息引导的,即将基于这些参数获得的对抗样本输入到图像检测模型进行检测,并将检测结果返回给该学习子模型以用于优化对应的参数,这个过程可以表示为学习子模型通过强化学习中与环境交互获得的奖励信号的学习,由此可以构建一个学习子模型学习攻击参数的选择策略。
示例性地,参数值定义为学习子模型在选择策略π指导下生成的动作,a t表示第t个动作(即第t个参数的值)。输入到学习子模型的图像特征定义为状态s,威胁模型F(·)为目标图像检测模型。参数为θ的策略函数π θ(a|s)是学习子模型用来决定采取什么动作的规则,可以表述为状态s中动作a的概率分布。
奖励反映了当前生成的对抗样本在目标图像检测模型上的表现,学习子模型的训练目标是学习好的策略以最大化奖励信号。在图像识别中,躲避攻击的目标是生成尽可能远离目标图像即上述第二图像的图像,而模仿攻击的目标是生成与目标图形尽可能相似的图像。因此,奖励函数R被形式化为:
Figure PCTCN2022125631-appb-000017
在迭代训练中,学习子模型首先根据策略π预测一组参数,然后根据预测参数生成对抗样本,将生成的对抗样本输入威胁模型以获得奖励值。经过多次训练,学习子模型将生成在威胁模型上表现良好的动作即参数,从而可以基于这些参数生成攻击性更强的对抗样本。
上述学习子模型需要学习位置、权重和攻击步长的策略。考虑到位置等参数的联合求解,学习子模型的设计采用了基于U-net的结构。假设学习子模型的数量为n,设计学习子模型以n个通道输出,且输出为与输入图像具有相同长度h和宽度w(即大小为n×h×w)的特征图。
在特征图M的每个通道M i(i=1,..,n)中,每个像素点的相对值代表了每个位置对于学习子模型f i的重要性,通道的整体值反映了对应的学习子模型的重要性。上述第一图像子上述第二图像中的不同位置有不同的攻击强度,因此在学习子模型网络的顶层,使用全连接层将特征图M映射到表示不同攻击步长值的向量V。学习子模型的结构可以参见图6A所示。
图6A最左侧的方块里面代表的是U-Net的网络结构,每一个竖向方块代表一层网络,下面的数字代表通道数。
第一层是输入层,3表示输入层的通道数,由于彩色图像有RGB三个通道,输入层通道数自然就是3。
Conv3×3和Conv1×1代表卷积层,其对应的卷积核大小分别为3×3和1×1。
Max Pool 2×2代表2×2的最大池化层。
Up conv 2×2表示上采样层,每处理一次,特征图宽和高就扩大2倍。如:原来是大小128*32*32,经过上采样就变为128*64*64。32和64表示特征图的大小,也就是宽和高,128表示的是通道数。
FC layer表示全连接层。
上面这些都是神经网络的一些基本模块和基本操作。由上面这些基本操作组成的U-net网络可以实现需要的功能。即:输入一张图像,通过学习,可以输出上述像素坐标参数等信息参数。
整个U-Net网络结构从左到右所执行的操作为:
1.输入一个3通道图像;
2.进行一次3×3卷积操作,得到64通道的特征图;
3.然后进行最大池化操作下采样,得到64通道,但是大小减半的特征图;
4.之后3×3卷积操作得到128通道的特征图;
5.再进行最大池化操作下采样,得到128通道,但是大小减半的特征图;
6.进行3×3卷积操作,得到256通道的特征图;
7.上采样,特征图大小变大为2倍;
8. 1×1卷积,通道数减为128;
9.上采样,特征图大小增大为两倍,和原图大小一样。这里的大小指的是长和宽;
10. 1×1卷积,通道数减为128;
11. 1×1卷积,通道数减为n。
完成上述操作之后可以得到的n*h*w的特征图,用于预测位置和权重。再后面进行了一次全连接层的处理,用于预测步长。
图6A右侧的方块表示对U-Net输出的特征图进一步处理,得到上述像素坐标参数等参数信息。
具体来说,对于位置参数,位置的可选范围是离散的,因此位置策略π θ1被设计为遵循分类分布。给定每个位置被选中的概率P position,位置参数(c x,c y)~Cat(P position)和P position可以计算为:
Figure PCTCN2022125631-appb-000018
对于权重参数,每个学习子模型上的损失与集成损失的权重比是一个连续值,可以将权重策略π θ2设置为遵循高斯分布。所以第i个权重参数
Figure PCTCN2022125631-appb-000019
Figure PCTCN2022125631-appb-000020
的计算结果如下:
Figure PCTCN2022125631-appb-000021
其中,
Figure PCTCN2022125631-appb-000022
是指特征图中第i个通道的平均值,σ是一个超参数。在实际采样中,可以使用裁剪操作使权重之和等于1。
对于攻击步长参数,以0.01的间隔设置0.01到0.2范围内的20个值,并由于值的离散性采用分类分布作为步长策略π θ3。所以步长参数∈~Cat(p step),每个候选值的概率p step为:
p step=softmax(FC(P position))    (9)
通过从相应的分布中采样,可以得到位置参数(c x,c y)、权重参数ρ i(i=1,...,n)和步长参数∈。
在上述迭代训练的过程中,可以使用策略梯度算法来指导学习子模型的策略更新。具体的,在学习子模型训练中,目标是让学习子模型h θ学习一个好的策略π θ。假设学习子模型有T个攻击参数需要确定,并且τ=(s,a 1,a 2,...,a T)是决策结果,那么最优策略参数θ *表示为:
Figure PCTCN2022125631-appb-000023
可以使用策略梯度方法通过梯度上升法求解θ *,并遵循REINFORCE算法,使用策略函数分布的N(N为正整数)个采样的平均值来逼近策略梯度
Figure PCTCN2022125631-appb-000024
Figure PCTCN2022125631-appb-000025
其中R n是第n次采样的奖励。当使用参数θ更新策略时,可以将奖励R视为步长。奖励越大,步长越大。如果奖励为负,则将向相反的方向发展。通过这种方式,学习子模型可以在增加奖励的方向上随着θ的更新学习到好的策略函数。
对于遵循分类分布的参数即位置参数和攻击步长参数,给定概率向量p(即等式(7)和等式(9)中的P position和p step),让p(a)表示概率向量p中参数a的概率,然后对于π θ1和π θ3,在等式(11)中的
Figure PCTCN2022125631-appb-000026
可以计算为:
Figure PCTCN2022125631-appb-000027
对于遵循高斯分布的参数即权重参数,高斯分布的平均值μ f由学习子模型的输出计算,因此μ f可以表示为h θ(s)=μ f。因此,对于π θ2,在等式(11)中的
Figure PCTCN2022125631-appb-000028
可以计算为:
Figure PCTCN2022125631-appb-000029
通过上述的示例,可以实现基于深度强化学习模型来搜索参数空间确定对应的参数以生成攻击性强的对抗样本。
为了便于理解上述图像检测模型评估方法,下面结合自动驾驶领域的交通标识牌图像检测模型为例进行示例性地描述。
本申请实施例的图像检测模型评估方法,适用于自动驾驶车辆中检测模型进行训练或评估的应用场景。本申请中的图像检测模型评估方法,可以由数据处理装置执行,该装置采用软件和/或硬件实现,并具体配置于电子设备中,该电子设备可以是终端设备或服务器。在一种可能的实施方式中,该电子设备可以是车载设备,可设置于车辆中。设置该电子设备的车辆可以是具备自动驾驶功能的车辆。交通标识牌图像检测模型评估方法可以包括但不限于如下步骤:
S201:获取贴纸图像和交通标识牌图像。
获取贴纸图像,该贴纸可以是物理环境中存在的贴纸或噪声贴纸。
在一种可能的实施方式中,贴纸可以是物理环境中存在的贴纸。图像处理装置可以通过用户界面接收用户输入的贴纸图像,或者图像处理装置可以从其它设备接收该贴纸图像等等。如,通过图像技术接收如图7所示的贴纸图像。
在另一种可能的实施方式中,贴纸图像可以是噪声贴纸图像。例如,可以使用随机噪声生成器生成多个噪声,再使用固定L步长得梯度下降法GD计算出最优噪声,从而生成噪声贴纸图像。噪声贴纸图像可以是基于预先构建的像素值的取值范围获得。基于噪声贴纸图像的像素值的取值范围确定噪声贴纸图像的像素值,获得噪声贴纸图像的像素矩阵。像素值的取值范围通常为[0,255],本申请实施例中,将像素值的取值范围归一化,对应为[0,1],如若像素值为255,本申请实施例中对应为1。
获取交通标识牌图像,借助目标检测技术检测出如图8A所示的交通标识牌的标识牌区域,该标识牌区域指示的信息为“限速100km/h”,获取标识牌区域200,排除背景区域210,得到如图8B所示的交通标识牌图像。
S202:确定交通标识牌图像的有效的贴纸粘贴坐标集合。
如图9所示,该交通标识牌图像中的关键位置区域为指示“限速100km/h”信息的区域220,确定交通标识牌图像中排除区域220的区域230为有效的贴纸粘贴位置。有效的贴纸粘 贴位置为排除标识“100”的区域。基于该位置生成上述交通标识牌图像的掩膜M F∈R r×v来标记该交通标识牌图像中的有效的贴纸粘贴位置坐标集合。该交通标识牌图像的掩膜是一个r*v的矩阵,该矩阵的大小与该交通标识牌图像的像素矩阵的大小相同,该r和v为正整数。具体的,可以通过对有效的贴纸粘贴位置230的像素标记为1,对除此之外的区域220的像素标记为0来生成该交通标识牌图像的掩膜。即通过该掩膜可以清楚地标记出该交通标识牌图像中的有效的贴纸粘贴位置坐标集合。
S203:选择攻击参数并确定各个攻击参数的取值范围,以此构建攻击参数空间。通过攻击算法搜索合适的攻击参数,依据攻击参数生成对抗样本,并输入到交通标识牌检测模型中进行检测,输出检测结果。
在一种可能的实施方式中,设攻击参数向量为θ=(θ 1,…,θ i,…,θ d)1≤i≤d,令d=2,选定贴纸的粘贴位置(即θ 1)、旋转角度(即θ 2)作为攻击参数。确定各个参数的取值范围
Figure PCTCN2022125631-appb-000030
粘贴位置(θ 1)的取值范围是包含有效粘贴位置的坐标集合,旋转角度θ 2的取值范围为[0°,360°)。
以交通标识牌图像构建该图像坐标系,该图像坐标系以该交通标识牌图像左上角为原点O,以图像上侧为x轴,以图像左侧为y轴,建立坐标系,该粘贴位置的取值范围是有效粘贴位置的坐标集合。
以该交通标识牌图像中有效粘贴位置的左上角为原点,在该图像所在的平面右侧延伸的方向为0°,以该原点为旋转重心顺时针旋转360°。因此,旋转角度θ 2的范围为[0°,360°)。
在另一种可能的实施方式中,设攻击参数向量为θ=(θ 1,…,θ i,…,θ d)1≤i≤d,令d=3,选定贴纸的粘贴位置(即θ 1)、旋转角度(即θ 2)、像素矩阵(即θ 3)作为攻击参数。确定各个参数的取值范围
Figure PCTCN2022125631-appb-000031
粘贴位置(θ 1)的取值范围是包含有效粘贴位置的坐标集合,以及旋转角度θ 2的取值范围为[0°,360°)不再赘述。像素矩阵θ 3的取值范围归一化,对应为[0,1]以贴纸的粘贴位置θ 1、旋转角度θ 2为攻击参数时,通过差分进化算法来搜索贴纸的粘贴位置、旋转角度。给定种群规模P,令P=120,令X(k)表示第k代种群,种群由P组待选攻击参数向量构成,种群中的每个个体代表一组攻击参数向量的可能取值,个体上的每个参数代表相应向量组内对应参数的取值。令J(θ)表示评判种群内个体的优劣程度的评价指标,对应的J(θ)值越小的个体越优。初始状态下,J(θ)与对抗攻击的目标函数保持一致。首先在攻击参数的取值范围内对初代种群X(0)进行随机初始化。选定攻击方式为非目标攻击,确定其目标函数为:
Figure PCTCN2022125631-appb-000032
且令种群内个体适应度评价指标J(θ)=L(θ)。对对种群进行迭代进化更新,直到满足停止迭代的标准。第一步、设种群迭代进化次数的上限为T=30,stop=T,k=0。第二步、若k<T且X 0(k)代表的攻击参数向量可以实现成功攻击,则令stop=k,结束迭代进化;否则以随机杂交和近亲繁殖两种方式进行下一步来生成候选种群C(k)。重复执行第二步,k=k+1,直到满足结束迭代进化的标准。
以X 0(stop)为攻击参数生成当前标识牌图像的对抗样本,将该对抗样本输入交通标识牌图像检测模型中,获得检测结果。如图所示,若交通标识牌图像检测模型正确识别该对抗样本中包括的交通标识图像指示的信息“限速100km/h”,则攻击失败;如图所示,若交通标识牌图像检测模型没有正确识别出对抗样本中包括的交通标识图像指示的信息“限速100km/h”,则攻击成功。若攻击成功,则结束上述种群的迭代,并记录迭代的次数k。
S204:重复执行S201-S203的单张交通标识牌图像的对抗样本的生成过程,直至全部交通标识牌测试图像的对抗样本生成完毕;通过交通标识牌的检测模型输出所有对抗样本的检测结果,统计所有对抗样本中能实现成功攻击所述检测模型的比例和生成过程中对交通标识 牌检测模型的平均查询次数,以此两项指标作为对交通标识牌检测模型的鲁棒性评估结果。
若所有对抗样本中能实现成功攻击所述检测模型的比例大于设定的第一阈值,则可以确定该图像检测模型的鲁棒性较差,若该比例小于该第一阈值,则可以确定该图像检测模型的鲁棒性较好。
将该检测模型平均调用次数与设定的第二阈值比较,若该检测模型平均调用次数大于该第二阈值,则可以确定该图像检测模型的鲁棒性较好,反之,若该检测模型平均调用次数小于该第二阈值,则可以确定该图像检测模型的鲁棒性较差。
交通标识牌的检测模型f(·)可以输出以上对抗样本攻击成功或攻击失败的检测结果;交通标识牌的检测模型f(·)还可以对输入到模型的标识牌图像x输出其相应的预测类别标签t以及概率f(x,t)。
如图10A所示的对抗样本,检测模型输出预测类别标签以及概率为:限速100km/h,99%,如图10B所示的对抗样本,检测模型输出预测类别标签以及概率为:禁止机动车驶入,76%。
通过联合优化贴纸粘贴位置和旋转角度,可以减少对检测模型的调用次数,通过少量的模型调用次数动态调整攻击参数,有效提高攻击成功率和模型调用效率。
本申请提供的图像检测模型评估方法还可以应用在人脸识别模型中,具体评估的具体实现可以参见上述图2及其可能的实施方式中对应的描述,此处不再赘述。
为了体现本申请的图像检测模型评估方法相比于现有的方案可以生成攻击性更强的对抗样本从而更有效地实现图像检测模型的评估,本申请实施例通过实验获得了对应的实验数据证明了这一点。示例性地,该实验中的对抗样本可以是基于上述基于深度强化学习模型来搜索参数空间确定对应的参数生成的对抗样本。
示例性地,图11所示为实验获得的攻击成功率的对比结果示意图。在实验中,选定四种具有代表性的人脸识别模型作为待检测模型,这四种模型包括:
模型一、FaceNet人脸识别模型;
模型二、CosFace50人脸识别模型;
模型三、ArcFace34人脸识别模型;
模型四、ArcFace50人脸识别模型。
并且,从野外带标签的面孔(labeled faces in the Wild,LFW)和名人面孔属性(celeb Faces attribute,CelebA)中随机选择5752个不同的人脸图像构建人脸数据库。然后,基于该人脸数据库中的人脸图像生成对抗样本,可以通过三种方式来生成对抗样本:
方式一、只改变干扰图像(例如上述第一图像)叠加到这些人脸图像上的位置来生成对抗样本;
方式二、只改变叠加到这些人脸图像上的干扰图像的内容来生成对抗样本;
方式三、改变干扰图像叠加到这些人脸图像上的位置和叠加到这些人脸图像上的干扰图像的内容来生成对抗样本。
然后,将这些对抗样本分别输入到上述四中模型中进行检测,以成功攻击率和模型调用次数两个指标评估模型的鲁棒性。图11所示中包括两种对抗样本,一种是躲避对抗样本,该躲避对抗样本为尽可能远离目标图像(例如上述第二图像)的对抗样本;另一种是模仿对抗样本,该模仿对抗样本为与目标图像尽可能相似的对抗样本,结合该两种对抗样本是进行实验可以获得更准确的实验结果。结合图11中所示的数据可以看到,上述方式一和上述方式二对 应的平均成功率分别为38.89%和48.38%,而方式三对应的平均成功率为78.81%,从而表明联合优化干扰图像叠加到这些人脸图像上的位置和叠加到这些人脸图像上的干扰图像的内容来生成不同的对抗样本的攻击模式具有更好的攻击效果。
基于上述的描述,本申请实施例还提供了一种图像处理方法,参见图12,该图像处理方法可以包括但不限于如下步骤:
S1201、获取第一图像和第二图像;该第一图像用于叠加于该第二图像得到第三图像,该第三图像用于攻击图像检测模型。
S1202、基于第一区域构建第一参数空间;该第一区域为该第二图像中的部分区域,该第一参数空间包括像素坐标参数的范围和角度参数的范围,该像素坐标参数的范围为该第一区域在该第二图像中的像素坐标的集合。
S1203、基于该像素坐标参数的范围确定第一像素坐标集合,并基于该角度参数的范围确定第一角度;该第一像素坐标集合指示该第三图像中该第一图像叠加在该第二图像上的位置,该第一角度指示该第三图像中叠加在该第二图像上的该第一图像的旋转角度。
在一种可能的实施方式中,上述获取第一图像和第二图像之前,还包括:构建第二参数空间,该第二参数空间包括像素值的取值范围;上述获取第一图像,包括:基于该像素值的取值范围确定该第一图像的像素值,获得该第一图像的像素矩阵。
上述图12所示图像处理方法及其可能的实施方式的具体实现和有益效果的描述可以参见上述图2所示图像检测模型评估方法及其可能的实现方式中对应的描述,此处不再赘述。
上述主要对本申请实施例提供的图像处理方法和图像检测模型评估方法进行了介绍。可以理解的是,各个设备,例如自动驾驶车辆等为了实现上述对应的功能,其包含了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,本申请能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
本申请实施例可以根据上述方法示例对车辆等进行功能模块的划分,例如,可以对应各个功能划分各个功能模块,也可以将两个或两个以上的功能集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。需要说明的是,本申请实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
在采用对应各个功能划分各个功能模块的情况下,图13示出了一种图像处理装置300,该装置例如可以是上述方法实施例中所述的图像处理装置,或者可以是该图像处理装置中的芯片,或者可以是该图像处理装置中的处理系统等,该装置300包括:
获取模块310,用于获取第一图像和第二图像;该第一图像用于叠加于该第二图像得到第三图像,该第三图像用于攻击图像检测模型;
处理模块320,用于基于第一区域构建第一参数空间;该第一区域为该第二图像中的部分区域,该第一参数空间包括像素坐标参数的范围和角度参数的范围,该像素坐标参数的范围为该第一区域在该第二图像中的像素坐标的集合;基于该像素坐标参数的范围确定该第一 像素坐标集合,并基于该角度参数的范围确定该第一角度;该第一像素坐标集合指示该第三图像中该第一图像叠加在该第二图像上的位置,该第一角度指示该第三图像中叠加在该第二图像上的该第一图像的旋转角度。
在其中一种实施方式中,该处理模块320,还用于在该获取模块获取第一图像和第二图像之前,构建第二参数空间,该第二参数空间包括像素值的取值范围;
该获取模块,具体用于:基于该像素值的取值范围确定该第一图像的像素值,获得该第一图像的像素矩阵。
在采用对应各个功能划分各个功能模块的情况下,图14示出了一种图像处理装置400,该装置例如可以是上述方法实施例中所述的图像处理装置,或者可以是该图像处理装置中的芯片,或者可以是该图像处理装置中的处理系统等,该装置400包括:
输入模块410,用于将N个对抗样本分别输入图像检测模型;N个对抗样本中的每个样本为采用上述图13所述的装置获得的用于攻击检测模型的第三图像,N为正整数;
输出模块420,用于通过检测模型输出N个对抗样本的检测结果;
处理模块430,用于基于检测结果统计N个对抗样本中成功攻击检测模型的比例;并基于比例评估检测模型的鲁棒性。
在具体实现中,该N个对抗样本可以是上述步骤S104中的多个不同的对抗样本。
在其中一种可能的实施方式中,该N个对抗样本为基于M个不同的第二图像生成的样本,M为正整数,M<N;
处理模块430,具体用于:基于比例和模型平均调用次数评估检测模型的鲁棒性,模型平均调用次数为M/N,模型平均调用次数指示M个图像中每个图像的对抗样本平均输入检测模型的次数。
在具体实现中,该M个不同的第二图像可以是上述步骤S104中的多个第二图像。
图15所示为本申请提供的装置的一种可能的硬件结构示意图,该装置可以是上述方法实施例所述的图像处理装置,或者可以是该图像处理装置中的芯片,或者可以是该图像处理装置中的处理系统等。该装置1500包括:处理器1501、存储器1502和通信接口1503。处理器1501、通信接口1503以及存储器1502可以相互连接或者通过总线1504相互连接。
示例性的,存储器1502用于存储装置1500的计算机程序和数据,存储器1502可以包括但不限于是随机存储记忆体(random access memory,RAM)、只读存储器(read-only memory,ROM)、可擦除可编程只读存储器(erasable programmable read only memory,EPROM)或便携式只读存储器(compact disc read-only memory,CD-ROM)等。
通信接口1503包括发送接口和接收接口,通信接口1503的个数可以为多个,用于支持装置1500进行通信,例如接收或发送数据或消息等。
示例性的,处理器1501可以是中央处理器单元、通用处理器、数字信号处理器、专用集成电路、现场可编程门阵列或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。处理器也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,数字信号处理器和微处理器的组合等等。处理器1501可以用于读取上述存储器1502中存储的程序,使得装置1500执行如上述图2及其可能的实施例中所述的图像检测模型评估方法。
一种可能的实施方式中,处理器1501可以用于获取第一图像和第二图像;所述第一图像 用于叠加于所述第二图像得到第三图像,所述第三图像用于攻击图像检测模型;基于第一区域构建第一参数空间;所述第一区域为所述第二图像中的部分区域,所述第一参数空间包括像素坐标参数的范围和角度参数的范围,所述像素坐标参数的范围为所述第一区域在所述第二图像中的像素坐标的集合;基于所述像素坐标参数的范围确定第一像素坐标集合,并基于所述角度参数的范围确定第一角度;所述第一像素坐标集合指示所述第三图像中所述第一图像叠加在所述第二图像上的位置,所述第一角度指示所述第三图像中叠加在所述第二图像上的所述第一图像的旋转角度。
图15所示装置1500中各个单元的具体操作以及有益效果可以参见上述图2及其可能的方法实施例中对应的描述,此处不再赘述。
本申请实施例还提供一种计算机可读存储介质,该计算机可读存储介质存储有计算机程序,该计算机程序被处理器执行以实现上述图2及其可能的方法实施例中任一实施例所述的方法。
本申请实施例还提供一种计算机程序产品,当该计算机程序产品被计算机读取并执行时,上述图2及其可能的方法实施例中任一实施例所述的方法。
综上所述,本申请实施例通过将干扰图像叠加到目标图像来生成攻击性较强并且实用性较好的对抗样本,从而用该生成的对抗样本可以更加有效地评估图像检测模型的鲁棒性。
最后应说明的是:以上各实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述各实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的范围。

Claims (13)

  1. 一种图像处理方法,其特征在于,包括:
    获取第一图像和第二图像;所述第一图像用于叠加于所述第二图像得到第三图像,所述第三图像用于攻击图像检测模型;
    基于第一区域构建第一参数空间;所述第一区域为所述第二图像中的部分区域,所述第一参数空间包括像素坐标参数的范围和角度参数的范围,所述像素坐标参数的范围为所述第一区域在所述第二图像中的像素坐标的集合;
    基于所述像素坐标参数的范围确定第一像素坐标集合,并基于所述角度参数的范围确定第一角度;所述第一像素坐标集合指示所述第三图像中所述第一图像叠加在所述第二图像上的位置,所述第一角度指示所述第三图像中叠加在所述第二图像上的所述第一图像的旋转角度。
  2. 根据权利要求1所述的方法,其特征在于,所述获取第一图像和第二图像之前,还包括:
    构建第二参数空间,所述第二参数空间包括像素值的取值范围;
    所述获取第一图像,包括:
    基于所述像素值的取值范围确定所述第一图像的像素值,获得所述第一图像的像素矩阵。
  3. 根据权利要求1或2所述的方法,其特征在于,所述第一图像为物理环境中存在的贴纸的图像。
  4. 一种图像检测模型评估方法,其特征在于,所述方法包括:
    将N个对抗样本分别输入图像检测模型;所述N个对抗样本中的每个样本为采用权利要求1-3任一项所述的方法获得的用于攻击所述检测模型的所述第三图像,所述N为正整数;
    通过所述检测模型输出所述N个对抗样本的检测结果;
    基于所述检测结果统计所述N个对抗样本中成功攻击所述检测模型的比例;
    基于所述比例评估所述检测模型的鲁棒性。
  5. 根据权利要求4所述的方法,其特征在于,所述N个对抗样本为基于M个不同的所述第二图像生成的样本,所述M为正整数,M<N;
    所述基于所述比例评估所述检测模型的鲁棒性,还包括:
    基于所述比例和模型平均调用次数评估所述检测模型的鲁棒性,所述模型平均调用次数为M/N,所述模型平均调用次数指示所述M个图像中每个图像的对抗样本平均输入所述检测模型的次数。
  6. 一种图像处理装置,其特征在于,所述装置包括:
    获取模块,用于获取第一图像和第二图像;所述第一图像用于叠加于所述第二图像得到第三图像,所述第三图像用于攻击图像检测模型;
    处理模块,用于基于第一区域构建第一参数空间;所述第一区域为所述第二图像中的部分区域,所述第一参数空间包括像素坐标参数的范围和角度参数的范围,所述像素坐标参数 的范围为所述第一区域在所述第二图像中的像素坐标的集合;基于所述像素坐标参数的范围确定所述第一像素坐标集合,并基于所述角度参数的范围确定所述第一角度;所述第一像素坐标集合指示所述第三图像中所述第一图像叠加在所述第二图像上的位置,所述第一角度指示所述第三图像中叠加在所述第二图像上的所述第一图像的旋转角度。
  7. 根据权利要求6所述的装置,其特征在于,所述处理模块,还用于在所述获取模块获取第一图像和第二图像之前,构建第二参数空间,所述第二参数空间包括像素值的取值范围;
    所述获取模块,具体用于:
    基于所述像素值的取值范围确定所述第一图像的像素值,获得所述第一图像的像素矩阵。
  8. 一种评估图像检测模型的装置,其特征在于,所述装置包括:
    输入模块,用于将N个对抗样本分别输入图像检测模型;所述N个对抗样本中的每个样本为采用权利要求6或7所述的装置获得的用于攻击所述检测模型的所述第三图像,所述N为正整数;
    输出模块,用于通过所述检测模型输出所述N个对抗样本的检测结果;
    处理模块,用于基于所述检测结果统计所述N个对抗样本中成功攻击所述检测模型的比例;并基于所述比例评估所述检测模型的鲁棒性。
  9. 根据权利要求8所述的装置,其特征在于,所述N个对抗样本为基于M个不同的所述第二图像生成的样本,所述M为正整数,M<N;
    所述处理模块,具体用于:
    基于所述比例和模型平均调用次数评估所述检测模型的鲁棒性,所述模型平均调用次数为M/N,所述模型平均调用次数指示所述M个图像中每个图像的对抗样本平均输入所述检测模型的次数。
  10. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行以实现权利要求1至3任意一项所述的方法。
  11. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行以实现权利要求4或5所述的方法。
  12. 一种装置,所述装置包括处理器和存储器,其特征在于,其中,所述存储器用于存储计算机程序,所述处理器用于调用所述计算机程序,以使得所述装置执行如权利要求1-3任一项所述的方法。
  13. 一种装置,所述装置包括处理器和存储器,其特征在于,其中,所述存储器用于存储计算机程序,所述处理器用于调用所述计算机程序,以使得所述装置执行如权利要求4或5所述的方法。
PCT/CN2022/125631 2021-10-26 2022-10-17 图像处理方法、图像检测模型评估方法及装置 WO2023071841A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111249005.4 2021-10-26
CN202111249005.4A CN116029950A (zh) 2021-10-26 2021-10-26 图像处理方法、图像检测模型评估方法及装置

Publications (1)

Publication Number Publication Date
WO2023071841A1 true WO2023071841A1 (zh) 2023-05-04

Family

ID=86076533

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/125631 WO2023071841A1 (zh) 2021-10-26 2022-10-17 图像处理方法、图像检测模型评估方法及装置

Country Status (2)

Country Link
CN (1) CN116029950A (zh)
WO (1) WO2023071841A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110222831A (zh) * 2019-06-13 2019-09-10 百度在线网络技术(北京)有限公司 深度学习模型的鲁棒性评估方法、装置及存储介质
CN111626925A (zh) * 2020-07-24 2020-09-04 支付宝(杭州)信息技术有限公司 一种对抗补丁的生成方法及装置
CN113361582A (zh) * 2021-06-01 2021-09-07 珠海大横琴科技发展有限公司 一种对抗样本的生成方法和装置
US20210300433A1 (en) * 2020-03-27 2021-09-30 Washington University Systems and methods for defending against physical attacks on image classification
CN113469873A (zh) * 2021-06-25 2021-10-01 中国人民解放军陆军工程大学 对抗智能侦察识别系统的伪装贴片生成方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110222831A (zh) * 2019-06-13 2019-09-10 百度在线网络技术(北京)有限公司 深度学习模型的鲁棒性评估方法、装置及存储介质
US20210300433A1 (en) * 2020-03-27 2021-09-30 Washington University Systems and methods for defending against physical attacks on image classification
CN111626925A (zh) * 2020-07-24 2020-09-04 支付宝(杭州)信息技术有限公司 一种对抗补丁的生成方法及装置
CN113361582A (zh) * 2021-06-01 2021-09-07 珠海大横琴科技发展有限公司 一种对抗样本的生成方法和装置
CN113469873A (zh) * 2021-06-25 2021-10-01 中国人民解放军陆军工程大学 对抗智能侦察识别系统的伪装贴片生成方法

Also Published As

Publication number Publication date
CN116029950A (zh) 2023-04-28

Similar Documents

Publication Publication Date Title
US10210418B2 (en) Object detection system and object detection method
US11256960B2 (en) Panoptic segmentation
CN109902806B (zh) 基于卷积神经网络的噪声图像目标边界框确定方法
US20230196117A1 (en) Training method for semi-supervised learning model, image processing method, and device
US20230087526A1 (en) Neural network training method, image classification system, and related device
US11893781B2 (en) Dual deep learning architecture for machine-learning systems
US11734390B2 (en) Unsupervised domain adaptation method, device, system and storage medium of semantic segmentation based on uniform clustering
US10275719B2 (en) Hyper-parameter selection for deep convolutional networks
US20210027098A1 (en) Weakly Supervised Image Segmentation Via Curriculum Learning
US10289909B2 (en) Conditional adaptation network for image classification
US11775574B2 (en) Method and apparatus for visual question answering, computer device and medium
US11768876B2 (en) Method and device for visual question answering, computer apparatus and medium
US10262214B1 (en) Learning method, learning device for detecting lane by using CNN and testing method, testing device using the same
WO2019089578A1 (en) Font identification from imagery
US9317540B2 (en) Method, system and aggregation engine for providing structural representations of physical entities
US10275667B1 (en) Learning method, learning device for detecting lane through lane model and testing method, testing device using the same
CN115908908B (zh) 基于图注意力网络的遥感图像聚集型目标识别方法及装置
US11188795B1 (en) Domain adaptation using probability distribution distance
CN111723815A (zh) 模型训练方法、图像处理方法、装置、计算机系统和介质
WO2023019456A1 (en) Method and apparatus for evaluation of adversarial robustness
CN111428567B (zh) 一种基于仿射多任务回归的行人跟踪系统及方法
WO2023071841A1 (zh) 图像处理方法、图像检测模型评估方法及装置
US20240119260A1 (en) Defense against adversarial example input to machine learning models
US11328179B2 (en) Information processing apparatus and information processing method
WO2022222143A1 (zh) 人工智能系统的安全性检测方法、装置及终端设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22885719

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE