CN113643278B

CN113643278B - Method for generating countermeasure sample for unmanned aerial vehicle image target detection

Info

Publication number: CN113643278B
Application number: CN202111002003.5A
Authority: CN
Inventors: 肖俊; 李海峰; 邹蒲; 黄海阔; 黄晨曦
Original assignee: Hunan Aerospace Yuanwang Science & Technology Co ltd; Central South University
Current assignee: Hunan Aerospace Yuanwang Science & Technology Co ltd; Central South University
Priority date: 2021-08-30
Filing date: 2021-08-30
Publication date: 2023-07-18
Anticipated expiration: 2041-08-30
Also published as: CN113643278A

Abstract

The invention discloses an anti-sample generation method for unmanned aerial vehicle image target detection, which comprises the following steps: a control point on a vehicle vertically shot by the unmanned aerial vehicle is used as a reference control point, and the shape of the mask is marked in the range of the reference control point; generating a mask according to the mask coordinates, and initializing a general countermeasure patch by using the mask; calculating a projection matrix by utilizing the corresponding relation between control points in the original image and the target image, and carrying out projection transformation on the universal countermeasure patch and a mask thereof; repositioning the universal countermeasure patch and the mask to a specified area of the vehicle by using projection transformation to obtain a countermeasure image; inputting the contrast image into a target detection model, calculating a loss value by using an attack loss function, and optimizing the universal contrast patch by using a back propagation algorithm. The invention can attack the one-stage and two-stage target detection models, can interfere most samples, and relocates the countermeasure patch by using the projection transformation model, thereby ensuring the effectiveness of the countermeasure patch after projection transformation.

Description

Method for generating countermeasure sample for unmanned aerial vehicle image target detection

Technical Field

The invention belongs to the technical field of machine learning, and particularly relates to an antagonism sample generation method for unmanned aerial vehicle image target detection.

Background

In recent years, with the rapid development of deep learning in the field of computer vision, deep convolutional neural networks are increasingly widely used in unmanned aerial vehicle target detection and recognition. Although the deep convolutional neural network has excellent performance in the field of computer vision such as object detection, it has proven to be very sensitive to disturbance of resistance, and when such disturbance is added to an image, the new image generated easily fool a system based on the deep convolutional neural network, so that the system makes an erroneous inference. The addition of a few small disturbances on the image, which are hardly noticeable to the human eye, is sufficient to misclassify the convolutional neural network with a high degree of confidence, a phenomenon called challenge, a disturbance called challenge, and the addition of a disturbance-resistant image called challenge sample. The challenge sample can easily fool the deep convolutional neural network, and its existence can have a very serious impact on the practical application of the deep convolutional neural network.

Attacks against the target detection model may be achieved by adding a challenge disturbance to the image, which may make the target detector unable to detect the target or erroneously detect the target, as proposed by Xie et al as Dense Adversary Generation (DAG) challenge method with fast RCNN as the attack model, and by assigning a challenge tag to each region suggestion, then performing gradient back propagation to optimize the challenge disturbance, causing the model to misclassify the region suggestion. There have also been proposed attack methods of counterpatches which can be made undetectable by a detector by adding only a local disturbance to a specified object, as in non-patent document "Fooling automated surveil-area cameras: adversarial patches to attack person detection," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp.1-7,2019, which shows a method of generating a counterpatch for a person, which generates a counterpatch capable of being made undetected by a detector by minimizing a target score.

The existing countermeasure patch generation mode is mainly obtained by continuously and iteratively updating random noise added at the center of the target real frame. The countermeasure patches generated in this way have limitations in terms of multiple distances, multiple angles, etc., for example, the object should be located directly in front of the camera, and the distances and angles cannot be changed greatly. It does not take into account that the relative movement between the detection object and the detector causes a dynamic change in the angle of view and distance taken by the camera to cause deformation of the challenge patch, and in the case where the angle of view and distance taken vary significantly, the challenge patch easily loses its resistance, resulting in a less powerful attack. The viewing angle and distance of the unmanned aerial vehicle are continuously changed in the process of aerial vehicle shooting, the vehicles in the images are sent into the target detector in different sizes and angles, and an antagonism patch added to the vehicles by an attacker must be capable of withstanding the changes to successfully attack the target detector.

There are a variety of attack algorithms on natural images that produce challenge samples, and they can be classified into different types according to their characteristics. According to the information that the attacker can acquire, the attack algorithm can be divided into white-box attack and black-box attack. In a white-box attack, an attacker can learn complete information of the target model, including parameters, structures, training methods, and even training data. In contrast to white-box attacks, an attacker in a black-box attack does not know the specific information of the model, but can observe the output through any input. And an attacker searches an antagonism sample through the corresponding relation of input and output to realize the attack on the model. The scope of the attack area can be classified into a full-graph attack and a partial attack. A full-image attack may modify all pixels of an image, while a local attack may only change a portion of the pixels in an image. Depending on whether the attack category has directionality, the attack algorithm can be classified into targeted attacks and non-targeted attacks and vanishing attacks. The target attack deception model mispredicts the category of the target as a designated label, and the non-target attack only needs to enable the model to mispredict the label of the target or to wrongly position the target, and the hidden attack is to enable the model not to detect the target.

However, research in the remote sensing field to generate an challenge sample for a target detection model has not been performed. Work in paper "Synthesizing robust adversarial examples," in International conference on machine learning.pmlr, pp.284-29,2018, found that the attack success rate was reduced when a challenge sample obtained using only Expectation Over Transformation (EOT) algorithm was printed onto a T-shirt. Because human gestures can cause wrinkles in the challenge sample, such wrinkles cannot be simulated by the EOT algorithm. The challenge sample itself is very fragile, and once part of the information is lost, the whole challenge sample is often rendered ineffective. Accordingly, the prior art proposes generating a challenge sample of a flexible object, a T-shirt, based on a thin-plate spline (thin plate spline) model to simulate the human own gestural actions affecting the crinkling law of the garment. The related research on the countermeasure sample in the remote sensing field is mainly focused on an attack classification model, and no work of attacking a target detector under the view angle of an unmanned plane exists at present.

Disclosure of Invention

The countermeasure patch generated by the invention is covered on the unmanned aerial vehicle, and the non-rigid deformation of the T-shirt fold is not needed to be considered. The invention develops a new way to simulate the deformation of the countermeasure patch caused by the change of the shooting visual angle and the distance of the unmanned aerial vehicle by using a projection transformation model. Therefore, the invention provides a fight attack framework of an unmanned aerial vehicle image target detection system based on a projection transformation model, and designs a universal fight patch to fool all instances of a specific object. The attack method adopted by the invention is to generate the universal countermeasure patch in a white box mode, and an attacker has white box-level access right to the target detection model. This means that the attacker knows all the information of the model, including structure, parameters, gradients, etc. One constraint imposed on an attacker is to limit the coverage of the challenge patch on the vehicle. An attack is of no significance if the vehicle is completely covered by the challenge patch until no one can see the vehicle.

The shape location of the challenge patch is characterized by discrete pixel coordinates. However, the present invention uses a gradient-based optimization method, and it must be ensured that the transformation of the image is minimal. The projective transformation model used in the present invention maintains continuity of patch shape location while also being differentiable. The entire challenge frame is therefore tiny, and the problem of optimizing the challenge patch can be solved by gradient descent and back propagation.

Specifically, the invention discloses an antagonistic sample generation method facing unmanned aerial vehicle image target detection, which comprises the following steps:

taking a control point on a vehicle image vertically shot by the unmanned aerial vehicle as a reference control point, and marking the shape of a mask within a certain range of the reference control point;

generating a mask according to the mask coordinates, and initializing the mask to generate a universal countermeasure patch;

calculating a projection matrix by utilizing the corresponding relation between control points in the original image and the target image, and carrying out projection transformation on the universal countermeasure patch and a mask thereof;

repositioning the universal challenge patch and mask to a specified area of a vehicle by using the projective transformation to obtain a challenge image;

inputting the countermeasure image into a target detection model, calculating a loss value by using an attack loss function, and optimizing a general counterpatch by using a back propagation algorithm;

And pasting the universal challenge patch to the vehicle image to generate a challenge sample.

Still further, the training strategy is to train only IOUs with the true boxes of each attack object greater than a threshold lambda _IOU Prediction box=0.5.

Still further, the universal challenge patch is generated by randomly extracting a subset containing a portion of the samples, updating the same universal challenge patch with gradient back-propagation minimization loss in the extracted subset, finding an intersection in the sampled subset trained challenge patch, and interfering with most of the samples such that the samples are incorrectly predicted.

Further, the countermeasure imageCan be represented by an original image x:

wherein eta represents the general disturbance rejection applied to the training image set X, M & ltN & gt represents the patch P, M is a mask responsible for constraining the general disturbance eta to the surface of the target object, and A & ltN & gt represents the multiplication by element, which is a mask operation only retaining a specific region.

Still further, repositioning the generic challenge disturbance by the projective transformation, the projective transformation having the formula:

wherein (x) _s ，y _s ) Representing coordinates on the original image, (x' _d ，y′ _d ) Representing coordinates on the target image, (x) _d ，y _d ，z _d ) Representing coordinates of a target coordinate system, a ₁₁ ，a ₁₂ ，a ₁₃ ，a ₂₁ ，a ₂₂ ，a ₂₃ ，a ₃₁ ，a ₃₂ ，a ₃₃ Is an element in the projection matrix.

Further, the projection matrix is solved through the corresponding relation of the control points, then the mask is projected to the target image from the original image, the universal countermeasure patch is added at the position of the mask to form a countermeasure image, and the control points are obtained through manual labeling.

Further, for the one-stage target detector and the two-stage target detector, an antagonistic patch is generated by minimizing the score of the target contained in the prediction frame, the confidence of the object in the final output frame of the misleading detector is smaller than the detection threshold value, and a vanishing attack mode is obtained, and the loss function of the vanishing attack is as follows:

wherein X represents the set of training images,to combat an image, x is the original image, f (·) represents the output of the target detector; p (·) is a function representing the confidence level of the extraction of the prediction box from the tensor, max representing the maximum value of the confidence level of the extraction of the prediction box, E representing the expected values of all the images, the optimization objective being to minimize the expected values, the expected values being approximated by an empirical average, a batch of images being randomly extracted over the image set X, and the loss being averaged.

Further, a one-stage object detection algorithm and a two-stage object detection algorithm are used as the object model of the attack.

Furthermore, the attack performance of the patch is quantified by the attack success rate, and the attack success rate is defined as follows:

wherein N is _all Representing the number of all attack objects in the dataset, N for vanishing attacks _succ The number of attack objects which can successfully avoid the detection of the target detector is represented, the confidence of the prediction frame is lower than 0.5 or the IOU of the prediction frame and the real frame is smaller than 0.5, and the attack is considered to be successful.

Furthermore, the universal challenge patch can perform integrated attack in combination with attack low-level similar features, wherein the attack low-level similar features are shape and color features.

The beneficial effects of the invention are as follows:

1) The invention provides a challenge-attack framework aiming at an unmanned aerial vehicle image target detection model, which can attack a one-stage target detection model and a two-stage target detection model.

2) The present invention devised a generic challenge patch that can interfere with most samples in the dataset, making them model-mispredicted, fooling all instances of a particular object.

3) The invention utilizes the projection transformation model to reposition the countermeasure patch, simulates the deformation of the patch caused by the change of the aerial view angle and the distance of the unmanned aerial vehicle, ensures that the generated universal countermeasure patch can be robust to projection transformation, and ensures the effectiveness of the countermeasure patch after the projection transformation.

Drawings

FIG. 1 is a process diagram of the present invention for generating a generic challenge patch;

FIG. 2 is a schematic diagram of a prediction block selection of a training generic challenge patch of the present invention;

FIG. 3 is a diagram of an add-on center challenge patch process of the present invention;

FIG. 4 is a projective transformation relocation challenge-patch procedure diagram of the present invention;

FIG. 5 is a diagram of an object detection framework of the present invention;

FIG. 6 illustrates a control point to be annotated in an image in the VisDrone dataset of the present invention;

FIG. 7 illustrates control points in an image in a UAV dataset of the present invention that need to be annotated;

FIG. 8 is a vehicle object to be attacked by the present invention;

FIG. 9 shows the original image detection result of the present invention;

FIG. 10 shows the detection result after the addition of the full-view disturbance in the present invention;

FIG. 11 is a graph showing the results of a test for adding a center patch to an offending vehicle of the present invention;

FIG. 12 shows the result of detection of a random noise projection relocated to an attacking vehicle in accordance with the present invention;

FIG. 13 illustrates the result of detection of a center patch projection relocation to an attacking vehicle in accordance with the present invention;

FIG. 14 illustrates the result of a detection of a projection patch of the present invention relocated to an attacking vehicle;

FIG. 15 shows the success rate of YOLOv3 attacks at different distances and angles for the projection patch of the present invention;

FIG. 16 shows the attack success rate of FRCNN at different distances and angles for the projection patch of the present invention;

FIG. 17 shows the success rate of the attack of YOLOv3 of the projection patch of the present invention at 30m height in different shooting directions;

fig. 18 shows the attack success rate of FRCNN of the projection patch of the present invention at different shooting directions of 30m height;

FIG. 19 illustrates the effect of repositioning a projection of a 300 x 300 projection patch of the present invention onto a vehicle;

fig. 20 effect of projection repositioning onto a vehicle of a 300 x 500 projection patch of the present invention.

Detailed Description

The invention is further described below with reference to the accompanying drawings, without limiting the invention in any way, and any alterations or substitutions based on the teachings of the invention are intended to fall within the scope of the invention.

The steps of the method for generating the countermeasure sample for unmanned aerial vehicle image target detection are shown in fig. 1, and the specific process is as follows:

a control point on a vehicle vertically shot by the unmanned aerial vehicle is used as a reference control point, and the shape of the mask is marked in the range of the reference control point;

generating a mask according to the mask coordinates, and initializing a general countermeasure patch by using the mask;

calculating a projection matrix by utilizing the corresponding relation between the control points and the reference points in the image, and carrying out projection transformation on the universal countermeasure patches and masks thereof;

repositioning the universal countermeasure patch and the mask to a specified area of the vehicle by using projection transformation to obtain a countermeasure image;

Inputting the contrast image into a target detection model, calculating a loss value by using an attack loss function, and optimizing the universal contrast patch by using a back propagation algorithm.

Training strategy

One image contains not only attack objects but also non-attack objects. Therefore, not all objects in an image are requiredUsed to train the challenge patch. Specifically, a represents a prediction frame set of all attack objects, and a prediction frame includes frame coordinates and category confidence of the objects included in the frame. And B represents the real boxes of all the attack objects, wherein the real boxes contain frame coordinates and the real categories of the objects contained in the boxes. Set IOU (A) _i ，B _j ) Is the IOU (Intersection over Union, overlap) between the i-th prediction box and the j-th real box. IOU (A) _i ，B _j )≥λ _IOU Is defined as a selected set of samples that participate in the training against patches, i.e. training only IOUs greater than a threshold lambda with the true box of each attack object _IOU Prediction box=0.5. As shown in fig. 2, a solid line box represents a real border, a solid line box 2 is a real border of an object to be attacked, and a solid line box 1 represents a real border of a non-attacking object. The dashed boxes represent prediction boxes of the detector output, the IOU of the dashed box 3 and the solid box 2 being smaller than the threshold, and the IOU of the dashed box 4 and the dashed box 5 and the solid box 2 being larger than the threshold. Only the IOUs with the true box of the attack object are greater than the threshold lambda _IOU Is considered the predicted output of these objects. Thus, only dashed box 4 and dashed box 5 are selected as samples of the training challenge patch.

The generic challenge-patch uses a set of images containing the object of attack in the training process, the generation of which does not need to run on all samples of the entire dataset, but rather the same challenge-patch is updated with gradient back-propagation minimization loss in the extracted subset by randomly extracting a subset containing part of the samples. When the extracted subsets differ, the pixels of the challenge patch update differ, and finding an intersection in the challenge patch trained on those subsets of samples can interfere with most of the samples, causing them to be mispredicted.

Challenge-against methods include full-map challenge-against perturbation, center challenge-against patch, and projected challenge-against patch.

Full-map countermeasure perturbation: a generic tamper-resistant allows malicious modification of the pixel values of the whole image. The perturbation is not limited to attacks on a certain image, but can be successful for most images in the training set. Let X denote the set of training images, lead toThe images in the over-training X gradually build up generic countermeasure disturbances. In the t-th iteration, an additional full-map disturbance Δη is calculated by means of a loss function against the attack _t . Then, deltaeta is taken _t Added to the current generic challenge perturbation eta _t-1 And cut off the new disturbance eta _t To ensure that the distortion of the image is limited to [ -delta, delta]And (3) inner part. The countermeasure to the disturbance allows changing the pixel value of the whole image, each pixel being able to modify the upper limit of the pixel value is usually constrained. The experiment sets δ to 10, which means that the upper limit of the pixel value that each pixel in the image can modify is 10, and thus the resulting disturbance resistance is often imperceptible.

Center challenge patch: formally, eta represents the general disturbance rejection applied to the training image set X, M.eta represents the patch P, M is a mask responsible for constraining the general disturbance eta to the surface of the target object, and # represents the result of multiplication by element, which is a mask operation that only retains a specific region. M will be helpful when it is desired to limit the shape of the disturbance. With width and height (w) of circumscribed rectangle of challenge patch P _p ，h _p ) Indicating the size of the challenge patch, (x) _p ，y _p ) The center coordinates of the circumscribed rectangle of the challenge patch P are shown. Countermeasure imageCan be represented by an original image x:

true bounding box B for each attack object in a given training image _i Comprising 4 elements (x _i ,y _i ,w _i ,h _i )，(x _i ，y _i ) Representation B _i Center of (w) _i ，h _i ) Representation B _i Is of a size of (a) and (b). The size of the countermeasure patch P is according to B _i The size of (a) is reduced by a scale of α, where 0 < α < 1, α=0.35. Then, put in B _i Center position (x) _i ，y _i ) I.e. (x) _p ，y _p )＝(x _i ，y _i ). The manner of adding the challenge patch in the center of the real border is shown in fig. 3, and the training-generated center challenge patch is the baseline method for experimental comparison of the invention.

Projection contrast patch: the central challenge patch ignores the deformation of the challenge patch attached to the roof of the vehicle, and the relative motion between the drone and the vehicle may cause the area of the roof of the vehicle photographed by the camera to change constantly. This results in a reduced aggressiveness of the center challenge patch after image transformation. To overcome this problem, more complex transformations should be considered against the patch, enhancing its robustness. The invention uses a projective transformation model to reposition the countermeasure patch to simulate deformation of the countermeasure patch covered on the top of the vehicle caused by the aerial view angle and the distance change of the unmanned aerial vehicle. Projective transformation, also called perspective transformation, is the projection of a plane onto a new plane through a projection matrix.

The general projective transformation formula is:

wherein, (x) _s ，y _s ，z _s ) Representing the coordinates of the original coordinate system, (x _d ，y _d ，z _d ) Representing the coordinates of the target coordinate system,is a projection matrix.

The original image is a two-dimensional plane, so z _s =1. The target image is also a two-dimensional plane, so the projective transformation formula used in the invention is:

wherein, (x) _s ，y _s ) Representing coordinates on the original image, (x' _d ，y′ _d ) Representing coordinates on the target image.

Projection momentA in array A ₃₃ There are 8 parameters known to be known =1. One coordinate pair can only list two equations, so four coordinate pairs are needed to list 8 equations to solve the remaining parameters, and at the same time, at least three coordinate pairs are not in the same straight line. Thus, the mapping from the original image to the target image is accomplished by 4 control points at given positions. As shown in fig. 4, the projection matrix is solved through the corresponding relation of the control points, then the mask is projected from the original image to the target image, and the countermeasure patch is added at the position of the mask, so that the countermeasure image generated by the method of the invention is formed, wherein the control points are all obtained by manual labeling.

Loss function

The currently mainstream target detection algorithms based on deep learning models can be mainly divided into two categories: two-stage target detection, typical representative of such algorithms are candidate region-based RCNN series of algorithms, including RCNN, fast RCNN, cascades RCNN, and the like. One-stage object detection algorithms, typical representative of which are YOLO series algorithms, including YOLOv1, YOLOv2, YOLOv3, YOLOv4, and the like. In this work we have chosen a one-stage object detection algorithm representing YOLOv3 and a two-stage object detection algorithm representing fast RCNN as the target of attack. The current best performing target detection models are all based on improvements of these two classical models, YOLOv3 and Faster RCNN are often used for the comparative baseline of various new target detection models. Therefore, they were chosen as the target detection model for our attack.

The output structure of the object detector is rich, the output comprises a set of frames, and the frames contain the class confidence of the object. To address the inherent tradeoff between accuracy and recall in a detection task, a threshold is typically set to maintain high accuracy, with only confidence scores above that threshold being considered a detected target. An object detector takes an image as input and outputs n detected objects. Each test includes a confidence level over the predefined c categories and the location of the test object represented by 4 coordinates. As shown in fig. 5, for the area proposal network represented by the two-phase detector base network, the one-phase detector base network represents the detector itself. The method comprises the steps that after an image is input into a one-stage detector, the coordinate offset of a frame and the class score of an object contained in the frame are directly output, after the image is input into a two-stage detector, the coordinate offset of an area suggestion and the class score of the object contained in the area suggestion are output in a first stage, the output area suggestion is cut into corresponding areas of an original image in the first stage, the corresponding areas are sequenced according to the area suggestion scores and then are used as input of a second stage, and the coordinate offset of the frame and the class score of the object contained in the frame are output through a sub-network at the back. The red mark indicates the object that needs to be attacked and the black mark indicates the normal object.

The object of the invention is to prevent a target object from being detected by a target detector in such an attack mode as a vanishing attack. To achieve this, the countermeasure patch must ensure that the confidence of the target object in any bounding box is less than the detection threshold. The object detector typically produces hundreds or thousands of a priori frames overlapping the object. Typically, non-maximum suppression is used to select bounding boxes with highest confidence and cull overlapping boxes with lower confidence. Assuming that an attack can avoid a prediction of an a priori frame, the detector need only select a different a priori frame to predict the target. In order to completely hide the detection of an object from the image, an attack must simultaneously spoof all a priori sets of boxes overlapping the object.

Faster RCNN is a two-stage detection method. For a two-stage target detector, the focus of the attack area suggestion network is taken as an attack Faster RCNN, the classification layer of the area suggestion network judges whether the area suggestion belongs to the foreground or the background through classification, and the attack area suggestion network only needs to consider the foreground and the background. In particular, the area suggestion network is fooled into generating low quality area suggestions by reducing the foreground score of the area suggestions, i.e. reducing the number of valid area suggestions to attack an object. When the foreground confidence of the region suggestions drops to some extent, the region suggestions will be considered as background, misleading the final output of the detector.

YOLOv3 is a one-stage detection algorithm. The predicted vector output by YOLOv3 contains 1 parameter responsible for predicting the fraction of objects contained in the frame. Similarly, by reducing the score of the bounding box containing objects, the highest score of the final output class of the detector may be reduced indirectly, calculated by multiplying the score of the bounding box containing objects by the classification score of the in-box objects. When the highest score of a category is below the detection threshold, the detector may go undetected.

For both types of object detectors, an challenge patch is generated by minimizing the score of the prediction box containing the object, and the misleading detector ultimately outputs that the confidence of the object within the bezel is less than the detection threshold, thereby creating a generic attack pattern. The loss function of the vanishing attack is as follows:

wherein X represents a set of training images, X andreference formula 1.f (-) represents the output of the target detector (for Faster RCNN, this is the tensor of the regional advice network output; for Yolov3, this is the tensor of the final output). P (·) is a function representing the confidence level of the extraction of the prediction box from the tensor, max representing the maximum value of the confidence level of the extraction of the prediction box, E representing the expected values of all the images, the optimization objective being to minimize the expected values, the expected values being approximated by an empirical average, a batch of images being randomly extracted over the image set X, and the loss being averaged.

The invention was tested using the VisDrone and UAV datasets. The VisDrone is an unmanned aerial vehicle aerial photographing data set which is sourced by Tianjin university, and is mainly used for researching algorithms such as target detection, target tracking and the like on an unmanned aerial vehicle platform. The data set has 10 categories, namely, pepestrian, peple, bicycle, car, van, tree, tricycle, awing-tricycle, bus and motor. The method only selects partial images in the whole data set to label the control points. 965 images were selected from 6471 images of the target detection task to produce data meeting the experimental requirements, wherein 857 images are used for training and 108 images are used for testing. The images are selected to be different in size, and the sizes of the images are cut into 1000×600 uniformly. Then, 2 vehicles with the same type and the same category are selected on each image to label the control points. Fig. 6 is an example of control points for one image in the visclone data set.

The UAV data set is a data set manufactured by taking an aerial photograph of a vehicle in a parking lot by using an unmanned aerial vehicle. The unmanned aerial vehicle flies a plurality of circles of shot videos around the parking lot according to the heights of 5 of 10m,15m,20m,25m and 30 m. For each video clip, one image was taken every 10 frames and 924 images were picked to make the data to be used for the experiment, with 800 training sets and 124 test sets. Only one vehicle is selected as an object of attack in each image, the vehicle is the same vehicle in all the images, and the vehicle marked with the control point is the vehicle to be attacked. In addition to labeling control points, all vehicles in the parking lot are also subjected to frame and category labeling for training the target detection model. The images in the whole dataset are 1920 x 1080 in size, and they are labeled with the category car only. FIG. 7 is an example of control points for an image in UAV data set.

According to the invention, a one-stage target detection algorithm representing YOLOv3 and a two-stage target detection algorithm representing Faster RCNN (FRCNN) are selected as target models of attack. The skeletal networks of the YOLOv3 and FRCNN models are dark net53 and ResNet101, respectively, which are both trained on UAV and VisDrone data sets.

The size of each image is adjusted to 1000×600 input FRCNN, and the image size of input YOLOv3 is 416×416. For the UAV dataset, the generic challenge patch is sized 300 x 300. For the VisDrone dataset, the size of the generic challenge patch was set to 320 x 400.

Randomly and uniformly initializing the countermeasure patches, inputting 4 images in each batch, taking Adam as an optimizer, setting the initial learning rate to 0.01, setting the final learning rate to 0.001, attenuating the learning rate to 0.5 times of the original learning rate when the loss value is not reduced by 5 consecutive rounds, and training 100 rounds to obtain the final countermeasure patches.

The confidence threshold for the training phase is set to 0.3. The confidence threshold for the test phase was set to 0.5, the non-maximum suppression threshold was set to 0.4, and the overlap threshold was set to 0.5. When the confidence level of a prediction block is below a confidence threshold, the confidence level of the prediction block is considered too low and is not ignored as being a correct detection. When the degree of overlap of the predicted frame and the real frame is lower than the threshold of the degree of overlap, the positioning accuracy of the predicted frame is considered to be too low, and it is not considered to be a correct detection but ignored. When multiple prediction frames detect the same target at the same time, the other prediction frames with the confidence degree higher than the non-maximum suppression threshold value and overlapping with the prediction frames with the maximum confidence degree are ranked from high to low, and are ignored as redundant detection.

The deep learning framework uses Pytorch 1.1.0.GPU number is 1, model is NVIDIA GTX1080Ti.

In order to measure the attack ability against patches, the attack performance against patches is quantified by calculating the attack success rate. The attack success rate is defined as follows:

Comparison of different attack methods

The present invention compares the attack success rates of full-image perturbation, random noise, center patch and relocation patch to YOLOv3 and FRCNN on both data sets of UAV and VisDrone, as shown in table 1. Table 1 shows attack success rates of different methods, the first column shows a data set, the second column shows a target detection model, the third column shows an original image, the fourth column shows an image after adding a full-image disturbance, the fifth column shows an image after adding a center patch, the sixth column shows an image after random noise repositioning, the seventh column shows an image after center patch repositioning, and the eighth column shows an image after projection patch repositioning. Table 2 shows recall rates for non-attacking vehicles after adding the full map perturbation and relocation patch. The detection results of the original image and the challenge image by YOLOv3 are shown in fig. 8 to 14.

Full map against the advantages and disadvantages of disturbances. As can be seen in table 1, the full graph against the perturbation is the highest attack success rate among all methods. The success rates for attacks to YOLOv3 and FRCNN on UAV were 99.38% and 99.50%, respectively, and 23.05% and 24.68% on VisDrone, respectively. This is not surprising, as it allows modifying all pixels of an image, the more pixels that can be modified, the more vulnerable the model is to attack. It can be seen from table 2 that the problem with the greatest disturbance of the overall map is that it will not only interfere with the vehicle to be attacked, but will also affect the detection of a normal vehicle. The detected recall rates of YOLOv3 on UAV for non-offending vehicles in the original image, the full-image perturbation image, and the relocated patch image were 96.04%, 77.95%, and 96.06%, respectively, and the detected recall rates of YOLOv3 on VisDrone for non-offending vehicles in the original image, the full-image perturbation image, and the relocated patch image were 91.95%, 65.60%, and 91.92%, respectively. The full map disturbance severely affects the detection of normal vehicles, but this is not the case for the countermeasure patch.

Table 1 comparison of attack success rates of different attack methods

TABLE 2 recall of non-offending vehicles

Center counter patch advantages and disadvantages. Adding an countermeasure patch in the center of the real frame can attack the target detector with a higher attack success rate without modifying all pixels of the full graph, as shown in the fifth column in table 1. The success rates for attacks to YOLOv3 and FRCNN on UAV were 45.75% and 13.50%, respectively, and 10.44% and 20.42% on VisDrone, respectively. However, the challenge patch created in this way does not fit the point of view of the drone, and loses its challenge when it is projected to be repositioned onto the attacking vehicle, as shown in fig. 13. The center countermeasure patch is not robust enough to projective transformation, and after projective transformation is relocated to vehicles, the attack performance is the same as random noise, and the vehicles can be successfully detected by a target detector, and the detection effect of the vehicles is almost similar to that of an original image.

The invention utilizes projective transformation to reposition the countermeasure patch, thereby improving the robustness of the countermeasure patch in projective transformation. From table 1, it is observed that for YOLOv3, the projectively relocated challenge patch reached 38.5% success rate of attack on the UAV dataset, which is far higher than the random noise (0%) and the center patch (0.88%). Similarly, for FRCNN, the challenge patch generated by projective repositioning on the VisDrone dataset reached 19.31% of the attack success rate, which was also much higher than the random noise (4.14%) and the center patch (4.15%). If no projective transformation is applied during training, the attack success rate is reduced by about 10% -30% compared to the method using projective transformation.

The projective relocation attack method provided by the invention can be suitable for projective transformation to attack YOLOv3 and FRCNN better. Under the condition that no projective transformation exists, the central countermeasure patch loses the countermeasure after projective transformation, and the effectiveness of the countermeasure patch after projective transformation is ensured by adopting the projective transformation to reposition the countermeasure patch.

Projection of attack effects against patches at different shooting points

And evaluating the attack effect of the projection on the anti-patch according to factors such as the aerial photographing distance, angle, direction and the like of the unmanned aerial vehicle. In order to study the influence of different shooting distances, angles and directions, the method divides 10 m-30 m into 5 areas (each area is 5 m), and records video around a parking lot at the pitch angle of 0-15 degrees, 15-30 degrees and 30-45 degrees in each area. Fig. 15-16 illustrate attack success rates for challenge patches at different distances and angles.

YOLOv3 has a higher success rate than wide angle attacks at close range and a higher success rate than narrow angle attacks at far range. Within the distance range of 0-10m, attack success rates of 15-30 degrees and 30-45 degrees are 6.25% and 1.25% respectively; within the distance range of 10-15m, the attack success rate of 0-15 degrees is 10 percent which is far higher than the attack success rate of 15-30 degrees by 1.25 percent. Within the distance range of 15-20m, attack success rates of 15-30 degrees and 30-45 degrees are 16.25% and 38.75% respectively; within the distance range of 20-25m, attack success rates of 15-30 degrees and 30-45 degrees are 48.75% and 85% respectively.

The attack success rate of YOLOv3 and FRCNN is higher than that of near distance. YOLOv3 was in the range of 15-30 deg., with attack success rates of 16.25%, 48.75% and 100% for 15-20m, 20-25m and 25-30m, respectively. The FRCNN is within the range of 15-30 degrees, and the attack success rates of 15-20m, 20-25m and 25-30m are 2.5%, 5% and 55% respectively. FRCNN is difficult to attack successfully within the range of 0-25m, and the average attack success rate is only 0.94%. In the range of 25-30m, the average attack success rate is improved to 53.75 percent, and the attack success rates of 0-15 degrees and 15-30 degrees are 52.5 percent and 55 percent respectively, and only the difference is 2.5 percent. Therefore, it cannot be analyzed in which pitch angle range FRCNN has a better attack effect.

As the distance shortens, the attack success rate of the projection against the patch gradually decreases. The reason for this is that as the distance decreases, the detection capability of the object detector increases, and thus is less prone to fraud. Due to the limitation of the photographing conditions, the vehicle is not photographed using a further distance, so that it is not clear what the upper limit of the attack distance is at present. From the previous experiments, it can be presumed that when the distance is increased to a certain extent, that is, when the photographing distance is far, the number of pixels occupied by the patch on the image is reduced, which obviously affects the attack performance of the patch. Meanwhile, the photographed object becomes smaller after the distance is increased, and the detection of a small object is very challenging to the detection capability of the object detector.

As can be seen from fig. 15 and 16, YOLOv3 and FRCNN have a high attack success rate in the height range of 25-30 m. In order to evaluate the attack performance of the projection against patches in different shooting directions, the unmanned aerial vehicle shooting height is set to be 30m, and the flight radius is set to be 10m and 15m for shooting videos. The unmanned aerial vehicle flies around the vehicle for 360 degrees, and is divided into 8 areas at intervals of 45 degrees, as shown in fig. 17-18.

Projections generated by different object detectors have a very different capability to attack patches in different directions, and two phases of FRCNN are more difficult to attack than one phase of YOLOv 3. The average attack success rate of YOLOv3 in the direction range of areas 3 and 7 is highest (97.5%) and the average attack success rate in the direction range of areas 1 and 5 is lowest (72.5%). FRCNN, in contrast, has the highest average attack success rate (80%) in the direction range of regions 1 and 5 and the lowest average attack success rate (37.5%) in the direction range of regions 3 and 7. This shows that the projections generated by the target detection models with different structures have very different performances on resisting patch attacks, and the success rate of the attack YOLOv3 is higher than that of FRCNN in different distances, angles and directions in combination with fig. 15-18.

Projection is against the size of patch and the influence of area occupied on attack target on attack success rate

On the UAV dataset, the experiment was to set the challenge patch size of the attack to 300 x 300. To observe the impact of the size of the challenge patch on the performance of the attack, this section adds a size: 300×500. It is expected that larger sized challenge patches may increase the attack success rate to higher values. Table 3 verifies this expectation that the success rate of attack against YOLOv3 increased by 23% and the success rate of attack against FRCNN increased by 30% when a larger-sized anti-patch attack target detector was employed.

TABLE 3 comparison of attack success rates of two different sized projected patches on UAV data sets

Detector	300×300	300×500
			YOLOv3	0.3850	0.6150
FRCNN	0.1175	0.4175

Fig. 19-20 show the results of repositioning two differently sized projections generated by an attack FRCNN against a patch projection onto a vehicle. As can be seen from fig. 19 and 20, the size of the challenge patch is increased from 300×300 to 300×500, and the area occupied by the challenge patch on the attacking vehicle is also increased, and the current experiment cannot determine whether the increase in the success rate of the challenge is caused by the increase in the size of the challenge patch or the increase in the area occupied by the challenge patch on the attacking vehicle.

The experiment is redesigned on the VisDrone data set, firstly, the size of the challenge patch is changed, the area of the projection of the challenge patch, which is relocated to the vehicle, is ensured to be unchanged, and the influence of the size of the challenge patch on the attack success rate is analyzed. Secondly, the size of the challenge patch is unchanged, the area of the challenge patch projected and relocated to the vehicle is changed, and the influence of the area of the challenge patch on the attacking vehicle on the attack success rate is analyzed.

An increase in the size of the challenge patch may increase the success rate of the challenge patch. Different sizes of challenge patches were set on the VisDrone. It can be seen in table 4 that the success rate of attack against patches decreases as the size of the challenge patch decreases.

Table 4 comparison of attack success rates of different sized projection patches

Detector	400×500	360×450	320×400	280×350	240×300
						YOLOv3	0.2305	0.2240	0.2083	0.1832	0.1558
FRCNN	0.2445	0.2135	0.1931	0.1803	0.1599

Table 5 comparison of attack success rate of projected patches occupying different areas on a vehicle

Detector	Maximum value	Larger size	Medium and medium	Smaller size	Minimum of
						YOLOv3	0.2083	0.1657	0.1289	0.0974	0.0782
FRCNN	0.1931	0.1628	0.1383	0.1225	0.1044

The increased area occupied by the patch on the attacking vehicle can also improve the success rate of the attack against the patch. The size of the challenge patch is kept at 320 x 400, and the area occupied by the challenge patch on the attacking vehicle is changed by controlling the projective transformed reference coordinates. The anti-patch projections, which are 320 x 400 in size, are repositioned to different areas on the vehicle, with the area areas being largest, larger, medium, smaller and smallest in order from left to right, top to bottom. As shown in table 5, as the area occupied by the challenge patch on the attacking vehicle decreases, the success rate of the attack also gradually decreases.

When the challenge patch size is reduced from 320×400 to 240×300, the attack success rate is reduced by 5.25% (YOLOv 3) and 3.32% (FRCNN); when the area of the challenge patch on the attacking vehicle is changed from the maximum to the medium, the attack success rate is changed by 7.94% (YOLOv 3) and 5.48% (FRCNN), which are both higher than the attack performance degradation caused by the change of the challenge patch size. The results show that the change in the area occupied by the challenge patch on the challenge vehicle has a greater impact on the challenge success rate than the change in the challenge patch size. However, larger challenge patches must be more noticeable.

Projection antagonism patch transferability across models

This section mainly investigates the transferability of an anti-patch, i.e. whether an anti-patch generated on one detector can be used to attack another detector. If the challenge patch can be transferred between different target detectors, the challenge patch attack can be designed to be a generic black box attack method to fool the target detectors without knowing the specific detector and its network structure. In this case, it is more suitable for the actual application.

The present section evaluates the transferability of the challenge patch and table 6 compares the success rate of attacks against patches generated under three attack settings:

1) A single detector attack, meaning the generation and evaluation of an challenge patch using the same target detector;

2) A single detector transfer attack refers to the use of different target detectors to generate and evaluate an challenge patch;

3) An attack is integrated, and multiple models are simultaneously attacked to generate an countermeasure patch and evaluated.

Each row in table 6 represents the attack success rate of a single YOLOv3 attack, a single FRCNN attack, and a combination of YOLOv3 and FRCNN attacks against patches generated under the same target detector, and each column represents the attack success rate of a single YOLOv3 attack, for example, against patches generated on YOLOv3 and FRCNN under the same setting. It is noted from table 6 that FRCNN is more robust than YOLOv3 for a data set of only one class of UAV. The challenge performance of the challenge patch generated by YOLOv3 (38.5%) and FRCNN (20.83%) was 26.75% different on UAV data, and YOLOv3 (20.83%) and FRCNN (19.31%) were comparable on VisDrone data.

TABLE 6 attack success Rate comparison of projection patches generated by a Single Detector and Integrated attack

The challenge patch is barely able to transfer attacks to each other between YOLOv3 and FRCNN. As seen in table 6, on the UAV dataset, the challenge patch generated by YOLOv3, when shifted from attack YOLOv3 to attack FRCNN, the attack success rate was reduced from 38.5% to 0%; when the fight patch generated by FRCNN is transferred from attack FRCNN to attack Yolov3, the attack success rate is reduced from 11.75% to 0.38%; similarly, on the visclone dataset, the same target detector generates and evaluates the challenge patch, the success rate of the attack is about 20%, and when the challenge patch is transferred to attack a different detector, the attack performance drops sharply to be equivalent to random noise.

The challenge patch is poorly transferable between YOLOv3, which is characterized by dark net53, and FRCNN, which is characterized by ResNet101, which is a backbone network, and the challenge patch generated by the attacks YOLOv3 and FRCNN differs significantly in texture. Furthermore, this section also evaluates the effect of the integration attack. As can be seen from table 6, the integration attack also behaves differently on both data sets. On the UAV data set, when the YOLOv3 is attacked, the integrated attack success rate is improved by 3.75% compared with that of a single YOLOv3 attack, and when the FRCNN is attacked, the integrated attack success rate is improved by 1.25% compared with that of a single FRCNN attack. The challenge patch generated by the integrated attack combines the features of two single detector challenge patches. On the VisDrone dataset, the success rate of the integrated attack is improved by 8.98% compared with that of a single YOLOv3 attack when the YOLOv3 is attacked, and the improvement is at the cost of 4.26% reduction when the FRCNN is attacked.

We believe that looking for similar features from features extracted from different feature extraction networks to attack may result in better cross-model migration capability of the challenge patch. The method only carries out violent integration from the loss calculated in the advanced semantic features, ignores the common mode among the models, and is difficult to attack across different target detection models. Because the high-level semantic features extracted by different backbone networks are quite different, most of the networks extracted at the low level are features such as shapes, colors and the like, and have higher similarity. Preferentially, the embodiment can also combine the low-level similar characteristics of attack to carry out integrated attack, so that the transferability of the countermeasure patch can be better improved.

The invention provides a projective transformation model-based anti-attack framework of an unmanned aerial vehicle image target detection model, which can generate an anti-patch to be attached to a vehicle so as to avoid detection of a target detector YOLOv3 and a fast RCNN. The effectiveness of our approach was verified by experiments performed on the UAV and VisDrone datasets with attacks on these two representative target detectors. Experiments show that the attack success rates of the attacks to the YOLOv3 and the FRCNN on the UAV data set can reach 38.5% and 11.75%, and the attack success rates of the attacks on the VisDrone data set can reach 20.83% and 19.31%. In contrast, the baseline approach can only achieve 0.88% and 0.12% attack success rate on the UAV dataset and 0.88% and 4.55% attack success rate on the VisDrone dataset via projective transformation. The manner in which the projective transformation is utilized to reposition the challenge patch ensures that the challenge patch is aggressive after projective transformation. Based on the present invention, it is desirable to provide some insight as to how to achieve disturbance countermeasure on vehicle painting, graffiti and refitting.

The beneficial effects of the invention are as follows:

2) The present invention devised a generic challenge patch that fooled all instances of a specific object, i.e. the same challenge patch can attack different target objects of the same type, while maintaining detection of non-target objects. The generic challenge patch may interfere with most of the samples in the dataset, making them mispredicted by the model.

3) According to the invention, the projective transformation model is utilized to reposition the countermeasure patch, so that the deformation of the patch caused by the aerial view angle and the distance change of the unmanned aerial vehicle is simulated, the generated universal countermeasure patch can be robust to projective transformation, and the effectiveness of the countermeasure patch after projective transformation is ensured.

The embodiment of the present invention is an implementation manner of the present invention, but the implementation manner of the present invention is not limited by the embodiment, and any other changes, modifications, substitutions, combinations, and simplifications made by the spirit and principle of the present invention should be equivalent substitution manner, and all the changes, substitutions, combinations, and simplifications are included in the protection scope of the present invention.

Claims

1. An anti-sample generation method for unmanned aerial vehicle image target detection is characterized by comprising the following steps:

and pasting the optimized universal challenge patch to a vehicle image to generate a challenge sample.

2. The method of claim 1, wherein the training strategy is to train only the IOUs of the true frames of each attack object to be greater than a threshold λ _IOU Prediction box=0.5.

3. The method of claim 1, wherein the universal challenge patch is generated by randomly extracting a subset of samples containing portions, updating the same universal challenge patch with gradient back propagation minimization loss in the extracted subset, finding an intersection in the challenge patch trained on the sampled subset, and interfering with the majority of samples such that the samples are mispredicted.

4. The method for generating a challenge sample for unmanned aerial vehicle image target detection of claim 1, wherein the challenge image isCan be represented by an original image x:

wherein eta represents the general anti-disturbance applied to the training image set X, M & ltN & gt represents the anti-patch P, M is a mask responsible for constraining the general anti-disturbance eta to the surface of the target object, and A & ltN & gt represents the result of element-wise multiplication, which is a mask operation that only retains a specific region.

5. The method for generating a challenge sample for unmanned aerial vehicle image target detection of claim 4, wherein the generic challenge disturbance is repositioned by the projective transformation, the formula of which is:

6. The method for generating the challenge sample for unmanned aerial vehicle image target detection according to claim 1, wherein the projection matrix is solved through the corresponding relation of control points, then a mask is projected from an original image to a target image, the universal challenge patch is added at the position of the mask to form a challenge image, and the control points are obtained through manual labeling.

7. The method for generating an countermeasure sample for unmanned aerial vehicle image target detection according to claim 1, wherein for a one-stage target detector and a two-stage target detector, an countermeasure patch is generated by minimizing a score of a target contained in a prediction frame, and a confidence of an object in a final output frame of the misleading detector is smaller than a detection threshold value, so as to obtain a vanishing attack mode, and a loss function of the vanishing attack is as follows:

8. The method for generating an challenge sample for unmanned aerial vehicle image target detection according to claim 1, wherein a one-stage target detection algorithm and a two-stage target detection algorithm are used as target models for attacks.

9. The method for generating the challenge sample for unmanned aerial vehicle image target detection according to claim 1, wherein the challenge performance of the challenge patch is quantified by the challenge success rate, and the challenge success rate is defined as follows:

10. The method for generating a challenge sample for unmanned aerial vehicle image target detection according to claim 1, wherein the universal challenge patch can perform integrated attack in combination with attack low-level similar features, wherein the attack low-level similar features are shape and color features.