CN112884802B

CN112884802B - Attack resistance method based on generation

Info

Publication number: CN112884802B
Application number: CN202110204784.XA
Authority: CN
Inventors: 王正奕; 廖勇; 成日冉; 周惠; 蔡木目心; 王旭鹏
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2021-02-24
Filing date: 2021-02-24
Publication date: 2023-05-12
Anticipated expiration: 2041-02-24
Also published as: CN112884802A

Abstract

The invention discloses a generated attack countermeasure method, which comprises the following steps: calculating similarity coding data of a tracking template and a seed point set of a search area, and fusing the features extracted from the tracking template and the similarity coding data to obtain enhanced features; inputting the added features to a binomial distribution encoding layer, wherein the binomial distribution encoding layer learns a Bernoulli distribution for each point for describing the filtering state of the point; the resistant template was obtained by distillation in a filtration state. The invention adopts a similarity coding mode and a mode of fusing the similarity coding and the tracking template to extract the characteristics, and has the advantages that: the challenge attack countermeasure method can quickly calculate the potential similarity of the template and the search area, effectively encode challenge samples, filter the generated challenge samples by points, simulate data holes which are easy to generate when 3D data are acquired in the real world, and have the characteristic of being not easy to be perceived.

Description

Attack resistance method based on generation

Technical Field

The invention relates to the field of target tracking attack resistance, in particular to a generated attack resistance method.

Background

Despite the extensive research effort already in autopilot and intelligent surveillance systems, the task of 3D object tracking has received considerable attention. Although there have been many breakthrough developments in the field of 3D object tracking, there is not much research on its dependability compared to 2D object tracking. Some existing studies indicate that depth models are very vulnerable to some elaborated aggressive challenge samples. Since 3D object tracking plays a very important role in many fields where security is emphasized, it is highly desirable to evaluate the robustness of a 3D tracking model.

The primary object of early challenge to 3D models was the classifier. Since such attacks are attached to the victim point cloud, the method of attack is largely divided into point perturbation and point discard. Counter-attacks of point perturbation are typically constrained by L2 regularization, moving the point locally to generate a counter-sample. These methods generate sample-level attacks, but their nature is a relatively time-consuming optimization problem and therefore cannot be applied to some scenarios with real-time requirements. Some attack methods based on the generated network are gradually proposed, such as an attack method which confuses a classifier through label guidance, and an attack method which focuses on attack transference. On the other hand, the 3D sensor is simulated to acquire the occlusion condition of the point cloud data or the inherent defects of the point cloud data to generate an contrast sample, so that the attack on the depth model is very effective. The method of obtaining a salient point by calculating the contribution degree of each point in the point cloud is largely applied to combat attacks. Generating significant occlusion using iterative methods can also effectively fool the deep neural network by counting the robustness of each point in the depth model.

There is also research on 3D target detection against attacks. The first type of attack on radar target detection is an attack that combines an optimization method with global sampling. In addition, there is a unified attack method that can confuse the 3D object detection algorithm, which has advantages in terms of physical awareness in an autopilot scenario. In addition, an observation-based radar point cloud shielding attack method can also cause instability of the 3D object detector.

In the prior art, a method of directly adopting computing point contributions to perform point filtering is generally based on an optimization method, and the time consumption of the method is relatively high. In addition, the point filtering is realized by adopting a differentiable form to fit the Bernoulli distribution, if the lost point state description is performed by using a Sigmoid function, the distribution of the lost point states between 0 and 1 is insufficient, and most of the lost point states are between the 0 and 1, so that the lost point cannot be well described.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a method for resisting attack based on generation.

The aim of the invention is realized by the following technical scheme: a method of generating a challenge-attack based, comprising the steps of:

calculating similarity coding data of a tracking template and a seed point set of a search area, and fusing the features extracted from the tracking template and the similarity coding data to obtain enhanced features;

inputting the added features to a binomial distribution encoding layer, wherein the binomial distribution encoding layer learns a Bernoulli distribution for each point for describing the filtering state of the point; the resistant template was obtained by distillation in a filtration state.

Further, the calculating similarity coded data of the tracking template and the seed point set of the search area includes:

respectively extracting a first seed point set of the tracking template and a second seed point set of the search area by utilizing a downsampling mode;

and taking the cosine distances of the first seed point set and the second seed point set which are obtained through calculation as potential similarity coding data.

Further, the cosine distance is convolved and symmetric to be used as potential similarity coded data.

Further, the fusing the features extracted from the tracking template and the similarity coding data to obtain enhanced features includes:

upsampling the first seed point set to obtain potential features of the tracking template;

and splicing and fusing the potential features and similarity coding data which are repeatedly calculated for a plurality of times to obtain enhanced features.

Further, the method further comprises:

using potentially similarity encoded data as feature loss L _feat To distinguish between tracking templates and search spaces in one potential feature space:

where Sim' represents the potential similarity encoded data of the tracking template and the search space, d ₂ Representing the dimension of the similarity code, i represents the sequence number of the point.

Further, the binomial distribution encoding layer learns, for each point, a bernoulli distribution for describing the filtering state of the point, including:

point filtration is achieved using a stretched binary cone distribution with a range of intervals (gamma, ζ) where gamma <0 and ζ >1, and attaching a filtration status to each point of the tracking template.

Further, the adding features are input to implement point filtering by using the stretched binary constraint distribution, and a filtering state is attached to each point of the tracking template, including:

given a random variable s that is subject to a binary con-crete distribution phi lying within the (0, 1) interval, and can be used with q _s (s|phi) as the probability density of the distribution, Q _s (s|phi) as its cumulative probability; the binary cone distribution phi is represented by the parameter phi= (log alpha, beta), where log alpha represents position and beta represents temperature; the binary control distribution is re-parameterized with a random variable U-U (0, 1) that obeys a uniform distribution, expressed as:

s＝Sigmoid((log u-log(1-u)+logα)/β)

stretching the binary cone distribution to a (gamma, zeta) interval, wherein gamma is less than 0 and zeta is more than 1, and then carrying out truncation treatment on the binary cone distribution by using hard-sigmoid to obtain hard-cone distribution:

wherein z represents the filtering state, z _i ∈{0，1}；

The obtaining the antagonistic templates by using filtration state distillation comprises:

filter state z _i The points of=0 are filtered out by the filter, generating an antagonism template.

Further, the method further comprises:

by L ₀ Regularization as a filtering loss function; wherein L0 regularization is defined as the cumulative probability of the hard-saturation distribution being greater than zero:

further, the method further comprises:

generating a plurality of proposals, and their corresponding probability scores, as candidate regions for the target location using the countermeasure template; the proposal with the highest probability score will be selected as the final prediction result.

Further, the method further comprises:

using a location loss function L _loc Simultaneously decreasing the score of all proposals aggregated into one group is defined as:

wherein R represents the proposals ordered according to the scores, and p, q and R represent the subscript ranges of the proposals aggregated into groups respectively.

Further, the method further comprises:

using L2 distance as perceptual loss function L _perc Is used to constrain changes in data, defined as:

in the method, in the process of the invention,

representing the antagonistic template, P _tmp Representing tracking template->

Points representing the resistance template, x _i Representing the points of the tracking template.

The beneficial effects of the invention are as follows:

(1) In an exemplary embodiment of the present invention, feature extraction is performed by adopting a similarity coding mode and a similarity coding and tracking template fusion mode, and the advantages are that: (1) the similarity coding mode has the advantages that: the template and the search space can be distinguished in the feature space, and the template and the search space can be further distinguished in the abstract space; (2) the way of fusing similarity coding with tracking templates has the advantage: embedding the similarity codes serving as potential features into a tracking template, and enhancing the features of the tracking template; (3) the advantage of using a differentiable form to fit the Bernoulli distribution (i.e., learning the Bernoulli distribution for each point drop probability to describe the filtering state of the point) is that: learning can be performed in a gradient descent method using a neural network, and simulation of point filtering is more pertinent due to the bernoulli distribution. By adding similarity codes as enhancement features to the learning of the neural network, potential feature spaces can be better mined.

(2) In still another exemplary embodiment of the present invention, the similarity calculation is repeated M times and fusion is performed, which has the effect that: features of the template can be enhanced, and differences between the template and the search area in the feature space can be mined.

(3) In a further exemplary embodiment of the present invention, for similarity data, a symmetric function is further used, due to the invariance of the displacement of the point cloud, ensuring the same similarity output under different point sequences.

(4) In yet another exemplary embodiment of the invention, multiple losses are employed to enhance different effects.

Drawings

Fig. 1 is a flow chart of a method provided in an exemplary embodiment of the invention.

Detailed Description

The following description of the embodiments of the present invention will be made apparent and fully understood from the accompanying drawings, in which some, but not all embodiments of the invention are shown. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The terminology used in the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the present application. As used in this application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first message may also be referred to as a second message, and similarly, a second message may also be referred to as a first message, without departing from the scope of the present application. The term "if" as used herein may be interpreted as "at..once" or "when..once" or "in response to a determination", depending on the context.

In addition, the technical features of the different embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.

First, the background of the present exemplary embodiment will be described: known is a video sequence comprising N frames stored in the form of a point cloud

3D target tracking aims at locating tracking templates P in successive frames _tmp . Note that, the tracking template P _tmp Is from the first frame s ₁ Obtained from (i.e. P) _tmp ∈s ₁ 。

In the prior art, the 3D tracker is able to generate a search area P in the current frame by expanding the range of the prediction result in the previous frame _aea And in the search area P _aea Some proposals and probability scores corresponding to the proposals are generated as candidates of the target position.

In contrast, in the present exemplary embodiment, in the counter attack of 3D target tracking, it is necessary to have the 3D tracker be disturbed, resulting in that it predicts a result deviating from the true target position.

Thus: tracking template P _tmp Representing the tracking target specified in the first frame, searching for the region P _area Indicating the region in which the prediction result of the previous frame is expanded.

Referring to fig. 1, fig. 1 illustrates a generation-based challenge method disclosed in an exemplary embodiment of the present application, including the steps of:

s11: feature extraction: calculating similarity coding data of a tracking template and a seed point set of a search area, and fusing the features extracted from the tracking template and the similarity coding data to obtain enhanced features;

s13: challenge distillation: inputting the added features to a binomial distribution encoding layer, wherein the binomial distribution encoding layer learns a Bernoulli distribution for each point for describing the filtering state of the point; the resistant template was obtained by distillation in a filtration state.

Specifically, in this exemplary embodiment, the proposed generation-based challenge-attack method flow is directed to a 3D tracker, as shown in fig. 1:

in step S11, a known element containing M ₁ Tracking template P of individual points _tmp And comprises M ₂ Search area P of individual points _area Tracking template P _tmp And search space P _area Is encoded into a description of the template characteristics, i.e., similarity-encoded data Sim', having a dimension M ₁ ×(d ₁ +d ₂ ) Wherein d is ₁ Representing the number of features of the up-sampled points, d ₂ Representing the number of dimensions of the similarity code; the template P will then be tracked _tmp Extracted features

Fusion with similarity coded data Sim' to obtain enhancement features +.>

Thereafter, in step S13, a binomial distribution encoding layer is employed to learn the Bernoulli distribution of the drop probability for each point for describing the filtering state of the point, thereby distilling out an antagonistic template

/>

The feature extraction method adopts a similarity coding mode and a mode of fusing the similarity coding and the tracking template, and has the advantages that: (1) advantages of similarity coding scheme: the difference between the template and the search area in the feature space can be mined, and the template and the search area are encoded; (2) The way of fusing similarity coding with tracking templates has the advantage: and taking the encoded similarity as the characteristics of the search area and the template of the potential space, and carrying out characteristic enhancement on the template. Before the method is used, the original model accuracy can be reduced by 6.5% by the attack method, and after the method is used, the model accuracy can be reduced by 22.9%.

The advantage of using a differentiable form to fit the Bernoulli distribution (i.e., learning the Bernoulli distribution for each point drop probability to describe the filtering state of the point) is that: discrete filtration states can be described using a method that can be gradient descent to achieve a filtration operation.

More preferably, in an exemplary embodiment, the calculating similarity-encoded data of the tracking template and the seed point set of the search region includes:

s31: respectively extracting a first seed point set of the tracking template and a second seed point set of the search area by utilizing a downsampling mode;

s33: and taking the cosine distances of the first seed point set and the second seed point set which are obtained through calculation as potential similarity coding data.

Specifically, in step S31 of the exemplary embodiment, the downsampling may be implemented by the pointnet++ (i.e., the backbone network in fig. 1), the template P _tmp And search area P _area Respectively downsampled to S by the furthest point sampling algorithm ₁ And S is ₂ Each with multi-scale features f e R ⁿ Respectively forming a first seed point set S _tmp (S ₁ X n) and a second set of seed points s _area (S ₂ ×n)。

In step S33, a method for calculating the seed point set S is designed for better feature extraction _tmp And S is _area Cosine distance of (2)

As a branch of the potential similarity data Sim'. The encoded potential similarity can effectively fuse the template P _tmp And search space P _area 。

More preferably, in an exemplary embodiment, the cosine distances are convolved and symmetric to encode data as potential similarity.

It should be noted that, because of the invariance of the point cloud, the symmetric function is further used to ensure the output of the same similarity under different point sequences

More preferably, in an exemplary embodiment, the fusing the features extracted from the tracking template and the similarity-encoded data to obtain enhanced features includes:

s51: upsampling the first seed point set to obtain potential features of the tracking template;

s53: and splicing and fusing the potential features and similarity coding data which are repeatedly calculated for a plurality of times to obtain enhanced features.

Specifically, in step S51, a first set of seed points S _tmp Is upsampled into

To generate potential features for each point in the original tracking template.

And in step S53, by combining the potential features P _tmp Splicing with M repeated similarity to obtain enhanced features

It should be noted that repeating the similarity operation M times is achieved by duplication, so that the similarity is spliced into the feature of each point as the feature of the point.

More preferably, in an exemplary embodiment, the method further comprises:

where Sim' represents the potential similarity encoded data of the tracking template and the search space, d ₂ Representing the dimension of the similarity code, i represents the subscript of the dimension.

I.e. the mean value of the similarity code is taken as loss L _feat By reducing this loss, the similarity between the search space and the target in the feature space can be reduced, and the search space and the target can be distinguished from each other in the feature space.

Preferably, in an exemplary embodiment, the binomial distribution encoding layer learns, for each point, a bernoulli distribution for describing a filtering state of the point, including:

Specifically, in the exemplary embodiment, the point filtering module learns the probabilities that individual points are filtered out. A binomial distribution encoding layer implements point filtering through a point-scale filter that learns a Bernoulli distribution for each point to describe the filtering state of the point. Specifically, the filtering state z of the point scale _i E {0,1} is appended to each point of the tracking template, which can be expressed as

Filter state z _i Those points which=0 are filtered out by the filter, thus generating an antagonistic template +.>

But it is not trivial due to the discontinuity of the bernoulli distribution. Therefore Bi is adoptedThe natural Concrete distribution is a smooth simulation of the Bernoulli distribution and is continuously differentiable. Meanwhile, in order to ensure that the point filtering module can effectively filter points, the value of the point filtering module needs to be determined to be 0 or 1, so that the binary control distribution is stretched to a (gamma, zeta) interval, wherein gamma is less than 0 and zeta is more than 1.

More preferably, in an exemplary embodiment, the adding features is input to implement point filtering using the stretched binary control distribution, and attaching filtering states to each point of the tracking template, including:

s71: the Binary Concrete distribution is a smooth simulation of the Bernoulli distribution, and it is continuously differentiable; given a random variable s that is subject to a binary con-crete distribution phi lying within the (0, 1) interval, and can be used with q _s (s|phi) as the probability density of the distribution, Q _s (s|phi) as its cumulative probability; the binary cone distribution phi is represented by the parameter phi= (log alpha, beta), where log alpha represents position and beta represents temperature; the binary control distribution is re-parameterized with a random variable U-U (0, 1) that obeys a uniform distribution, expressed as:

s＝Sigmoid((log u-log(1-u)+logα)/β)

s73: in order to ensure that the point filtering module can effectively filter points, the value of the point filtering module needs to be determined to be 0 or 1, the binary control distribution is stretched to a (gamma, zeta) interval, wherein gamma is less than 0 and zeta is more than 1, and the point filtering module is truncated by using hard-sigmoid to obtain hard-control distribution:

wherein z represents the filtering state, z _i ∈{0，1}；

filter state z _i Point=0Filtered out by a filter to generate an antagonistic template.

More preferably, in an exemplary embodiment, due to L ₀ Regularization does not lead to collapse of the filter state values, so it is used to penalize the binomial distribution coding layer. Binomial distributed coding layer is L ₀ Regularization is constrained to minimize the number of points filtered out. Thus, the method further comprises:

the cumulative probability greater than zero is the "probability not equal to 0", i.e., the probability of filtering state 1, and the number of filtered points is controlled by this probability.

More preferably, in an exemplary embodiment, as shown in fig. 1, the method further comprises:

The search area P is to be searched for _area And an antagonistic template P _tmp The target tracking is realized by inputting a plurality of proposals to a victim depth model (Victim Deep Model) and selecting the proposal with the highest score as a prediction result of the current frame.

More preferably, in an exemplary embodiment, the tracker predicts some proposals and their corresponding probability scores, and as candidate regions for the target location, the proposal with the highest probability score is selected as the final prediction result. However, other higher scoring proposals are also located closer to the true position, resulting in more accurate predictions of these proposals. Thus utilizing the location loss function L _loc Can be within a specified rangeTo accumulate proposal scores to form a group, so that the proposal scores of all proposal aggregated into a group can be reduced simultaneously, thereby reducing the scores of the preferable proposal simultaneously, which is defined as:

More preferably, in an exemplary embodiment, the method further comprises:

in order that the attack effect is not perceptible to the human eye, the L2 distance is used as a perception loss function L _pec Is used to constrain changes in data, defined as:

in the method, in the process of the invention,

representing the antagonistic template, P _tmp Representing tracking template->

Summarizing all the exemplary embodiments, the objective loss function may be expressed as:

L＝L _feat +a*L _loc +b*L ₀ +c*L _perc

where a, b, c are super parameters for balancing the terms in the loss function.

Based on any of the above exemplary embodiments, an exemplary embodiment of the present invention provides a storage medium having stored thereon computer instructions that, when executed, perform the steps of the one generated based challenge method.

Based on any one of the above exemplary embodiments, an exemplary embodiment of the present invention provides a terminal, including a memory and a processor, where the memory stores computer instructions executable on the processor, and the processor executes the steps of the attack resistance method based on the generation when the processor executes the computer instructions.

Based on this understanding, the technical solution of the present embodiment may be embodied essentially or in a part contributing to the prior art or in the form of a software product stored in a storage medium, comprising several instructions for causing an apparatus to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-only memory (ROM), a random access memory (RandomAccessMemory, RAM), a magnetic disk, an optical disk, or other various media capable of storing program codes.

It is apparent that the above examples are given by way of illustration only and not by way of limitation, and that other variations or modifications may be made in the various forms based on the above description by those of ordinary skill in the art. It is not necessary here nor is it exhaustive of all embodiments. And obvious variations or modifications thereof are contemplated as falling within the scope of the present invention.

Claims

1. A method of generating a challenge based challenge, characterized by: the method comprises the following steps:

inputting the enhancement features into a binomial distribution coding layer, wherein the binomial distribution coding layer learns a Bernoulli distribution for each point to describe the filtering state of the point; distilling the mixture in a filtering state to obtain an antagonistic template;

generating a plurality of proposals, and their corresponding probability scores, as candidate regions for the target location using the countermeasure template; the proposal with the highest probability score will be selected as the final prediction result;

2. A method of generating-based challenge attack according to claim 1, wherein: the calculating the similarity coding data of the seed point set of the tracking template and the searching area comprises the following steps:

3. A method of generating-based challenge attack according to claim 2, wherein: the step of fusing the features extracted from the tracking template and the similarity coding data to obtain enhanced features comprises the following steps:

4. A method of generating-based challenge attack according to claim 2, wherein: the method further comprises the steps of:

using potentially similarity encoded data as feature loss L _feat To at one potentialDistinguishing the tracking template from the search space in the feature space:

5. A method of generating-based challenge attack according to claim 1, wherein: the binomial distribution encoding layer learns, for each point, a bernoulli distribution for describing the filtering state of the point, including:

point filtering is achieved using a stretched binary control distribution with a range of intervals (gamma, ζ) where gamma <0 and ζ >1, and attaching a filtering state to each point of the tracking template.

6. A method of generating-based challenge attack in accordance with claim 5, wherein: the adding features are input to the implementation of point filtering using the stretched binary constraint distribution, and the filtering state is attached to each point of the tracking template, including:

given a random variable s that is within the (0, 1) interval subject to the binaryconscrete distribution phi, and can be used with q _s (s|phi) as the probability density of the distribution, Q _s (s|phi) as its cumulative probability; the binary cone distribution phi is represented by the parameter phi= (log alpha, beta), where log alpha represents position and beta represents temperature; the binary control distribution is re-parameterized with a random variable U-U (0, 1) that obeys a uniform distribution, expressed as:

s＝Sigmoid((logu-log(1-u)+logα)/β)

stretching the binarycrcete distribution to a (gamma, zeta) interval, wherein gamma <0 and zeta >1, and carrying out truncation treatment on the binarycercete distribution by using hard-sigmoid to obtain hard-concrete distribution:

wherein z represents the filtering state, z _i ∈{0，1}；

7. A method of generating-based challenge attack in accordance with claim 6, wherein: the method further comprises the steps of:

8. a method of generating-based challenge attack according to claim 1, wherein: the method further comprises the steps of:

in the method, in the process of the invention,

representing the antagonistic template, P _tmp Representing tracking template->

Points representing the resistance template, x _i Representing the points of the tracking template. />