CN113343025B

CN113343025B - Sparse attack resisting method based on weighted gradient Hash activation thermodynamic diagram

Info

Publication number: CN113343025B
Application number: CN202110893931.9A
Authority: CN
Inventors: 黄亮; 施荣华; 胡超
Original assignee: Central South University
Current assignee: Central South University
Priority date: 2021-08-05
Filing date: 2021-08-05
Publication date: 2021-11-02
Anticipated expiration: 2041-08-05
Also published as: CN113343025A

Abstract

The invention provides a sparse counterattack method based on weighted gradient hash activation thermodynamic diagram, which comprises the following steps: step 1, inputting a query video into a video hash retrieval model to obtain a query video hash code; step 2, acquiring a target video set and respectively inputting target videos in the target video set into a video hash retrieval model to generate a plurality of target video hash codes; and 3, performing dot multiplication on the query video hash code and the target video hash codes respectively to construct a Hamming distance function between the query video hash code and the target video hash codes. According to the sparse countermeasure attack method based on the weighted gradient hash activation thermodynamic diagram, the position and the sensitive area of the sparse countermeasure attack are determined by using the accuracy of the sensitivity of the weighted gradient hash activation thermodynamic diagram, the pixel cost of the countermeasure attack is reduced, and the accuracy, the efficiency and the imperceptibility of a countermeasure sample of the sparse countermeasure attack are improved.

Description

Sparse attack resisting method based on weighted gradient Hash activation thermodynamic diagram

Technical Field

The invention relates to the technical field of video counterattack, in particular to a sparse counterattack method based on weighted gradient hash activation thermodynamic diagram.

Background

The application of the deep neural network to the hash retrieval greatly improves the hash retrieval efficiency, and in recent years, the deep neural network is proved to be very fragile under the attack resistance, so that the safety problem related to the deep neural network draws attention, the research on the attack resistance is further developed, and the deep retrieval system bears the risk of the deep neural network while enjoying the benefits brought by the deep neural network.

The existing anti-attack method can be divided into two categories of dense anti-attack and sparse anti-attack, compared with the dense attack, the sparse attack achieves the attack effect by disturbing pixel points at partial positions, and the biggest challenge of the sparse attack lies in how to determine the disturbed positions.

At present, the attack aiming at a video hash retrieval system only comprises a deep hash target attack method, the principle of the attack method is that a target label is optimized, a new voting component is arranged to obtain the optimal representation of a hash code set of the target label, so that the accuracy of resisting the attack is improved, the deep hash target attack method is intensive resisting attack, a plurality of redundant pixels are generated and are not suitable for a real scene, and moreover, the intensive resisting attack also needs to cost more pixels, so that the imperceptibility of a resisting sample is low.

Disclosure of Invention

The invention provides a sparse counterattack method based on weighted gradient hash activation thermodynamic diagram, and aims to solve the problems that a traditional counterattack method generates a plurality of redundant pixels, is not suitable for a real scene, needs more pixel cost and causes lower imperceptibility of countersamples.

In order to achieve the above object, an embodiment of the present invention provides a sparse counterattack method based on weighted gradient hash activation thermodynamic diagram, including:

step 1, inputting a query video into a video hash retrieval model to obtain a query video hash code;

step 2, acquiring a target video set and respectively inputting target videos in the target video set into a video hash retrieval model to generate a plurality of target video hash codes;

step 3, performing dot multiplication on the query video hash code and the plurality of target video hash codes respectively to construct a Hamming distance function between the query video hash code and the plurality of target video hash codes;

step 4, performing chain derivation on the output of the Hamming distance function and the intermediate layer input of the video Hash retrieval model, and then performing linear combination on the output of the Hamming distance function and the intermediate layer input of the video Hash retrieval model to generate a weighted gradient Hash activation thermodynamic diagram;

step 5, mapping the time dimension and the space dimension of the weighted gradient Hash activation thermodynamic diagram through trilinear interpolation and upsampling to obtain a weighted gradient Hash activation thermodynamic matrix, and enabling the weighted gradient Hash activation thermodynamic matrix to pass throughReluFunction activation andBinarizeperforming binarization on the function to obtain a sparse mask matrix;

and 6, multiplying the antagonism disturbance by a mask matrix to obtain an antagonism mask matrix, constructing an antagonism objective function according to the antagonism mask matrix and the Hamming distance function, and optimizing the antagonism objective function by an ADAM (adaptive dynamic analysis and analysis) optimization method to obtain a sparse antagonism video sample.

Wherein, the step 1 specifically comprises:

step 11, defining the video hash retrieval model asF(^.)；

Step 12, inputting the query video into a video Hash retrieval modelF(^.) The video hash retrieval model generates a query video hash code, and the query video hash code generation process is as follows:

H _q=F(X _q) （1）

wherein the content of the first and second substances,H _qindicating that the video hash code is queried,H _q∈{0,1}^N，Nthe length is represented as a function of time,H _qis of length ofNThe binary hash-code sequence of (a),X _qwhich represents the query video, is presented to the user,X _q∈R ^C×G×B×T，Crepresenting the number of frames of the query video,Grepresenting the width of each frame of the query video,Bindicating the height of each frame of the query video,Trepresenting the number of channels per frame of the query video.

Wherein, the step 2 specifically comprises:

step 21, obtaining a target video setX _t={x _t1,x _t2,…,x _tiAnd (c) the step of (c) in which,x _tirepresenting the target video setiThe number of the target videos is reduced,i=1,2,…,n；

step 22, respectively inputting the target videos in the target video set into a video hash retrieval model, wherein the video hash retrieval model generates a plurality of target video hash codes, and the target video hash code generation process is as follows:

H _ti=F(x _ti) （2）

wherein the content of the first and second substances,H _tiis shown asiThe hash code of the video to be targeted,i=1,2,…,n。

wherein, the step 3 specifically comprises:

querying a hamming distance function between the video hash code and the plurality of target video hash codes as follows:

（3）

wherein the content of the first and second substances,d(^.,^.) Representing a dot product operation function.

Wherein, the step 4 specifically comprises:

step 41, after performing chain derivation on the output of the hamming distance function and the intermediate layer input of the video hash retrieval model, obtaining the gradient of the intermediate feature map, and using the gradient of the intermediate feature map as the weight of the intermediate feature map, as follows:

（4）

wherein the content of the first and second substances,Wthe weights representing the intermediate feature map are,W∈R ^c×y×g×b，cthe number of frames representing the intermediate feature map,yrepresenting the number of weighted graphs of the intermediate feature graph per frame,grepresenting the weighted graph width of the intermediate feature graph for each frame,brepresenting the weighted graph height of the intermediate feature map for each frame,Arepresenting middle layer input, wherein the middle layer input is a middle characteristic diagram;

step 42, weighting the intermediate feature mapWGlobal average is carried out on the second dimension to obtain the global average weight of each frame of feature mapw ^cAs follows:

（5）

wherein the content of the first and second substances,w ^ca global average weight representing the feature map of each frame,

the spatial resolution of the feature map of each frame is represented,iandjrepresenting pixel coordinates;

step 43, global average weight of each frame feature mapw ^cAnd input of intermediate layerAPerforming linear combination to obtain a weighted gradient hash activation thermodynamic diagramQ ^kAs follows:

Q ^k =w ^c A（6）

wherein the content of the first and second substances,Q ^krepresents a weighted gradient hash activation thermodynamic diagram,k=1,2,…,c。

wherein, the step 5 specifically comprises:

step 51, mapping the weighted gradient hash activation thermodynamic diagram into a weighted gradient hash activation thermodynamic matrix with the same size as the target video size through trilinear interpolation and upsampling;

step 52, activating the weighted gradient hash thermal matrix inputReLUThe activation function obtains an activation matrixV ^T；

Step 53, setting threshold valueεCombined with a threshold valueεWill activate the matrixV ^TInput deviceBinarizeA sparse mask matrix is generated in the function as follows:

（7）

wherein the content of the first and second substances,M ⁿa matrix of masks is represented that is,Binarizea binary function is represented that is a function of,V ^Trepresenting an activation matrix, activating the matrixV ^TMask of pixels with intermediate weights below the threshold is set to 0, activating the matrixV ^TThe mask for pixels with a median weight above the threshold is set to 1.

Wherein, the step 6 specifically comprises:

step 61, constructing an antagonism objective function, as follows:

（8）

wherein the content of the first and second substances,Ewhich is indicative of a competing disturbance,Ma matrix of masks is represented that is,Mis composed ofM ⁿIn a shorthand form of (1);

represents the maximum value in the matrix, τ and

represents a constant;

step 62, optimizing the formula (8) by using an ADAM optimization method to obtain an optimal solution with the objective of minimizing the hamming distance function between the query video hash code and the plurality of target video hash codes and minimizing the addition of adversarial disturbance, so as to obtain sparse adversarial video sampleX _a。

The scheme of the invention has the following beneficial effects:

according to the sparse countermeasure attack method based on the weighted gradient hash activation thermodynamic diagram, the position of the sparse countermeasure attack in the video hash retrieval model is determined by generating the weighted gradient hash activation thermodynamic diagram, the query video hash code is subjected to point multiplication with the target video hash codes respectively, the sensitive area of the sparse countermeasure attack is determined, the range of the sparse countermeasure attack is limited through the mask matrix, the pixel cost of the sparse countermeasure attack is reduced, the accuracy and the efficiency of the sparse countermeasure attack are improved, and the imperceptibility of a countermeasure sample is improved.

Drawings

FIG. 1 is a flow chart of the present invention;

fig. 2 is a schematic diagram of a mask matrix visualization according to the present invention.

Detailed Description

In order to make the technical problems, technical solutions and advantages of the present invention more apparent, the following detailed description is given with reference to the accompanying drawings and specific embodiments.

Aiming at the problems that the existing counterattack method generates a plurality of redundant pixels, is not suitable for a real scene, needs more pixel cost and causes lower imperceptibility of countersamples, the invention provides a sparse counterattack method based on weighted gradient hash activation thermodynamic diagram.

As shown in fig. 1 to 2, an embodiment of the present invention provides a sparse counterattack method based on weighted gradient hash activation thermodynamic diagram, including: step 1, inputting a query video into a video hash retrieval model to obtain a query video hash code; step 2, acquiring a target video set and respectively inputting target videos in the target video set into a video hash retrieval model to generate a plurality of target video hash codes; step 3, performing dot multiplication on the query video hash code and the plurality of target video hash codes respectively to construct a Hamming distance function between the query video hash code and the plurality of target video hash codes; step 4, performing chain derivation on the output of the Hamming distance function and the intermediate layer input of the video Hash retrieval model, and then performing linear combination on the output of the Hamming distance function and the intermediate layer input of the video Hash retrieval model to generate a weighted gradient Hash activation thermodynamic diagram; step 5, mapping the time dimension and the space dimension of the weighted gradient Hash activation thermodynamic diagram through trilinear interpolation and upsampling to obtain a weighted gradient Hash activation thermodynamic matrix, and enabling the weighted gradient Hash activation thermodynamic matrix to pass throughReluFunction activation andBinarizeperforming binarization on the function to obtain a sparse mask matrix; step 6, multiplying the antagonism disturbance by the mask matrix to obtain an antagonism mask matrix, constructing an antagonism objective function according to the antagonism mask matrix and the Hamming distance function, and optimizing by using an ADAM (adaptive dynamic analysis and analysis) methodAnd optimizing the antagonistic objective function to obtain a sparse antagonistic video sample.

Wherein, the step 1 specifically comprises: step 11, defining the video hash retrieval model asF(^.)；

H _q=F(X _q) （1）

Wherein, the step 2 specifically comprises: step 21, obtaining a target video setX _t={x _t1,x _t2,…,x _tiAnd (c) the step of (c) in which,x _tirepresenting the target video setiThe number of the target videos is reduced,i=1,2,…,n；

H _ti=F(x _ti) （2）

wherein, the step 3 specifically comprises: querying a hamming distance function between the video hash code and the plurality of target video hash codes as follows:

（3）

Wherein, the step 4 specifically comprises: step 41, after performing chain derivation on the output of the hamming distance function and the intermediate layer input of the video hash retrieval model, obtaining the gradient of the intermediate feature map, and using the gradient of the intermediate feature map as the weight of the intermediate feature map, as follows:

（4）

（5）

the spatial resolution of the feature map of each frame is represented,iandjrepresentation imageA pixel coordinate;

Q ^k =w ^c A（6）

according to the sparse countermeasure attack method based on the weighted gradient hash activation thermodynamic diagram, the activation thermodynamic diagram is a category distinguishing and positioning technology, any model based on the convolutional neural network is made to be more transparent through generating a visual interpretation, the visual interpretation is also called the weighted gradient hash activation thermodynamic diagram, the key area in the image can be positioned through the weighted gradient hash activation thermodynamic diagram, the sparse countermeasure attack method based on the weighted gradient hash activation thermodynamic diagram applies the weighted gradient hash activation thermodynamic diagram to the video to obtain the key area in the video, and then the sparse countermeasure attack on the video hash retrieval model is achieved through combining with the countermeasure attack technology.

Wherein, the step 5 specifically comprises: step 51, mapping the weighted gradient hash activation thermodynamic diagram into a weighted gradient hash activation thermodynamic matrix with the same size as the target video size through trilinear interpolation and upsampling;

（7）

In the sparse counterattack method based on weighted gradient hash activation thermodynamic diagram according to the above embodiment of the present invention, when the sparse counterattack video sample is generated, the threshold value in the formula (7) is usedεSet to 0.5.

Wherein, the step 6 specifically comprises: step 61, constructing an antagonism objective function, as follows:

（8）

represents the maximum value in the matrix, τ and

represents a constant;

Activation of thermal power based on weighted gradient hash as described in the above embodiments of the present inventionSparse counterattack method of graph, and obtained sparse counterattack video sampleX _aAnd (4) carrying out verification: 1. acquiring a test video data set and inputting the test video data set into a video hash retrieval model to generate a hash code database of the test video data set; 2. antagonistic video samples to be sparseX _aInput video hash retrieval model generation of antagonistic hash codesH _a(ii) a 3. To-be-antagonistic hash codesH _aPerforming point multiplication with each test video hash code in the hash code database of the test video data set respectively, and constructing a Hamming distance function between the antagonistic hash code and each test video hash code to obtain the Hamming distance between the antagonistic hash code and each test video hash code; 4. sequencing the Hamming distances between the antagonistic hash codes and each test video hash code from small to large to obtain a retrieval result, wherein if the Hamming distances between the antagonistic hash codes and the test video hash codes in the retrieval result are sequenced more forward, the smaller the Hamming distance between the test video hash codes and the antagonistic hash codes is, the higher the similarity between the test video hash codes and the antagonistic hash codes is; 5. MeanAverageprecision (MAP) is defined to measure the search results after sorting, as follows:

（9）

where O represents the number of test videos in the test data set,ka ranking representing a hamming distance between the antagonistic hash code and each of the test video hash codes in the search results,P(K) The accuracy of the representation is such that,P(K)=r/k，ris shown askHow many test video hashes before the ranking are consistent with the antagonistic hashes, whenkRel (k) is 1 when the test video hash code corresponding to the rank is consistent with the antagonistic hash code, and when the rank is not consistent with the antagonistic hash codekWhen the test video hash code corresponding to the rank is not consistent with the antagonistic hash code, rel (k) is 0,Rindicating the number of test video hash codes in the retrieval result which are consistent with the antagonistic hash codes.

In the sparse attack countermeasure method based on weighted gradient hash activation thermodynamic diagram according to the embodiment of the invention, the public data sets UCF101 and HMDB51 are respectively selectednThe hash codes are of lengths of 16bits, 32bits and 62bits respectively of the target video, wherein,n=1,3,5,7,9；nrepresenting the number of target videos, and enabling the target videos with different hash code lengths and different numbers to be differentεSubstituting the values into a formula (8) respectively to calculate MAP values of the confrontation video samples under different hash code lengths and different target video numbers; s (sparse) is introduced to measure the mask matrix set in the formula (7), so as to show the number of disturbed pixels introduced by the adversity attack, specifically expressed by S = U/L, where S represents the percentage of the number of disturbed pixels, U represents the number of disturbed pixels in the video, that is, the number of pixels set to 1 in the mask matrix, L represents the sum of the number of pixels in the video, that is, the number of pixels in the mask matrix, and when S is smaller, the smaller the number of added disturbed pixels, the lower the cost of the pixels to be paid, and the experimental results are shown in table 1:

TABLE 1

Calculation of MAP values for Sparse antagonistic video samples Sparse: take the MAP value equal to 91.61% for example whennIf not less than 1, inputting a target video with the hash code length of 16bits in the UCF101 data set into a video hash retrieval model to generate a target hash code, and setting the hash code in the formula (7)εGenerating a Sparse mask matrix for 0.5, substituting the target hash code, the number of the target videos and the Sparse mask matrix into a formula (8), obtaining a Sparse antagonistic video sample Sparse16bits with the hash code length of 16bits through an ADAM optimizer optimization formula (8), inputting the Sparse antagonistic video sample Sparse16bits into a video hash retrieval model to generate a Sparse antagonistic hash code, performing point multiplication on the Sparse antagonistic hash code and each test video hash code in a hash code database of the test video data set to construct a Sparse antagonistic hash codeObtaining a Hamming distance between the sparse antagonistic hash code and each test video hash code by using a Hamming distance function between the sparse antagonistic hash code and each test video hash code; sequencing the Hamming distance between the sparse antagonistic hash codes and each test video hash code from small to large to obtain a retrieval result, and calculating an MAP value according to the retrieval result through a formula (9) to obtain the MAP value of the sparse antagonistic hash codes of the sparse antagonistic video samples with the target video hash code as a target; calculation of MAP values for Dense antagonistic video samples density: take the MAP value equal to 91.76% for example whennIf not less than 1, inputting a target video with the hash code length of 16bits in the UCF101 data set into a video hash retrieval model to generate a target hash code, and setting the hash code in the formula (7)εGenerating a Dense mask matrix, substituting a target hash code, the number of target videos and the Dense mask matrix into a formula (8), obtaining a resistant video sample Dense16bits with the length of the Dense hash code being 16bits through an ADAM optimizer optimization formula (8), inputting the Dense resistant video sample Dense16bits into a video hash retrieval model to generate a Dense resistant hash code, performing point multiplication on the Dense resistant hash code and each test video hash code in a hash code database of a test video data set, constructing a Hamming distance function between the Dense resistant hash code and each test video hash code, and obtaining the Hamming distance between the Dense resistant hash code and each test video hash code; sequencing the Hamming distance between the dense antagonism hash codes and each test video hash code from small to large to obtain a retrieval result, and calculating an MAP value according to the retrieval result through a formula (9) to obtain the MAP value of the dense antagonism hash codes of the dense antagonism video samples by taking the target video hash codes as targets; in table 1, origin represents a MAP result of a target video, when hash lengths of sparse antagonistic video samples generated based on the UCF101 dataset are 16bits, 32bits, and 64bits, respectively, values of Sparsity are 66.38%, 65.03%, and 62.42%, respectively, and when hash lengths of dense antagonistic video samples generated based on the UCF101 dataset are 16bits, 32bits, and 64bits, respectively, values of Sparsity are 100%; dense pairs generated based on HMDB51 datasetWhen the hash lengths of the resistant video samples are 16bits, 32bits and 64bits respectively, the sparity values are 69.14%, 59.79% and 54.18%, and when the hash lengths of the dense resistant video samples generated based on the HMDB51 dataset are 16bits, 32bits and 64bits respectively, the sparity values are all 100%; when the value of the Sparsity is 100%, it represents that all the pixels in the mask matrix are added with disturbance, density represents Dense,ε=at 1, the pixels in the mask matrix are all 1, Sparse represents Sparse,ε=at 0.5, the portion of pixels in the mask matrix is 1.

As can be seen from Table 1, the MAP value gradually increases with the increase of the number of hash bits, and the MAP value of 64bits is the largest; number of target videosnWhen the MAP value is gradually increased, the MAP value is gradually increased; when in usen<At 5, MAP increased most rapidly; when in usenWhen =7 or 9, the rate of MAP increase is significantly reduced; the MAP of the sparse antagonistic video sample is slightly lower than that of the dense antagonistic video sample, but the value of disturbed pixel points of the sparse antagonistic video sample is obviously reduced; when the number of hash bits is increased, the value of the granularity is gradually reduced, which means that longer hash codes have richer information, and it is easier to find a key region of the target video for adding noise, that is, fewer pixels are needed to resist the attack, and the cost of the pixels is lower. Fig. 2 shows a visualization result of the sparse matrix, where the white portion in fig. 2 represents that the pixel position is set to 1, and the black portion in fig. 2 represents that the pixel position is set to 0.

According to the sparse countermeasure attack method based on the weighted gradient hash activation thermodynamic diagram, the sparse countermeasure attack method based on the weighted gradient hash activation thermodynamic diagram is applied to countermeasure attack, sparse countermeasure attack is carried out on the video hash retrieval model, the position of the sparse countermeasure attack in the video hash retrieval model is determined by generating the weighted gradient hash activation thermodynamic diagram, the query video hash code is respectively subjected to point multiplication with a plurality of target video hash codes, the sensitive area of the sparse countermeasure attack is determined, the range of the sparse countermeasure attack is limited through the mask matrix, the pixel cost of the sparse countermeasure attack is reduced, the accuracy and the efficiency of the sparse countermeasure attack are improved, and the imperceptibility of a countermeasure sample is improved.

While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A sparse countermeasure attack method based on weighted gradient hash activation thermodynamic diagram is characterized by comprising the following steps:

step 6, multiplying the antagonism disturbance by a mask matrix to obtain an antagonism mask matrix, constructing an antagonism objective function according to the antagonism mask matrix and the Hamming distance function, and optimizing the antagonism objective function by an ADAM (adaptive dynamic analysis and analysis) optimization method to obtain a sparse antagonism video sample;

the step 6 specifically includes:

step 61, constructing an antagonism objective function, as follows:

represents the maximum value in the matrix, τ and

represents a constant;d(^.,^.) The function of the dot product operation is represented,F(^.) A video hash retrieval model is represented and,X _qwhich represents the query video, is presented to the user,X _q∈R ^C×G×B×T，Crepresenting the number of frames of the query video,Grepresenting the width of each frame of the query video,Bindicating the height of each frame of the query video,Tindicating the number of channels per frame of the query video,H _tiis shown asiThe hash code of the video to be targeted,i=1,2,…,n；

step 62, optimizing the antagonistic objective function by using an ADAM optimization method, and solving an optimal solution with the objective of minimizing hamming distance function between the query video hash code and the plurality of objective video hash codes and minimizing adding antagonistic disturbance to obtain sparse antagonistic video samplesX _a。

2. The sparse counterattack method based on weighted gradient hash activation thermodynamic diagram according to claim 1, wherein the step 1 specifically comprises:

step 11, defining the video hash retrieval model asF(^.)；

Step 12, inputting the query video into a video Hash retrieval moduleModel (III)F(^.) The video hash retrieval model generates a query video hash code, and the query video hash code generation process is as follows:

H _q=F(X _q) （1）

3. The sparse counterattack method based on weighted gradient hash activation thermodynamic diagram according to claim 2, wherein the step 2 specifically comprises:

H _ti=F(x _ti) （2）

4. the sparse counterattack method based on weighted gradient hash activation thermodynamic diagram of claim 3, wherein the step 3 specifically comprises:

（3）

5. The sparse counterattack method based on weighted gradient hash activation thermodynamic diagram of claim 4, wherein the step 4 specifically comprises:

（4）

（5）

Q ^k =w ^c A（6）

6. the sparse counterattack method based on weighted gradient hash activation thermodynamic diagram of claim 5, wherein the step 5 specifically comprises:

（7）