CN115828188A

CN115828188A - Method for defending substitute model attack and capable of verifying DNN model copyright

Info

Publication number: CN115828188A
Application number: CN202211661085.9A
Authority: CN
Inventors: 刘红; 吴希昊; 刘传雨; 肖云鹏
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2022-12-23
Filing date: 2022-12-23
Publication date: 2023-03-21

Abstract

The invention relates to the technical field of deep learning model property right protection, in particular to a method for defending alternative model attack and verifying DNN model copyright, which comprises the steps of constructing a joint deployment model, wherein the model comprises an extraction network and a classification network; in the joint deployment model, a data set comprises an original data set and a trigger set, and the trigger set is generated by embedding a watermark into the original data; in the process of training the joint deployment model, extracting the watermarks of the data set by using an extraction network, wherein the watermarks extracted from the original data set are used as original data, and the watermarks extracted from the triggering set are embedded watermarks; adding disturbance into the extracted data, and carrying out countermeasure training on the classification model; if an attacker obtains the model and the data set in the server through attack and trains to obtain a substitution model, respectively inputting the original data set trigger set into the substitution model, and if the classification results of the two data sets are different, taking the model as the substitution model; the invention can verify the copyright of the model under the condition of the black box.

Description

Method for defending substitute model attack and capable of verifying DNN model copyright

Technical Field

The invention relates to the technical field of deep learning model property right protection, in particular to a method for defending alternative model attack and verifying DNN model copyright.

Background

At present, deep learning is rapidly developed in various fields, and great success is achieved in the aspects of computer vision, natural language processing and the like, which far exceeds the traditional algorithm. A good deep learning model often requires many specialized talents, a large amount of computing resources, and large-scale data, which is often unique to companies, meaning that the deep learning model has great commercial value. However, as a digital product, while neural network models can condense the designer's intelligence, it requires a significant amount of training data and computational resources. For example, in order to accurately recognize a human face, a neural network generally requires tens of millions of human face images to learn and popularize. In addition, the neural network is affected by its network structure, data size and computational resources, and the calculation result usually takes several weeks, so it is necessary to protect the copyright of the neural network model from being violated.

The trained model is deployed in a cloud server to provide services, however, the server can be attacked maliciously to cause model leakage, an attacker can obtain illegal benefits through plagiarism model providing services, or a substitute model is trained through API access. Methods of verifying model copyrights are roughly classified into white box verification and black box verification.

In 2017, uchida et al propose a method for adding a watermark to a model, and add a regular term to an objective function for training a normal network to embed copyright information in network weights, and at the same time, ensure that the large-scale reduction of model performance is not affected. But the weights and structure of the model must be fully accessed for authentication, and in a real scenario an attacker does not open the model. In order to verify the copyright without knowing the internal structure and the total weight of the model, merrer et al propose a method for verifying the model copyright under the black box condition, and they adopt an antagonistic defense technology to fine tune the decision boundary of the model, so that the fine-tuned network can still normally classify part of samples of the decision boundary, and selected several antagonistic samples can be correctly classified, but they do not consider the mobility problem of the antagonistic samples; zhang et al designed a black box model watermark based on author signature, and they designed 3 watermark styles: the method comprises the steps of marking target labels appointed by an author on the watermarks by picture marks, random noise and irrelevant pictures respectively, then mixing the target labels into a training set for training, wherein the network obtained through training shows all normal conditions on normal picture input, but when the pictures marked with the watermarks are met, the appointed target labels are output, and therefore existence of the watermarks is proved. Adi et al propose a black-box model watermarking algorithm based on back-door attacks, which randomly select some abstract pictures, apply target labels, mix in a training centralized training network, the trained network appears normal on normal input, and when the selected abstract pictures are encountered, the model outputs the designated target labels, thereby proving the existence of the watermarks. However, the black box model watermarks at this stage are all 0-1 watermark algorithms, i.e. the embedded watermark can only verify the existence or non-existence of the watermark. Guo et al designed a multi-bit black-box model watermark algorithm, they first converted author information into n-bit binary sequences, then sent into random number generator and random sequencer respectively to specify the image after adding watermark label and embedded watermark position and watermark content, when extracting the watermark, only used as information to calculate the embedded watermark position can correctly extract the watermark. Chen et al also implement a multi-bit black-box model watermarking algorithm, when embedding a watermark, first send all pictures in a training set to the network, take a mean value of the output logits and cluster into two types, then correspondingly select pictures and target labels from the two types of pictures according to the copyright identification of an author, generate a countermeasure sample, and then finely tune the model to enhance the attack effect of the countermeasure sample.

However, the existing black box model only considers verification, and how to deal with verification failure caused by the attack of the substitution model is a problem to be considered. The training data set based on the target model can maximally imitate the behavior of the target model, and a polling method is used for learning the target model. There are two main methods for preventing the surrogate model from being attacked, namely, the method prevents the successful training of the surrogate model, and the method activates the backdoor in the trained surrogate model. The training of the deep learning model is based on a gradient descent algorithm, and model parameters are optimized through the back propagation of gradient descent, so that the convergence state is achieved. For the single classification problem, one picture is required to correspond to a single category, gradient reduction of the substitution model can be disabled by corresponding the consistent picture to different classification labels, and the substitution model cannot be converged. There is therefore a need for a method to disable the gradient descent algorithm when training a surrogate model without affecting the copyright owner training model.

Disclosure of Invention

In order to verify the copyright of the model under the condition of a black box and destroy the training of the substitution model, the invention provides a method for defending the attack of the substitution model and verifying the copyright of a DNN model, which specifically comprises the following steps:

s1, constructing a joint deployment model, wherein the model comprises an extraction network and a classification network;

s2, in the combined deployment model, a data set comprises an original data set and a trigger set, and the trigger set is generated by utilizing a spatial invisible watermark system based on the original data set;

s3, in the process of training the joint deployment model, extracting the watermarks of the data set by using an extraction network, wherein the watermarks extracted from the original data set are consistent with the original data, and the watermarks extracted from the trigger set are the watermarks generated by the spatial invisible watermark mechanism in the step S2;

s4, adding disturbance into the data after the watermark is extracted, and performing countermeasure training on the classification model;

and S5, if an attacker obtains the model and the data set in the server through attack and trains to obtain a substitution model, respectively inputting the original data set trigger set into the substitution model, and if the classification results of the two data sets are different, the model is the substitution model.

Further, the loss function of the extraction network in the training process is as follows:

L _R ＝λ ₄ *l _wm +λ ₅ *l _self ；

wherein l _wm The watermark loss is realized by extracting the data in the trigger set through an extraction networkWatermark, λ ₄ Is the weight lost by the watermark; l. the _self The self-loss is realized by extracting the data in the original data set through an extraction network to obtain an image consistent with the original image, and the lambda is ₅ Is the weight lost by itself.

Further, watermark loss l _wm Expressed as:

wherein x is _i 'As the ith image in the trigger set X', N _c Is the number of pixels; r (x) _i ') denotes the extraction network R from the trigger set image x _i ' extracting the obtained data, wherein l is a watermark; n is a radical of _f Extracting the number of neurons in the network; VGG _k (R(x _i ')) represents the ith image x in the trigger set _i The features of the watermark extracted by the extraction network R on the kth layer of the VGG network; VGG _k (l) Features of the watermark at the k-th layer of the VGG network.

Further, self-loss l _self Expressed as:

wherein x is _i Representing the ith image, N, in the original data set X _c Is the number of pixels, R (x) _i ) Representing the ith image X of an extraction network from a raw data set X _i Extracting the obtained watermark, x _i 'As the ith image in the trigger set X', N _f For extracting the number of neurons in the network, VGG _k (R(x _i ) Representing the features of the watermark extracted from the ith image in the original data set X through the extraction network R on the kth layer of the VGG network; VGG _k (x _i ) For the ith image X in the original data set X _i Features at the kth layer of the VGG network.

Further, in the process of performing countermeasure training on the classification model, if the label corresponding to one original data x is y, the data after the disturbance is added to the original data is represented as x + Δ x, the label corresponding to x + Δ x is made to be y, and the label obtained by classifying the original data x by the classifier M after the training is completed is y', and the label obtained by classifying x + Δ x is y.

Further, the loss function of the classification model training process is expressed as:

wherein x represents a datum in the training set, y represents a real label corresponding to the datum x, and y' represents an error label corresponding to the datum x; l (x + Deltax, y; theta) represents a loss function of the classification model, namely the cross entropy loss of the classification model; theta represents a model parameter of the classification model; Δ x is a disturbance belonging to the set of disturbances Ω;

represents the cross entropy loss, f _θ (x) The label is expressed by the predicted system parameter when the input is the system parameter.

Compared with the prior art, the invention has the following beneficial effects:

the use scene is wider: the model is subjected to copyright verification based on a black box mode, the model structure and parameters are not required to be known during the copyright verification of the model, and the use scene is wide;

higher safety: according to the method, the training data set is poisoned through an invisible watermark mechanism, only a model copyright owner can train a model in the poisoned data set, training of the substitution model is invalid, and the situation that an attacker trains the substitution model to disable black box verification is avoided.

Drawings

FIG. 1 is a schematic diagram of a scenario adopted by a method for defending against substitution model attacks and capable of verifying the copyright of a DNN model;

FIG. 2 is a general flow diagram of one embodiment of a method of the present invention for verifying the copyright of a DNN model for protection against substitution model attacks;

FIG. 3 is a schematic illustration of multiple gradients in an embodiment of the invention;

FIG. 4 is a diagram of a model framework employed in an embodiment of a method of verifiable DNN model copyrights for protection against substitution model attacks of the present invention;

FIG. 5 is a structure of an arbiter employed in the present invention;

FIG. 6 is an extraction network for extracting a watermark in the present invention;

fig. 7 shows a watermark embedding network for acquiring a trigger set according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The invention provides a method for defending alternative model attack and capable of verifying DNN model copyright, as shown in figure 2, which specifically comprises the following steps:

The invention mainly aims at the scene that an attacker intends to avoid a backdoor by training a surrogate model, as shown in FIG. 1. The method mainly comprises five processes:

the first stage is as follows: constructing a mixed data set based on the invisible watermark system;

in the second stage, a watermark extraction network and a classification network are jointly trained based on a mixed data set;

the third stage, reverse the training stage of confrontation, make the model only have the correct result to the output of the extracting network of the watermark;

a fourth stage: model deployment and verification, namely, jointly deploying an extraction network and a classification network at the stage, and judging the copyright ownership of the model according to the difference and the sameness of the output results of a trigger set and a common data set when the copyright is verified;

the fifth stage: an attacker trains a substitution model, the attacker performs internal attack to obtain a mixed data set and a source model and trains the substitution model based on the mixed data set and the source model, and the gradient descent algorithm of the substitution model is destroyed by multiple labels corresponding to visual consistency images caused by an invisible watermark mechanism of the mixed data set, so that the substitution model is not converged.

In this embodiment, the raw data set is first collected, the trigger set is generated using a spatially invisible watermarking scheme, and the raw data set and the trigger set are treated as a mixed data set. In this process, the raw dataset may be directly obtained from an existing image-based classification dataset, such as CIFAR-10, MINIST, etc. The invention can carry out copyright protection on the classification model trained by any data set.

Defense is based on the idea that a source model and a data set substitution model thereof have two ideas, one is to fail gradient descent of the substitution model, and the other is to enable a backdoor to still be activated in the substitution model. It is a feasible practice to still activate backdoors in the substitution model, but the generated substitution model can still be used, which still infringes the interests of copyright holders to some extent. The method for directly preventing the training of the surrogate model is the most effective method, the deep learning optimizes internal parameters based on the gradient descent algorithm, and the training of the model can be prevented by destroying the gradient descent algorithm. The invisible watermark is also called a blind watermark, so that the image bearing the watermark and the original image have completely consistent appearances, the invisible watermark mechanism is combined with the back door mechanism, the images with the invisible watermark correspond to different categories and serve as a trigger set to trigger the back door, and meanwhile, the images with consistent appearances have different labels because the trigger set and the original data set serve as a mixed training set, so that the training substitution model cannot be converged.

The method comprises the steps of performing watermark embedding on images in an original data set through a watermark embedding network to obtain a trigger set image, namely adding a spatial invisible watermark on the basis of the original image, selecting any method for adding the spatial invisible watermark on the images from the prior art by a person skilled in the art, not otherwise limiting the method, training the visual consistency and the structural consistency before and after watermark embedding, and taking the trigger set (D2) and the original data set (D1) as a mixed training set (D), namely D = D1+ D2, x = D1 _i The ith picture in D has a corresponding label of y _i Then there is data x of the visually original data set _m = data x of trigger set _n If the mth data x in the original data set _m Is given by the label y _m N-th data x in the trigger set _n Is given by the label y _n ，x _m 、x _n With visual consistency, the surrogate model trained by the attacker would consider x _m ＝x _n But its corresponding tag has y _m ≠y _n I.e. the same picture is considered to correspond to different classification labels.

Further, embedding the watermark in the original image requires that the image containing the watermark (trigger set) remain visually consistent with the original carrier image in order not to sacrifice the quality of the original carrier image. Since the generation countermeasure network has a good expression in terms of the difference of desired different images, the present embodiment adds a discriminator D after the watermark embedding network H until the discriminator D does not discriminate whether or not the image data output from the watermark embedding network H is a watermark image, to further improve the image quality of the watermark image. In the embodiment of the present invention, as shown in fig. 4, UNet can be used as the watermark embedding network H, and the UNet network structure as shown in fig. 7 is widely used by image processing tasks, especially tasks where the input image and the output image have the same attribute, because the UNet network has weight connection sharing, the loss function of the watermark embedding network can be expressed as:

L _H ＝λ ₁ *l _bs +λ ₂ *l _vgg +λ ₃ *l _adv ；

wherein λ is ₁ 、λ ₂ 、λ ₃ For a hyper-parameter, all three coefficients may be set to 1;

N _c representing the total number of pixel values of the image, X 'representing the image in the trigger set X', X representing one image in the original image set X; loss of perception _vgg Is the difference between the VGG characteristics of x' and x, and can be expressed as:

wherein, VGG _k () Refers to the feature extracted from the k-th layer of the VGG network, N _f For the number of neurons of the VGG network, the VGG network in this embodiment adopts a VGG16 network; the countermeasure loss is used to constrain the discrimination of a discriminator, the discriminator is used to judge whether the image is a trigger set image or an original image after the watermark is embedded, and the loss function is expressed as:

wherein, D (x) _i ) Representing an image x _i Inputting a discriminator; l _adv The meaning of (1) is that for an ideal discriminator, the output is 1 when the input image is an original image, and the output is 0 when the input image is a trigger set image. The above loss function can be used to train out the requiredAnd (3) a watermark embedding network, wherein the watermark embedding network is used for embedding the copyright watermark into the original data set to obtain a trigger set, and the trigger set and the original data set are mixed.

In this embodiment, an original image and a watermarked image are input into a discriminator, and the probability that the input image is the original image or the watermarked image is determined until the discriminator cannot determine a difference between the watermark output by an encoder and the original image, as shown in fig. 5, the discriminator is formed by cascading three convolution modules and a convolution layer, where one convolution module sequentially includes one convolution layer (Conv), one Batch Normalization layer (BN), and one ReLU active layer.

And (3) taking the mixed data set as a training set, and using a watermark extraction network to jointly train the protected classification model, so as to solve the problem that the gradient cannot be reduced. The mixed data set can prevent training of the substitution model, but can also interfere with the training of the model by the copyright owner, so that a network for purifying the poisoned data set is required to be trained, pictures with consistent appearances are extracted, and the extracted results are used as the trained images instead of the mixed data set images to train the classification network model.

And (3) carrying out watermark extraction on the mixed data set, wherein in order to ensure the accuracy of the classification model of the joint training, the watermark-free part in the mixed data set is required to be kept consistent with the original image in the extracted result. And for the part with the watermark, extracting the watermark and requiring the extraction result to be consistent with the watermark. And then, training the model owner based on the result of extracting the network, and solving the problem that the gradient cannot be reduced because the visual consistency of the pictures with the visual consistency is removed after the network is extracted.

In this embodiment, for the extraction network R, a CEILNet is used, which follows the network structure of an automatic encoder, fig. 7, consisting of three convolutional layers, fig. 6, and a decoder consisting of one anti-convolutional layer and two convolutional layers. For the watermark extraction loss function, it is necessary to constrain the extraction network R to extract the watermark from the trigger set, and output the image itself from the original image, and add a layer of gaussian noise after extraction, and the loss function proposed in this embodiment is expressed as:

L _R ＝λ ₄ *l _wm +λ ₅ *l _self ；

wherein λ is ₄ 、λ ₅ Respectively, a hyperparametric of watermark loss, self-loss, lambda ₄ 、λ ₅ Is set to 1; l _wm For watermark loss, it is expressed as:

wherein, l is a watermark, and the watermark is used for synthesizing with the image of the original data set to obtain a trigger set; the watermark loss constraint watermark extraction network R extracts images which keep consistency with the watermark image structure and vision for the trigger set image, so that the visual perception loss l is added _vgg (ii) a Self-loss l _self Expressed as:

and the self-loss constraint watermark extraction network R extracts an image which keeps consistent with the original image structure and vision for the original image.

The classification network is a back door embedded in the joint training process with the extraction network, so that the classification network can not be normally used when being separated from the extraction network. Countering an attack, i.e. deliberately adding some imperceptible subtle perturbations to the input samples, results in the model giving a false output with high confidence. The adversity training is a defense method adopted to improve the robustness of the model to the adversity sample, in the embodiment, some adversity samples are constructed and added into the original data set, the robustness of the model to the adversity sample is hopefully enhanced, and the adversity training formula is expressed as follows:

the above formula can also be expressed as:

wherein x represents a datum in the training set, y represents a real label corresponding to the datum x, and y' represents an error label corresponding to the datum x; l (x + Deltax, y; theta) represents a loss function of the classification model, namely the cross entropy loss of the classification model; theta represents a model parameter of the classification model; Δ x is a disturbance belonging to the disturbance set Ω;

The antagonism training process is as follows:

1. adding disturbance to the input x can make the model unable to obtain a correct prediction result y, i.e. f (x + Δ x) ≠ y, so as to generate a corresponding loss value according to the loss function, wherein the value of the disturbance Δ x can be limited to a disturbance space Ω, | | | Δ x | | | | ≦ e, ensuring that the input sample is indistinguishable to a human, and e is a minimum threshold value, which is generally set to 0.01.

2. The input of the model after adding disturbance is x + delta x, the model is trained by using (x + delta x, y), and the model parameter theta is updated to minimize the average loss of training data.

According to the idea of antagonism training, a reverse antagonism training algorithm is provided to train a classification network, in the traditional antagonism training, the generalization capability of a model to an image added with disturbance is optimized, and the robustness is improved. The method specifically comprises the following steps:

assuming that x corresponds to a tag y, setting the tag of x + Δ x to be y, i.e., M (x + Δ x) = y; and for undisturbed x, M (x) = y';

the noise addition is completed by a watermark extraction network, in the embodiment, a noise layer is added after the last layer of the extraction network is output, and the noise layer randomly adds a Gaussian noise on the basis of the extraction network output result; specifically, in order to make the data to which noise is added correspond to a correct label and the data to which noise is not added correspond to a dislocated label in the classification task, that is, if the label corresponding to one original data x is y, the data after disturbance is added to the original data is represented as x + Δ x, the label corresponding to x + Δ x is y, and the label obtained by classifying the original data x by the classifier M after training is y', and the label obtained by classifying x + Δ x is y; the loss function in the extraction network S2 is then:

l _task ＝l _g +l _ng

l _g ＝-[ylogy+(1-y)log(1-y)]

l _ng ＝[ylogy+(1-y)log(1-y)]

wherein l _g Is a loss function that adds gaussian noise floor images.

The classification model trained by the loss function can only correctly output a classification result for the graph which is output by the extraction network and added with specific disturbance, and can output an error result for a clean image without noise.

Jointly deploying an extraction network and a protected model, namely jointly deploying the watermark extraction network and a classification network into an API, receiving input by the watermark extraction network and outputting a final result by the classification network, and assuming that the input is x, the joint network is M ', the classification network is M and the extraction network is E, then M' (x) = M (E (x)) = y.

Attackers acquire a joint deployment model and a training set through internal attack and train the joint deployment model and the training set, namely the training set of a substitution model is usually based on the existing training set of a target model, some disturbance is added, and the target model is polled in a self-adaptive mode to expand a limited seed data set. Suppose there is a synthetic dataset { (x) _i ,y _i ) And (4) a synthesis set generated based on the mixed data set obtained in S1, wherein x is contained in the synthesis set _i ＝x _j Corresponds to y _i ≠y _j . As shown in fig. 3: then for sample x _i Having a gradient

For sample x _j Is provided with

And because of x _i ＝x _j ，y _i ≠y _j Therefore u is _i ≠u _j For the model, because a same picture has different gradients in the training process, the gradient can not be reduced normally, and the model can not reach the convergence state.

After the attacker fails to attempt to train the surrogate model to achieve the goal of the theft model, it is more likely that the target model is directly deployed for profit-making, which is also the lowest cost form of theft. Based on the above steps, the obtained combined model has two requirements: 1. the function is maintained. 2. Sensitive to the trigger set. Assuming that there are a combined model M ', a single classification model M, a trigger set X', and an original data set X, the combined model that satisfies the above requirements is described as follows:

1. when a picture in Z is input into the model M ', M' (X) = M (E (X)) = M (X), that is, when X is classified, M (X) → M (xi) = { y1, y2.., yn }.

2. When the picture in Z is input into the model M, M (X) → M (X) = { y1, y2.., yn }, which is consistent with the picture input into M', it is indicated that the joint deployment model does not affect the original function of the single classification model.

3. When a picture in X 'is input to the model M', M '(X') = M (E (X ')) = M (l'), i.e., i 'is classified, and if the l correspondence type is different from X in S2, M (l') → M (l) = { yl }, which indicates that the joint model outputs a label corresponding to l instead of a label corresponding to X for the trigger set.

4. When a picture in X ' is input into the model M, M (X ') = M (xi ') → M (xi) = { w1, w 2., wk }, it is proved that only the protected model can trigger the backdoor and the normal model is not sensitive to the trigger set.

Based on the above description, for a suspected model, if the suspected model outputs { yl } for the trigger set and { y1, y2.

An attacker acquires the joint deployment model, but because the joint deployment can carry a backdoor embedded by a copyright owner, an attempt is made to deploy only the classification model instead of deploying the watermark extraction network and the classification network together. The classification model trained based on the inverse antagonism algorithm provided by the embodiment directly outputs an erroneous result for normal undisturbed input. Therefore, the model cannot be normally used without jointly deploying the models. The backdoor is triggered whenever the models are co-deployed. Therefore, the above method can protect the model copyright of the copyright holder.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. A method for defending alternative model attack and capable of verifying DNN model copyright is characterized by specifically comprising the following steps:

2. The method for protecting against alternative model attack capable of verifying DNN model copyright as claimed in claim 1, wherein the loss function of the extraction network during training is:

L _R ＝λ ₄ *l _wm +λ ₅ *l _self ；

wherein l _wm For watermark loss, the purpose is to extract the data in the trigger set by an extraction network to obtain a watermark, lambda ₄ Is the weight lost by the watermark; l. the _self The self-loss is realized by extracting the data in the original data set through an extraction network to obtain an image consistent with the original image, and the lambda is ₅ Is the weight lost by itself.

3. Method for verifiable DNN model copyrights against substitution model attacks as in claim 2, wherein the watermark is lost by l _wm Expressed as:

wherein x is _i ^′ For trigger set X ^′ Middle ith image, N _c Is the number of pixels; r (x) _i ^′ ) Representation extraction network R from trigger set image x _i ^′ Extracting the obtained data, wherein l is a watermark; n is a radical of _f Extracting the number of neurons in the network;

VGG _k (R(x _i ^′ ) Represents the ith image x in the trigger set _i ^′ The features of the watermark extracted by the extraction network R on the kth layer of the VGG network; VGG _k (l) Features of the watermark at the k-th layer of the VGG network; i | · | purple wind ² Representing the L2 norm.

4. The verifiable DNN module of claim 2 for protection against surrogate model attacksMethod of copyright typing characterized by a loss of itself l _self Expressed as:

wherein x is _i Representing the ith image, N, in the original data set X _c Is the number of pixels, R (x) _i ) Representing the ith image X of an extraction network from a raw data set X _i Extracting the obtained watermark, x _i ^′ For trigger set X ^′ Middle ith image, N _f For extracting the number of neurons in the network, VGG _k (R(x _i ) Representing the features of the watermark extracted from the ith image in the original data set X through the extraction network R on the kth layer of the VGG network; VGG _k (x _i ) For the ith image X in the original data set X _i Features at the kth layer of the VGG network.

5. The method for verifying the copyright of the DNN model for defending against the attack of the surrogate model as claimed in claim 1, wherein in the process of performing the countermeasure training on the classification model, if the label corresponding to an original data x is y, the data after the original data is disturbed is represented as x + Δ x, the label corresponding to x + Δ x is y, the label obtained by classifying the original data x by the classifier M after the training is y ', the label obtained by classifying x + Δ x is y, and y' is the error label of the original data x.

6. The method for protecting against alternative model attacks that can verify the copyright of the DNN model in claim 5, wherein the classification model training process is expressed as:

wherein x represents a datum in the training set, y represents a real label corresponding to the datum x, and y' represents an error label corresponding to the datum x; l (x + Δ x)Y; theta) represents a loss function of the classification model, i.e. the cross entropy loss of the classification model; theta represents a model parameter of the classification model; Δ x is a disturbance belonging to the disturbance set Ω;