CN110674937A

CN110674937A - Training method and system for improving robustness of deep learning model

Info

Publication number: CN110674937A
Application number: CN201910599177.0A
Authority: CN
Inventors: 刘祥龙; 刘艾杉; 于航; 张崇智
Original assignee: Beijing University of Aeronautics and Astronautics
Current assignee: Beihang University; Beijing University of Aeronautics and Astronautics
Priority date: 2019-07-04
Filing date: 2019-07-04
Publication date: 2020-01-10

Abstract

The invention discloses a training method and a system for improving robustness of a deep learning model, wherein the training method comprises the following steps: s1, calculating and obtaining the confrontation noise required by each hidden layer neuron of the model when the model is reversely transmitted, and storing the confrontation noise; s2, when the model is transmitted in the forward direction, the corresponding counter noise of the neuron is taken out from the noise storage unit, the neuron value of the hidden layer is updated, and the forward transmission is continued; s3, iteratively executing the step S1 and the step S2 to share P rounds, and finishing the training of improving the robustness of the deep learning model; wherein P is a positive integer. The method effectively improves the robustness of the deep learning model to resisting sample noise and natural noise while ensuring the generalization capability of the model, and improves the stability of the deep learning model when applied in an actual scene; meanwhile, the traditional forward-backward training process is combined, the calculation complexity is effectively reduced, and the applicability is greatly improved.

Description

Training method and system for improving robustness of deep learning model

Technical Field

The invention relates to a training method for improving the robustness of a deep learning model, and also relates to a training system for realizing the method.

Background

In recent years, deep learning has achieved excellent success in a number of challenging areas such as computer vision and natural language processing. In practical applications, deep learning is usually applied to large data sets, and in the data set formed by data collected from daily life, a large amount of noise is inevitably contained, and although the noise has no influence on human cognition and object recognition, the noise can mislead a deep neural network to make wrong decisions, which poses a serious safety threat to the practical application of machine learning in the digital and physical world.

Meanwhile, the importance of the interpretable deep learning is highlighted by the fact that the tiny noise can cause the deep neural network to make completely wrong decisions, the basis of the deep model in classification and judgment, and how to further improve the stability and the expression capacity of the deep learning model. Therefore, training robust, interpretable deep neural networks has received much attention in recent research.

In this context, in order to improve the defense capability of the deep learning model against noise, many defense methods against noise are proposed by experts in the industry, and some of them use a method (gradient masking or gradient masking) for making the gradient of the model non-calculable or non-differentiable, so as to avoid the conventional gradient-based attack method. However, these gradient-based masking approaches have proven to provide only a false illusion of success of defense. An attack method (Backward Pass differentiated attack) of avoiding gradient can still achieve an attack success rate close to 100%. The countermeasure Training (adaptive Training) method utilizes a data expansion method, adds the countermeasure sample generated by the countermeasure attack method into a Training set for model Training, and improves the defense capability of the model to a certain extent. However, while deep learning models trained on ImageNet datasets using this approach may be robust to single-step attacks, they are still vulnerable in the face of iterative attacks.

Disclosure of Invention

Aiming at the defects of the prior art, the primary technical problem to be solved by the invention is to provide a training method for improving the robustness of a deep learning model.

Another technical problem to be solved by the present invention is to provide a training system for improving robustness of a deep learning model.

In order to achieve the purpose, the invention adopts the following technical scheme:

according to a first aspect of the embodiments of the present invention, there is provided a training method for improving robustness of a deep learning model, including the following steps:

s1, calculating and obtaining the confrontation noise required by each hidden layer neuron of the model when the model is reversely transmitted, and storing the confrontation noise;

s2, when the model is transmitted in the forward direction, the pair anti-noise corresponding to the neuron is taken out from the noise storage unit, the neuron value of the hidden layer is updated, and the forward transmission is continued;

s3, iteratively executing the step S1 and the step S2 to share P rounds, and finishing the training of improving the robustness of the deep learning model; wherein P is a positive integer.

Preferably, when the model is reversely propagated, the confrontation noise required by each hidden layer neuron of the model is calculated and stored, and the method comprises the following steps:

when the model is reversely propagated, sequentially deriving to obtain the confrontation gradient of the loss function to each hidden layer neuron according to a chain rule;

and storing the antagonistic noise required by each neuron in a noise storage unit corresponding to the neuron.

Preferably, the counteracting gradient of the loss function obtained by sequential derivation according to the chain rule for each hidden layer neuron adopts the following formula:

wherein, g^m,tZ representing the m-th hidden layer of the t-th iteration_mThe antagonistic gradient of (c); z is a radical of^m,tAnd (4) representing the output of the mth layer hidden layer neuron of the tth round of iteration.

Preferably, the countermeasure noise required by each neuron is obtained by combining momentum information according to the countermeasure gradient of the loss function for each hidden layer neuron; the following formula is adopted:

wherein epsilon represents the step size of each round of confrontation gradient, k represents the iteration number, (1-eta) is the attenuation rate, and g^m,tZ representing the m-th hidden layer of the t-th iteration_mThe antagonistic gradient of (c); r is^m,tThe countermeasure noise of the mth layer hidden layer neuron for the tth round of iteration.

Preferably, when the model is propagated in the forward direction, the countermeasure noise corresponding to the neuron is taken out from the noise storage unit, the neuron value of the hidden layer is updated, and the forward propagation is continued, including the following steps:

in the forward propagation process, the activation function value a of each neuron in the previous layer is calculated^m-1,t；

During the forward propagation, calculating an activation function of each neuron;

taking out the corresponding antagonistic noise of the neuron from the noise storage unit;

after the model is executed to affine transformation to calculate each layer of input, adding corresponding counternoise into each layer of input;

iterating the output z of the m layer neuron in the t round^m,tInput to the activation function, calculate the activation function value through the activation function and continue the forward propagation.

Preferably, for a neural network, during forward propagation, the input of each layer is calculated through affine transformation, and the following calculation formula is adopted:

z^m,t＝a^m-1,tw^m-1+b^m-1；

wherein z is^m,tIterating the output of the mth layer of neurons for the t round; a is^m-1,tThe activation function of the (m-1) th layer neurons for the t-th iteration; w is a^m-1Is an affine transformation matrix for layer m-1 neurons; b^m-1Is the affine transformation bias for the m-1 layer neurons.

Preferably, the training process is divided into three stages, each stage generating different parameters of the counternoise; at each stage, the opposing noise parameters η and ε are kept at a fixed magnitude.

Wherein preferably, stage 1 is zero counternoise; and (3) value taking of the hyper-parameter: ε is 0, k is 1, said phase 1 lasts for p₁A wheel;

stage 2 is large countermeasure noise; and (3) value taking of the hyper-parameter: the value of epsilon is larger, the value of k is smaller, and the stage lasts for 2 consecutive p₂A wheel;

stage 3 is little countermeasure noise; and (3) value taking of the hyper-parameter: the value of epsilon is small, the value of k is large, and the stage 3 lasts for p₃A wheel;

wherein p is₁、p₂、p₃Are all positive integers.

Preferably, the number P of iterations of step S1 and step S2 in step S3 is the sum of the number of iterations of each of step S1 and step S2.

According to a second aspect of the embodiments of the present invention, there is provided a training system for improving robustness of a deep learning model, including a processor and a memory; the memory has stored thereon a computer program operable on the processor, which when executed by the processor performs the steps of:

The training method for improving the robustness of the deep learning model provided by the invention combines the traditional forward-reverse training process, and adds corresponding counternoise for each hidden layer neuron, so that the model parameters obtained by training are stable to noise input in the r-neighborhood of the data sample. The method effectively improves the robustness of the deep learning model to resisting sample noise and natural noise while ensuring the generalization capability of the model, and improves the stability of the deep learning model when applied in an actual scene; because the method is embedded in the traditional forward-backward training process, the computational complexity of the method is effectively reduced, and the applicability of the method is greatly improved.

Drawings

FIG. 1 is a flowchart of a training method for improving robustness of a deep learning model according to the present invention;

fig. 2 is a schematic structural diagram of a training system for improving robustness of a deep learning model provided by the invention.

Detailed Description

The technical contents of the present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.

At present, although deep learning has achieved excellent success in many challenging areas such as computer vision and natural language processing, it is still vulnerable to noise, especially against sample noise and natural noise.

Challenge samples refer to a very subtle noise of design that is indistinguishable to the human eye but devastating to deep learning models:

F_θ(x^adv)≠y s.t.||x-x^adv||＜ε；

where x denotes a normal sample, x^advRepresenting a challenge sample. x and x^advThere is a visual similarity between them, with distances less than ε, but the deep learning model F misclassifies them.

In addition to countering sample noise, natural noise occurs at a very high frequency in daily life, such as rain and snow, focus blur, and digital processing noise. They also become a strong challenge in building robust deep learning models. However, the research on the robustness of the lifting model to the natural noise is still in the beginning stage in the industry at present, and the research result is very little. The current confrontation defense method only adds confrontation noise to input data, so that the trained deep learning model is still vulnerable to well-designed generalized noise and iterative confrontation attack.

At present, the method for improving the robustness of a deep learning model to a challenge sample is mainly as follows: and (4) counterattack training, namely, generating a training model with counterattack samples mixed in the training data set by using a counterattack method. The method can improve the defense capability of the model for resisting sample noise to a certain extent, but the method generally reduces the generalization capability of the model and has poor stability for natural noise. The instability of the deep learning model to noise generally occurs in sudden changes of a feature map of a certain hidden layer and a neuron activation value during forward propagation, so that the stability of each hidden layer of the neural network is important, the noise insensitivity of the neural network is enhanced, and the stable behavior of the hidden layer is ensured to be helpful for ensuring the robust model.

The method provided by the invention not only adds anti-noise to the input data part during training, but also adds noise to each hidden layer neuron. Because the confrontation noise of each hidden layer neuron is difficult to calculate, the conventional forward-reverse training mode is combined, the confrontation gradient is solved by utilizing a chain rule during reverse propagation, and the confrontation noise value obtained in the previous step is added during forward propagation. This strategy forces the model to minimize the task-specific model loss, with opposite countermeasures to noise in each hidden layer neuron to expect maximum loss. The learning parameters in each layer then enable the model to make a consistent and stable prediction of the normal samples and their noise substitution distributed in the neighborhood, providing robust robustness to the depth model. The method effectively improves the robustness of the model to the countersample noise and the natural noise, and is convenient and simple to calculate and convenient to apply.

As shown in fig. 1, the training method for improving the robustness of the deep learning model provided by the present invention includes the following steps: when the model is reversely propagated, calculating to obtain an antitarnish gradient g corresponding to each hidden layer neuron of the model, combining momentum information to obtain antinoise R required by each neuron, and finally storing the antinoise R in a noise storage unit R of each neuron; when the model is propagated in the forward direction, the corresponding counternoise R is taken out from the noise storage unit R, the counternoise R is added after the model is subjected to affine transformation, and then the activation function value is calculated and the forward propagation is continued; and (4) iteratively executing the P rounds of the steps to finish the training of the robustness model, wherein P is a positive integer. This process is described in detail below.

And S1, calculating and obtaining the counteracting noise required by each hidden layer neuron of the model when the model reversely propagates, and storing the counteracting noise.

Specifically, let n data samples be included in the data set D, the feature vector X ∈ X, the label category Y ∈ Y, and the data samples are hereinafter referred to simply as samples. First, formally, a deep neural network y ═ F (x; θ) is a combination of many non-linear mappings F, one for each layer, as follows:

wherein z is₀X denotes input, z_M+1Y denotes output, z_mRepresenting the output of the mth hidden layer neuron, and theta is the weight of the network; m is the number of hidden layer neurons.

In the embodiment provided by the invention, the hidden state (output of the mth hidden layer neuron) z in the mth layer_mIntroducing countermeasures against the noise r_m：

By usingIndicating noise injection in all layers

Then learns the network parameters θ by minimizing the following penalty function L:

for each data point (x; y), the search for the countering noise r is constrained by an L-2 norm (which is often used to measure the vector length magnitude or distance in some vector space), and the coefficient η controls the magnitude of the countering noise.

In the embodiment provided by the invention, when the model is reversely propagated, the confrontation noise required by each hidden layer neuron of the model is calculated and stored; the method specifically comprises the following steps:

and S11, when the model is propagated reversely, sequentially deriving the antagonistic gradient g of the loss function L for each hidden layer neuron according to a chain rule.

Specifically, for each data point (x; y) in the mini-batch, the internal optimization is approximated by running k gradient descent steps to better utilize the information of the data in each mini-batch and to better fit the data distribution. For all M0, …, M, initialize r^m,00. For the m-th layer z_mAntagonistic gradient g of^mSelecting the direction in which the loss function L can be increased to a maximum, i.e. the loss function L with respect to z_mGradient rising direction of (1), z of the m-th hidden layer_mThe confrontation gradient of (1) is calculated by the following method:

wherein the content of the first and second substances,

representing the hamiltonian, the operator,

representing noise r^mThe gradient of (a) of (b) is,

representing a hidden state r_mOf the gradient of (c). g^m.tRepresenting the mth layer z of the tth iteration_mThe antagonistic gradient of (c). Here, the invention proposes an important observation, namely the countering noise z_mIs equal to the hidden state r_mThe gradient is already calculated in the standard back propagation, so no additional computational expense is introduced.

In combination with the above, the present invention selects and combines the chain rule to calculate the antagonistic gradient, and one derivation obtains the antagonistic gradient g of the loss function L for a hidden layer neuron, and the antagonistic gradient will increase the size of the loss function L, and the calculation method is as follows:

wherein, L () represents the loss function of the deep learning model, cross entropy is used in the embodiments provided by the present invention; f () is a function represented by the deep learning model; z is a radical of_mAn output representing an mth layer hidden layer neuron of the deep learning model; t represents the current iteration round.

Wherein, the total number of iterations is used to better utilize all information in each batch of data (mini-batch). According to the subsequent optimization iteration process, the whole optimization process has certain influence on the iteration times, the best effect is achieved when the value is 3-5 generally, and the running time of the algorithm is increased due to excessive iteration times.

S12, obtaining the confrontation noise required by each neuron according to the confrontation gradient of the loss function L for each hidden layer neuron and combining momentum information; the following formula is specifically adopted:

wherein, epsilon tableShowing the step size of each round of confrontation gradient, and k represents the number of iteration; and normalized by the number of steps k, so that the total magnitude of the updates for k steps is equal to epsilon. (1- η) is the attenuation rate, and the factor (1- η) corresponds to the L-2 regularization of the noise amplitude. In practice, η and ε control the previous noise value r, respectively^m,tAnd gradient g^m,tFor new noise r^m,t+1The contribution of (c). r is^m,tThe antinoise noise of the mth layer hidden layer neurons is iterated for the tth round.

And S13, storing the antagonistic noise required by each neuron in a noise storage unit corresponding to the neuron.

S2, when the model is transmitted in the forward direction, the antagonistic noise R corresponding to the neuron is taken out from the noise storage unit R, the neuron value of the hidden layer is updated, and the forward transmission is continued; the method specifically comprises the following steps:

s21, in the forward propagation process, calculating to obtain the activation function value a of each neuron in the previous layer^m-1,t(ii) a In the embodiment provided by the present invention, the activation function of the neuron is calculated by a conventional method, which is not described herein again.

S22, calculating an activation function of each neuron in the forward propagation process;

s23, extracting the antagonistic noise R corresponding to the neuron from the noise storage unit R;

s24, after the model executes affine transformation to calculate each layer of input, adding corresponding counternoise in each layer of input;

specifically, for a neural network, during forward propagation, each layer of input is calculated through affine transformation, and the following calculation formula is adopted:

z^m,t＝a^m-1,tw^m-1+b^m-1；

wherein z is^m,tThe output of the mth layer hidden layer neuron for the tth round of iteration; a is^m-1,tThe activation function of the (m-1) th layer hidden layer neuron for the t-th iteration is carried out; w is a^m-1Is affine transformation matrix of the neural element of the (m-1) th hidden layer; b^m-1Is the affine transformation bias of the m-1 layer hidden layer neurons.

Then, the countermeasure noise R corresponding to the neuron is taken out from the noise storage unit R, and added, that is:

r^m,t＝R；

z^m,t＝z^m,t+r^m,t；

s25, the output z of the m layer hidden layer neuron of the t round iteration^m,tInput to the activation function, calculate the activation function value through the activation function and continue the forward propagation.

a^m,t＝relu(z^m,t)；

In practice, for each small batch of training processes, a corresponding counternoise r is stored for each hidden layer neuron in a backward propagation process, in a forward propagation process, r is simply taken as noise and added to the input in the corresponding hidden layer neuron after the affine transformation and before the activation function. This training process does not introduce a substantial increase in computational and memory consumption, except that a noise storage unit R needs to be integrated into each neuron to store the countering noise R. At the time of inference, R can be discarded, thus having no impact on model complexity.

And S3, iteratively executing the step S1 and the step S2 to share P rounds, and finishing the training for improving the robustness of the deep learning model. Wherein P is a positive integer.

In the embodiments provided by the present invention, in order to make the model robust against various types of true noise and at the same time achieve a strong generalization capability on the original samples, the magnitude of the noise needs to be carefully controlled to balance the over-fitting and under-fitting. In order to achieve the aim, in the whole training period process (the training process for improving the robustness of the deep learning model), the training process is divided into three stages, each stage corresponds to a plurality of batches of data, and each stage can generate antagonistic noise with different degrees. P is the sum of the number of iterations of each stage step S1 and step S2. At each stage, the opposing noise parameters η and ε are kept at a fixed magnitude.

Specifically, for each stage of batch data (mini-batch), keeping fixed-size counternoise parameters η and ε, propagation is performed p times; on the basis, aiming at the whole training period process of the model, the magnitude of the antagonistic noise introduced by different stages (one stage comprises a plurality of batches of data) is changed during the training of the model.

Stage 1: zero challenge noise (original sample). The algorithm has the following hyper-parameter values: epsilon is 0, k is 1, and this phase lasts for p₁And (4) wheels. This stage uses only the original samples to achieve a good start of the deep neural network. In the absence of any noise, the model can be trained quickly to have a basic classification capability, avoiding instability until a reasonably good model is found.

And (2) stage: the noise is greatly resisted. The algorithm has the following hyper-parameter values: the value of epsilon is larger, the value of k is smaller, and p lasts for the period₂And (4) wheels. At this stage, the deep learning model is further trained by introducing competing noise ε using a relatively large competing step size. This enables us to quickly improve the noise immunity of the model and strongly push the decision boundary away from the data points to increase the robustness against the challenge samples, while using a large step size allows us to run a smaller number k of gradient steps, thus saving computation. In the embodiment provided by the invention, when the parameter is taken as a value, if the value of epsilon is larger, the countermeasure noise is introduced by using a larger countermeasure step length relative to the stage 3; the smaller value of k is relative to the smaller value of stage 3. The specific values of epsilon and k are set according to experimental data.

And (3) stage: little antagonistic noise. The algorithm has the following hyper-parameter values: the value of epsilon is small, the value of k is large, and p lasts in the stage₃And (4) wheels. A larger countermeasure step increases robustness but may affect the prediction effect. In the final stage, we reduce the challenge step size in order to fine-tune the model appropriately to achieve a better balance between robustness and prediction accuracy. In the embodiment provided by the invention, when the parameter is taken as a value, the smaller value of epsilon is to reduce the impedance step length on the basis of the stage 2; the larger k value is the value of k increased on the basis of the phase 2. The specific values of epsilon and k are set according to experimental data.

For each batch of data (mini-batch), the flow of the training method for improving the robustness of the deep learning model based on the anti-noise propagation is shown in the following table 1:

TABLE 1 schematic flow chart of training method

According to the training method for improving the robustness of the deep learning model, in the deep model training process, the traditional forward-reverse training process is combined, corresponding countermeasures are added for each hidden layer neuron, the strategy forces the model to minimize the model loss of a specific task, and each hidden layer neuron has opposite countermeasures to maximize the expected loss. The learning parameters in each layer then enable the model to make a consistent and stable prediction of the normal samples and their noise substitution distributed in the neighborhood, providing robust robustness to the depth model. The method effectively improves robustness of the model to resisting sample noise and natural noise, and is convenient and simple to calculate and convenient to apply.

In summary, in the training method for improving the robustness of the deep learning model provided by the invention, in the deep model training process, the traditional forward-reverse training process is combined, and corresponding counternoise is added for each hidden layer neuron, so that the model parameters obtained by training are stable to noise input in the r-neighborhood of the data sample. S1, calculating the noise needed by each hidden layer neuron of the model when the model is reversely transmitted, and finally storing the noise; s2, when the model is transmitted in the forward direction, corresponding counternoise is taken out from the noise storage unit, the neuron value of the hidden layer is updated, and the forward transmission is continued; and S3, iteratively executing the P rounds of the steps to finish the training of the robustness model. According to the training method based on the antagonistic noise propagation, provided by the invention, the model generalization capability is ensured, the robustness of the deep learning model to antagonistic sample noise and natural noise is effectively improved, and the stability of the deep learning model in the application in an actual scene is improved; meanwhile, the traditional forward-reverse training process is combined, the complexity of calculation is effectively reduced, and the applicability is greatly improved.

The invention also provides a training system for improving the robustness of the deep learning model. As shown in fig. 2, the system includes a processor 22 and a memory 21 storing instructions executable by the processor 22;

processor 22 may be a general-purpose processor, such as a Central Processing Unit (CPU), a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits configured to implement embodiments of the present invention, among others.

The memory 21 is used for storing the program codes and transmitting the program codes to the CPU. Memory 21 may include volatile memory, such as Random Access Memory (RAM); the memory 21 may also include non-volatile memory, such as read-only memory, flash memory, a hard disk, or a solid state disk; the memory 21 may also comprise a combination of memories of the kind described above.

Specifically, the scheduling terminal dual-interface fast switching system based on the FPGA provided in the embodiment of the present invention includes a processor 22 and a memory 21; the memory 21 has stored thereon a computer program operable on the processor 22, which when executed by the processor 22, performs the steps of:

Wherein, when the model is reversely propagated, the confrontation noise required by each hidden layer neuron of the model is obtained by calculation and stored, and the following steps are realized when the computer program is executed by the processor 22;

Wherein the computer program realizes the following steps when being executed by the processor 22;

according to the chain rule, the successive derivation results in a loss function, and the following formula is adopted for the confrontation gradient of each hidden layer neuron:

wherein, g^m.tZ representing the m-th hidden layer of the t-th iteration_mThe antagonistic gradient of (c); z is a radical of^m,tAnd (4) representing the output of the mth layer hidden layer neuron of the tth round of iteration.

according to the confrontation gradient of the loss function for each hidden layer neuron, combining momentum information to obtain the confrontation noise required by each neuron; the following formula is adopted:

wherein epsilon represents the step size of each round of confrontation gradient, k represents the iteration number, (1-eta) is the attenuation rate, and g^m.tZ representing the m-th hidden layer of the t-th iteration_mThe antagonistic gradient of (c); r is^m,tThe countermeasure noise of the mth layer hidden layer neuron for the tth round of iteration.

When the model is propagated in the forward direction, the countermeasure noise corresponding to the neuron is taken out from the noise storage unit, the neuron value of the hidden layer is updated, the forward propagation is continued, and the following steps are realized when the computer program is executed by the processor 22;

aiming at a neural network, during forward propagation, input of each layer is calculated through affine transformation, and the following calculation formula is adopted:

z^m,t＝a^m-1,tw^m-1+b^m-1；

the training process is divided into three stages, and each stage generates different counternoise parameters; at each stage, the opposing noise parameters η and ε are kept at a fixed magnitude.

stage 1 is zero counternoise; and (3) value taking of the hyper-parameter: ε is 0, k is 1, said phase 1 lasts for p₁A wheel;

stage 2 is large countermeasure noise; and (3) value taking of the hyper-parameter: the value of epsilon is larger, the value of k is smaller, and the stage lasts for 2 consecutive p₂A wheel; in the embodiment provided by the invention, when the parameter takes a value, epsilon is takenA larger value introduces counter noise using a larger counter step size relative to phase 3; the smaller value of k is relative to the smaller value of stage 3. The specific values of epsilon and k are set according to experimental data.

Stage 3 is little countermeasure noise; and (3) value taking of the hyper-parameter: the value of epsilon is small, the value of k is large, and the stage 3 lasts for p₃A wheel; in the embodiment provided by the invention, when the parameter is taken as a value, the smaller value of epsilon is to reduce the countermeasure step length on the basis of the stage 2; the larger k value is the value of k increased on the basis of the stage 2. The specific values of epsilon and k are set according to experimental data.

Wherein p is₁、p₂、p₃Are all positive integers. Wherein the computer program realizes the following steps when being executed by the processor 22;

the number P of iterations of step S1 and step S2 in step S3 is the sum of the number of iterations of each of step S1 and step S2.

The embodiment of the invention also provides a computer readable storage medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an ASIC. Additionally, the ASIC may reside in user equipment. However, the processor and the storage medium may reside as discrete components in a communication device.

The training method and system for improving the robustness of the deep learning model provided by the invention are explained in detail above. Any obvious modifications to the invention, which would occur to those skilled in the art, without departing from the true spirit of the invention, would constitute a violation of the patent rights of the present invention and would carry a corresponding legal responsibility.

Claims

1. A training method for improving robustness of a deep learning model is characterized by comprising the following steps:

s2, when the model is transmitted in the forward direction, the corresponding counter noise of the neuron is taken out from the noise storage unit, the neuron value of the hidden layer is updated, and the forward transmission is continued;

2. The training method for improving the robustness of the deep learning model as claimed in claim 1, wherein the counternoise required by each hidden layer neuron of the model is calculated and stored during the backward propagation of the model, comprising the following steps:

3. The training method for improving robustness of the deep learning model as claimed in claim 2, wherein:

according to the chain rule, the confrontation gradient of the loss function for each hidden layer neuron is obtained by sequential derivation according to the following formula:

wherein, g^m.tZ representing the m-th hidden layer of the t-th iteration_mThe antagonistic gradient of (c); z is a radical of^m，tAnd (4) representing the output of the m-th layer hidden layer neuron of the t-th iteration.

4. The training method for improving the robustness of the deep learning model as claimed in claim 2, wherein the confrontation noise required by each neuron is obtained by combining momentum information according to the confrontation gradient of the loss function for each hidden layer neuron; the following formula is adopted:

wherein epsilon represents the step size of each round of confrontation gradient, k represents the iteration number, (1-eta) is the attenuation rate, and g^m.tZ representing the m-th hidden layer of the t-th iteration_mThe antagonistic gradient of (c); r is^m，tThe countermeasure noise of the mth layer hidden layer neuron for the tth round of iteration.

5. The training method for improving robustness of the deep learning model as claimed in claim 1, wherein during forward propagation of the model, the corresponding counteracting noise of the neuron is taken out from the noise storage unit, the neuron value of the hidden layer is updated, and the forward propagation is continued, comprising the following steps:

in the forward propagation process, the activation function value a of each neuron in the previous layer is calculated^m-1，t；

iterating the output z of the m layer neuron in the t round^m，tInput to the activation function, calculate the activation function value through the activation function and continue the forward propagation.

6. The training method for improving the robustness of the deep learning model as claimed in claim 5, wherein for a neural network, during forward propagation, the input of each layer is calculated through affine transformation, and the following calculation formula is adopted:

z^m，t＝a^m-1，tw^m-1+b^m-1；

wherein z is^m，tIterating the output of the mth layer of neurons for the t round; a is^m-1，tThe activation function of the (m-1) th layer neurons for the t-th iteration; w is a^m-1Is an affine transformation matrix for layer m-1 neurons; b^m-1Is the affine transformation bias for the m-1 layer neurons.

7. The training method for improving robustness of the deep learning model as claimed in claim 1, wherein:

8. The training method for improving robustness of deep learning model according to claim 7, wherein:

stage 2 is large countermeasure noise; and (3) value taking of the hyper-parameter: the value of epsilon is larger, the value of k is smaller, and the phase lasts for 2 consecutive p₂A wheel;

wherein p is₁、p₂、p₃Are all positive integers.

9. The training method for improving robustness of deep learning model according to claim 8, wherein:

10. A training system for improving robustness of a deep learning model is characterized by comprising a processor and a memory; the memory having stored thereon a computer program operable on the processor, the computer program when executed by the processor implementing the steps of: