CN110674937A - Training method and system for improving robustness of deep learning model - Google Patents

Training method and system for improving robustness of deep learning model Download PDF

Info

Publication number
CN110674937A
CN110674937A CN201910599177.0A CN201910599177A CN110674937A CN 110674937 A CN110674937 A CN 110674937A CN 201910599177 A CN201910599177 A CN 201910599177A CN 110674937 A CN110674937 A CN 110674937A
Authority
CN
China
Prior art keywords
noise
neuron
model
deep learning
hidden layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910599177.0A
Other languages
Chinese (zh)
Inventor
刘祥龙
刘艾杉
于航
张崇智
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Beijing University of Aeronautics and Astronautics
Original Assignee
Beijing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Aeronautics and Astronautics filed Critical Beijing University of Aeronautics and Astronautics
Priority to CN201910599177.0A priority Critical patent/CN110674937A/en
Publication of CN110674937A publication Critical patent/CN110674937A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a training method and a system for improving robustness of a deep learning model, wherein the training method comprises the following steps: s1, calculating and obtaining the confrontation noise required by each hidden layer neuron of the model when the model is reversely transmitted, and storing the confrontation noise; s2, when the model is transmitted in the forward direction, the corresponding counter noise of the neuron is taken out from the noise storage unit, the neuron value of the hidden layer is updated, and the forward transmission is continued; s3, iteratively executing the step S1 and the step S2 to share P rounds, and finishing the training of improving the robustness of the deep learning model; wherein P is a positive integer. The method effectively improves the robustness of the deep learning model to resisting sample noise and natural noise while ensuring the generalization capability of the model, and improves the stability of the deep learning model when applied in an actual scene; meanwhile, the traditional forward-backward training process is combined, the calculation complexity is effectively reduced, and the applicability is greatly improved.

Description

Training method and system for improving robustness of deep learning model
Technical Field
The invention relates to a training method for improving the robustness of a deep learning model, and also relates to a training system for realizing the method.
Background
In recent years, deep learning has achieved excellent success in a number of challenging areas such as computer vision and natural language processing. In practical applications, deep learning is usually applied to large data sets, and in the data set formed by data collected from daily life, a large amount of noise is inevitably contained, and although the noise has no influence on human cognition and object recognition, the noise can mislead a deep neural network to make wrong decisions, which poses a serious safety threat to the practical application of machine learning in the digital and physical world.
Meanwhile, the importance of the interpretable deep learning is highlighted by the fact that the tiny noise can cause the deep neural network to make completely wrong decisions, the basis of the deep model in classification and judgment, and how to further improve the stability and the expression capacity of the deep learning model. Therefore, training robust, interpretable deep neural networks has received much attention in recent research.
In this context, in order to improve the defense capability of the deep learning model against noise, many defense methods against noise are proposed by experts in the industry, and some of them use a method (gradient masking or gradient masking) for making the gradient of the model non-calculable or non-differentiable, so as to avoid the conventional gradient-based attack method. However, these gradient-based masking approaches have proven to provide only a false illusion of success of defense. An attack method (Backward Pass differentiated attack) of avoiding gradient can still achieve an attack success rate close to 100%. The countermeasure Training (adaptive Training) method utilizes a data expansion method, adds the countermeasure sample generated by the countermeasure attack method into a Training set for model Training, and improves the defense capability of the model to a certain extent. However, while deep learning models trained on ImageNet datasets using this approach may be robust to single-step attacks, they are still vulnerable in the face of iterative attacks.
Disclosure of Invention
Aiming at the defects of the prior art, the primary technical problem to be solved by the invention is to provide a training method for improving the robustness of a deep learning model.
Another technical problem to be solved by the present invention is to provide a training system for improving robustness of a deep learning model.
In order to achieve the purpose, the invention adopts the following technical scheme:
according to a first aspect of the embodiments of the present invention, there is provided a training method for improving robustness of a deep learning model, including the following steps:
s1, calculating and obtaining the confrontation noise required by each hidden layer neuron of the model when the model is reversely transmitted, and storing the confrontation noise;
s2, when the model is transmitted in the forward direction, the pair anti-noise corresponding to the neuron is taken out from the noise storage unit, the neuron value of the hidden layer is updated, and the forward transmission is continued;
s3, iteratively executing the step S1 and the step S2 to share P rounds, and finishing the training of improving the robustness of the deep learning model; wherein P is a positive integer.
Preferably, when the model is reversely propagated, the confrontation noise required by each hidden layer neuron of the model is calculated and stored, and the method comprises the following steps:
when the model is reversely propagated, sequentially deriving to obtain the confrontation gradient of the loss function to each hidden layer neuron according to a chain rule;
and storing the antagonistic noise required by each neuron in a noise storage unit corresponding to the neuron.
Preferably, the counteracting gradient of the loss function obtained by sequential derivation according to the chain rule for each hidden layer neuron adopts the following formula:
Figure BDA0002118687660000021
wherein, gm,tZ representing the m-th hidden layer of the t-th iterationmThe antagonistic gradient of (c); z is a radical ofm,tAnd (4) representing the output of the mth layer hidden layer neuron of the tth round of iteration.
Preferably, the countermeasure noise required by each neuron is obtained by combining momentum information according to the countermeasure gradient of the loss function for each hidden layer neuron; the following formula is adopted:
Figure BDA0002118687660000022
wherein epsilon represents the step size of each round of confrontation gradient, k represents the iteration number, (1-eta) is the attenuation rate, and gm,tZ representing the m-th hidden layer of the t-th iterationmThe antagonistic gradient of (c); r ism,tThe countermeasure noise of the mth layer hidden layer neuron for the tth round of iteration.
Preferably, when the model is propagated in the forward direction, the countermeasure noise corresponding to the neuron is taken out from the noise storage unit, the neuron value of the hidden layer is updated, and the forward propagation is continued, including the following steps:
in the forward propagation process, the activation function value a of each neuron in the previous layer is calculatedm-1,t
During the forward propagation, calculating an activation function of each neuron;
taking out the corresponding antagonistic noise of the neuron from the noise storage unit;
after the model is executed to affine transformation to calculate each layer of input, adding corresponding counternoise into each layer of input;
iterating the output z of the m layer neuron in the t roundm,tInput to the activation function, calculate the activation function value through the activation function and continue the forward propagation.
Preferably, for a neural network, during forward propagation, the input of each layer is calculated through affine transformation, and the following calculation formula is adopted:
zm,t=am-1,twm-1+bm-1
wherein z ism,tIterating the output of the mth layer of neurons for the t round; a ism-1,tThe activation function of the (m-1) th layer neurons for the t-th iteration; w is am-1Is an affine transformation matrix for layer m-1 neurons; bm-1Is the affine transformation bias for the m-1 layer neurons.
Preferably, the training process is divided into three stages, each stage generating different parameters of the counternoise; at each stage, the opposing noise parameters η and ε are kept at a fixed magnitude.
Wherein preferably, stage 1 is zero counternoise; and (3) value taking of the hyper-parameter: ε is 0, k is 1, said phase 1 lasts for p1A wheel;
stage 2 is large countermeasure noise; and (3) value taking of the hyper-parameter: the value of epsilon is larger, the value of k is smaller, and the stage lasts for 2 consecutive p2A wheel;
stage 3 is little countermeasure noise; and (3) value taking of the hyper-parameter: the value of epsilon is small, the value of k is large, and the stage 3 lasts for p3A wheel;
wherein p is1、p2、p3Are all positive integers.
Preferably, the number P of iterations of step S1 and step S2 in step S3 is the sum of the number of iterations of each of step S1 and step S2.
According to a second aspect of the embodiments of the present invention, there is provided a training system for improving robustness of a deep learning model, including a processor and a memory; the memory has stored thereon a computer program operable on the processor, which when executed by the processor performs the steps of:
s1, calculating and obtaining the confrontation noise required by each hidden layer neuron of the model when the model is reversely transmitted, and storing the confrontation noise;
s2, when the model is transmitted in the forward direction, the pair anti-noise corresponding to the neuron is taken out from the noise storage unit, the neuron value of the hidden layer is updated, and the forward transmission is continued;
s3, iteratively executing the step S1 and the step S2 to share P rounds, and finishing the training of improving the robustness of the deep learning model; wherein P is a positive integer.
The training method for improving the robustness of the deep learning model provided by the invention combines the traditional forward-reverse training process, and adds corresponding counternoise for each hidden layer neuron, so that the model parameters obtained by training are stable to noise input in the r-neighborhood of the data sample. The method effectively improves the robustness of the deep learning model to resisting sample noise and natural noise while ensuring the generalization capability of the model, and improves the stability of the deep learning model when applied in an actual scene; because the method is embedded in the traditional forward-backward training process, the computational complexity of the method is effectively reduced, and the applicability of the method is greatly improved.
Drawings
FIG. 1 is a flowchart of a training method for improving robustness of a deep learning model according to the present invention;
fig. 2 is a schematic structural diagram of a training system for improving robustness of a deep learning model provided by the invention.
Detailed Description
The technical contents of the present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
At present, although deep learning has achieved excellent success in many challenging areas such as computer vision and natural language processing, it is still vulnerable to noise, especially against sample noise and natural noise.
Challenge samples refer to a very subtle noise of design that is indistinguishable to the human eye but devastating to deep learning models:
Fθ(xadv)≠y s.t.||x-xadv||<ε;
where x denotes a normal sample, xadvRepresenting a challenge sample. x and xadvThere is a visual similarity between them, with distances less than ε, but the deep learning model F misclassifies them.
In addition to countering sample noise, natural noise occurs at a very high frequency in daily life, such as rain and snow, focus blur, and digital processing noise. They also become a strong challenge in building robust deep learning models. However, the research on the robustness of the lifting model to the natural noise is still in the beginning stage in the industry at present, and the research result is very little. The current confrontation defense method only adds confrontation noise to input data, so that the trained deep learning model is still vulnerable to well-designed generalized noise and iterative confrontation attack.
At present, the method for improving the robustness of a deep learning model to a challenge sample is mainly as follows: and (4) counterattack training, namely, generating a training model with counterattack samples mixed in the training data set by using a counterattack method. The method can improve the defense capability of the model for resisting sample noise to a certain extent, but the method generally reduces the generalization capability of the model and has poor stability for natural noise. The instability of the deep learning model to noise generally occurs in sudden changes of a feature map of a certain hidden layer and a neuron activation value during forward propagation, so that the stability of each hidden layer of the neural network is important, the noise insensitivity of the neural network is enhanced, and the stable behavior of the hidden layer is ensured to be helpful for ensuring the robust model.
The method provided by the invention not only adds anti-noise to the input data part during training, but also adds noise to each hidden layer neuron. Because the confrontation noise of each hidden layer neuron is difficult to calculate, the conventional forward-reverse training mode is combined, the confrontation gradient is solved by utilizing a chain rule during reverse propagation, and the confrontation noise value obtained in the previous step is added during forward propagation. This strategy forces the model to minimize the task-specific model loss, with opposite countermeasures to noise in each hidden layer neuron to expect maximum loss. The learning parameters in each layer then enable the model to make a consistent and stable prediction of the normal samples and their noise substitution distributed in the neighborhood, providing robust robustness to the depth model. The method effectively improves the robustness of the model to the countersample noise and the natural noise, and is convenient and simple to calculate and convenient to apply.
As shown in fig. 1, the training method for improving the robustness of the deep learning model provided by the present invention includes the following steps: when the model is reversely propagated, calculating to obtain an antitarnish gradient g corresponding to each hidden layer neuron of the model, combining momentum information to obtain antinoise R required by each neuron, and finally storing the antinoise R in a noise storage unit R of each neuron; when the model is propagated in the forward direction, the corresponding counternoise R is taken out from the noise storage unit R, the counternoise R is added after the model is subjected to affine transformation, and then the activation function value is calculated and the forward propagation is continued; and (4) iteratively executing the P rounds of the steps to finish the training of the robustness model, wherein P is a positive integer. This process is described in detail below.
And S1, calculating and obtaining the counteracting noise required by each hidden layer neuron of the model when the model reversely propagates, and storing the counteracting noise.
Specifically, let n data samples be included in the data set D, the feature vector X ∈ X, the label category Y ∈ Y, and the data samples are hereinafter referred to simply as samples. First, formally, a deep neural network y ═ F (x; θ) is a combination of many non-linear mappings F, one for each layer, as follows:
wherein z is0X denotes input, zM+1Y denotes output, zmRepresenting the output of the mth hidden layer neuron, and theta is the weight of the network; m is the number of hidden layer neurons.
In the embodiment provided by the invention, the hidden state (output of the mth hidden layer neuron) z in the mth layermIntroducing countermeasures against the noise rm
Figure BDA0002118687660000062
By usingIndicating noise injection in all layers
Figure BDA0002118687660000064
Then learns the network parameters θ by minimizing the following penalty function L:
Figure BDA0002118687660000065
for each data point (x; y), the search for the countering noise r is constrained by an L-2 norm (which is often used to measure the vector length magnitude or distance in some vector space), and the coefficient η controls the magnitude of the countering noise.
In the embodiment provided by the invention, when the model is reversely propagated, the confrontation noise required by each hidden layer neuron of the model is calculated and stored; the method specifically comprises the following steps:
and S11, when the model is propagated reversely, sequentially deriving the antagonistic gradient g of the loss function L for each hidden layer neuron according to a chain rule.
Specifically, for each data point (x; y) in the mini-batch, the internal optimization is approximated by running k gradient descent steps to better utilize the information of the data in each mini-batch and to better fit the data distribution. For all M0, …, M, initialize rm,00. For the m-th layer zmAntagonistic gradient g ofmSelecting the direction in which the loss function L can be increased to a maximum, i.e. the loss function L with respect to zmGradient rising direction of (1), z of the m-th hidden layermThe confrontation gradient of (1) is calculated by the following method:
Figure BDA0002118687660000071
wherein the content of the first and second substances,
Figure BDA0002118687660000076
representing the hamiltonian, the operator,
Figure BDA0002118687660000072
representing noise rmThe gradient of (a) of (b) is,
Figure BDA0002118687660000073
representing a hidden state rmOf the gradient of (c). gm.tRepresenting the mth layer z of the tth iterationmThe antagonistic gradient of (c). Here, the invention proposes an important observation, namely the countering noise zmIs equal to the hidden state rmThe gradient is already calculated in the standard back propagation, so no additional computational expense is introduced.
In combination with the above, the present invention selects and combines the chain rule to calculate the antagonistic gradient, and one derivation obtains the antagonistic gradient g of the loss function L for a hidden layer neuron, and the antagonistic gradient will increase the size of the loss function L, and the calculation method is as follows:
Figure BDA0002118687660000074
wherein, L () represents the loss function of the deep learning model, cross entropy is used in the embodiments provided by the present invention; f () is a function represented by the deep learning model; z is a radical ofmAn output representing an mth layer hidden layer neuron of the deep learning model; t represents the current iteration round.
Wherein, the total number of iterations is used to better utilize all information in each batch of data (mini-batch). According to the subsequent optimization iteration process, the whole optimization process has certain influence on the iteration times, the best effect is achieved when the value is 3-5 generally, and the running time of the algorithm is increased due to excessive iteration times.
S12, obtaining the confrontation noise required by each neuron according to the confrontation gradient of the loss function L for each hidden layer neuron and combining momentum information; the following formula is specifically adopted:
Figure BDA0002118687660000075
wherein, epsilon tableShowing the step size of each round of confrontation gradient, and k represents the number of iteration; and normalized by the number of steps k, so that the total magnitude of the updates for k steps is equal to epsilon. (1- η) is the attenuation rate, and the factor (1- η) corresponds to the L-2 regularization of the noise amplitude. In practice, η and ε control the previous noise value r, respectivelym,tAnd gradient gm,tFor new noise rm,t+1The contribution of (c). r ism,tThe antinoise noise of the mth layer hidden layer neurons is iterated for the tth round.
And S13, storing the antagonistic noise required by each neuron in a noise storage unit corresponding to the neuron.
S2, when the model is transmitted in the forward direction, the antagonistic noise R corresponding to the neuron is taken out from the noise storage unit R, the neuron value of the hidden layer is updated, and the forward transmission is continued; the method specifically comprises the following steps:
s21, in the forward propagation process, calculating to obtain the activation function value a of each neuron in the previous layerm-1,t(ii) a In the embodiment provided by the present invention, the activation function of the neuron is calculated by a conventional method, which is not described herein again.
S22, calculating an activation function of each neuron in the forward propagation process;
s23, extracting the antagonistic noise R corresponding to the neuron from the noise storage unit R;
s24, after the model executes affine transformation to calculate each layer of input, adding corresponding counternoise in each layer of input;
specifically, for a neural network, during forward propagation, each layer of input is calculated through affine transformation, and the following calculation formula is adopted:
zm,t=am-1,twm-1+bm-1
wherein z ism,tThe output of the mth layer hidden layer neuron for the tth round of iteration; a ism-1,tThe activation function of the (m-1) th layer hidden layer neuron for the t-th iteration is carried out; w is am-1Is affine transformation matrix of the neural element of the (m-1) th hidden layer; bm-1Is the affine transformation bias of the m-1 layer hidden layer neurons.
Then, the countermeasure noise R corresponding to the neuron is taken out from the noise storage unit R, and added, that is:
rm,t=R;
zm,t=zm,t+rm,t
s25, the output z of the m layer hidden layer neuron of the t round iterationm,tInput to the activation function, calculate the activation function value through the activation function and continue the forward propagation.
am,t=relu(zm,t);
In practice, for each small batch of training processes, a corresponding counternoise r is stored for each hidden layer neuron in a backward propagation process, in a forward propagation process, r is simply taken as noise and added to the input in the corresponding hidden layer neuron after the affine transformation and before the activation function. This training process does not introduce a substantial increase in computational and memory consumption, except that a noise storage unit R needs to be integrated into each neuron to store the countering noise R. At the time of inference, R can be discarded, thus having no impact on model complexity.
And S3, iteratively executing the step S1 and the step S2 to share P rounds, and finishing the training for improving the robustness of the deep learning model. Wherein P is a positive integer.
In the embodiments provided by the present invention, in order to make the model robust against various types of true noise and at the same time achieve a strong generalization capability on the original samples, the magnitude of the noise needs to be carefully controlled to balance the over-fitting and under-fitting. In order to achieve the aim, in the whole training period process (the training process for improving the robustness of the deep learning model), the training process is divided into three stages, each stage corresponds to a plurality of batches of data, and each stage can generate antagonistic noise with different degrees. P is the sum of the number of iterations of each stage step S1 and step S2. At each stage, the opposing noise parameters η and ε are kept at a fixed magnitude.
Specifically, for each stage of batch data (mini-batch), keeping fixed-size counternoise parameters η and ε, propagation is performed p times; on the basis, aiming at the whole training period process of the model, the magnitude of the antagonistic noise introduced by different stages (one stage comprises a plurality of batches of data) is changed during the training of the model.
Stage 1: zero challenge noise (original sample). The algorithm has the following hyper-parameter values: epsilon is 0, k is 1, and this phase lasts for p1And (4) wheels. This stage uses only the original samples to achieve a good start of the deep neural network. In the absence of any noise, the model can be trained quickly to have a basic classification capability, avoiding instability until a reasonably good model is found.
And (2) stage: the noise is greatly resisted. The algorithm has the following hyper-parameter values: the value of epsilon is larger, the value of k is smaller, and p lasts for the period2And (4) wheels. At this stage, the deep learning model is further trained by introducing competing noise ε using a relatively large competing step size. This enables us to quickly improve the noise immunity of the model and strongly push the decision boundary away from the data points to increase the robustness against the challenge samples, while using a large step size allows us to run a smaller number k of gradient steps, thus saving computation. In the embodiment provided by the invention, when the parameter is taken as a value, if the value of epsilon is larger, the countermeasure noise is introduced by using a larger countermeasure step length relative to the stage 3; the smaller value of k is relative to the smaller value of stage 3. The specific values of epsilon and k are set according to experimental data.
And (3) stage: little antagonistic noise. The algorithm has the following hyper-parameter values: the value of epsilon is small, the value of k is large, and p lasts in the stage3And (4) wheels. A larger countermeasure step increases robustness but may affect the prediction effect. In the final stage, we reduce the challenge step size in order to fine-tune the model appropriately to achieve a better balance between robustness and prediction accuracy. In the embodiment provided by the invention, when the parameter is taken as a value, the smaller value of epsilon is to reduce the impedance step length on the basis of the stage 2; the larger k value is the value of k increased on the basis of the phase 2. The specific values of epsilon and k are set according to experimental data.
For each batch of data (mini-batch), the flow of the training method for improving the robustness of the deep learning model based on the anti-noise propagation is shown in the following table 1:
TABLE 1 schematic flow chart of training method
Figure BDA0002118687660000101
According to the training method for improving the robustness of the deep learning model, in the deep model training process, the traditional forward-reverse training process is combined, corresponding countermeasures are added for each hidden layer neuron, the strategy forces the model to minimize the model loss of a specific task, and each hidden layer neuron has opposite countermeasures to maximize the expected loss. The learning parameters in each layer then enable the model to make a consistent and stable prediction of the normal samples and their noise substitution distributed in the neighborhood, providing robust robustness to the depth model. The method effectively improves robustness of the model to resisting sample noise and natural noise, and is convenient and simple to calculate and convenient to apply.
In summary, in the training method for improving the robustness of the deep learning model provided by the invention, in the deep model training process, the traditional forward-reverse training process is combined, and corresponding counternoise is added for each hidden layer neuron, so that the model parameters obtained by training are stable to noise input in the r-neighborhood of the data sample. S1, calculating the noise needed by each hidden layer neuron of the model when the model is reversely transmitted, and finally storing the noise; s2, when the model is transmitted in the forward direction, corresponding counternoise is taken out from the noise storage unit, the neuron value of the hidden layer is updated, and the forward transmission is continued; and S3, iteratively executing the P rounds of the steps to finish the training of the robustness model. According to the training method based on the antagonistic noise propagation, provided by the invention, the model generalization capability is ensured, the robustness of the deep learning model to antagonistic sample noise and natural noise is effectively improved, and the stability of the deep learning model in the application in an actual scene is improved; meanwhile, the traditional forward-reverse training process is combined, the complexity of calculation is effectively reduced, and the applicability is greatly improved.
The invention also provides a training system for improving the robustness of the deep learning model. As shown in fig. 2, the system includes a processor 22 and a memory 21 storing instructions executable by the processor 22;
processor 22 may be a general-purpose processor, such as a Central Processing Unit (CPU), a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits configured to implement embodiments of the present invention, among others.
The memory 21 is used for storing the program codes and transmitting the program codes to the CPU. Memory 21 may include volatile memory, such as Random Access Memory (RAM); the memory 21 may also include non-volatile memory, such as read-only memory, flash memory, a hard disk, or a solid state disk; the memory 21 may also comprise a combination of memories of the kind described above.
Specifically, the scheduling terminal dual-interface fast switching system based on the FPGA provided in the embodiment of the present invention includes a processor 22 and a memory 21; the memory 21 has stored thereon a computer program operable on the processor 22, which when executed by the processor 22, performs the steps of:
s1, calculating and obtaining the confrontation noise required by each hidden layer neuron of the model when the model is reversely transmitted, and storing the confrontation noise;
s2, when the model is transmitted in the forward direction, the pair anti-noise corresponding to the neuron is taken out from the noise storage unit, the neuron value of the hidden layer is updated, and the forward transmission is continued;
s3, iteratively executing the step S1 and the step S2 to share P rounds, and finishing the training of improving the robustness of the deep learning model; wherein P is a positive integer.
Wherein, when the model is reversely propagated, the confrontation noise required by each hidden layer neuron of the model is obtained by calculation and stored, and the following steps are realized when the computer program is executed by the processor 22;
when the model is reversely propagated, sequentially deriving to obtain the confrontation gradient of the loss function to each hidden layer neuron according to a chain rule;
and storing the antagonistic noise required by each neuron in a noise storage unit corresponding to the neuron.
Wherein the computer program realizes the following steps when being executed by the processor 22;
according to the chain rule, the successive derivation results in a loss function, and the following formula is adopted for the confrontation gradient of each hidden layer neuron:
wherein, gm.tZ representing the m-th hidden layer of the t-th iterationmThe antagonistic gradient of (c); z is a radical ofm,tAnd (4) representing the output of the mth layer hidden layer neuron of the tth round of iteration.
Wherein the computer program realizes the following steps when being executed by the processor 22;
according to the confrontation gradient of the loss function for each hidden layer neuron, combining momentum information to obtain the confrontation noise required by each neuron; the following formula is adopted:
wherein epsilon represents the step size of each round of confrontation gradient, k represents the iteration number, (1-eta) is the attenuation rate, and gm.tZ representing the m-th hidden layer of the t-th iterationmThe antagonistic gradient of (c); r ism,tThe countermeasure noise of the mth layer hidden layer neuron for the tth round of iteration.
When the model is propagated in the forward direction, the countermeasure noise corresponding to the neuron is taken out from the noise storage unit, the neuron value of the hidden layer is updated, the forward propagation is continued, and the following steps are realized when the computer program is executed by the processor 22;
in the forward propagation process, the activation function value a of each neuron in the previous layer is calculatedm-1,t
During the forward propagation, calculating an activation function of each neuron;
taking out the corresponding antagonistic noise of the neuron from the noise storage unit;
after the model is executed to affine transformation to calculate each layer of input, adding corresponding counternoise into each layer of input;
iterating the output z of the m layer neuron in the t roundm,tInput to the activation function, calculate the activation function value through the activation function and continue the forward propagation.
Wherein the computer program realizes the following steps when being executed by the processor 22;
aiming at a neural network, during forward propagation, input of each layer is calculated through affine transformation, and the following calculation formula is adopted:
zm,t=am-1,twm-1+bm-1
wherein z ism,tIterating the output of the mth layer of neurons for the t round; a ism-1,tThe activation function of the (m-1) th layer neurons for the t-th iteration; w is am-1Is an affine transformation matrix for layer m-1 neurons; bm-1Is the affine transformation bias for the m-1 layer neurons.
Wherein the computer program realizes the following steps when being executed by the processor 22;
the training process is divided into three stages, and each stage generates different counternoise parameters; at each stage, the opposing noise parameters η and ε are kept at a fixed magnitude.
Wherein, when the model is reversely propagated, the confrontation noise required by each hidden layer neuron of the model is obtained by calculation and stored, and the following steps are realized when the computer program is executed by the processor 22;
wherein the computer program realizes the following steps when being executed by the processor 22;
stage 1 is zero counternoise; and (3) value taking of the hyper-parameter: ε is 0, k is 1, said phase 1 lasts for p1A wheel;
stage 2 is large countermeasure noise; and (3) value taking of the hyper-parameter: the value of epsilon is larger, the value of k is smaller, and the stage lasts for 2 consecutive p2A wheel; in the embodiment provided by the invention, when the parameter takes a value, epsilon is takenA larger value introduces counter noise using a larger counter step size relative to phase 3; the smaller value of k is relative to the smaller value of stage 3. The specific values of epsilon and k are set according to experimental data.
Stage 3 is little countermeasure noise; and (3) value taking of the hyper-parameter: the value of epsilon is small, the value of k is large, and the stage 3 lasts for p3A wheel; in the embodiment provided by the invention, when the parameter is taken as a value, the smaller value of epsilon is to reduce the countermeasure step length on the basis of the stage 2; the larger k value is the value of k increased on the basis of the stage 2. The specific values of epsilon and k are set according to experimental data.
Wherein p is1、p2、p3Are all positive integers. Wherein the computer program realizes the following steps when being executed by the processor 22;
the number P of iterations of step S1 and step S2 in step S3 is the sum of the number of iterations of each of step S1 and step S2.
The embodiment of the invention also provides a computer readable storage medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an ASIC. Additionally, the ASIC may reside in user equipment. However, the processor and the storage medium may reside as discrete components in a communication device.
The training method and system for improving the robustness of the deep learning model provided by the invention are explained in detail above. Any obvious modifications to the invention, which would occur to those skilled in the art, without departing from the true spirit of the invention, would constitute a violation of the patent rights of the present invention and would carry a corresponding legal responsibility.

Claims (10)

1. A training method for improving robustness of a deep learning model is characterized by comprising the following steps:
s1, calculating and obtaining the confrontation noise required by each hidden layer neuron of the model when the model is reversely transmitted, and storing the confrontation noise;
s2, when the model is transmitted in the forward direction, the corresponding counter noise of the neuron is taken out from the noise storage unit, the neuron value of the hidden layer is updated, and the forward transmission is continued;
s3, iteratively executing the step S1 and the step S2 to share P rounds, and finishing the training of improving the robustness of the deep learning model; wherein P is a positive integer.
2. The training method for improving the robustness of the deep learning model as claimed in claim 1, wherein the counternoise required by each hidden layer neuron of the model is calculated and stored during the backward propagation of the model, comprising the following steps:
when the model is reversely propagated, sequentially deriving to obtain the confrontation gradient of the loss function to each hidden layer neuron according to a chain rule;
and storing the antagonistic noise required by each neuron in a noise storage unit corresponding to the neuron.
3. The training method for improving robustness of the deep learning model as claimed in claim 2, wherein:
according to the chain rule, the confrontation gradient of the loss function for each hidden layer neuron is obtained by sequential derivation according to the following formula:
Figure FDA0002118687650000011
wherein, gm.tZ representing the m-th hidden layer of the t-th iterationmThe antagonistic gradient of (c); z is a radical ofm,tAnd (4) representing the output of the m-th layer hidden layer neuron of the t-th iteration.
4. The training method for improving the robustness of the deep learning model as claimed in claim 2, wherein the confrontation noise required by each neuron is obtained by combining momentum information according to the confrontation gradient of the loss function for each hidden layer neuron; the following formula is adopted:
Figure FDA0002118687650000012
wherein epsilon represents the step size of each round of confrontation gradient, k represents the iteration number, (1-eta) is the attenuation rate, and gm.tZ representing the m-th hidden layer of the t-th iterationmThe antagonistic gradient of (c); r ism,tThe countermeasure noise of the mth layer hidden layer neuron for the tth round of iteration.
5. The training method for improving robustness of the deep learning model as claimed in claim 1, wherein during forward propagation of the model, the corresponding counteracting noise of the neuron is taken out from the noise storage unit, the neuron value of the hidden layer is updated, and the forward propagation is continued, comprising the following steps:
in the forward propagation process, the activation function value a of each neuron in the previous layer is calculatedm-1,t
During the forward propagation, calculating an activation function of each neuron;
taking out the corresponding antagonistic noise of the neuron from the noise storage unit;
after the model is executed to affine transformation to calculate each layer of input, adding corresponding counternoise into each layer of input;
iterating the output z of the m layer neuron in the t roundm,tInput to the activation function, calculate the activation function value through the activation function and continue the forward propagation.
6. The training method for improving the robustness of the deep learning model as claimed in claim 5, wherein for a neural network, during forward propagation, the input of each layer is calculated through affine transformation, and the following calculation formula is adopted:
zm,t=am-1,twm-1+bm-1
wherein z ism,tIterating the output of the mth layer of neurons for the t round; a ism-1,tThe activation function of the (m-1) th layer neurons for the t-th iteration; w is am-1Is an affine transformation matrix for layer m-1 neurons; bm-1Is the affine transformation bias for the m-1 layer neurons.
7. The training method for improving robustness of the deep learning model as claimed in claim 1, wherein:
the training process is divided into three stages, and each stage generates different counternoise parameters; at each stage, the opposing noise parameters η and ε are kept at a fixed magnitude.
8. The training method for improving robustness of deep learning model according to claim 7, wherein:
stage 1 is zero counternoise; and (3) value taking of the hyper-parameter: ε is 0, k is 1, said phase 1 lasts for p1A wheel;
stage 2 is large countermeasure noise; and (3) value taking of the hyper-parameter: the value of epsilon is larger, the value of k is smaller, and the phase lasts for 2 consecutive p2A wheel;
stage 3 is little countermeasure noise; and (3) value taking of the hyper-parameter: the value of epsilon is small, the value of k is large, and the stage 3 lasts for p3A wheel;
wherein p is1、p2、p3Are all positive integers.
9. The training method for improving robustness of deep learning model according to claim 8, wherein:
the number P of iterations of step S1 and step S2 in step S3 is the sum of the number of iterations of each of step S1 and step S2.
10. A training system for improving robustness of a deep learning model is characterized by comprising a processor and a memory; the memory having stored thereon a computer program operable on the processor, the computer program when executed by the processor implementing the steps of:
s1, calculating and obtaining the confrontation noise required by each hidden layer neuron of the model when the model is reversely transmitted, and storing the confrontation noise;
s2, when the model is transmitted in the forward direction, the corresponding counter noise of the neuron is taken out from the noise storage unit, the neuron value of the hidden layer is updated, and the forward transmission is continued;
s3, iteratively executing the step S1 and the step S2 to share P rounds, and finishing the training of improving the robustness of the deep learning model; wherein P is a positive integer.
CN201910599177.0A 2019-07-04 2019-07-04 Training method and system for improving robustness of deep learning model Pending CN110674937A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910599177.0A CN110674937A (en) 2019-07-04 2019-07-04 Training method and system for improving robustness of deep learning model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910599177.0A CN110674937A (en) 2019-07-04 2019-07-04 Training method and system for improving robustness of deep learning model

Publications (1)

Publication Number Publication Date
CN110674937A true CN110674937A (en) 2020-01-10

Family

ID=69068730

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910599177.0A Pending CN110674937A (en) 2019-07-04 2019-07-04 Training method and system for improving robustness of deep learning model

Country Status (1)

Country Link
CN (1) CN110674937A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111475618A (en) * 2020-03-31 2020-07-31 百度在线网络技术(北京)有限公司 Method and apparatus for generating information
CN111680292A (en) * 2020-06-10 2020-09-18 北京计算机技术及应用研究所 Confrontation sample generation method based on high-concealment universal disturbance
CN113228062A (en) * 2021-02-25 2021-08-06 东莞理工学院 Deep integration model training method based on feature diversity learning
CN113222056A (en) * 2021-05-28 2021-08-06 北京理工大学 Countercheck sample detection method for image classification system attack
CN113242547A (en) * 2021-04-02 2021-08-10 浙江大学 Method and system for filtering user behavior privacy in wireless signal based on deep learning and wireless signal receiving and transmitting device
CN113610141A (en) * 2021-08-02 2021-11-05 清华大学 Robustness testing method and system for automatic driving multi-sensor fusion perception model

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111475618A (en) * 2020-03-31 2020-07-31 百度在线网络技术(北京)有限公司 Method and apparatus for generating information
CN111680292A (en) * 2020-06-10 2020-09-18 北京计算机技术及应用研究所 Confrontation sample generation method based on high-concealment universal disturbance
CN111680292B (en) * 2020-06-10 2023-05-16 北京计算机技术及应用研究所 High-concealment general disturbance-based countering sample generation method
CN113228062A (en) * 2021-02-25 2021-08-06 东莞理工学院 Deep integration model training method based on feature diversity learning
WO2022178775A1 (en) * 2021-02-25 2022-09-01 东莞理工学院 Deep ensemble model training method based on feature diversity learning
CN113242547A (en) * 2021-04-02 2021-08-10 浙江大学 Method and system for filtering user behavior privacy in wireless signal based on deep learning and wireless signal receiving and transmitting device
CN113222056A (en) * 2021-05-28 2021-08-06 北京理工大学 Countercheck sample detection method for image classification system attack
CN113610141A (en) * 2021-08-02 2021-11-05 清华大学 Robustness testing method and system for automatic driving multi-sensor fusion perception model
CN113610141B (en) * 2021-08-02 2022-03-11 清华大学 Robustness testing method and system for automatic driving multi-sensor fusion perception model

Similar Documents

Publication Publication Date Title
CN110674937A (en) Training method and system for improving robustness of deep learning model
Yin et al. Knowledge transfer for deep reinforcement learning with hierarchical experience replay
Awais et al. Revisiting internal covariate shift for batch normalization
CN110048827B (en) Class template attack method based on deep learning convolutional neural network
Huang et al. On-line sequential extreme learning machine.
CN111325324A (en) Deep learning confrontation sample generation method based on second-order method
WO2018204371A1 (en) System and method for batch-normalized recurrent highway networks
Qu et al. Minimalistic attacks: How little it takes to fool deep reinforcement learning policies
CN112200243B (en) Black box countermeasure sample generation method based on low query image data
Behzadan et al. Mitigation of policy manipulation attacks on deep q-networks with parameter-space noise
CN112580728B (en) Dynamic link prediction model robustness enhancement method based on reinforcement learning
CN111047054A (en) Two-stage countermeasure knowledge migration-based countermeasure sample defense method
CN114756694B (en) Knowledge graph-based recommendation system, recommendation method and related equipment
CN111178504B (en) Information processing method and system of robust compression model based on deep neural network
Ghosh et al. An empirical analysis of generative adversarial network training times with varying batch sizes
Song et al. Overview of side channel cipher analysis based on deep learning
Hu et al. RL-VAEGAN: Adversarial defense for reinforcement learning agents via style transfer
Xu et al. Sparse adversarial attack for video via gradient-based keyframe selection
Kawakami et al. dsodenet: Neural ode and depthwise separable convolution for domain adaptation on fpgas
CN116824232A (en) Data filling type deep neural network image classification model countermeasure training method
Duan et al. Enhancing transferability of adversarial examples via rotation‐invariant attacks
Tian et al. Reducing sentiment bias in pre-trained sentiment classification via adaptive gumbel attack
KR102393759B1 (en) Method and system for generating an image processing artificial nerual network model operating in a device
CN116226897A (en) Improved Prim block chain network transmission optimization method combining training loss and privacy loss
CN114444697A (en) Knowledge graph-based common sense missing information multi-hop inference method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination