CN112766496A - Deep learning model security guarantee compression method and device based on reinforcement learning - Google Patents

Deep learning model security guarantee compression method and device based on reinforcement learning Download PDF

Info

Publication number
CN112766496A
CN112766496A CN202110119514.9A CN202110119514A CN112766496A CN 112766496 A CN112766496 A CN 112766496A CN 202110119514 A CN202110119514 A CN 202110119514A CN 112766496 A CN112766496 A CN 112766496A
Authority
CN
China
Prior art keywords
deep learning
learning model
value
network
compression
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110119514.9A
Other languages
Chinese (zh)
Other versions
CN112766496B (en
Inventor
陈晋音
王珏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN202110119514.9A priority Critical patent/CN112766496B/en
Publication of CN112766496A publication Critical patent/CN112766496A/en
Application granted granted Critical
Publication of CN112766496B publication Critical patent/CN112766496B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a deep learning model security guarantee compression method and device based on reinforcement learning, which comprises the following steps: (1) modeling the deep learning model into a graph network by using the graph network; (2) extracting an embedded vector of the graph network by adopting GCN; (3) taking the current embedded vector of each node of the graph network as an environment state of reinforcement learning, predicting an action value based on the environment state by adopting the reinforcement learning, and pruning the embedded vector of each node according to the action value until the embedded vectors of all the nodes are pruned, so as to realize one-round compression of a deep learning model; (4) calculating error rate and safety according to the prediction result of the model after one round of compression on the sample data; (5) calculating a return value of performing one-round deep learning model compression by adopting reinforcement learning according to the error rate and the safety; (6) and (5) repeating the steps (3) to (5) based on the reported value until the iteration is terminated, and realizing the compression of the deep learning model.

Description

Deep learning model security guarantee compression method and device based on reinforcement learning
Technical Field
The invention relates to the field of deep learning, in particular to a deep learning model security guarantee compression method and device based on reinforcement learning.
Background
The rapid development of deep learning makes it possible to achieve performance even exceeding the human level on tasks such as image classification, speech recognition and text classification. These advances have led to the widespread use and deployment of deep learning in the medical field. Image recognition, wearable equipment, auxiliary diagnosis and treatment, rehabilitation auxiliary appliances and the like are key application fields of medical health at present. Although the depth model is rich in variety and excellent in performance, a large amount of weight consumes considerable storage and memory bandwidth, which brings difficulty to the deployment of the depth model in the medical device. Therefore, how to compress the model to reduce resource consumption is crucial to the deployment of the model at the edge.
Depth model compression methods can be broadly divided into network pruning, network quantization, low rank approximation, knowledge distillation, and compact network design. The main idea of network pruning is to eliminate relatively unimportant weights in the weight matrix, and then fine-tune the network to recover the network accuracy. Network quantization reduces the size of the model by reducing the number of bits used for each parameter store. The low rank approximation is to use the weight matrix of the original network as a full rank matrix, and then multiple low rank matrices can be used to approximate the original matrix to achieve the purpose of simplification. Knowledge distillation uses migratory learning by using the output of a previously trained complex model as a supervisory signal to train another simple network. The compact network design is to select a small and compact network in the model building stage.
Network pruning is a relatively common method among these compression methods. At the core of the network pruning technique is to determine the compression strategy of each layer, because there are different redundancies. This typically requires manually set heuristics and domain expertise to explore the space of the entire model design, making tradeoffs between model size, speed, and accuracy. AMC uses reinforcement learning to automatically sample the design space, and improves the compression quality of the model. The DMCP models channel pruning into a Markov process, provides a differentiable channel pruning method, and can directly optimize through the loss of a standard task and the gradient reduction of budget regularization. Both methods can achieve large amplitude compression of the model with no or slight loss in model accuracy. However, they do not consider the security of the compressed model. The depth model is vulnerable to attacks against the sample. In addition, a complex connection mode exists in a modern depth model structure, and the layer-by-layer pruning in the method does not consider the correlation between layers.
In conclusion, how to automatically learn the compression strategy, the compressed model can reduce the number of parameters, and meanwhile, the high accuracy is kept, and meanwhile, the safety is high, and the method has important theoretical and practical significance for the deployment of the depth model on the medical equipment.
Disclosure of Invention
In view of the above, the invention provides a deep learning model security guarantee compression method and device based on reinforcement learning, which can keep the accuracy and security of a deep learning model while realizing the compression of the deep learning model.
In order to achieve the purpose, the invention provides the following technical scheme:
a deep learning model security guarantee compression method based on reinforcement learning comprises the following steps:
(1) regarding each layer of the deep learning model for identifying or classifying tasks as a node, regarding the connection relation between layers as a connecting edge, and modeling the deep learning model into an image network;
(2) extracting an embedded vector of the graph network by adopting a graph convolution neural network;
(3) taking the current embedded vector of each node of the graph network as an environment state of reinforcement learning, predicting an action value based on the environment state by adopting the reinforcement learning, and pruning the embedded vector of each node according to the action value until the embedded vectors of all the nodes are pruned, so as to realize one-round compression of a deep learning model;
(4) calculating the error rate and the safety of the compressed deep learning model according to the prediction result of the deep learning model after one round of compression on the sample data for identifying the task or classifying the task;
(5) calculating a return value of one-round deep learning model compression by adopting reinforcement learning according to the error rate and the safety of the compressed deep learning model;
(6) and (5) repeating the steps (3) to (5) based on the reported value until the iteration is terminated, and realizing the compression of the deep learning model.
Preferably, in the step (1), when the deep learning model is modeled into an image network, aiming at a current layer of the deep learning model, the input sample dimension, the convolution kernel sliding step length and the calculated amount of the current layer, the calculated total amount reduced by all layers in the front, the calculated total amount remained by all layers in the back, and a pruning strategy adopted by the previous layer form a feature vector of the current layer;
when a connection relationship exists between two layers, the connection edge between the two layers is 1, otherwise, the connection edge is 0, and accordingly, an adjacent matrix of the graph network is constructed.
Preferably, in the step (2), a feature matrix composed of feature vectors of the nodes and an adjacent matrix of the graph network are used as inputs of the graph convolution neural network, and at least 2 layers of the graph convolution neural network are adopted to extract an embedded vector of each node, wherein the embedded vector of each node has the same dimension as the feature vector.
Preferably, in step (3), the pruning of the embedded vector of each node according to the action value includes:
firstly, determining a modification strategy according to the action value when the action value is in
Figure BDA0002921951780000031
Then the maximum corresponding selection pruning strategy is adopted; when the action value is at
Figure BDA0002921951780000032
A greedy search pruning strategy is adopted; when the action value is at
Figure BDA0002921951780000033
Selecting a pruning strategy for pruning the channel according to the loss of the feature map;
then, determining the pruning sparse rate according to the pruning strategy, and enabling the sparse rate corresponding to the pruning strategy to be [0,1 ];
and finally, pruning the embedded vector of the current node according to the adopted pruning strategy and the corresponding sparse rate, and realizing the compression of the deep learning model layer structure corresponding to the node.
Preferably, determining the sparsity rate of pruning according to the pruning strategy comprises: and taking the difference between the minimum value of the action value range corresponding to the pruning strategy and the calculated action value as 3 times as the sparse rate.
Preferably, in the deep learning process, after predicting a current action value based on a current environment state by adopting an action network, calculating a decision value by adopting a decision network based on the current environment state, the current action and a next environment state corresponding to a next adjacent node, and constructing a loss function according to the decision value and a baseline reward value to update network parameters of the decision network so as to update the network parameters of the action network, wherein the network parameters are used for predicting the next action value based on the next environment state;
wherein, the Loss function Loss is:
Figure BDA0002921951780000041
yi=ri-b+γQ(si+1,u(si+1)|θQ)
wherein, Q(s)i,aiQ) With the expression parameter thetaQFor the environmental state s of the ith sampleiAnd an action value aiN represents a sample of transitions taken from a pool of experience for updating decision network parameters, b represents a baseline reward value that is an exponential moving average of previous reward values, riRepresents the recorded return value in the ith sample, which is the return value R divided by the step number T of one curtain, gamma represents the discount factor, Q(s)i+1,u(si+1)|θQ) Is given a parameter of thetaQAccording to the environment state s of the (i + 1) th samplei+1And action according to environmental status si+1Generated action value u(s)i+1) The evaluation value of (1).
Preferably, in the step (4), calculating the error rate and the safety of the compressed deep learning model includes:
inputting sample data for identifying or classifying tasks into the compressed deep learning model to obtain a prediction result, determining the number of correctly classified samples according to the prediction result, calculating the prediction accuracy of the compressed deep learning model according to the ratio of the number of correctly classified samples to the total number of samples, and then calculating the error rate of the compressed deep learning model according to the prediction accuracy;
and adopting sample data for identifying or classifying the tasks and the CLEVER score calculated by the compressed deep learning model as safety.
Preferably, in step (5), the calculation formula of the report value R is:
R=-Error·log(FLOPs)+λu
wherein Error represents an Error rate, FLOPs represents the total calculation amount of the deep learning model after compression, lambda represents a hyper-parameter, and u represents a safety index.
An apparatus for compression of security assurance for deep learning model based on reinforcement learning, comprising a memory, a processor and a computer program stored in the memory and executable on the computer processor, wherein the processor implements the method for compression of security assurance for deep learning model based on reinforcement learning when executing the computer program.
Compared with the prior art, the invention has the beneficial effects that:
a deep learning model security guarantee compression method and device based on reinforcement learning are disclosed, wherein the model structure is modeled in a graph network mode, so that the relation between layers can be fully considered, and the model structure is not only considered to be a structure which is only sequentially connected between layers, and is more universal. By using a reinforcement learning framework, the compression strategy and the sparse rate of each layer can be automatically selected without manual setting, and in addition, the parameters of the model can be reduced and the safety can be kept as much as possible while the accuracy of the compressed model is kept through the setting of the return value R.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a flowchart of a deep learning model security guarantee compression method based on reinforcement learning according to an embodiment of the present invention;
FIG. 2 is a diagram of a reinforcement learning process provided by an embodiment of the present invention;
FIG. 3 is a process diagram of reinforcement learning model based compression according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the detailed description and specific examples, while indicating the scope of the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention.
In order to realize model compression and maintain accuracy and safety, the embodiment provides a deep learning model safety guarantee compression method and device based on reinforcement learning.
Fig. 1 is a flowchart of a deep learning model security guarantee compression method based on reinforcement learning according to an embodiment of the present invention. As shown in fig. 1, the method for compressing security assurance of deep learning model based on reinforcement learning according to the embodiment includes the following steps:
and step 1, modeling a compressed deep learning model structure by using a graph network mode.
In the embodiment, each layer of the deep learning model for identifying or classifying the task is regarded as a node, and the connection relation between the layers is regarded as a connecting edge, so that the deep learning model is modeled into the graph network.
The key in the graph network is the node, the node attribute and the node connection relation. In the modeling process, each layer of the deep learning model is taken as a node, 11 features are used for representing feature vectors of the node, wherein t represents the layerIndex, then the feature vector of each node is represented as: (t, n, c, h, w, stride, k, FLOPs [ t],reduced,rest,at-1) Where n × c × k × k is the dimension of the convolution kernel, c × h × w is the dimension of the input sample, stride is the step length of the convolution kernel sliding, FLOPs [ t]Scaling these eigenvalues to [0,1] for the calculated quantities FLOPs of the t-th layer, reduced for the calculated total quantities reduced for the previous layers, rest for the calculated total quantities remaining in the subsequent layers]. Such a feature may well distinguish between convolutional layers. The node connection relationship is represented by an adjacency matrix A in which an element ai,jDetermined by the relationship between nodes i and j. If connection exists between layers, corresponding elements in the adjacency matrix are set to be 1, and the general deep learning model is a layer-by-layer connection mode, and the residual block needs to be noticed and has connection spanning multiple layers. Furthermore, DenseNet also has many connections across layers, and when constructing the adjacency matrix, it should be noted that the corresponding element is set to 1.
And 2, extracting the embedded vector of the graph network by adopting a graph convolution neural network.
In the embodiment, a 2-layer graph convolution neural network is adopted to extract an embedded vector of a graph network, and the propagation rule of each layer is as follows:
Figure BDA0002921951780000071
wherein the content of the first and second substances,
Figure BDA0002921951780000072
i.e. adding an identity matrix I to the adjacency matrix AN
Figure BDA0002921951780000073
Is that
Figure BDA0002921951780000074
Degree matrix of, i.e.
Figure BDA0002921951780000075
H(l)Is the active cell matrix of the l-th layer, H(0)Feature moments formed for the feature vectors x of the nodesX, W of matrix(l)For the parameter matrix of each layer, σ is a sigmoid activation function, and the input values can be mapped to [0,1 [ ]]. No class mark is used for training the parameter W, and a better network information aggregation effect can be obtained only by randomly initializing W. The output embedded vector is represented as: (y)1,y2,…,yl)=G(x1,x2,…,xl) Where G represents the entire GCN model and l represents the number of features, where the dimension of the selected output embedding vector is the same as the dimension of the input feature vector.
And 3, pruning the deep learning model layer by adopting a reinforcement learning algorithm DDPG until the last layer is obtained.
In the embodiment, the current embedded vector of each node of the graph network is used as the environment state of reinforcement learning, the action value based on the environment state is predicted by adopting the reinforcement learning, and the embedded vector of each node is pruned according to the action value until the embedded vectors of all the nodes are pruned, so that one-round compression of a deep learning model is realized. The specific process is as follows:
(a) and taking the current embedded vector of each node of the graph network as the environment state of the reinforcement learning.
In an embodiment, each layer L of the deep learning model istI.e. the current embedded vector of each node of the graph network as the environmental state s of reinforcement learningt. It should be noted that the embedded vectors of all layers cannot be used as the state at once because the subsequent pruning operation will affect the feature vectors of the subsequent layers, and then the changed feature vectors need to be input to the GCN to generate the embedded vectors. That is, in the embodiment, the layer structure of the deep learning model is pruned layer by layer, so that after the pruning of the current layer is finished, the feature vector corresponding to the next layer is input to the GCN to generate the embedded vector.
(b) And predicting an action value based on the environment state by adopting reinforcement learning, and pruning the embedded vector of each node according to the action value.
DDPG agent receives current environment state s from environmenttThen outputs one by using a decision network[0,1]Is taken as the action value atThe motion value atThe representative is the selected pruning strategy and the sparsity rate thereof, and after a certain pruning strategy is determined, the sparsity rate can determine a specific pruning mode. Three channel pruning methods are selected, the first method is maximum corresponding selection, namely a channel to be pruned is determined according to the weight value, the second method is a greedy search mode, an optimal channel combination is selected, the effect of the channel combination on the layer is optimal, the third method is that the channel to be pruned is selected according to the loss of the feature diagram, and the loss of the feature diagram is required to be as small as possible by the channel to be pruned. The corresponding sparse rate of each pruning strategy is [0,1]]Therefore, the action value of the DDPG output needs to be divided into three halves,
Figure BDA0002921951780000091
and
Figure BDA0002921951780000092
and respectively establishing mapping relations. After the current layer is compressed using the corresponding strategy and sparsity, the agent moves to the next layer Lt+1Reception state st+1
(c) According to the sequence of the structural layers in the deep learning model and the strategy quasi-layer compression of DDPG, until the last layer L is finishedTAs shown in fig. 3.
The structure of the DDPG is as shown in fig. 2, the DDPG has two networks, an Actor network and a Critic network, the Actor network generates an action, a state and an action value are input into the Critic network to obtain a corresponding Q value, an objective function of the Actor is to maximize the Q value, and an objective function of the Critic network is to minimize an error of Q (s, a). Here, the Actor network and Critic network are both set as two hidden layers, each layer has 300 neurons, soft update is performed using τ of 0.01, and training network uses a batch size of 64 and 2000 as the size of the buffer. Noise processing for agent exploration, here using a truncated normal distribution to make the agent explore the unknown space as much as possible:
Figure BDA0002921951780000093
during the search, σ is initialized to 0.5, and after 100 searches, σ search is exponentially reduced by 300.
Like Block-QNN, where a variant form of the Bellman equation is applied, each state-to-state transition can be represented by a quadruple(s)t,at,R,st+1) And expressing, wherein R is a return value obtained after the whole deep learning model is compressed, and is calculated according to accuracy and safety. Since the reward value is only available at the end of the session, the baseline reward value b is used to reduce the variance of the gradient estimate at the time of the process update, which is the exponential moving average of the previous reward:
Figure BDA0002921951780000094
yi=ri-b+γQ(si+1,u(si+1)|θQ)
where the discount factor gamma is set to 1 to avoid short-term reward priority being too high.
And 4, calculating the error rate and the safety of the compressed deep learning model and obtaining a return value.
In an embodiment, accuracy and security are evaluated on a validation set used to identify or classify tasks, and then a reward value is calculated back to the agent. The accuracy is evaluated by using an accuracy evaluation method in image classification, namely, the percentage of the number of correctly classified samples to the total number of samples. The Error rate Error is calculated as:
Figure BDA0002921951780000101
wherein N iscorrectNumber of samples representing correct classification, NtotalRepresenting the total number of samples.
The method for evaluating the safety uses the CLEVER score, is irrelevant to attacks, has reasonable calculation amount and can be used for large dataThe method is more efficient by using a limit value theory to estimate the required Rispisis constant. The calculation method of the CLEVER score of a single sample is as follows, the calculation method of the CLEVER score of the target attack is named as CLEVER-t, the input is a classifier function f (x), and the sample x0And a corresponding class c, target class j, perturbation norm p, on the fixation sphere BP(x0M) taking N uniformly and independentlybBatches of N in each batchsOne sample, M is the maximum perturbation. Firstly, initialization is carried out:
Figure BDA0002921951780000102
g(x)←fc(x)-fj(x)
Figure BDA0002921951780000103
in NbSelecting a batch i from the batches, and sequentially selecting N from the batchessA sample, the sample being denoted x(i ,k)Where k denotes the kth selected sample of the batch, x(i.k)∈Bp(x0M), b can be calculated by back propagationik:
Figure BDA0002921951780000104
Update S for each batch:
S←S∪{maxk{bik}}
the maximum likelihood estimate of the location parameter of the inverse Wilson distribution over S is taken and recorded as
Figure BDA0002921951780000111
The CLEVER score for this sample is then u:
Figure BDA0002921951780000112
for the attack without the target, the CLEVER score of the attack with the target is calculated for each target class, and the minimum value is taken. For a single sample, the above mentioned CLEVER score may randomly select a portion of a data set for the data set, and then take the average as the security index of the classifier on a certain data set.
The return value R is calculated in the following manner:
R=-Error·log(FLOPs)+λu
this reward function is sensitive to variations in Error rate Error, since it maximizes R, so Error is reduced as much as possible. Because of the large number of floating-point operands, the logarithm is multiplied by the error rate as a small stimulus. The CLEVER score is not suitable to take the logarithm as an excitation, since this value is small and therefore multiplied by a hyperparameter lambda.
And 5, feeding back the report value R to the intelligent agent, and repeating the step 3 and the step 4 until iteration.
In the embodiment, the reinforcement learning strategy is continuously updated according to the requirement of the number of rounds set by experiments, namely, the total 400 rounds in the DDPG setting, so that a better compression effect is achieved.
The embodiment also provides a deep learning model security guarantee compression device based on reinforcement learning, which comprises a memory, a processor and a computer program stored in the memory and executable on the computer processor, wherein the processor implements the deep learning model security guarantee compression method based on reinforcement learning when executing the computer program.
The deep learning model security guarantee compression method and device based on reinforcement learning can be used for pruning compression of deep learning models of image classification tasks, voice recognition tasks and text classification tasks, so that on the basis of guaranteeing the safety and accuracy of the models, the weight of the models is reduced, the compressed deep learning models can reduce the calculated amount and reduce the resource consumption in the image classification tasks, the voice recognition tasks and the text classification tasks, and are convenient to deploy at the edge end.
When the deep learning model is used for an image classification task, the adopted sample data is an image, and the error rate and the safety of the compressed model are calculated according to the image. When the deep learning model is used for a voice recognition task, the adopted sample data is voice data, and the error rate and the safety of the compressed model are calculated according to the voice data. When the deep learning model is used for a text classification task, sample data adopted is text data, and the error rate and the safety of the compressed model are calculated according to the text data.
According to the deep learning model security guarantee compression method and device based on reinforcement learning, the model structure is modeled in a graph network mode, the relation between layers can be fully considered, the fact that only structures which are sequentially connected between the layers are considered is not the only consideration, and the method and device are more universal. By using a reinforcement learning framework, the compression strategy and the sparse rate of each layer can be automatically selected without manual setting, and the network structure is trimmed layer by layer to the last layer according to the compression strategy and the sparse rate. In addition, by setting the return value R, the accuracy of the compressed model can be kept, and meanwhile, the parameters of the model are reduced as much as possible and the safety is kept.
The above-mentioned embodiments are intended to illustrate the technical solutions and advantages of the present invention, and it should be understood that the above-mentioned embodiments are only the most preferred embodiments of the present invention, and are not intended to limit the present invention, and any modifications, additions, equivalents, etc. made within the scope of the principles of the present invention should be included in the scope of the present invention.

Claims (9)

1. A deep learning model security guarantee compression method based on reinforcement learning is characterized by comprising the following steps:
(1) regarding each layer of the deep learning model for identifying or classifying tasks as a node, regarding the connection relation between layers as a connecting edge, and modeling the deep learning model into an image network;
(2) extracting an embedded vector of the graph network by adopting a graph convolution neural network;
(3) taking the current embedded vector of each node of the graph network as an environmental state of reinforcement learning, predicting an action value taken based on the environmental state by adopting the reinforcement learning, and realizing the pruning of the current layer of the model according to the action value until each layer of the model is pruned, thereby realizing one-round compression of the deep learning model;
(4) calculating the error rate and the safety of the compressed deep learning model according to the prediction result of the deep learning model after one round of compression on the sample data for identifying the task or classifying the task;
(5) calculating a return value of one-round deep learning model compression by adopting reinforcement learning according to the error rate and the safety of the compressed deep learning model;
(6) and (5) repeating the steps (3) to (5) based on the reported value until the iteration is terminated, and realizing the compression of the deep learning model.
2. The reinforcement learning-based deep learning model security guarantee compression method according to claim 1, wherein in the step (1), when the deep learning model is modeled into a graph network, for a current layer of the deep learning model, an input sample dimension, a convolution kernel sliding step length and a calculated amount of the current layer, a calculated total amount of reduction of all previous layers and a calculated total amount of residual of all subsequent layers are combined into a feature vector of the current layer by a pruning strategy adopted by a previous layer;
when a connection relationship exists between two layers, the connection edge between the two layers is 1, otherwise, the connection edge is 0, and accordingly, an adjacent matrix of the graph network is constructed.
3. The reinforcement learning-based deep learning model security guarantee compression method of claim 1, wherein in the step (2), a feature matrix composed of feature vectors of nodes and an adjacent matrix of a graph network are used as input of the graph convolution neural network, at least 2 layers of the graph convolution neural network are used for extracting an embedded vector of each node, and the embedded vector of each node has the same dimension as the feature vector.
4. The reinforcement learning-based deep learning model security guarantee compression method of claim 1, wherein in the step (3), the pruning of the embedded vector of each node according to the action value comprises:
firstly, determining a modification strategy according to the action value when the action value is in
Figure FDA0002921951770000021
Then the maximum corresponding selection pruning strategy is adopted; when the action value is at
Figure FDA0002921951770000022
A greedy search pruning strategy is adopted; when the action value is at
Figure FDA0002921951770000023
Selecting a pruning strategy for pruning the channel according to the loss of the feature map;
then, determining the pruning sparse rate according to the pruning strategy, and enabling the sparse rate corresponding to the pruning strategy to be [0,1 ];
and finally, pruning the embedded vector of the current node according to the adopted pruning strategy and the corresponding sparse rate, and realizing the compression of the deep learning model layer structure corresponding to the node.
5. The reinforcement learning-based deep learning model security assurance compression method of claim 1, wherein determining the pruning sparsity rate according to a pruning strategy comprises: and taking the difference between the minimum value of the action value range corresponding to the pruning strategy and the calculated action value as 3 times as the sparse rate.
6. The reinforcement learning-based deep learning model security guarantee compression method of claim 1, wherein in the deep learning process, after predicting a current action value based on a current environment state by using an action network, calculating a decision value based on the current environment state, a current action and a next environment state corresponding to a next adjacent node by using a decision network, and constructing a loss function according to the decision value and a baseline reward value to update network parameters of the decision network, thereby updating the network parameters of the action network for predicting a next action value based on the next environment state;
wherein, the Loss function Loss is:
Figure FDA0002921951770000031
yi=ri-b+γQ(si+1,u(si+1)|θQ)
wherein, Q(s)i,aiQ) With the expression parameter thetaQFor the environment state s of the ith sample in the extracted N samplesiAnd an action value aiN represents a sample of successive transitions taken from the experience buffer for updating the decision network parameters, b represents a baseline reward value that is an exponentially moving average of previous reward values, riRepresents the recorded return value in the ith sample, which is the return value R divided by the step number T of one curtain, gamma represents the discount factor, Q(s)i+1,u(si+1)|θQ) Is given a parameter of thetaQAccording to the environment state s of the (i + 1) th samplei+1And action according to environmental status si+1Generated action value u(s)i+1) The evaluation value of (1).
7. The reinforcement learning-based deep learning model security assurance compression method of claim 1, wherein in the step (4), the calculating of the error rate and the security of the compressed deep learning model comprises:
inputting sample data for identifying or classifying tasks into the compressed deep learning model to obtain a prediction result, determining the number of correctly classified samples according to the prediction result, calculating the prediction accuracy of the compressed deep learning model according to the ratio of the number of correctly classified samples to the total number of samples, and then calculating the error rate of the compressed deep learning model according to the prediction accuracy;
and adopting sample data for identifying or classifying the tasks and the CLEVER score calculated by the compressed deep learning model as safety.
8. The reinforcement learning-based deep learning model security guarantee compression method as claimed in claim 1, wherein in step (5), the calculation formula of the report value R is:
R=-Error·log(FLOPs)+λu
wherein Error represents an Error rate, FLOPs represents the total calculation amount of the deep learning model after compression, lambda represents a hyper-parameter, and u represents a safety index.
9. An apparatus for compression of security assurance for deep learning model based on reinforcement learning, comprising a memory, a processor and a computer program stored in the memory and executable on the computer processor, wherein the processor implements the method for compression of security assurance for deep learning model based on reinforcement learning according to any one of claims 1 to 8 when executing the computer program.
CN202110119514.9A 2021-01-28 2021-01-28 Deep learning model safety guarantee compression method and device based on reinforcement learning Active CN112766496B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110119514.9A CN112766496B (en) 2021-01-28 2021-01-28 Deep learning model safety guarantee compression method and device based on reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110119514.9A CN112766496B (en) 2021-01-28 2021-01-28 Deep learning model safety guarantee compression method and device based on reinforcement learning

Publications (2)

Publication Number Publication Date
CN112766496A true CN112766496A (en) 2021-05-07
CN112766496B CN112766496B (en) 2024-02-13

Family

ID=75706455

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110119514.9A Active CN112766496B (en) 2021-01-28 2021-01-28 Deep learning model safety guarantee compression method and device based on reinforcement learning

Country Status (1)

Country Link
CN (1) CN112766496B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114389990A (en) * 2022-01-07 2022-04-22 中国人民解放军国防科技大学 Shortest path blocking method and device based on deep reinforcement learning
CN114756294A (en) * 2022-03-22 2022-07-15 同济大学 Mobile edge calculation unloading method based on deep reinforcement learning
CN116129197A (en) * 2023-04-04 2023-05-16 中国科学院水生生物研究所 Fish classification method, system, equipment and medium based on reinforcement learning

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108090443A (en) * 2017-12-15 2018-05-29 华南理工大学 Scene text detection method and system based on deeply study
CN109754085A (en) * 2019-01-09 2019-05-14 中国人民解放军国防科技大学 Deep reinforcement learning-based large-scale network collapse method, storage device and storage medium
CN110728361A (en) * 2019-10-15 2020-01-24 四川虹微技术有限公司 Deep neural network compression method based on reinforcement learning
CN111340227A (en) * 2020-05-15 2020-06-26 支付宝(杭州)信息技术有限公司 Method and device for compressing business prediction model through reinforcement learning model

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108090443A (en) * 2017-12-15 2018-05-29 华南理工大学 Scene text detection method and system based on deeply study
CN109754085A (en) * 2019-01-09 2019-05-14 中国人民解放军国防科技大学 Deep reinforcement learning-based large-scale network collapse method, storage device and storage medium
CN110728361A (en) * 2019-10-15 2020-01-24 四川虹微技术有限公司 Deep neural network compression method based on reinforcement learning
CN111340227A (en) * 2020-05-15 2020-06-26 支付宝(杭州)信息技术有限公司 Method and device for compressing business prediction model through reinforcement learning model

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114389990A (en) * 2022-01-07 2022-04-22 中国人民解放军国防科技大学 Shortest path blocking method and device based on deep reinforcement learning
CN114756294A (en) * 2022-03-22 2022-07-15 同济大学 Mobile edge calculation unloading method based on deep reinforcement learning
CN114756294B (en) * 2022-03-22 2023-08-04 同济大学 Mobile edge computing and unloading method based on deep reinforcement learning
CN116129197A (en) * 2023-04-04 2023-05-16 中国科学院水生生物研究所 Fish classification method, system, equipment and medium based on reinforcement learning

Also Published As

Publication number Publication date
CN112766496B (en) 2024-02-13

Similar Documents

Publication Publication Date Title
CN109308318B (en) Training method, device, equipment and medium for cross-domain text emotion classification model
WO2021007812A1 (en) Deep neural network hyperparameter optimization method, electronic device and storage medium
WO2020048389A1 (en) Method for compressing neural network model, device, and computer apparatus
CN112766496B (en) Deep learning model safety guarantee compression method and device based on reinforcement learning
JP6483667B2 (en) System and method for performing Bayesian optimization
CN107729999A (en) Consider the deep neural network compression method of matrix correlation
CN111832627A (en) Image classification model training method, classification method and system for suppressing label noise
CN111242948B (en) Image processing method, image processing device, model training method, model training device, image processing equipment and storage medium
Salama et al. A novel ant colony algorithm for building neural network topologies
Ganguly et al. An introduction to variational inference
CN112000788B (en) Data processing method, device and computer readable storage medium
CN114819143A (en) Model compression method suitable for communication network field maintenance
CN116469155A (en) Complex action recognition method and device based on learnable Markov logic network
CN117636183A (en) Small sample remote sensing image classification method based on self-supervision pre-training
CN113408721A (en) Neural network structure searching method, apparatus, computer device and storage medium
JP7073171B2 (en) Learning equipment, learning methods and programs
CN115565669A (en) Cancer survival analysis method based on GAN and multitask learning
CN113239809B (en) Underwater sound target identification method based on multi-scale sparse SRU classification model
CN115131646A (en) Deep network model compression method based on discrete coefficient
CN115392434A (en) Depth model reinforcement method based on graph structure variation test
CN114625886A (en) Entity query method and system based on knowledge graph small sample relation learning model
CN113495986A (en) Data processing method and device
CN114444654A (en) NAS-oriented training-free neural network performance evaluation method, device and equipment
Nord Multivariate Time Series Data Generation using Generative Adversarial Networks: Generating Realistic Sensor Time Series Data of Vehicles with an Abnormal Behaviour using TimeGAN
Kodryan et al. Efficient language modeling with automatic relevance determination in recurrent neural networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant