CN112766496A

CN112766496A - Deep learning model security guarantee compression method and device based on reinforcement learning

Info

Publication number: CN112766496A
Application number: CN202110119514.9A
Authority: CN
Inventors: 陈晋音; 王珏
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2021-01-28
Filing date: 2021-01-28
Publication date: 2021-05-07
Anticipated expiration: 2041-01-28
Also published as: CN112766496B

Abstract

The invention discloses a deep learning model security guarantee compression method and device based on reinforcement learning, which comprises the following steps: (1) modeling the deep learning model into a graph network by using the graph network; (2) extracting an embedded vector of the graph network by adopting GCN; (3) taking the current embedded vector of each node of the graph network as an environment state of reinforcement learning, predicting an action value based on the environment state by adopting the reinforcement learning, and pruning the embedded vector of each node according to the action value until the embedded vectors of all the nodes are pruned, so as to realize one-round compression of a deep learning model; (4) calculating error rate and safety according to the prediction result of the model after one round of compression on the sample data; (5) calculating a return value of performing one-round deep learning model compression by adopting reinforcement learning according to the error rate and the safety; (6) and (5) repeating the steps (3) to (5) based on the reported value until the iteration is terminated, and realizing the compression of the deep learning model.

Description

Deep learning model security guarantee compression method and device based on reinforcement learning

Technical Field

The invention relates to the field of deep learning, in particular to a deep learning model security guarantee compression method and device based on reinforcement learning.

Background

The rapid development of deep learning makes it possible to achieve performance even exceeding the human level on tasks such as image classification, speech recognition and text classification. These advances have led to the widespread use and deployment of deep learning in the medical field. Image recognition, wearable equipment, auxiliary diagnosis and treatment, rehabilitation auxiliary appliances and the like are key application fields of medical health at present. Although the depth model is rich in variety and excellent in performance, a large amount of weight consumes considerable storage and memory bandwidth, which brings difficulty to the deployment of the depth model in the medical device. Therefore, how to compress the model to reduce resource consumption is crucial to the deployment of the model at the edge.

Depth model compression methods can be broadly divided into network pruning, network quantization, low rank approximation, knowledge distillation, and compact network design. The main idea of network pruning is to eliminate relatively unimportant weights in the weight matrix, and then fine-tune the network to recover the network accuracy. Network quantization reduces the size of the model by reducing the number of bits used for each parameter store. The low rank approximation is to use the weight matrix of the original network as a full rank matrix, and then multiple low rank matrices can be used to approximate the original matrix to achieve the purpose of simplification. Knowledge distillation uses migratory learning by using the output of a previously trained complex model as a supervisory signal to train another simple network. The compact network design is to select a small and compact network in the model building stage.

Network pruning is a relatively common method among these compression methods. At the core of the network pruning technique is to determine the compression strategy of each layer, because there are different redundancies. This typically requires manually set heuristics and domain expertise to explore the space of the entire model design, making tradeoffs between model size, speed, and accuracy. AMC uses reinforcement learning to automatically sample the design space, and improves the compression quality of the model. The DMCP models channel pruning into a Markov process, provides a differentiable channel pruning method, and can directly optimize through the loss of a standard task and the gradient reduction of budget regularization. Both methods can achieve large amplitude compression of the model with no or slight loss in model accuracy. However, they do not consider the security of the compressed model. The depth model is vulnerable to attacks against the sample. In addition, a complex connection mode exists in a modern depth model structure, and the layer-by-layer pruning in the method does not consider the correlation between layers.

In conclusion, how to automatically learn the compression strategy, the compressed model can reduce the number of parameters, and meanwhile, the high accuracy is kept, and meanwhile, the safety is high, and the method has important theoretical and practical significance for the deployment of the depth model on the medical equipment.

Disclosure of Invention

In view of the above, the invention provides a deep learning model security guarantee compression method and device based on reinforcement learning, which can keep the accuracy and security of a deep learning model while realizing the compression of the deep learning model.

In order to achieve the purpose, the invention provides the following technical scheme:

a deep learning model security guarantee compression method based on reinforcement learning comprises the following steps:

(1) regarding each layer of the deep learning model for identifying or classifying tasks as a node, regarding the connection relation between layers as a connecting edge, and modeling the deep learning model into an image network;

(2) extracting an embedded vector of the graph network by adopting a graph convolution neural network;

(3) taking the current embedded vector of each node of the graph network as an environment state of reinforcement learning, predicting an action value based on the environment state by adopting the reinforcement learning, and pruning the embedded vector of each node according to the action value until the embedded vectors of all the nodes are pruned, so as to realize one-round compression of a deep learning model;

(4) calculating the error rate and the safety of the compressed deep learning model according to the prediction result of the deep learning model after one round of compression on the sample data for identifying the task or classifying the task;

(5) calculating a return value of one-round deep learning model compression by adopting reinforcement learning according to the error rate and the safety of the compressed deep learning model;

(6) and (5) repeating the steps (3) to (5) based on the reported value until the iteration is terminated, and realizing the compression of the deep learning model.

Preferably, in the step (1), when the deep learning model is modeled into an image network, aiming at a current layer of the deep learning model, the input sample dimension, the convolution kernel sliding step length and the calculated amount of the current layer, the calculated total amount reduced by all layers in the front, the calculated total amount remained by all layers in the back, and a pruning strategy adopted by the previous layer form a feature vector of the current layer;

when a connection relationship exists between two layers, the connection edge between the two layers is 1, otherwise, the connection edge is 0, and accordingly, an adjacent matrix of the graph network is constructed.

Preferably, in the step (2), a feature matrix composed of feature vectors of the nodes and an adjacent matrix of the graph network are used as inputs of the graph convolution neural network, and at least 2 layers of the graph convolution neural network are adopted to extract an embedded vector of each node, wherein the embedded vector of each node has the same dimension as the feature vector.

Preferably, in step (3), the pruning of the embedded vector of each node according to the action value includes:

firstly, determining a modification strategy according to the action value when the action value is in

Then the maximum corresponding selection pruning strategy is adopted; when the action value is at

A greedy search pruning strategy is adopted; when the action value is at

Selecting a pruning strategy for pruning the channel according to the loss of the feature map;

then, determining the pruning sparse rate according to the pruning strategy, and enabling the sparse rate corresponding to the pruning strategy to be [0,1 ];

and finally, pruning the embedded vector of the current node according to the adopted pruning strategy and the corresponding sparse rate, and realizing the compression of the deep learning model layer structure corresponding to the node.

Preferably, determining the sparsity rate of pruning according to the pruning strategy comprises: and taking the difference between the minimum value of the action value range corresponding to the pruning strategy and the calculated action value as 3 times as the sparse rate.

Preferably, in the deep learning process, after predicting a current action value based on a current environment state by adopting an action network, calculating a decision value by adopting a decision network based on the current environment state, the current action and a next environment state corresponding to a next adjacent node, and constructing a loss function according to the decision value and a baseline reward value to update network parameters of the decision network so as to update the network parameters of the action network, wherein the network parameters are used for predicting the next action value based on the next environment state;

wherein, the Loss function Loss is:

y_i＝r_i-b+γQ(s_i+1,u(s_i+1)|θ^Q)

wherein, Q(s)_i,a_i|θ^Q) With the expression parameter theta^QFor the environmental state s of the ith sample_iAnd an action value a_iN represents a sample of transitions taken from a pool of experience for updating decision network parameters, b represents a baseline reward value that is an exponential moving average of previous reward values, r_iRepresents the recorded return value in the ith sample, which is the return value R divided by the step number T of one curtain, gamma represents the discount factor, Q(s)_i+1,u(s_i+1)|θ^Q) Is given a parameter of theta^QAccording to the environment state s of the (i + 1) th sample_i+1And action according to environmental status s_i+1Generated action value u(s)_i+1) The evaluation value of (1).

Preferably, in the step (4), calculating the error rate and the safety of the compressed deep learning model includes:

inputting sample data for identifying or classifying tasks into the compressed deep learning model to obtain a prediction result, determining the number of correctly classified samples according to the prediction result, calculating the prediction accuracy of the compressed deep learning model according to the ratio of the number of correctly classified samples to the total number of samples, and then calculating the error rate of the compressed deep learning model according to the prediction accuracy;

and adopting sample data for identifying or classifying the tasks and the CLEVER score calculated by the compressed deep learning model as safety.

Preferably, in step (5), the calculation formula of the report value R is:

R＝-Error·log(FLOPs)+λu

wherein Error represents an Error rate, FLOPs represents the total calculation amount of the deep learning model after compression, lambda represents a hyper-parameter, and u represents a safety index.

An apparatus for compression of security assurance for deep learning model based on reinforcement learning, comprising a memory, a processor and a computer program stored in the memory and executable on the computer processor, wherein the processor implements the method for compression of security assurance for deep learning model based on reinforcement learning when executing the computer program.

Compared with the prior art, the invention has the beneficial effects that:

a deep learning model security guarantee compression method and device based on reinforcement learning are disclosed, wherein the model structure is modeled in a graph network mode, so that the relation between layers can be fully considered, and the model structure is not only considered to be a structure which is only sequentially connected between layers, and is more universal. By using a reinforcement learning framework, the compression strategy and the sparse rate of each layer can be automatically selected without manual setting, and in addition, the parameters of the model can be reduced and the safety can be kept as much as possible while the accuracy of the compressed model is kept through the setting of the return value R.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a flowchart of a deep learning model security guarantee compression method based on reinforcement learning according to an embodiment of the present invention;

FIG. 2 is a diagram of a reinforcement learning process provided by an embodiment of the present invention;

FIG. 3 is a process diagram of reinforcement learning model based compression according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the detailed description and specific examples, while indicating the scope of the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention.

In order to realize model compression and maintain accuracy and safety, the embodiment provides a deep learning model safety guarantee compression method and device based on reinforcement learning.

Fig. 1 is a flowchart of a deep learning model security guarantee compression method based on reinforcement learning according to an embodiment of the present invention. As shown in fig. 1, the method for compressing security assurance of deep learning model based on reinforcement learning according to the embodiment includes the following steps:

and step 1, modeling a compressed deep learning model structure by using a graph network mode.

In the embodiment, each layer of the deep learning model for identifying or classifying the task is regarded as a node, and the connection relation between the layers is regarded as a connecting edge, so that the deep learning model is modeled into the graph network.

The key in the graph network is the node, the node attribute and the node connection relation. In the modeling process, each layer of the deep learning model is taken as a node, 11 features are used for representing feature vectors of the node, wherein t represents the layerIndex, then the feature vector of each node is represented as: (t, n, c, h, w, stride, k, FLOPs [ t],reduced,rest,a_t-1) Where n × c × k × k is the dimension of the convolution kernel, c × h × w is the dimension of the input sample, stride is the step length of the convolution kernel sliding, FLOPs [ t]Scaling these eigenvalues to [0,1] for the calculated quantities FLOPs of the t-th layer, reduced for the calculated total quantities reduced for the previous layers, rest for the calculated total quantities remaining in the subsequent layers]. Such a feature may well distinguish between convolutional layers. The node connection relationship is represented by an adjacency matrix A in which an element a_i,jDetermined by the relationship between nodes i and j. If connection exists between layers, corresponding elements in the adjacency matrix are set to be 1, and the general deep learning model is a layer-by-layer connection mode, and the residual block needs to be noticed and has connection spanning multiple layers. Furthermore, DenseNet also has many connections across layers, and when constructing the adjacency matrix, it should be noted that the corresponding element is set to 1.

And 2, extracting the embedded vector of the graph network by adopting a graph convolution neural network.

In the embodiment, a 2-layer graph convolution neural network is adopted to extract an embedded vector of a graph network, and the propagation rule of each layer is as follows:

wherein the content of the first and second substances,

i.e. adding an identity matrix I to the adjacency matrix A_N，

Is that

Degree matrix of, i.e.

H^(l)Is the active cell matrix of the l-th layer, H⁽⁰⁾Feature moments formed for the feature vectors x of the nodesX, W of matrix^(l)For the parameter matrix of each layer, σ is a sigmoid activation function, and the input values can be mapped to [0,1 [ ]]. No class mark is used for training the parameter W, and a better network information aggregation effect can be obtained only by randomly initializing W. The output embedded vector is represented as: (y)₁,y₂,…,y_l)＝G(x₁,x₂,…,x_l) Where G represents the entire GCN model and l represents the number of features, where the dimension of the selected output embedding vector is the same as the dimension of the input feature vector.

And 3, pruning the deep learning model layer by adopting a reinforcement learning algorithm DDPG until the last layer is obtained.

In the embodiment, the current embedded vector of each node of the graph network is used as the environment state of reinforcement learning, the action value based on the environment state is predicted by adopting the reinforcement learning, and the embedded vector of each node is pruned according to the action value until the embedded vectors of all the nodes are pruned, so that one-round compression of a deep learning model is realized. The specific process is as follows:

(a) and taking the current embedded vector of each node of the graph network as the environment state of the reinforcement learning.

In an embodiment, each layer L of the deep learning model is_tI.e. the current embedded vector of each node of the graph network as the environmental state s of reinforcement learning_t. It should be noted that the embedded vectors of all layers cannot be used as the state at once because the subsequent pruning operation will affect the feature vectors of the subsequent layers, and then the changed feature vectors need to be input to the GCN to generate the embedded vectors. That is, in the embodiment, the layer structure of the deep learning model is pruned layer by layer, so that after the pruning of the current layer is finished, the feature vector corresponding to the next layer is input to the GCN to generate the embedded vector.

(b) And predicting an action value based on the environment state by adopting reinforcement learning, and pruning the embedded vector of each node according to the action value.

DDPG agent receives current environment state s from environment_tThen outputs one by using a decision network[0,1]Is taken as the action value a_tThe motion value a_tThe representative is the selected pruning strategy and the sparsity rate thereof, and after a certain pruning strategy is determined, the sparsity rate can determine a specific pruning mode. Three channel pruning methods are selected, the first method is maximum corresponding selection, namely a channel to be pruned is determined according to the weight value, the second method is a greedy search mode, an optimal channel combination is selected, the effect of the channel combination on the layer is optimal, the third method is that the channel to be pruned is selected according to the loss of the feature diagram, and the loss of the feature diagram is required to be as small as possible by the channel to be pruned. The corresponding sparse rate of each pruning strategy is [0,1]]Therefore, the action value of the DDPG output needs to be divided into three halves,

and

and respectively establishing mapping relations. After the current layer is compressed using the corresponding strategy and sparsity, the agent moves to the next layer L_t+1Reception state s_t+1。

(c) According to the sequence of the structural layers in the deep learning model and the strategy quasi-layer compression of DDPG, until the last layer L is finished_TAs shown in fig. 3.

The structure of the DDPG is as shown in fig. 2, the DDPG has two networks, an Actor network and a Critic network, the Actor network generates an action, a state and an action value are input into the Critic network to obtain a corresponding Q value, an objective function of the Actor is to maximize the Q value, and an objective function of the Critic network is to minimize an error of Q (s, a). Here, the Actor network and Critic network are both set as two hidden layers, each layer has 300 neurons, soft update is performed using τ of 0.01, and training network uses a batch size of 64 and 2000 as the size of the buffer. Noise processing for agent exploration, here using a truncated normal distribution to make the agent explore the unknown space as much as possible:

during the search, σ is initialized to 0.5, and after 100 searches, σ search is exponentially reduced by 300.

Like Block-QNN, where a variant form of the Bellman equation is applied, each state-to-state transition can be represented by a quadruple(s)_t,a_t,R,s_t+1) And expressing, wherein R is a return value obtained after the whole deep learning model is compressed, and is calculated according to accuracy and safety. Since the reward value is only available at the end of the session, the baseline reward value b is used to reduce the variance of the gradient estimate at the time of the process update, which is the exponential moving average of the previous reward:

y_i＝r_i-b+γQ(s_i+1,u(s_i+1)|θ^Q)

where the discount factor gamma is set to 1 to avoid short-term reward priority being too high.

And 4, calculating the error rate and the safety of the compressed deep learning model and obtaining a return value.

In an embodiment, accuracy and security are evaluated on a validation set used to identify or classify tasks, and then a reward value is calculated back to the agent. The accuracy is evaluated by using an accuracy evaluation method in image classification, namely, the percentage of the number of correctly classified samples to the total number of samples. The Error rate Error is calculated as:

wherein N is_correctNumber of samples representing correct classification, N_totalRepresenting the total number of samples.

The method for evaluating the safety uses the CLEVER score, is irrelevant to attacks, has reasonable calculation amount and can be used for large dataThe method is more efficient by using a limit value theory to estimate the required Rispisis constant. The calculation method of the CLEVER score of a single sample is as follows, the calculation method of the CLEVER score of the target attack is named as CLEVER-t, the input is a classifier function f (x), and the sample x₀And a corresponding class c, target class j, perturbation norm p, on the fixation sphere B_P(x₀M) taking N uniformly and independently_bBatches of N in each batch_sOne sample, M is the maximum perturbation. Firstly, initialization is carried out:

g(x)←f_c(x)-f_j(x)

in N_bSelecting a batch i from the batches, and sequentially selecting N from the batches_sA sample, the sample being denoted x⁽ⁱ ^,k)Where k denotes the kth selected sample of the batch, x^(i.k)∈B_p(x₀M), b can be calculated by back propagation_ik:

Update S for each batch:

S←S∪{max_k{b_ik}}

the maximum likelihood estimate of the location parameter of the inverse Wilson distribution over S is taken and recorded as

The CLEVER score for this sample is then u:

for the attack without the target, the CLEVER score of the attack with the target is calculated for each target class, and the minimum value is taken. For a single sample, the above mentioned CLEVER score may randomly select a portion of a data set for the data set, and then take the average as the security index of the classifier on a certain data set.

The return value R is calculated in the following manner:

R＝-Error·log(FLOPs)+λu

this reward function is sensitive to variations in Error rate Error, since it maximizes R, so Error is reduced as much as possible. Because of the large number of floating-point operands, the logarithm is multiplied by the error rate as a small stimulus. The CLEVER score is not suitable to take the logarithm as an excitation, since this value is small and therefore multiplied by a hyperparameter lambda.

And 5, feeding back the report value R to the intelligent agent, and repeating the step 3 and the step 4 until iteration.

In the embodiment, the reinforcement learning strategy is continuously updated according to the requirement of the number of rounds set by experiments, namely, the total 400 rounds in the DDPG setting, so that a better compression effect is achieved.

The embodiment also provides a deep learning model security guarantee compression device based on reinforcement learning, which comprises a memory, a processor and a computer program stored in the memory and executable on the computer processor, wherein the processor implements the deep learning model security guarantee compression method based on reinforcement learning when executing the computer program.

The deep learning model security guarantee compression method and device based on reinforcement learning can be used for pruning compression of deep learning models of image classification tasks, voice recognition tasks and text classification tasks, so that on the basis of guaranteeing the safety and accuracy of the models, the weight of the models is reduced, the compressed deep learning models can reduce the calculated amount and reduce the resource consumption in the image classification tasks, the voice recognition tasks and the text classification tasks, and are convenient to deploy at the edge end.

When the deep learning model is used for an image classification task, the adopted sample data is an image, and the error rate and the safety of the compressed model are calculated according to the image. When the deep learning model is used for a voice recognition task, the adopted sample data is voice data, and the error rate and the safety of the compressed model are calculated according to the voice data. When the deep learning model is used for a text classification task, sample data adopted is text data, and the error rate and the safety of the compressed model are calculated according to the text data.

According to the deep learning model security guarantee compression method and device based on reinforcement learning, the model structure is modeled in a graph network mode, the relation between layers can be fully considered, the fact that only structures which are sequentially connected between the layers are considered is not the only consideration, and the method and device are more universal. By using a reinforcement learning framework, the compression strategy and the sparse rate of each layer can be automatically selected without manual setting, and the network structure is trimmed layer by layer to the last layer according to the compression strategy and the sparse rate. In addition, by setting the return value R, the accuracy of the compressed model can be kept, and meanwhile, the parameters of the model are reduced as much as possible and the safety is kept.

The above-mentioned embodiments are intended to illustrate the technical solutions and advantages of the present invention, and it should be understood that the above-mentioned embodiments are only the most preferred embodiments of the present invention, and are not intended to limit the present invention, and any modifications, additions, equivalents, etc. made within the scope of the principles of the present invention should be included in the scope of the present invention.

Claims

1. A deep learning model security guarantee compression method based on reinforcement learning is characterized by comprising the following steps:

(3) taking the current embedded vector of each node of the graph network as an environmental state of reinforcement learning, predicting an action value taken based on the environmental state by adopting the reinforcement learning, and realizing the pruning of the current layer of the model according to the action value until each layer of the model is pruned, thereby realizing one-round compression of the deep learning model;

2. The reinforcement learning-based deep learning model security guarantee compression method according to claim 1, wherein in the step (1), when the deep learning model is modeled into a graph network, for a current layer of the deep learning model, an input sample dimension, a convolution kernel sliding step length and a calculated amount of the current layer, a calculated total amount of reduction of all previous layers and a calculated total amount of residual of all subsequent layers are combined into a feature vector of the current layer by a pruning strategy adopted by a previous layer;

3. The reinforcement learning-based deep learning model security guarantee compression method of claim 1, wherein in the step (2), a feature matrix composed of feature vectors of nodes and an adjacent matrix of a graph network are used as input of the graph convolution neural network, at least 2 layers of the graph convolution neural network are used for extracting an embedded vector of each node, and the embedded vector of each node has the same dimension as the feature vector.

4. The reinforcement learning-based deep learning model security guarantee compression method of claim 1, wherein in the step (3), the pruning of the embedded vector of each node according to the action value comprises:

A greedy search pruning strategy is adopted; when the action value is at

5. The reinforcement learning-based deep learning model security assurance compression method of claim 1, wherein determining the pruning sparsity rate according to a pruning strategy comprises: and taking the difference between the minimum value of the action value range corresponding to the pruning strategy and the calculated action value as 3 times as the sparse rate.

6. The reinforcement learning-based deep learning model security guarantee compression method of claim 1, wherein in the deep learning process, after predicting a current action value based on a current environment state by using an action network, calculating a decision value based on the current environment state, a current action and a next environment state corresponding to a next adjacent node by using a decision network, and constructing a loss function according to the decision value and a baseline reward value to update network parameters of the decision network, thereby updating the network parameters of the action network for predicting a next action value based on the next environment state;

wherein, the Loss function Loss is:

y_i＝r_i-b+γQ(s_i+1,u(s_i+1)|θ^Q)

wherein, Q(s)_i,a_i|θ^Q) With the expression parameter theta^QFor the environment state s of the ith sample in the extracted N samples_iAnd an action value a_iN represents a sample of successive transitions taken from the experience buffer for updating the decision network parameters, b represents a baseline reward value that is an exponentially moving average of previous reward values, r_iRepresents the recorded return value in the ith sample, which is the return value R divided by the step number T of one curtain, gamma represents the discount factor, Q(s)_i+1,u(s_i+1)|θ^Q) Is given a parameter of theta^QAccording to the environment state s of the (i + 1) th sample_i+1And action according to environmental status s_i+1Generated action value u(s)_i+1) The evaluation value of (1).

7. The reinforcement learning-based deep learning model security assurance compression method of claim 1, wherein in the step (4), the calculating of the error rate and the security of the compressed deep learning model comprises:

8. The reinforcement learning-based deep learning model security guarantee compression method as claimed in claim 1, wherein in step (5), the calculation formula of the report value R is:

R＝-Error·log(FLOPs)+λu

9. An apparatus for compression of security assurance for deep learning model based on reinforcement learning, comprising a memory, a processor and a computer program stored in the memory and executable on the computer processor, wherein the processor implements the method for compression of security assurance for deep learning model based on reinforcement learning according to any one of claims 1 to 8 when executing the computer program.