CN112270367A

CN112270367A - Semantic information-based method for enhancing robustness of deep learning model

Info

Publication number: CN112270367A
Application number: CN202011222045.5A
Authority: CN
Inventors: 陈兴蜀; 王丽娜; 王伟; 岳亚伟; 唐瑞; 朱毅; 曾雪梅
Original assignee: Sichuan University
Current assignee: Sichuan University
Priority date: 2020-11-05
Filing date: 2020-11-05
Publication date: 2021-01-26

Abstract

The invention discloses a semantic information-based method for enhancing the robustness of a deep learning model, and belongs to the field of deep learning safety. In order to improve the defense capacity of the deep learning model against attack in the countermeasure environment and improve robustness, the invention designs a semantic information-based method for enhancing the robustness of the deep learning model against attack, which can fully mine the omitted semantic information near the decision boundary of the deep learning model and greatly improve the classification accuracy of the deep learning model against the countermeasure samples. The method comprises the steps of extracting general semantic information on a subset of a training data set in an iteration mode; the extracted general semantic information is used for increasing the diversity of training data in a random selection and simple superposition mode; respectively calculating loss functions on the clean samples and the samples added with the semantic information on the expanded new training set and summing the loss functions; and optimizing the summed loss function to train the deep learning model until the model converges.

Description

Semantic information-based method for enhancing robustness of deep learning model

Technical Field

The invention relates to the technical field of machine learning, in particular to a method for enhancing the robustness of a deep learning model based on semantic information.

Background

In recent years, with the accumulation of mass data and the dramatic increase in computing power, artificial intelligence represented by deep learning has been rapidly developed and has been gaining attention in many application scenarios. Deep learning models have achieved performance beyond that of humans on many tasks. However, different scenes and practical applications in the real world often face various situations such as high environmental complexity, strong uncertainty, incomplete information, information antagonism and interference, and the existing deep learning model excessively depends on massive data or knowledge, so that the existing deep learning model has the limitations of poor environmental change adaptability, easy attack in an antagonistic environment, single task and the like, and cannot meet the requirements of various scenes. Particularly, the deep learning model has the problem of poor robustness, the deep learning model with good performance on a test data set can be deceived by some countersamples which cannot be identified by human eyes, a serious error identification result occurs, and the deep learning model with poor robustness brings huge hidden dangers to the application in various fields.

At present, researches on improving robustness of a deep learning model are mainly divided into two types, one type discovers the upper bound of the robustness of the model by researching a new anti-attack form, and heuristically enhances the robustness of the model based on different attack methods, and the method can not provide powerful guarantee and often depends on a large number of samples. And another type of method uses formalization to ensure the lower bound of model robustness, and the method is reliable but has more assumed conditions, complex calculation and difficult application.

Disclosure of Invention

In view of the above problems, an object of the present invention is to provide a method for enhancing robustness of a deep learning model based on semantic information, which significantly improves robustness of the model by using missing semantic information, and can capture semantic information suitable for most samples with only a small number of samples. The technical scheme is as follows:

a method for enhancing the robustness of a deep learning model based on semantic information comprises the following steps:

the method comprises the following steps: iteratively extracting semantic information: for each category of the deep learning model classification task, randomly extracting a subset X not containing the category sample from training set data, iteratively extracting omitted semantic information near the category region on X, calculating to obtain a semantic information vector applicable to most samples on X, and constraining L of the vector by using a parameter eta_∞An upper bound of norm;

step two: sampling extended training set data: respectively extracting samples with specific ratios at random from a difference set of the training set and each class sample, adding the semantic information vector obtained by the first step, and keeping the rest unchanged to form a new training set;

step three: calculating an objective function: on the new training set, respectively calculating loss functions of the samples added with the semantic information and the original samples not added, and summing the loss functions;

step four: retraining the model: and (4) retraining the deep learning model by adopting the summed loss function obtained in the step three until the model converges.

Further, in the first step: respectively calculating missing semantic information vectors near decision boundaries of the corresponding regions of the class on a subset X of a difference set of the training set and each class; iteratively computing point by point on the subset X, at L_∞Under the upper bound constraint of the norm, sequentially calculating the components of the semantic information vector at each point and aggregating to obtain the final universal semantic information vector:

where r represents a semantic information vector, Δ r_iRepresented on the set XCalculating the component of the semantic information vector at the ith point, wherein the component is obtained by solving the optimization problem at the ith point; p_∞,ηRepresenting the projection operation on an infinite norm sphere with 0 as the center of a circle and eta as the radius; by calculating Δ r_iAnd solving an optimization problem and performing projection operation at every k steps to limit the size of the semantic information vector.

Further, the objective function in step 3 is:

wherein, theta^*Is a parameter of the model;

a loss function representing an original sample x to which no semantic information is added;

representing samples x with added semantic information_iA loss function of + r;

c represents an original training set; t represents a certain class in the model classification task, and T belongs to (1, 2.. eta., T), wherein T represents the total class number of the original model classification task; f (-) represents a deep learning model; θ, δ represent model parameters and perturbations added to the original sample, respectively.

The invention has the beneficial effects that:

1) the method extracts the semantic information missed in the area near the decision boundary of the model from the training data, and reserves the universal applicability of the semantic information vector to the sample point by one-by-one iterative polymerization on a series of samples.

2) The invention adopts a method of sample data diversity expansion to sample the training data set in proportion and uses semantic information to reconstruct to obtain the training set containing richer semantic information.

3) The method adopts the recalculated target function, retrains the deep learning model on the reconstructed training set until the model converges, and obtains stronger robustness.

Drawings

FIG. 1 is a conceptual diagram of the present invention.

Fig. 2 is a schematic diagram of semantic information vector according to the present invention (taking a ten-classified image data set as an example).

Detailed Description

The invention is described in further detail below with reference to the figures and specific embodiments. The deep learning model is easy to attack against the sample, which means that the model does not really learn the real concepts related to the decision, therefore, if the omitted information related to the real concepts can be extracted, the model can be helped to learn a clearer decision boundary closer to the reality, and the robustness of the model is enhanced. Such C to be extracted should not be derived from only a single instance of a sample, but should be applied to most samples, reflecting the decision-making concept-related information that is missed by the model, rather than causing the model to overfit individual samples.

The invention relates to a semantic information-based method for enhancing the robustness of a deep learning model, which mainly comprises the following steps of 1. The method mainly comprises the steps of iteratively extracting semantic information, sampling and expanding training set data, calculating a target function, retraining a model and the like. The conventional deep learning model a in fig. 1 is trained on the original training set samples, and the model can correctly classify clean samples, but cannot correctly classify antagonistic samples that are very close to the original samples. After the semantic information extracted by the method is used for reconstructing the original training set, the model is retrained, and the obtained model B with enhanced confrontation robustness not only can correctly classify clean samples, but also can correctly classify confrontation samples.

The calculation method is as follows:

1. iteratively extracting semantic information: for each category of the deep learning model classification task, randomly extracting a subset X not containing the category sample from training set data, iteratively extracting omitted semantic information near the category region on X, calculating to obtain a semantic information vector applicable to most samples on X, and constraining L of the vector by using a parameter eta_∞The upper bound of the norm.

Semantic information is closely related to human understanding of features and concepts. The semantic information is introduced into the model training process, so that the model can be helped to learn a real concept, and the robustness of confrontation is improved. This semantic information is missed in the model training process and should be related to the features of the sample as a whole, rather than the individual features of a certain sample, which means that the extracted semantic information must have generality to the sample. At the same time, such semantic information vectors should be represented in the form of very small perturbations to prevent destructive interference with the otherwise useful learning process.

Based on this, we propose a method for iteratively extracting semantic information, and the calculation formula is:

where r represents a semantic information vector, Δ r_iAnd representing the component of the semantic information vector calculated at the ith point on the set X, wherein the component is obtained by solving the optimization problem at the ith point. P_∞,ηRepresents the projection operation on an infinity norm sphere with 0 as the center and η as the radius. Obtaining a semantic information vector with universality through continuous iterative polymerization, and calculating delta r_iAnd solving an optimization problem and performing projection operation at every k steps to limit the size of the semantic information vector.

The significance of this step is that the countermeasure samples which are considered to be very close to the original samples by the existing research but can not be correctly identified shows that the model obtained by the existing deep learning method system does not learn the real concept, the phenomenon of the countermeasure samples represents the blind spots of the model, but the countermeasure samples generated by the existing research are disordered in appearance and do not contain any information, and paradox exists between the two. Therefore, the omitted semantic information is extracted, so that the robustness of the model can be improved, the model is promoted to learn a real concept, and a more accurate model is obtained. In addition, the iterative extraction mode keeps the universality of most samples, so that the efficiency of subsequent steps can be improved, and the extracted semantic information is ensured to be general information rather than local unimportant information which only acts on a single sample.

In fig. 2, semantic information extracted from the data set is amplified and visualized, two pictures are randomly extracted from each class and compared with the semantic information picture, and a part outlined by a box clearly shows the corresponding relationship between the semantic information and the semantic information in the original image.

2. Sampling and expanding a training set: and (3) respectively and randomly extracting samples with specific ratios from the difference set of the training set and each class sample, adding the semantic information vector obtained by the first step, and keeping the rest unchanged to form a new training set.

After extracting the semantic information iteratively, the training data set needs to be extended using the semantic information. The augmentation requires the addition of semantic information while not reducing the original distribution characteristics. Therefore, semantic information is added to the difference set between the training data and the data of different classes in a sampling mode.

Denote the original training set by C_tSamples belonging to a category T in C are represented, wherein T e (1, 2.. eta., T) represents the total number of categories of the original model classification task. At C \ C_tExtracting a sample according to the probability P, adding semantic information to the extracted sample x to change the sample x into x_i+ r. The remaining samples remain unchanged and are mixed well to form a new training set.

3. Calculating an objective function: on the new training set, loss functions are calculated and summed respectively for the samples with added semantic information and the original samples without added semantic information.

The traditional deep learning model training process obtains the parameters of the model by solving the optimization problem about the objective function J (θ, x, y), which is shown in the following formula (2):

wherein Θ is^*Are parameters of the model.

Solving a min-max problem by a general countertraining mode, as shown in formula (3)

Based on this, the method decomposes the objective function into two parts, as shown in formula (4)

Wherein, theta^*Is a parameter of the model;

representing samples x with added semantic information_iA loss function of + r;

c represents an original training set; t is a certain class in the model classification task, and f (-) represents a deep learning model.

4. Retraining the model: and (4) retraining the deep learning model by adopting the summed loss function obtained in the step three until the model converges.

Through the steps, a new training set and a new objective function added with semantic information are obtained. On the new dataset, the deep learning model is retrained using the newly computed objective function until convergence. The obtained new model not only can correctly classify the clean samples, but also can resist the counterattack, and the model with stronger counterattack robustness is obtained.

Claims

1. A method for enhancing the robustness of a deep learning model based on semantic information is characterized by comprising the following steps:

the method comprises the following steps: iteratively extracting semantic information: for each category of the deep learning model classification task, randomly extracting a sub-category not containing the category sample from the training set dataCollecting X, iteratively extracting omitted semantic information near the region on X, calculating to obtain a semantic information vector applicable to most samples on X, and constraining L of the vector by using parameter eta_∞An upper bound of norm;

2. The method for enhancing robustness of deep learning model based on semantic information as claimed in claim 1, wherein in the step one: respectively calculating missing semantic information vectors near decision boundaries of the corresponding regions of the class on a subset X of a difference set of the training set and each class; iteratively computing point by point on the subset X, at L_∞Under the upper bound constraint of the norm, sequentially calculating the components of the semantic information vector at each point and aggregating to obtain the final universal semantic information vector:

where r represents a semantic information vector, Δ r_iRepresenting the component of the semantic information vector calculated at the ith point on the set X, wherein the component is obtained by solving the optimization problem at the ith point; p_∞,ηRepresenting the projection operation on an infinite norm sphere with 0 as the center of a circle and eta as the radius; by calculating Δ r_iAnd solving an optimization problem and performing projection operation at every k steps to limit the size of the semantic information vector.

3. The method for enhancing robustness of deep learning model based on semantic information as claimed in claim 1, wherein the objective function in step 3 is:

wherein, theta^*Is a parameter of the model;

representing samples x with added semantic information_iA loss function of + r;