CN115309897A

CN115309897A - Chinese multi-modal confrontation sample defense method based on confrontation training and contrast learning

Info

Publication number: CN115309897A
Application number: CN202210891903.8A
Authority: CN
Inventors: 张连新; 郑海; 王靖午
Original assignee: Fangying Jintai Technology Beijing Co ltd
Current assignee: Fangying Jintai Technology Beijing Co ltd
Priority date: 2022-07-27
Filing date: 2022-07-27
Publication date: 2022-11-08

Abstract

The Chinese multi-modal confrontation sample defense method based on confrontation training and contrast learning comprises the following steps: preprocessing a text data set; sequentially inputting samples of each batch in the training set into an original defense model to perform forward calculation to obtain a sample vector of each batch of samples; calculating the gradient of the current batch of samples under the loss function of the original defense model and generating confrontation samples by confrontation disturbance; inputting the confrontation sample into an original defense model to obtain a feature vector of the confrontation sample of the current batch; transforming a loss function, and training an original defense model through the minimization of the loss function; and training to obtain an improved defense model of the current batch of samples, and selecting the improved defense model which best appears on the verification set as a final defense model. Aiming at the problems that the Chinese spelling detection performance is limited, the parallelization operation cannot be realized, and the operation speed is low, the method improves the multi-mode ChineseBERT model fusing the Chinese character pronunciation, the character form and the character meaning information, so that the method can better defend against the Chinese anti-sample attack in the real environment.

Description

Chinese multi-modal confrontation sample defense method based on confrontation training and contrast learning

The technical field is as follows:

the invention relates to the technical field of information security, in particular to a Chinese multi-modal confrontation sample defense method based on confrontation training and contrast learning.

The background art comprises the following steps:

the natural language processing technology is known as 'the bright pearl on the artificial intelligence imperial crown', and the natural language processing technology mainly researches various theories and methods of human language between people and computers and computer-aided interaction between people and people, helps individuals, enterprises and governments automatically understand mass text contents, and helps people of different languages and countries to understand each other. To date, with the continuous development of computer technology and artificial intelligence, natural language processing technology has been applied in many practical fields, such as machine translation, social media monitoring, speech processing, and the like.

With the continuous development of deep learning, the natural language processing technology based on the deep neural network surpasses the traditional machine learning method based on statistics on tasks such as text classification, machine translation, dialogue system and the like. At present, two deep learning models are mainly used for solving the problem of natural language processing, and the first method is to combine the deep learning models such as CNN, LSTM and the like with Word vector technologies such as Word2vec, glove and the like to better mine the characteristics such as local and overall time sequence and the like in a text; the other method is a pre-training language model represented by BERT, the BERT model uses a transformer as a basic framework, uses massive text data to carry out unsupervised training, has huge parameters and better text understanding capability, surpasses the existing method in multiple natural language processing tasks, and becomes a new milestone.

However, researchers find that natural language processing technology based on deep learning faces threats against samples like other deep learning algorithms due to the inherent characteristics of local linearity and high dimensionality of data of deep neural networks. The confrontation sample is a section of input artificially and elaborately modified on the original sample, and the input can successfully attack the deep neural network model with high probability to generate error output, but the original judgment is still kept for people. The method brings many problems to the application of natural language processing in a real environment, for example, some apps use an emotion analysis technology, provide recommendation service for users according to historical comments of the users, and give recommendation scores, but attackers can spread false information maliciously through resistant texts to obtain profits, and profit loss is caused to consumers; meanwhile, in order to bypass information detection systems such as violence, pornography and abuse on the network, attackers generate countermeasures to samples and pollute the network environment. These attacks can cause unpredictable economic losses to merchants and enterprises, and also seriously affect the mental health of netizens, especially teenagers. Therefore, the system understands the countersample and defends the countersample attack so as to construct a robust model is one of the hot research problems of the academia at present. However, the defense research of the existing countermeasure sample mainly focuses on the English field, and because the difference of Chinese and English languages is large, chinese has many characters, large search space and contains many characters with shape and pronunciation, characters with image and the like, the existing defense method of the English countermeasure sample cannot be directly applied to the defense of the Chinese countermeasure sample.

In view of the above problems, patent publication No. CN114169443A discloses a word-level text confrontation sample detection method, which models the confrontation sample detection problem as a binary classification problem, and divides the problem into two steps to detect confrontation samples, firstly, the confrontation samples of corresponding normal samples are generated by using a confrontation sample attack algorithm, feature vectors for characterizing the normal and confrontation samples are respectively extracted, and then, the corresponding deep learning model is used to construct a confrontation sample detection binary classification model. Patent publication No. CN110457701A discloses an interpretability-based countermeasure training method for improving detection effect of a model on malicious texts, which converts all texts into readable texts by processing input texts with a neutralization filter, a de-aliasing filter and a spell check, constructs a text classification model based on the readable texts, trains a text classification model for spell-checked input and corresponding labels thereof, sequentially generates text countermeasure samples according to a method for generating countermeasure samples and an initial text classification model, and finally retrains the original classification model by using the generated text countermeasure samples and original samples to obtain a text classification model capable of defending against sample attacks.

In summary, the current method and system cannot solve the following problems: because there is no division between Chinese words, the Chinese spelling detection performance is limited; there are countertraining methods that adjust the classification boundaries of the classifier by expanding the data set, but such methods work generally in the face of new countervailing samples.

The invention content is as follows:

aiming at the problems, the invention designs a Chinese multi-modal confrontation sample defense method based on confrontation training and contrast learning, and aims at the problems that the existing Chinese spelling detection performance is limited, LSTM has gradient disappearance in the aspect of processing long text, can not run in parallel and has slow model running speed.

The Chinese multi-modal confrontation sample defense method based on confrontation training and contrast learning comprises the following steps:

the method comprises the following steps: preprocessing the text data set: dividing a text data set into a training set, a test set and a verification set according to a certain proportion, and dividing the training set into batch batches of batch _ num, wherein each batch contains batch _ size text data; setting the number of batches and the number of texts in each batch according to the structure of the training model;

step two: the method comprises the steps that samples of each batch in a training set are sequentially input into an original defense model to be subjected to forward calculation to obtain sample vectors of the samples of each batch, wherein the original defense model is a ChinesBERT model, the samples are text data of each batch in the training set, the ChinesBERT pre-training language model processes Chinese character voice, character patterns and semantic information and fuses the Chinese character voice, the character patterns and the semantic information into dynamic Chinese character vectors with rich expression, and meanwhile, a transformer model which processes long texts and runs in parallel with advantages is used for pre-training to form the Chinese pre-training language model fusing the character voice, the character patterns and the semantic information, and the ChinesBERT is fused with multi-modal information, so that the Chinese text pre-training language model is suitable for being used as a basic defense model of a Chinese anti-multi-modal attack method;

step three: obtaining the gradient of word-sound embedding of the current batch of samples under the loss function of the original defense model by the back propagation of the FGSM algorithm

Gradient of word sense embedding

Gradient of a glyph intersection

And respectively calculating the anti-disturbance of the pronunciation according to the gradient of pronunciation embedding, meaning embedding and font embedding

Word sense confrontation perturbation

Countermeasure perturbation of a sum font

Wherein L is _CE For the cross entropy loss function, the element is the step length of each step of FGSM algorithm iteration, x is a sample,

is a gradient; through FGSM algorithm, attack the batch data input by the original defense model to obtain the antagonistic disturbance of each sample vector, including the disturbance of word-pronunciation vector, font vector and font-meaning vector, the purpose of this step is to change the target category and generate the antagonistic vector sample aiming at the original defense model;

step four: adding the counterdisturbance of the pronunciation, the counterdisturbance of the meaning and the counterdisturbance of the font into the pronunciation embedding, the meaning embedding and the font embedding of the sample respectively to generate the pronunciation embedding of the countersample: x' ₁ ＝x ₁ +r _adv1 And word sense embedding of the confrontation sample: x' ₂ ＝x ₂ +r _adv2 And font embedding of the confrontation sample: x' ₃ ＝x ₃ +r _adv3 (ii) a Wherein, x' ₁ Is the phoneme embedding, x 'of the antagonistic sample' ₂ Is word sense embedding, x 'of the confrontation sample' ₃ Is against glyph embedding of the sample;

step five: inputting the character sound embedding, character meaning embedding and character pattern embedding of the confrontation samples into the original defense model to obtain a feature vector F ([ x' ₁ ,x' ₂ ,x' ₃ ])；

Step six: modifying a loss function, calculating the loss value of each sample in the current batch, and training an original defense model through the minimization of the loss function;

step seven: selecting a next sample, repeating the third step to the sixth step, if the loss value of the current original defense model cannot be improved in a certain training turn or the accuracy of the current original defense model on the verification set is reduced, stopping training to obtain an improved defense model of the current batch of samples, namely the improved defense model with higher accuracy and defense capability under the current weight parameters;

step eight: selecting the next batch, repeating the third step to the seventh step, calculating the improved defense models of all the batches of samples, and selecting the improved defense model which best expresses on the verification set from the improved defense models of all the batches of samples as a final defense model;

step nine: for a new confrontation sample, predicting the category of the new confrontation sample by using a final defense model, if the output category is not changed, successfully defending, taking a text data set as a news text classification task as an example, and classifying the categories into financial news, sports news, social news and the like; for example, the emotion polarity classification task includes negative emotion, neutral emotion, and positive emotion.

Preferably, the text data set is one of a Thucnews data set, a bean photo review data set, a carry-away average review data set, and a data set acquired through a Github code hosting website.

Preferably, the value range of epsilon in the step three is [0,0.5].

Preferably, the method of the sixth step specifically includes the following steps:

selecting a sample vector in the current batch and the feature vector of the antagonistic sample as a positive sample pair, and using the vectors of other samples in the batch and the feature vectors of the antagonistic samples of other samples as a negative sample pair; the loss function is modified to: l = (1- λ) L _CE +λL _CL (ii) a Wherein, the cross entropy loss function:

the contrast loss function is:

wherein, y _i,c In order to be a true probability,

to predict probability, F is the original defense model, x ⁺ Is a positive sample, x ^- The sample class is a negative sample, i belongs to N, N is the number of samples in a batch, c is the number of sample classes, i is the ith sample in the current batch, and lambda is a weight parameter;

calculating loss values of the positive sample pair and the negative sample pair through the modified loss function, and pulling the positive sample pair distance and the negative sample pair distance away through minimizing the loss values, wherein more data are introduced into the positive sample pair and the negative sample pair, and the sample pairs are generated for the original defense model in a targeted manner, so that the original defense model can better defend the confrontation samples, and a good classification effect can be obtained under the condition of small samples;

and calculating the loss value of each sample vector and the positive and negative sample pairs thereof in the current batch according to the loss function, returning the loss function to the parameters of the updated model, and training the original defense model.

Preferably, the loss function is reconstructed through contrast learning, the contrast learning is an automatic supervision learning method and belongs to an unsupervised learning paradigm, and the core idea of the contrast learning is to enable a model to learn how to distinguish a positive sample pair from other negative sample pairs in a feature space, and grasp the essential features of the samples to learn the feature representation of each sample. With this approach, a machine learning model can be trained to distinguish between similar and different samples. The learning paradigm for contrast learning can be expressed as: for arbitrary data x, the goal of contrast learning is to learn an encoder f such that for a positive sample x of x ⁺ And negative sample x ^- Comprises the following steps: score (F (x), F (x) ⁺ ))＞＞Score(F(x),F(x ^- ))。

Preferably, λ ∈ [0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9].

Preferably, the weighting parameter λ that minimizes the loss value of the loss function L is selected by a grid search.

Preferably, the batch _ size is one of 64, 128 or 256.

Preferably, the training set, the test set and the verification set are divided according to the proportion of 8.

Preferably, in the step eight, through cross validation and model early-stopping technology, the improved defense model with the best classification effect on the validation set is selected as the final defense model.

According to the Chinese multi-modal confrontation sample defense method based on the confrontation training and the contrast learning, the training data set of the ChineseBERT has no confrontation sample, so that the effect of directly using the ChineseBERT as a defense model cannot be expected, and the ChineseBERT which is modified by the method is superior to the existing model in the aspects of accuracy, robustness and the like, so that the method can be used for the defense model of the Chinese multi-modal attack. Meanwhile, in the practical situation, because the use condition of the confrontation samples is limited, the number of the confrontation samples is small, the confrontation samples cannot be directly obtained, and the performance of the model can be improved under the condition of few samples through the contrast learning, so that the Chinese confrontation text attack can be better defended under the condition of few samples through the combination of the confrontation training and the contrast learning.

Description of the drawings:

FIG. 1 is a flow chart of a Chinese multi-modal confrontation sample defense method based on confrontation training and contrast learning provided by the invention.

FIG. 2 is a schematic diagram of a comparative learning process in accordance with the present invention.

The specific implementation mode is as follows:

in order to make the technical scheme of the invention easier to understand, the Chinese multi-modal confrontation sample defense method based on confrontation training and contrast learning designed by the invention is clearly and completely described by using a mode of a specific embodiment.

The Chinese multi-modal confrontation sample defense method based on confrontation training and contrast learning provided by the invention is described in the following by combining the specification, the attached drawings 1 and the attached drawings 2, and the method specifically comprises the following steps:

step 100: selecting a distance-carrying average comment data set as a text data set, and preprocessing the text data set: dividing a text data set into a training set, a testing set and a verification set according to the proportion of 8;

step 110: sequentially inputting samples of each batch in the training set into an original ChineseBERT model to perform forward calculation to obtain a sample vector of each batch of samples;

step 120: performing back propagation through FGSM algorithm to obtain the gradient of the word-pronunciation embedding of the current batch of samples under the loss function of the original ChineseBERT model

Gradient of word sense embedding

Gradient of sum-font embedding

Step 130: calculating the word-pronunciation confrontation according to the gradient of word-pronunciation embedding, word meaning embedding and character pattern embeddingDisturbance

Word sense confrontation perturbation

Countermeasure perturbation of a sum font

Step 140: adding the anti-disturbance of the pronunciation, the anti-disturbance of the meaning and the anti-disturbance of the font into the pronunciation embedding, the meaning embedding and the font embedding of the sample respectively to generate the pronunciation embedding of the anti-sample: x' ₁ ＝x ₁ +r _adv1 Word sense embedding of the confrontation sample: x' ₂ ＝x ₂ +r _adv2 And font embedding of the confrontation sample: x' ₃ ＝x ₃ +r _adv3 ；

Step 150: inputting the character sound embedding, character meaning embedding and character pattern embedding of the confrontation samples into the original defense model to obtain a feature vector F ([ x' ₁ ,x' ₂ ,x' ₃ ])；

Step 160: selecting a sample vector in the current batch and the feature vector of the antagonistic sample as a positive sample pair, and using the vectors of other samples in the batch and the feature vectors of the antagonistic samples of other samples as a negative sample pair;

step 170: λ =0.3 is obtained by grid search, and the loss function is transformed by contrast learning as: l =0.7L _CE +0.3L _CL ；

Step 180: calculating loss values of the positive sample pair and the negative sample pair through the modified loss function, and pulling in the distance of the positive sample pair and pulling out the distance of the negative sample pair through minimizing the loss values;

step 190: calculating the loss value of each sample vector and the positive and negative sample pairs thereof in the current batch according to the loss function, returning the loss function to the parameters of the updated model, and training the original defense model

Step 200: repeating the steps 120 to 190, if the loss value of the current original defense model cannot be improved within a certain training turn or the accuracy of the current original defense model on the verification set is reduced, stopping training to obtain an improved defense model of the current batch of samples;

step 210: repeating the steps 120 to 200, calculating the improved defense models of all batches of samples, and selecting the improved defense model which best appears on the verification set as the final defense model in the improved defense models of all batches of samples through cross verification and model early-stop technology;

step 220: and for the new confrontation sample, predicting the class of the new confrontation sample by using a final defense model, and if the output class is not changed, successfully defending.

The above description is only of the preferred embodiments of the present invention, and it should be noted that: it will be apparent to those skilled in the art that various modifications, substitutions, variations and enhancements can be made without departing from the spirit and scope of the invention, which should be considered as within the scope of the invention.

Claims

1. The Chinese multi-modal confrontation sample defense method based on confrontation training and contrast learning is characterized by comprising the following steps of:

the method comprises the following steps: preprocessing the text data set: dividing a text data set into a training set, a testing set and a verification set according to a certain proportion, and dividing the training set into batch batches of batch _ num, wherein each batch contains batch _ size text data;

step two: sequentially inputting samples of each batch in a training set into an original defense model to perform forward calculation to obtain a sample vector of each batch of samples, wherein the original defense model is a ChineseBERT model, and the samples are text data in each batch in the training set;

Gradient of word sense embedding

Gradient of a glyph intersection

Antagonistic perturbations of word senses

Antagonistic perturbations of sum glyphs

Wherein L is _CE For the cross entropy loss function, e is the step size of each step of the FGSM algorithm iteration, x is the sample,

is a gradient;

step four: adding the anti-disturbance of the pronunciation, the anti-disturbance of the meaning and the anti-disturbance of the font into the pronunciation embedding, the meaning embedding and the font embedding of the sample respectively to generate the pronunciation embedding of the anti-sample: x' ₁ ＝x ₁ +r _adv1 And word sense embedding of the confrontation sample: x' ₂ ＝x ₂ +r _adv2 And font embedding of the confrontation sample: x' ₃ ＝x ₃ +r _adv3 (ii) a Wherein, x' ₁ Is a character tone embedding, x 'of the confrontation sample' ₂ Is word sense embedding, x 'of the confrontation sample' ₃ Is a glyph embedding of the countermeasure sample;

step seven: selecting a next sample, repeating the third step to the sixth step, and stopping training if the loss value of the current original defense model cannot be improved in a certain training turn or the accuracy of the current original defense model on the verification set is reduced to obtain an improved defense model of the current batch of samples;

step nine: and (4) for the new confrontation sample, predicting the class of the new confrontation sample by using a final defense model, and if the output class is not changed, successfully defending.

2. The chinese multimodal countermeasure sample defense method based on countermeasure training and contrastive learning of claim 1, wherein the text data set is one of a Thucnews data set, a bean photo review data set, a carry-over mean review data set, a data set acquired through a githuub code hosting website.

3. The chinese multi-modal confrontation sample defense method based on confrontation training and contrast learning of claim 1, wherein the value range of e in the three steps is [0,0.5].

4. The Chinese multi-modal confrontation sample defense method based on confrontation training and contrast learning as claimed in claim 1, wherein the method of the sixth step specifically comprises the following steps:

selecting a sample vector in the current batch and the feature vector of the antagonistic sample thereof as a positive sample pair, and using the vectors of other samples in the batch and the feature vectors of the antagonistic samples of other samples as a negative sample pair;

the loss function is modified to: l = (1-lambda) L _CE +λL _CL (ii) a Wherein the cross entropy loss function:

the contrast loss function is:

wherein, y _i,c In order to be a true probability of,

calculating loss values of the positive sample pair and the negative sample pair through the modified loss function, and pulling in the distance of the positive sample pair and pulling out the distance of the negative sample pair through minimizing the loss values;

5. The Chinese multimodal confrontation sample defense method based on confrontation training and contrast learning of claim 4, characterized in that the loss function is modified by contrast learning modification.

6. The method for defending against multi-modal confrontation samples in Chinese based on confrontation training and contrast learning as claimed in claim 4, wherein λ ∈ [0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9].

7. The Chinese multimodal countermeasure sample defense method based on countermeasure training and contrast learning of claim 6, wherein the weighting parameter λ that can make the loss value of the loss function L lowest is selected by grid search.

8. The chinese multimodal confrontation sample defense method based on confrontation training and contrast learning of claim 1, wherein the batch _ size is one of 64, 128 or 256.

9. The Chinese multimodal countersample defense method based on countertraining and contrastive learning of claim 1, characterized in that, the training set, the test set and the verification set are divided according to the proportion of 8.

10. The Chinese multi-modal confrontation sample defense method based on confrontation training and contrast learning as claimed in claim 1, wherein in the step eight, the improved defense model with the best classification effect on the verification set is selected as the final defense model through cross-validation and model early-stop technology.