CN115309898A

CN115309898A - Word granularity Chinese semantic approximate countermeasure sample generation method based on knowledge enhanced BERT

Info

Publication number: CN115309898A
Application number: CN202210891923.5A
Authority: CN
Inventors: 郑海
Original assignee: Fangying Jintai Technology Beijing Co ltd
Current assignee: Fangying Jintai Technology Beijing Co ltd
Priority date: 2022-07-27
Filing date: 2022-07-27
Publication date: 2022-11-08

Abstract

The word granularity Chinese semantic approximate countermeasure sample generation method based on the knowledge enhancement BERT comprises the following steps: dividing the preprocessed text data set into a training set, a verification set and a test set according to a proportion, and inputting data in the training set into a target model to obtain a target Chinese text classification model; generating a word importance list; training a knowledge-enhanced BERT model; using a knowledge enhanced BERT model to sequentially generate a candidate character list for each character in the character importance list; and selecting characters in a candidate character list to replace the characters in the character importance list corresponding to the currently selected candidate character list to generate a current confrontation sample, and attacking the target Chinese text classification model by using the current confrontation sample. By designing an automatic countermeasure sample generation method, the generated semantic approximate countermeasure sample has better readability and higher attack success rate, so that the purpose of generating high-quality Chinese semantic approximate countermeasure samples for different Chinese text classification models is achieved.

Description

Word granularity Chinese semantic approximate countermeasure sample generation method based on knowledge enhanced BERT

The technical field is as follows:

the invention relates to the technical field of information security, in particular to a word granularity Chinese semantic approximate confrontation sample generation method based on knowledge enhancement BERT.

Background art:

with the continuous development of deep learning, the natural language processing technology based on the deep neural network surpasses the traditional machine learning method based on statistics on the tasks of text classification, machine translation, a dialogue system and the like. At present, two deep learning models for solving the problem of natural language processing are mainly used, and the first method is to combine the deep learning models such as CNN, LSTM and the like with Word vector technologies such as Word2vec, glove and the like to better mine the characteristics of local time sequence, overall time sequence and the like in a text; the other method is a pre-training language model represented by BERT, the BERT model uses a transformer as a basic framework, uses massive text data to carry out unsupervised training, has huge parameters and better text understanding capability, surpasses the existing method in multiple natural language processing tasks, and becomes a new milestone.

However, researchers find that natural language processing technology based on deep learning faces threats against samples like other deep learning algorithms due to the inherent characteristics of local linearity and high dimensionality of data of deep neural networks. The confrontation sample is a section of input artificially and meticulously reformed on the original sample, and the input can successfully attack the deep neural network model with high probability to generate error output, but still keeps the original judgment for a human.

Countermeasure samples were first proposed in the image domain, and since Szegedy et al, in the text of systematic properties of neural networks, published that the deep neural network model is easily misled by minor perturbations to classify errors, there has been endless research on how to generate and defend the countermeasure samples. The countermeasure attack against the text field is no exception, and the countermeasure attack brings great potential safety hazard in the real world: when a user shops, eats and watches movies, suggestions are easy to obtain by reading comments on products or services, so that some apps use emotion analysis technology to provide recommendation services for the user according to historical comments of the user and give recommendation scores, but attackers may generate samples according to real comments of the user, defamation competitors or maliciously recommend inferior commodities, and more seriously, attackers can maliciously spread false information through resistant texts to obtain profits and cause economic losses for the consumers. Although malicious information detection modules are widely deployed on systems such as social media and mails to provide clean and comfortable internet access environments for users, these systems are still threatened by countermeasure samples, and thus security problems brought by such countermeasure samples are also widely concerned by researchers.

Therefore, the system understands the countermeasure sample and defends the attack of the countermeasure sample so as to construct a robust model, which is one of the hot research problems in the academic world at present, however, the conventional countermeasure sample generation technology mainly aims at English, chinese has the characteristics of a large number of characters and a large search space due to large difference of Chinese and English languages, the conventional English automatic attack method cannot be directly applied to Chinese countermeasure text generation, and the conventional search method is difficult to directly find out substituted characters with similar semantics.

In view of the above problem, patent publication No. CN111241291a discloses a method for generating countermeasure samples using a countermeasure generation network, in which a classifier with N classification tasks trained in advance, a generator for generating real samples corresponding to each class, and N discriminators corresponding to N classes respectively detecting whether input samples are corresponding classes are used to generate countermeasure samples having a specified real class but predicted as other classes by the classifier. The patent with publication number CN113869062A provides a social text personality privacy protection method based on black box confrontation samples, wherein texts are preprocessed by using a BERT model, stop words are removed, word-level replacement operation is conducted on words with large meaning contribution degree in each text through attention mechanism searching, meanwhile, the vector generated by the BERT model is used for conducting estimation on the label contribution degree of the words, and the words with large label contribution degree are selected for character-level replacement operation. In summary, the current method cannot solve the following problems: the context information of the original sentence cannot be fully considered in a mode based on the word set and the word stock, the replaced words may not conform to the original context, are easy to be found by human readers, and are difficult to train against the generated network model.

The invention content is as follows:

aiming at the problems that the semantics of countermeasure samples generated in the existing Chinese semantic approximation countermeasure sample generation method are not approximate, a BERT model needs to be retrained and the like, the invention provides a word granularity Chinese semantic approximation countermeasure sample generation method based on knowledge enhancement BERT in a black box state.

A word granularity Chinese semantic approximate countermeasure sample generation method based on knowledge enhancement BERT comprises the following steps:

the method comprises the following steps: preprocessing the text data set: performing word segmentation on the text data set, and filtering stop words according to a stop word list; dividing the preprocessed text data set into a training set, a verification set and a test set according to a proportion; training the text data in the training set to generate a text word vector, inputting the text word vector into a target model, and obtaining a target Chinese text classification model after training; because the training target Chinese text classification model is not the core of the invention and is the prior art in the field, the training is only carried out by adopting the method for training the target Chinese text classification model, which is commonly used in the field, and the detailed description is not needed here;

step two: using the text data in the test set as an original sentence, sequentially deleting each character in the original sentence to determine the importance of each character in the original sentence, and sequencing the importance to obtain a word importance list;

step three: fusing the prior knowledge into a BERT model to obtain a knowledge enhancement BERT model; the BERT model integrated with the prior knowledge is a conventional technology in the field, and is not described in detail in the method;

step four: using the knowledge enhancement BERT model obtained in the third step to sequentially generate a candidate character list for each character in the character importance list;

step five: selecting characters in a candidate character list to replace characters in a character importance list corresponding to the currently selected candidate character list to generate a current confrontation sample, and using a target Chinese text classification model obtained by training in the first step of attacking by the current confrontation sample;

step six: if the output of the target Chinese text classification model changes, the attack is successful, and the current countermeasure sample is taken as a final countermeasure sample; if the output of the target Chinese text classification model is not changed, the attack is unsuccessful, and after the currently selected character is excluded from the character importance list, the fifth step is executed;

wherein the fourth step specifically comprises the following steps:

replacing each character in the word importance list with a special mark [ MASK ] in the knowledge enhanced BERT model in sequence, adding a special classification mark [ CLS ] and a stop mark [ SEP ] at the beginning and the end of the original sentence, and changing the original sentence into the following form:

S _lm ＝[CLS],c’ ₁ ,...,c’ _j-1 ,[MASK],c’ _j+1 ,…[SEP](ii) a Wherein S is _lm Is a sentence after change, wherein, c' _j The j character in the character importance degree list;

will S _lm Inputting the information into a knowledge enhanced BERT model, and enabling the knowledge enhanced BERT model to carry out the processing on the special mark [ MASK ] according to the context semantics]Predicting the character, and taking the predicted first k characters to generate a primary candidate character list;

calculating c 'for each character in the primary candidate character list using its corresponding word2vec vector' _j Filtering the characters with the cosine similarity smaller than a preset threshold value, and generating a candidate character list of the characters in the character importance list, wherein the threshold value is generally set according to the experience of technicians and specific scenes.

Preferably, the method for generating a word importance list in step two specifically includes the following steps:

deleting each word in the original sentence in sequence to obtain an importance query sentence;

predicting the category of the importance query sentence by using the target Chinese text classification model obtained by training in the first step, wherein the classification category is financial news, sports news, social news and the like by taking a news text classification task as an example; taking the emotion polarity classification task as an example, the classification categories have negative emotion, neutral emotion, positive emotion and the like, and are according to the formula:

calculating the probability that each character in the importance query sentence can change the category of the current sentence, wherein the probability is the importance of the character, and sequencing the importance to obtain a word importance list; wherein, P _F For class prediction of the model, y ₁ Is of the original class, y ₂ For classes predicted by the model, y ₁ ＝y ₂ When y is not successful, then attack is unsuccessful ₁ ≠y ₂ If yes, the attack is successful; s 'is a target statement, c' _j Is the j-th character, S 'in the word importance list' _\ c’ _j The method is a sentence obtained after removing the jth character in the original sentence.

Preferably, the method for training the knowledge-enhanced BERT model in the third step comprises the following steps:

selecting a second text data set, and preprocessing the second text data set: segmenting words of the text data set, and filtering stop words according to a stop word list; respectively querying synonyms and antisense words of the remaining words of the sentences in the synonym thesaurus and the antisense thesaurus by using the remaining words of the sentences in the second text data set to respectively form synonyms and antisense sentences of each sentence;

transforming a loss function, and training a BERT model by using the transformed loss function to minimize the value of the loss function to obtain a knowledge enhanced BERT model;

preferably, the loss function is adapted based on comparative learning: l = alpha L _CE +(1-α)L _CL (ii) a Wherein L is a modified loss function, L _CE As a cross-entropy loss function, L _CL A loss function is learned for comparison, wherein,

wherein, y _ic The probability of the truth of the object,

for prediction probability, N is sample data, C is the number of classes,

for knowledge-enhanced BERT coders, x _i For the original sentence, x _sim As a synonymous sentence, x _anti Are antisense sentences.

Preferably, the word importance list is selected according to a formula such that the current prediction category has the highest probability of changing:

preferably, the text dataset is pre-processed using the python jieba tool.

Preferably, in the first step, a word2vec method is used to train half of the text data in the training set to generate a text word vector.

Preferably, the target model is one of a CNN model, an LSTM model, and a BERT model.

Preferably, the ratio of the training set, the validation set and the test set is 8; the verification set is used for verifying the effect of the model in the training process, and the model which has better performance in the verification set is selected as a final target model through technologies such as advanced stopping, cross verification and the like; the test set is used for testing the trained and verified final target model, and the generalization capability of the model is reflected.

According to the method for generating the word granularity Chinese semantic approximate confrontation sample based on the knowledge-enhanced BERT, the BERT model does not need to be retrained, the confrontation sample with high attack success rate and good quality can be generated only by fine tuning aiming at the modified BERT model, the method is more flexible and convenient, the modified model can be more suitable for generating the confrontation sample with the semantic approximate character, meanwhile, the method is suitable for various target models, and meanwhile, due to the fact that the black box attack method is used, the proposed attack can better attack the actual model, and the robustness of the model is further improved.

Description of the drawings:

fig. 1 is a flow chart of generating a word weight list according to an embodiment of the present invention.

Fig. 2 is a flowchart of generating a candidate word list according to an embodiment of the present invention.

Fig. 3 is a flowchart of a target model for resisting sample attack according to an embodiment of the present invention.

The specific implementation mode is as follows:

in order to make the technical scheme of the invention easier to understand, the method for generating the word granularity Chinese semantic approximate confrontation sample based on knowledge enhanced BERT designed by the invention is clearly and completely described by using a specific embodiment mode.

The following describes, with reference to fig. 1 of the specification, a word weight list generation method for providing a word-granularity chinese semantic approximation countermeasure sample generation method based on knowledge enhanced BERT in this embodiment, where the method specifically includes the following steps:

step 100: preprocessing a text data set, dividing the preprocessed text data into a training set, a verification set and a test set according to 8;

step 110: inputting the text data in the training set into a CNN model to be trained to obtain a target CNN model;

step 120: in order to better explain the method, in the embodiment, a sentence in the test set is selected as an original sentence, and words in the sentence after word segmentation are sequentially deleted to obtain an importance query sentence;

step 130: predicting the category of the importance query sentence by using a target CNN model;

step 140: calculating importance Each character in a query sentence can changeThe probability of the category to which the current sentence belongs, namely the importance of the characters, is established according to the descending order of the importance scores of each character, and a word importance list W is established _imp ＝[c' ₁ ,c' ₂ ,…,c' _n ]Wherein, c' _j Are characters in the word importance list.

The method for generating the candidate word list for providing the method for generating the word-granularity Chinese semantic approximation countermeasure sample based on knowledge enhanced BERT according to the embodiment is described with reference to the accompanying drawing 2 of the specification, and specifically comprises the following steps:

step 200: the priori knowledge is fused into a BERT model to obtain a knowledge enhancement BERT model;

step 210: w is to be _imp Each character of c' _j Sequentially replaced by [ MASK ] in knowledge enhanced BERT model]Symbols, and special classification marks [ CLS ] are added at the beginning and end of the original sentence]And stop sign [ SEP]The original sentence is changed into the following form: s _lm ＝[CLS],c’ ₁ ,...,c’ _j-1 ,[MASK],c’ _j+1 ,…[SEP]；

Step 220: will S _lm Inputting the information into a knowledge enhanced BERT model, and enabling the knowledge enhanced BERT model to carry out special mark [ MASK ] according to context semantics]Predicting the characters, and taking the first 20 predicted characters to form a primary candidate character list;

step 230: for each character c in the primary candidate character list _k Calculate its and c 'using its corresponding word2vec vector' _j And retaining the characters with the similarity smaller than 0.3 to form a candidate character list, wherein the calculation formula is as follows:

wherein A and B are both vectors.

The method for providing the countersample attack target model of the word-granularity Chinese semantic approximation countersample generation method based on the knowledge-enhanced BERT according to the embodiment is described in the following with reference to the attached drawing 3 of the specification, and specifically comprises the following steps:

step 300: using c _k Substitute for c' _j Forming a challenge sample;

step 310: and inputting the countermeasure sample into the target CNN model, if the prediction type of the target CNN model is changed, successfully attacking, stopping attacking, and taking the countermeasure sample successfully attacked at this time as a final countermeasure sample.

Step 320: otherwise, selecting another character c 'in the character importance list' _j+1 C 'from' _j+1 C in the corresponding candidate word list _k+1 And replacing the target CNN model with the challenge sample to generate a current challenge sample to attack the target CNN model again.

The above description is only of the preferred embodiments of the present invention, and it should be noted that: it will be apparent to those skilled in the art that various modifications, substitutions, variations and enhancements can be made without departing from the spirit and scope of the invention, which should be considered as within the scope of the invention.

Claims

1. The method for generating word granularity Chinese semantic approximate countermeasure samples based on knowledge enhanced BERT is characterized by comprising the following steps:

the method comprises the following steps: preprocessing the text data set: segmenting words of the text data set, and filtering stop words according to a stop word list; dividing the preprocessed text data set into a training set, a verification set and a test set according to a proportion; training the text data in the training set to generate a text word vector, inputting the text word vector into a target model, and obtaining a target Chinese text classification model after training;

step three: fusing the prior knowledge into a BERT model to obtain a knowledge enhancement BERT model;

step six: if the output of the target Chinese text classification model changes, the attack is successful, and the current countermeasure sample is taken as a final countermeasure sample; if the output of the target Chinese text classification model is not changed, the attack is unsuccessful, and after the currently selected character is excluded from the character importance list, the fifth step is executed; the fourth step specifically comprises the following steps:

will S _lm Inputting the information into a knowledge enhanced BERT model, and enabling the knowledge enhanced BERT model to carry out special mark [ MASK ] according to context semantics]Predicting the character, and taking the predicted first k characters to generate a primary candidate character list;

calculating c 'for each character in the primary candidate character list using its corresponding word2vec vector' _j And filtering the characters with the cosine similarity smaller than a preset threshold value to generate a candidate character list of the characters in the character importance list.

2. The method according to claim 1, wherein the method for generating the word importance list in the second step specifically includes the following steps:

predicting the category of the importance query sentence by using the target Chinese text classification model obtained by training in the step one, and according to a formula:

calculating the probability that each character in the importance query sentence can change the category of the current sentence, wherein the probability is the importance of the character, and sequencing the importance to obtain a character importance list; wherein, P _F For class prediction of the model, y ₁ Is of the original class, y ₂ A category predicted for the model; s 'is a target statement, c' _j Is the j-th character, S 'in the word importance list' _\ c’ _j The method is a sentence obtained after removing the jth character in the original sentence.

3. The method of claim 1, wherein the step three method of training the knowledge-enhanced BERT model comprises the steps of:

selecting a second text data set, and preprocessing the second text data set: segmenting words of the text data set, and filtering stop words according to a stop word list; using the remaining words of the sentences in the second text data set to respectively query synonyms and antonyms in the synonym library and the antonym library to respectively form a synonym and an antonym of each sentence;

and modifying a loss function, and training the BERT model by using the modified loss function to minimize the value of the loss function so as to obtain a knowledge enhanced BERT model.

4. The method of claim 3, wherein the penalty function is adapted based on comparative learning.

5. The method of claim 1, wherein the significant list of words is selected according to a formula that maximizes the probability of the current prediction class changing:

6. the method of claim 1, wherein the text dataset is preprocessed using a python jieba tool.

7. The method of claim 1, wherein in step one, a word2vec method is used to train half of the text data in the training set to generate a text word vector.

8. The method of claim 1, wherein the target model is one of a CNN model, LSTM model, BERT model.

9. The method of claim 1, wherein the ratio of training set, validation set and test set is 8.