CN114169443A

CN114169443A - Word-level text countermeasure sample detection method

Info

Publication number: CN114169443A
Application number: CN202111496214.9A
Authority: CN
Inventors: 范铭; 王晨旭; 曹慧; 魏闻英; 陶俊杰; 刘烃
Original assignee: Xian Jiaotong University
Current assignee: Xian Jiaotong University
Priority date: 2021-12-08
Filing date: 2021-12-08
Publication date: 2022-03-11
Anticipated expiration: 2041-12-08
Also published as: CN114169443B

Abstract

The invention discloses a detection method of word-level text countermeasure samples, and provides a detection method for defense of the text countermeasure samples of a deep learning model. The method models the problem of the detection of the confrontation samples into two classification problems, the confrontation samples are detected by two steps, firstly, the confrontation samples of corresponding normal samples are generated by utilizing a confrontation sample attack algorithm, and the feature vectors for representing the normal samples and the confrontation samples are respectively extracted. And secondly, constructing a confrontation sample detection binary classification model by using a corresponding deep learning model. Through the above method, it is possible to detect whether or not the current sample is a countermeasure sample of the current model.

Description

Word-level text countermeasure sample detection method

Technical Field

The invention relates to the field of deep learning security problems, in particular to a method for detecting word-level text countermeasure samples.

Background

In recent years, with the rapid development of deep learning, especially various neural network models are deployed in practical systems such as face recognition, machine translation, fraud detection, etc. on a large scale, the security problem has been gradually recognized and valued by the academic and industrial circles. A counterattack refers to the process of applying a slight perturbation to the raw inputs of a target machine learning model to generate a countersample to spoof the target model. The vulnerability of the deep learning model can be exposed by resisting the attack, so that the robustness and the interpretability of the model are improved, and the extensive research is carried out in the field of images.

In the field of image classification, antagonistic samples are intentionally synthesized images that look almost identical to the original image, but may mislead the classifier to provide a wrong prediction output. In the text field, practical systems such as spam detection, harmful text detection, malware detection and the like have been deployed with deep learning models on a large scale, and security is particularly important for the systems. Compared with the image field, the defense research of the text field against attacks is far from enough. The defense against attacks in the text field mainly has the following difficulties:

1) the image data and the text data are different in inherent, and the countermeasure defense method for the image field cannot be directly applied to the text data;

2) the pixel values of the image data are continuous, the text data are discrete, and the discrete characteristic of the text data makes the generation and detection defense of the countermeasure sample more challenging;

3) small changes to the pixel values can cause perturbations in the image data that are difficult to observe by the human eye. But for text counterattacks, small perturbations are easily perceived;

therefore, the defense method research of the confrontation sample is helpful for improving the robustness and the interpretability of the model.

Disclosure of Invention

The invention provides a detection method of word-level text countermeasure samples, and provides a detection method for defense of the text countermeasure samples of a deep learning model. The method models the problem of confrontation sample detection into two classification problems, specifically comprises four steps to detect the confrontation samples, and firstly trains a text classification model based on the existing training data set; secondly, generating a confrontation sample of the normal sample of the current model based on the existing attack algorithm; respectively extracting characteristic feature vectors from normal and confrontation samples of the current model to construct a training data set of the detection model; and finally, constructing a confrontation sample detection two-classification model according to the data set obtained in the last step, and judging whether the current test sample is the confrontation sample or not based on the confrontation sample detection two-classification model.

In order to achieve the purpose, the invention adopts the following technical scheme:

1) training a text classification model M based on an existing training dataset D, wherein D { (x)_i,y_i)},0<i<L, L being the length of the data set D, x_iFor one data sample in D, y_iFor the sample, the label is mapped:

step S101: selecting a neural network text classification model for the existing training data set D, and adding a Self-orientation layer behind an Embedding layer of the text classification model;

step S102: training based on the neural network structure to obtain a text classification model M;

2) generating a confrontation sample of a normal sample of the current model based on the existing confrontation sample attack algorithm:

step S201: finding out a sample with correct prediction of the current model M in the training data set;

step S202: attacking the found sample by using the existing attack algorithm until the attack is successful, wherein the attack success refers to the original piece of data (x)_i,y_i) After attack, the label is changed from the original one, i.e. from y_iBecome y_i' and y_i≠y_i'；

Step S203: saving the samples successful in the last step of attack and the confrontation samples thereof as a confrontation sample detection data set D₂Wherein D is₂＝{(x_i,y_i)},0<i<N, N is data set D₂Length of (1), x_iIs normal or antagonisticThis, y_iIs its corresponding tag, wherein y_i1 denotes a normal sample, y_i0 denotes challenge sample;

3) respectively extracting characteristic feature vectors from normal and confrontation samples of the model to construct a training data set S of the detection model:

step S301: according to D₂Data (x) in (1)_i,y_i) A feature vector a belonging to this piece of data is constructed. Let the input sentence be X ═ X₁,x₂...x_n]Where each element in X represents a word in the sentence and the probability that X is classified as the kth class by the output of model M is P_kFinding k important words with the maximum weight in the input sentence based on self-attention technology and recording the k important words as top_k＝[w_top1,w_top2...w_topk](ii) a For top_kEach Word in the Word obtains m similar words, the method for obtaining the similar words is Word2Vec technology, and a vector closest to the current Word is found as the similar Word based on the distance between the vectors; for top_kEach word in the sentence is selected as a replacement word from the similar meaning words to obtain a new input sentence x ', and the probability that x' is classified into the kth category through the output of the model M is P_k', with ABS (P)_k-P_k') as one dimension of the feature vector A, denoted as a_iWherein ABS is an absolute value function; traversing all the similar meaning word combinations in the topk to finally obtain a feature vector A with z dimension, wherein z is m^kAnd finally a ═ a₁,a₂…a_z]。

Step S302: according to D₂One piece of data (x) in (2)_i,y_i) A feature vector B belonging to this piece of data is constructed. For a given sentence X ═ X₁,x₂...x_n]Each element in X represents a word in the sentence, and k important words with the maximum weight in the input sentence are found and recorded as top based on self-attention technology_k＝[w_top1,w_top2...w_topk](ii) a Obtaining a set H of m sentences from the data set D, wherein the labels of each sentence are different from the labels X; for each sentence H in H_iThe top of X_kThe word contained being added to H_iTo obtain a new sentence H_i'; original sentence H_iProbability of being of class k is P_k,H_i' probability of being class k is P_k', with ABS (P)_k-P_k') as one dimension of the feature vector B, denoted B_iWherein ABS is an absolute value function; traversing each sentence in H to finally obtain a feature vector B with m dimensions, wherein m is the length of the set H, and finally B ═ B₁,b₂…b_m]。

Step S303: according to D₂One piece of data (x) in (2)_i,y_i) A feature vector C belonging to this piece of data is constructed. For a given sentence X ═ X₁,x₂...x_n]Each element in X represents a word in the sentence, and k important words with the maximum weight in the input sentence are found and recorded as top based on self-attention technology_k＝[w_top1,w_top2...w_topk](ii) a To top_kEach word in (a) gets its m-dimensional word vector representation, denoted c_iM represents the length of the current word embed, and the word vector can be obtained by word2vec technology; will top_kThe word vectors of each word are directly spliced together to be used as a characteristic vector C, and finally C is ═ C₁,c₂...c_k]Wherein C has a length of m × k;

step S304: final D₂One piece of data (x) in (2)_i,y_i) Is I, wherein I ═ A, B, C]I is saved as a training data set S, S { (x) against the sample detection model_i,y_i)},0<i<Q, Q being the length of the data set S, x_iFeature vector I, y being a normal or challenge sample_iIs its corresponding tag, wherein y_i1 denotes a normal sample, y_i0 denotes a challenge sample.

4) Constructing a confrontation sample detection binary classification model G according to the data set S obtained in the step 3):

step S401: training a two-classification model according to a training data set S, wherein the model can be a machine learning model such as an MLP (maximum likelihood probability) model and an SVM (support vector machine) model;

step S402: for the sample to be detected, after the characteristics are extracted in the step 3, inputting a trained two-classification model, if the classification label is 1, the sample is a normal sample, and if the classification label is 0, the sample is a countersample;

compared with the prior art, the invention has the following advantages:

1) is the first known word-level text countermeasure sample detection method;

2) the word-level text countermeasure sample detection method provided by the invention has the detection rate of 80-95% in different data sets and different algorithm models, and has good applicability in different scenes.

3) The proposed counter sample detection method based on self-attention does not depend on a specific model and has high expandability;

drawings

FIG. 1 trains a text classification model M based on an existing training data set D;

FIG. 2 is a diagram of a prior art challenge sample attack algorithm to generate challenge samples of normal samples of a current model;

FIG. 3 is a diagram illustrating a training data set of a detection model constructed by respectively extracting characteristic feature vectors from normal and confrontation samples of a current model;

FIG. 4 is a diagram of a discrimination input sample of a confrontation sample detection binary model;

Detailed Description

The following describes in detail a specific embodiment of the word-level text confrontation sample detection method with reference to the drawings.

FIG. 1 is an overall flow chart of the present invention for training a text classification model M based on an existing training data set D;

specifically, the neural network text classification model in step S101 may be any CNN or RNN series model, a Self-orientation layer is added behind an Embedding layer of any model to obtain a final model structure, and a final text classification model M may be obtained through step S102.

Fig. 2 generates a challenge sample of the current model normal sample based on the existing challenge sample attack algorithm.

Step S203: saving the samples successful in the last step of attack and the confrontation samples thereof as a confrontation sample detection data set D₂Wherein D is₂＝{(x_i,y_i)},0<i<N, N is data set D₂Length of (1), x_iFor normal or challenge samples, y_iIs its corresponding tag, wherein y_i1 denotes a normal sample, y_i0 denotes challenge sample;

specifically, the sample with the correct prediction is found in step S201, then the sample with the correct prediction is attacked by using an arbitrary counterattack sample attack algorithm in step S202, and finally the sample with the correct prediction and the corresponding sample with the successful attack are saved and recorded as step S203.

FIG. 3 is a training data set for constructing a detection model by respectively extracting characteristic feature vectors from normal and confrontation samples of the current model.

Step S301: according to D₂Data (x) in (1)_i,y_i) A feature vector a belonging to this piece of data is constructed. Let the input sentence be X ═ X₁,x₂...x_n]Where each element in X represents a word in the sentence and the probability that X is classified as the kth class by the output of model M is P_kFinding k important words with the maximum weight in the input sentence based on self-attention technology and recording the k important words as top_k＝[w_top1,w_top2...w_topk](ii) a For top_kEach Word in the Word obtains m similar meaning words, the method for obtaining the similar meaning words is Word2Vec technology, and the direction closest to the current Word is found based on the distance between vectorsAmount as its synonym; for top_kEach word in the sentence is selected as a replacement word from the similar meaning words to obtain a new input sentence x ', and the probability that x' is classified into the kth category through the output of the model M is P_k', with ABS (P)_k-P_k') as one dimension of the feature vector A, denoted as a_iWherein ABS is an absolute value function; traversing all the similar meaning word combinations in the topk to finally obtain a feature vector A with z dimension, wherein z is m^kAnd finally a ═ a₁,a₂…a_z]。

Step S303: according to D₂One piece of data (x) in (2)_i,y_i) A feature vector C belonging to this piece of data is constructed. For a given sentence X ═ X₁,x₂...x_n]Each element in X represents a word in the sentence, and k important words with the maximum weight in the input sentence are found and recorded as top based on self-attention technology_k＝[w_top1,w_top2...w_topk](ii) a To top_kEach word in (a) gets its m-dimensional word vector representation, denoted asc_iM represents the length of the current word embed, and the word vector can be obtained by word2vec technology; will top_kThe word vectors of each word are directly spliced together to be used as a characteristic vector C, and finally C is ═ C₁,c₂...c_k]Wherein C has a length of m × k;

Specifically, the token feature vector is extracted from the countermeasure sample detection data set D2 stored in step S203, the feature vector a is extracted in step S301, the feature vector B is extracted in step S302, and the feature vector C is extracted in step S303. The final sample is characterized as vector I, where I ═ a, B, C.

FIG. 4 is a diagram of a discrimination input sample of a confrontation sample detection two-classification model.

specifically, a training data set S and a sample to be tested are input, and an antagonistic sample detection binary model is obtained according to any binary algorithm model by using S according to step S401. And extracting a characterization vector I for the input sample to be tested according to the step S402, inputting the vector I into the two classification models for the detection of the confrontation sample, wherein if the classification label is 1, the sample is a normal sample, and if the classification label is 0, the sample is a confrontation sample.

Claims

1. The word-level text countermeasure sample detection method is characterized by comprising the following steps of:

4) constructing a confrontation sample detection binary model G according to the data set S obtained in the step 3).

2. The method for detecting word-level text countermeasure samples according to claim 1, wherein the step 1) comprises the following specific steps:

step S102: and training based on the neural network structure to obtain a text classification model M.

3. The method for detecting word-level text countermeasure samples according to claim 1, wherein the step 2) comprises the following specific steps:

Step S203: saving the samples successful in the last step of attack and the confrontation samples thereof as a confrontation sample detection data set D₂Wherein D is₂＝{(x_i,y_i)},0<i<N, N is data set D₂Length of (1), x_iFor normal or challenge samples, y_iIs its corresponding tag, wherein y_i1 denotes a normal sample, y_i0 denotes a challenge sample.

4. The method for detecting word-level text countermeasure samples according to claim 1, wherein the step 3) comprises the following specific steps:

step S301: according to D₂Data (x) in (1)_i,y_i) Constructing a feature vector A belonging to the data, and setting the input sentence as X ═ X₁,x₂...x_n]Where each element in X represents a word in the sentence and the probability that X is classified as the kth class by the output of model M is P_kFinding k important words with the maximum weight in the input sentence based on self-attention technology and recording the k important words as top_k＝[w_top1,w_top2...w_topk](ii) a For top_kEach Word in the Word obtains m similar words, the method for obtaining the similar words is Word2Vec technology, and a vector closest to the current Word is found as the similar Word based on the distance between the vectors; for top_kEach word in the sentence is selected as a replacement word from the similar meaning words to obtain a new input sentence x ', and the probability that x' is classified into the kth category through the output of the model M is P_k', with ABS (P)_k-P_k') as one dimension of the feature vector A, denoted as a_iWherein ABS is an absolute value function; traversing all the similar meaning word combinations in the topk to finally obtain a feature vector A with z dimension, wherein z is m^kAnd finally a ═ a₁,a₂…a_z]；

Step S302: according to D₂One piece of data (x) in (2)_i,y_i) A feature vector B belonging to this piece of data is constructed. For a given sentence X ═ X₁,x₂...x_n]Each element in X represents a word in the sentence, and k important words with the maximum weight in the input sentence are found and recorded as top based on self-attention technology_k＝[w_top1,w_top2...w_topk](ii) a Obtaining a set H of m sentences from the data set D, wherein the labels of each sentence are different from the labels X; for each sentence H in H_iMixing t of Xop_kThe word contained being added to H_iTo obtain a new sentence H_i'; original sentence H_iProbability of being of class k is P_k,H_i' probability of being class k is P_k', with ABS (P)_k-P_k') as one dimension of the feature vector B, denoted B_iWherein ABS is an absolute value function; traversing each sentence in H to finally obtain a feature vector B with m dimensions, wherein m is the length of the set H, and finally B ═ B₁,b₂…b_m]；

5. The method for detecting word-level text countermeasure samples according to claim 1, wherein the step 4) comprises the following specific steps:

step S401: training a two-classification model according to a training data set S, wherein the model is an MLP (maximum likelihood probability) and SVM (support vector machine) learning model;

step S402: and (3) for the sample to be detected, after the characteristics are extracted in the step (3), inputting the trained two-classification model, and if the classification label is 1, determining the sample as a normal sample, and if the classification label is 0, determining the sample as a countersample.