CN112364641A

CN112364641A - Chinese countermeasure sample generation method and device for text audit

Info

Publication number: CN112364641A
Application number: CN202011259475.4A
Authority: CN
Inventors: 王婧宜; 孔庆超; 张佳旭; 蒋永余; 郭建彬; 吴晓飞; 曹家; 赵菲菲; 罗引; 王磊
Original assignee: Beijing Zhongke Wenge Zhian Technology Co ltd; Shenzhen Zhongke Wenge Technology Co ltd; Beijing Zhongke Wenge Technology Co ltd
Current assignee: Beijing Zhongke Wenge Zhian Technology Co ltd; Shenzhen Zhongke Wenge Technology Co ltd; Beijing Zhongke Wenge Technology Co ltd
Priority date: 2020-11-12
Filing date: 2020-11-12
Publication date: 2021-02-12

Abstract

The application relates to a Chinese countermeasure sample generation method and device for text audit, wherein the method comprises the following steps: obtaining statement information to be processed; performing word segmentation on the statement information to be processed to obtain a plurality of words; determining first importance information of a word; obtaining disturbance words corresponding to all the words; according to the first importance information, disturbance statement information obtained after replacing each word in the statement information to be processed with a corresponding disturbance word is obtained in sequence; and when the disturbance statement information is determined to meet the preset condition, obtaining a countervailing sample after the attack of the statement information to be processed is successful according to the disturbance statement information. By the method in the embodiment, the mode of replacing words for the sentences to be processed can be realized to obtain the confrontation samples, so that the diversity of the samples for training the prediction model can be increased, meanwhile, the confrontation samples can be automatically generated, the convenience of acquiring training data is improved, and the efficiency of model training is improved.

Description

Chinese countermeasure sample generation method and device for text audit

Technical Field

The application relates to the field of artificial intelligence, in particular to a method and a device for generating a Chinese countermeasure sample for text audit.

Background

With the development of big data technology and the increasing of hardware computing power in recent years, deep learning technology has been widely applied in many fields, such as computer vision, speech recognition and natural language processing. However, with the rapid development of deep learning techniques, the safety problem of deep learning models has gradually attracted the attention of researchers. Szegydy et al first found the presence of an challenge sample (adaptive samples): i.e. input samples formed by deliberately adding minor disturbances in the dataset, causing the model to give a false output with high confidence.

The confrontation sample reveals the vulnerability of the deep learning model, and brings great attention to researchers. In the field of Natural Language Processing (NLP), confrontational samples for deep learning models have threatened real-world applications including text auditing. Text review (i.e., filtering harmful content in text, such as abuse, discrimination, personal attack, ethnic implications, etc.) is an important component of NLP applications, where keyword matching-based and machine learning-based text classification methods are currently the most common review methods. However, publishers of harmful content often make some variations in the emotional sensitive words so that the processed harmful information can bypass the detection of the text auditing system, for example, by using "waste" instead of "junk", to achieve the above-mentioned goal. In the related art, the text review system based on the keywords cannot rapidly deal with the deformed words, and additional manual review is required.

In view of the technical problems in the related art, no effective solution is provided at present.

Disclosure of Invention

In order to solve the technical problem or at least partially solve the technical problem, the application provides a method and a device for generating a Chinese countermeasure sample for text review.

In a first aspect, an embodiment of the present application provides a method for generating a chinese countermeasure sample for text review, including:

obtaining statement information to be processed;

performing word segmentation on the statement information to be processed to obtain a plurality of words;

determining first importance information for the term;

obtaining disturbance words corresponding to the words;

according to the first importance information, disturbance statement information obtained after replacing each word in the statement information to be processed with a corresponding disturbance word is obtained in sequence;

and when the disturbance statement information is determined to meet the preset condition, obtaining a countercheck sample after the sentence information to be processed is successfully attacked according to the disturbance statement information.

Optionally, as in the foregoing method, the obtaining a perturbation term corresponding to each term includes:

determining the pinyin and the font of the word;

replacing at least one character in the words with pinyin according to the pinyin to obtain characters as the disturbance words; alternatively, the first and second electrodes may be,

replacing at least one character in the words with a character with a shape similar to the character with a shape meeting the requirement of preset similarity according to the font, and taking the character as the disturbance word; alternatively, the first and second electrodes may be,

and replacing at least one character in the words with homophones and/or homonym near characters of which the character patterns meet the preset similarity requirement according to the pinyin and the character patterns to obtain characters serving as the disturbance words.

Optionally, as in the foregoing method, the method further includes:

arranging the words according to the first importance information from high to low to obtain arrangement order information corresponding to each word;

determining replacement order information corresponding to each word according to the arrangement order information; the replacement order information is used to determine an order in which to replace the words with corresponding perturbing words.

Optionally, as in the foregoing method, the sequentially obtaining, according to the first importance information, disturbance statement information obtained by replacing each word in the statement information to be processed with a corresponding disturbance word includes:

determining the lowest importance disturbing word with the lowest importance in all the disturbing words corresponding to the words, and obtaining the corresponding relation between the words and the lowest importance disturbing word;

and sequentially replacing each word with the corresponding disturbance word with the lowest importance according to the corresponding relation according to the replacement sequence information, and obtaining the disturbance statement information.

Optionally, as in the foregoing method, determining a lowest importance disturbing word with the lowest importance among all the disturbing words corresponding to the word includes:

replacing the words in the statement information to be processed by the disturbing words to obtain replaced statement information corresponding to the disturbing words;

deleting the disturbance words in the replaced sentence information to obtain second word-lacking sentence information corresponding to the disturbance words;

determining a third weight value corresponding to the replaced statement information and a fourth weight value corresponding to the second word-lacking statement information according to a preset text auditing model;

obtaining second importance information corresponding to the disturbance words according to the difference values between the third weight values and the fourth weight values;

and obtaining the lowest importance disturbing word corresponding to the word according to second importance information corresponding to the disturbing word corresponding to the word.

Optionally, as in the foregoing method, the determining importance information corresponding to each word includes:

deleting the words in the statement information to be processed respectively to obtain first word-lacking statement information corresponding to the words;

determining a first weight value corresponding to the statement information to be processed and a second weight value corresponding to the first word-lacking statement information according to a preset text auditing model;

and obtaining first importance information corresponding to the words according to the difference values between the first weight values and the second weight values respectively.

Optionally, as in the foregoing method, the disturbing statement information satisfies a preset condition, and includes:

in the disturbance statement information, the number of disturbance words is less than or equal to a preset upper limit threshold of the number of disturbance words; and the number of the first and second groups,

the proportion of the number of the disturbance words in the total number of words in the disturbance statement information is smaller than a preset disturbance proportion threshold value; and the number of the first and second groups,

the first prediction tag corresponding to the disturbance statement information is inconsistent with the second prediction tag corresponding to the statement information to be processed; the first prediction tag is obtained by predicting the disturbance statement information through a preset text auditing model, and the second prediction tag is obtained by predicting the statement information to be processed through the text auditing model.

In a second aspect, an embodiment of the present application provides an opponent training method, including:

obtaining training data and verification data according to a challenge sample generated by the method in any one of the preceding claims;

training a preset text audit model through the training data to obtain a trained text audit model;

and after the training, the audit model is verified through the verification data, and when the preset requirement is met, a target text audit model is obtained according to the trained audit model.

In a third aspect, an embodiment of the present application provides a chinese countermeasure sample generation apparatus for text review, including:

the statement acquisition module is used for acquiring statement information to be processed;

the word segmentation module is used for segmenting the statement information to be processed to obtain a plurality of words;

the importance determination module is used for determining first importance information of the words;

the disturbance word module is used for acquiring disturbance words corresponding to the words;

the replacing module is used for sequentially obtaining disturbance statement information after each word in the statement information to be processed is replaced by a corresponding disturbance word according to the first importance information;

and the sample generation module is used for obtaining a countersample after the attack on the statement information to be processed is successful according to the disturbance statement information when the disturbance statement information is determined to meet the preset condition.

In a fourth aspect, an embodiment of the present application provides an opponent training device, including:

a data acquisition module, configured to obtain training data and verification data according to the countermeasure sample generated by any one of the foregoing methods;

the training module is used for training a preset text auditing model through the training data to obtain a trained text auditing model;

and the verification module is used for verifying that the trained audit model passes through the verification data and obtaining a target text audit model according to the trained audit model when the preset requirements are met.

In a fifth aspect, an embodiment of the present application provides an electronic device, including: the system comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;

the memory is used for storing a computer program;

the processor is configured to implement the processing method according to any one of the preceding claims when executing the computer program.

In a sixth aspect, embodiments of the present application provide a storage medium comprising a stored program, wherein the program when executed performs the method steps of any one of the preceding claims.

Compared with the prior art, the technical scheme provided by the embodiment of the application has the following advantages:

the method provided by the embodiment of the application can obtain the confrontation sample in a way of replacing words for the sentence to be processed, further can increase the diversity of the sample used for training the prediction model, can improve the convenience of acquiring training data and the efficiency of model training by automatically generating the confrontation sample.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.

Fig. 1 is a flowchart of a method for generating a chinese countermeasure sample for text review according to an embodiment of the present application;

fig. 2 is a flowchart of a method for generating a chinese countermeasure sample for text review according to another embodiment of the present application;

fig. 3 is a schematic flow chart of a method for generating a chinese countermeasure sample for text review according to an application example of the present application;

FIG. 4 is a schematic flow chart of an anti-exercise provided in the application example of the present application;

fig. 5 is a block diagram of a chinese countermeasure sample generating apparatus for text review according to an embodiment of the present disclosure;

FIG. 6 is a block diagram of a resistance exercise device according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

At present, most of researches in the field of text countermeasure are based on English data, wherein disturbance methods aiming at the English data, such as addition, deletion, adjacent letter exchange and the like, easily cause Chinese semantic change and influence understanding. The invention provides a method for efficiently generating Chinese countermeasure samples by combining the characteristics of various word shapes of harmful texts in text examination, and the generated countermeasure samples can well retain the semantics of original samples. By injecting the countermeasure samples in the training process of the text auditing model based on deep learning, the recognition capability and robustness of the model to harmful samples can be effectively improved.

Fig. 1 is a method for generating a chinese countermeasure sample for text review according to an embodiment of the present application, including the following steps S1 to S6:

s1, obtaining statement information to be processed;

in particular, the information of the statements to be processed may be obtained by collecting social media data, which may include abuse and normal speech, on the network.

Moreover, after obtaining the statement information to be processed, preprocessing the statement information to be processed, where when the source of the social media data is social media such as a microblog, the preprocessing method may include: deleting hashtag (the topic of tweet, namely the topic of tweet represented by words with # numbers on the microblog), @ symbol, forwarding content, webpage link and the like.

And S2, performing word segmentation on the statement information to be processed to obtain a plurality of words.

Specifically, word segmentation is a process of recombining continuous statement information to be processed into a word sequence according to a certain specification. For example, when the sentence information to be processed includes the word w_iThen, word segmentation may be performed to obtain a word set X, where X ═ w₁,w₂,…,w_n]，w_iIs a word in word set X.

Optionally, the word segmentation of the sentence information to be processed may be assisted by a custom word segmentation dictionary.

Further, after word segmentation is performed, stop words in the word list are deleted, and then the words are obtained.

And S3, determining first importance information of the words.

The first importance information may be information for characterizing the degree of importance of the word affecting the semantics of the sentence information to be processed.

And S4, obtaining disturbance words corresponding to all the words.

Specifically, the disturbing word may be a disguised word corresponding to the original word, for example: for the word "bad person", the "huai person" is a disturbing word obtained by performing pinyin replacement on the "bad". In addition, the disturbing words can be obtained by processing the words in other modes.

And S5, sequentially obtaining disturbance statement information obtained after each word in the statement information to be processed is replaced by the corresponding disturbance word according to the first importance information.

Specifically, the disturbance statement information is statement information including disturbance words in the text, and when the disturbance statement information is replaced, the words with the highest importance in the to-be-processed statement are replaced with the disturbance words according to the first importance information, then the words with the second importance are replaced with the disturbance words, and the disturbance statement information is obtained after the corresponding disturbance words are replaced one by one according to the mode.

And S6, when the disturbance statement information is determined to meet the preset condition, obtaining a countervailing sample after the sentence information to be processed is successfully attacked according to the disturbance statement information.

Specifically, the preset condition may be a condition for determining that the challenge sample is successfully generated; the attack success may be: when the statement information to be processed is identified as an abusive statement, replacing words in the statement information to be processed with corresponding disturbance words, and then identifying the disturbance statement information as the abusive statement; i.e., the predicted result is opposite or inconsistent with the original result.

Through the method in the embodiment, the confrontation sample can be obtained in a mode of performing word replacement on the sentence to be processed, so that the diversity of the sample for training the prediction model can be increased, meanwhile, the confrontation sample can be automatically generated, the convenience of acquiring training data is improved, and the efficiency of model training is improved.

In real scenarios, harmful information (e.g., abusive text) is often distorted by publishers of harmful text to bypass audits, including pinyin replacements, homographic word replacements, pictographic word replacements, and the like. As shown in fig. 2, in some embodiments, in order to solve the technical problem, as the aforementioned method, the step S2 obtains perturbation words corresponding to respective words, which includes the following steps S21 to S24:

and S21, determining the pinyin and the font of the word.

Specifically, the pinyin and the font of each word can be obtained by performing character recognition on the word; optionally, the font may include information such as a radical corresponding to each word in the word, so that the disturbance word may be obtained by replacing the radical in a later period.

And S22, replacing at least one character in the words with pinyin according to the pinyin to obtain characters serving as disturbance words.

That is, one or more characters in the words are replaced by pinyin to obtain disturbance words; by way of example: for the word "bad person", the "huai person" is a disturbing word obtained by performing pinyin replacement on the "bad".

And S23, replacing at least one character in the words with a character which is obtained after the shape of the character meets the preset similarity requirement and is similar to the shape of the character according to the shape of the character, and using the character as a disturbance word.

Specifically, the preset similarity requirement may be as follows: determining a first region size corresponding to a difference portion between the near word and the replaced word, and a second region size corresponding to the replaced word as a whole; calculating a difference between the size of the first region and the size of the second region; and when the difference value is smaller than the preset upper limit ratio threshold value, judging that the shape-similar word and the replaced word meet the preset similarity requirement. And then, disturbance words can be obtained by replacing at least one word with a similar word, and the difference between the disturbance words and the words is small enough, so that the understanding of people on the text is not influenced. However, for the model, the meaning of the disturbance word cannot be accurately predicted because the characters of the disturbance word and the word are different. By way of example: the perturbation word "lining" is obtained by replacing "village" in "village flower" with the shape word "lining".

And S24, replacing at least one character in the words with homophones and/or homophones with the characters meeting the preset similarity requirement to obtain characters serving as disturbing words according to the pinyin and the characters.

Specifically, the homophonic character can be a character with the same pronunciation as the replaced character and with a similar character pattern. The method for acquiring the homophonic and homophonic characters can be as follows: firstly, determining the pinyin of the replaced character, and then inquiring homophones with the same pinyin (the same pinyin and the same tone, or the same pinyin only) in a character library through the pinyin of the replaced character; then, in all the homophones, the corresponding homonym is obtained by searching according to the method in step S23, and is taken as the homophones. By way of example: the stupid in the stupid egg is replaced by the similar-tone-shaped similar-character benzene, so that the disturbance word of the benzene egg is obtained.

The disturbance words under various conditions can be obtained by the method in the embodiment, and the method for obtaining the disturbance words in the embodiment adopts a character-level replacement attack method, namely, at least one word in the original words is replaced. Moreover, disturbance words obtained by the replacement method in the embodiment basically do not influence understanding of people on the text, and semantics of sentences can be well kept. After the attack is resisted, a text auditing classifier used for semantic prediction is misled, and then the abusive text is misjudged as normal text.

For example, the result of predicting the text including the disturbing words obtained by the above method by a text audit classifier that has not been trained by the countermeasure text in this embodiment is shown in the following table:

in some embodiments, the method as described above, further comprising steps P1 to P3 as described below:

and P1, determining first importance information corresponding to each word.

Specifically, the first importance information may be information corresponding to the importance of the word, and the importance may be a prediction score corresponding to a particular prediction type (when the prediction type is abuse, the importance information may be a score that assesses the degree of abuse, e.g., the first importance information corresponding to "fool egg" may be 100, the first importance information corresponding to "egg" may be 80, etc.).

And P2, arranging the words from high to low according to the first importance information to obtain the arrangement order information corresponding to each word.

Specifically, each word corresponds to one piece of first importance information, so that the arrangement order information corresponding to each word can be obtained by ordering each word according to the importance information.

Step P3, determining replacement sequence information corresponding to each word according to the arrangement sequence information; the replacement order information is used to determine an order in which to replace words with corresponding perturbing words.

That is, the replacement order information corresponding to each word is obtained from the ranking order information, and thus, the replacement order information is ranked from high to low according to the first importance information.

In some embodiments, as in the foregoing method, the step S4 sequentially obtains perturbation statement information obtained by replacing each word in the statement information to be processed with a corresponding perturbation word according to the first importance information, and includes the following steps S51 and S52:

and S51, determining the lowest importance disturbing word with the lowest importance in all disturbing words corresponding to the words, and obtaining the corresponding relation between the words and the lowest importance disturbing word.

Specifically, in the generation process of the countermeasure sample, each word corresponds to a plurality of disturbance words, so that a plurality of replacement choices exist, and in consideration of the generation efficiency of the countermeasure sample, the method adopts the idea of greedy algorithm, namely, when each word is replaced, the lowest-importance disturbance word with the lowest current importance (for example, the lowest abuse property) is selected, the lowest-importance disturbance word with the lowest importance is determined, the countermeasure sample is generated according to the lowest-importance disturbance word, and after the model to be trained is trained, the disturbance word with higher importance can be identified by the trained model.

And S52, sequentially replacing each word with the corresponding disturbance word with the lowest importance according to the replacement sequence information, and obtaining disturbance statement information.

Specifically, after the replacement order information is obtained, the words are disturbed in sequence from high to low according to the importance of the words. Determining the word w_iCorresponding disturbance word with lowest importance, replacing word w_iThen, disturbance statement information is obtained, and when the disturbance statement information does not meet the preset condition, the next word w is continuously processed on the basis of the disturbance statement information_i+1The replacement is performed until a preset condition is triggered.

The method in this embodiment sequentially replaces the words with the disturbance words according to the replacement order information, so that the words can be sequentially replaced according to importance, and further, under the condition of classifying the abusive text, the abusive words can be more accurately targeted, so that the disturbance words in the obtained confrontation text can be the words with the highest abusive property, and the accuracy and the training value of the confrontation text are improved.

In some embodiments, as the foregoing method, the step P1 of determining importance information corresponding to each word includes the following steps P11 to P13:

and P11, deleting words in the statement information to be processed respectively to obtain first word-lacking statement information corresponding to the words.

Specifically, the first word-lacking sentence information refers to sentence information corresponding to the deleted word in the to-be-processed sentence information. And each word corresponds to a first word-lacking sentence information.

And P12, determining a first weight value corresponding to the statement information to be processed and a second weight value corresponding to the first word-lacking sentence information according to a preset text auditing model.

Specifically, the weight value may be a prediction score corresponding to a specific prediction type. And the prediction score is obtained by predicting the statement information to be processed through a text auditing model.

And P13, obtaining first importance information corresponding to the words according to the difference values between the first weight values and the second weight values respectively.

That is to say, the first importance information corresponding to the word is obtained from a difference between a first weight value corresponding to the to-be-processed sentence information and a second weight value corresponding to the first missing word sentence information.

One of the optional implementations may be:

referring to TextFooler algorithm, to remove the change in the prediction score before and after the ith word, where i is 1,2, …, n. The formula is detailed in the formula shown below:

wherein the content of the first and second substances,

is the word w_iAn importance score of; y and

are category labels different from each other, namely:

F_Y(X) is the prediction score for statement X as category Y;

is the predicted score for statement X as category Y;

sentence X is the predicted score for category Y after word wi is removed,

sentence X after removing word wi as category

The predicted score of (a).

In some embodiments, as the foregoing method, the step S51 determines the lowest importance disturbing word with the lowest importance among all disturbing words corresponding to the word, and includes the following steps S511 to S515:

and S511, replacing words in the sentence information to be processed by the disturbance words to obtain replaced sentence information corresponding to the disturbance words.

That is to say, words in the statement information to be processed are replaced by disturbance words corresponding to the words, and the replaced statement information is obtained; by way of example: when the sentence information to be processed is "forest is big and garbage is present", the word to be replaced is "garbage", and the corresponding disturbance word is "garbage", the obtained sentence information after replacement is "forest is big and garbage is present".

And S512, deleting the disturbance words in the replaced sentence information to obtain second word-lacking sentence information corresponding to the disturbance words.

Specifically, the second word-lacking sentence information refers to sentence information corresponding to the sentence information to be processed after the disturbance word is deleted from the sentence information to be processed. And each disturbing word corresponds to a second word-lacking sentence information.

And S513, determining a third weight value corresponding to the replaced statement information and a fourth weight value corresponding to the second word-lacking statement information according to a preset text audit model.

Specifically, the third weight value and the fourth weight value may be prediction scores corresponding to the same specific prediction type. And the prediction score is obtained by predicting the statement information to be processed through a text auditing model.

And S514, obtaining second importance information corresponding to the disturbance words according to the difference values between the third weight values and the fourth weight values.

That is to say, the second importance information corresponding to the disturbance word is obtained by a difference between the third weight value corresponding to the replaced sentence information and the fourth weight value corresponding to the second missing word sentence information.

And S515, obtaining the disturbance words with the lowest importance corresponding to the words according to the second importance information corresponding to the disturbance words corresponding to the words.

Specifically, after the second importance information of each perturbation word corresponding to a word is obtained, the lowest importance perturbation word with the lowest second importance can be selected from each perturbation word corresponding to the word.

In some embodiments, as the method mentioned above, the step S6 of disturbing the sentence information to satisfy the preset condition includes the following steps S61 to S63:

s61, in disturbance statement information, the number of disturbance words is smaller than or equal to a preset upper limit threshold of the number of the disturbance words; and the number of the first and second groups,

s62, the proportion of the number of the disturbance words to the total number of the words in the disturbance sentence information is smaller than a preset disturbance proportion threshold value; and

s63, the first prediction tag corresponding to the disturbance statement information is inconsistent with the second prediction tag corresponding to the statement information to be processed; the first prediction tag is obtained by predicting disturbance statement information through a preset text audit model, and the second prediction tag is obtained by predicting statement information to be processed through the text audit model.

Specifically, when the number of the disturbance words is too large or the ratio of the number of the disturbance words to the total number of words in the disturbance statement information is too high, the change of the statement information to be processed is too much, which may cause a situation that the semantic of the countermeasure sample is greatly changed compared with the original sample, and therefore, an upper limit threshold value of the number of the disturbance words and a threshold value of the disturbance ratio need to be set. The upper limit threshold of the number of the disturbance words is a threshold used for limiting the number of the disturbance words in the statement information to be processed, and the disturbance proportion threshold is a threshold used for enabling the number of the disturbance words to account for the total number of words in the disturbance statement information.

By way of example: when the upper limit threshold of the number of the disturbance words is g (for example, 4) and the disturbance proportion threshold is p (for example, 50%), and when the number of the disturbance words is less than or equal to g and the proportion is less than or equal to p but the prediction label is not inverted, that is, the preset condition is met, the disturbance continues to be performed on the statement information to be processed.

When the number of the disturbance words is less than or equal to g and the proportion is less than or equal to p, and the prediction label is inverted, the disturbance of the statement information to be processed is successful; when the to-be-processed statement information is identified as the abusive text by the text auditing model, the success of disturbance means that the text auditing model cannot correctly identify the abusive statement information as the abusive text.

And if one of the conditions that the number of the disturbance words is greater than g or the proportion is greater than p is met but the prediction label is not reversed, namely the preset condition is not met, stopping the disturbance on the statement information to be processed, and failing to disturb.

In general, the text audit model is a two-class model, and there are only two cases of the prediction tags, so that when the first prediction tag is inconsistent with the second prediction tag, it means that the first prediction tag is opposite to the second prediction tag. For example, when the label corresponding to the statement information to be processed is inflicted text, and the label corresponding to the disturbance statement information needs to be normal text (i.e., non-inflicted text), it means that the text cannot be correctly predicted by the text auditing model, and therefore, it can be used as training for the text auditing model to improve the prediction accuracy for the countermeasure sample.

By applying the method in the embodiment, the attack result obtained by attacking the statement information to be processed is shown in the following table:

harmful text	Data index
		Data volume of test set (bar)	377
Attack success rate (%)	14.85
		Average number of disturbing words	3.83
Average length of text (word)	46.37

Therefore, the method in the embodiment can effectively generate Chinese countermeasure samples.

In the application example shown in fig. 3:

1. combining with the self-defined word segmentation dictionary, preprocessing the input sentence X, and segmenting words to obtain X ═ w₁,w₂,…,w_n](ii) a Wherein w_iIs a word in the sentence X obtained after word segmentation;

2. the whole sentence and the sentence with the ith word removed are input into a trained text auditing model (namely, an abuse text classification model), and the importance of each word is calculated according to an importance calculation formula of a TextFooler algorithm (an English confrontation sample generation algorithm).

3. Removing stop words in the word list, and sequencing the word sequence X from high to low according to the importance of the words;

4. performing word disturbance in sequence from high to low according to the importance of the words; for the word w_iTraversing all the alternative words of the current word, selecting the word with the lowest abuse property from the alternative words by using a TextFooler word importance score formula to replace the original word, and continuing to advance the next word on the basis of the selectionAnd (4) line replacement, and outputting the countermeasure sample until the stop condition is met.

5. Stopping conditions are as follows: the challenge sample is successfully generated if the following three conditions are simultaneously satisfied: 1) the number of the disturbance words is less than or equal to n; 2) the proportion of the number of the disturbance words in the total number of the text words is less than p; 3) predicting tag reversal; otherwise no challenge sample is generated.

There is also provided, in accordance with an embodiment of another aspect of the present application, a confrontational training method, including the steps S7 to S9 as follows:

and S7, obtaining training data and verification data according to the confrontation sample generated by the method in any one of the previous embodiments.

In particular, the training data and the verification data may include both challenge samples and the original test set. Wherein, the samples in the original test set can be the original sentence information which is not replaced by the disturbance words.

Generally, in the training data construction process, the label corresponding to the to-be-processed statement information corresponding to the antagonistic sample is retained (i.e., when the to-be-processed statement information corresponds to the abusive text, the label corresponding to the antagonistic sample is also the abusive text).

S8, training a preset text audit model through training data to obtain a trained text audit model;

and S9, after training, verifying the audit model through the verification data, and when the audit model meets the preset requirements, obtaining a target text audit model according to the audit model after training.

Specifically, the preset requirement may be a preset lower limit threshold of accuracy of prediction corresponding to the text audit model. In addition, other requirements may be set, and the present invention is not limited to these.

Namely, the text audit model is trained through the training data to obtain a trained text audit model, and then the trained text audit model is verified through the verification data to meet the preset requirements, so that the target text audit model can be obtained.

When the text type corresponding to the countermeasure sample is an abuse text and the BERT model is trained by the training data and the verification data obtained in the embodiment, the model obtained after training has the following effects:

as can be seen from the above table, after the confrontation training, the recognition accuracy of the BERT model on the confrontation sample is improved from 86.47% to 98.94%, which indicates that the confrontation training can effectively improve the recognition capability of the model on the harmful sample.

In general, the variety of training samples can be effectively increased by the confrontation samples, the model learns the relevance between the replacement words (namely the disturbing words) and the original words, and the region with poor performance of the model text vector representation space is improved.

In the application example shown in fig. 4:

1. acquiring training data;

2. carrying out data preprocessing on input training data;

3. obtaining preprocessed training data;

4. inputting the preprocessed training data into a confrontation sample generation module;

5. obtaining a confrontation sample of the training data, wherein the original sample label is consistent with the confrontation sample label;

6. mixing the confrontation sample and the original training data to be used as new training data to be input into a text auditing model;

7. and fine-tuning the text auditing model again according to the new training data to obtain the text auditing model after countertraining.

As shown in fig. 5, according to an embodiment of another aspect of the present application, there is also provided a chinese countermeasure sample generating apparatus for text review, including:

a statement obtaining module 11, configured to obtain statement information to be processed;

the word segmentation module 12 is configured to segment words of the sentence information to be processed to obtain a plurality of words;

the importance determining module 13 is used for determining first importance information of the words;

the disturbance word module 14 is used for acquiring disturbance words corresponding to the words;

the replacing module 15 is configured to sequentially obtain disturbance statement information obtained by replacing each term in the statement information to be processed with a corresponding disturbance term according to the first importance information;

and the sample generation module 16 is configured to, when it is determined that the disturbance statement information meets the preset condition, obtain, according to the disturbance statement information, a countersample after the attack on the statement information to be processed is successful.

Specifically, the specific process of implementing the functions of each module in the apparatus according to the embodiment of the present invention may refer to the related description in the method embodiment, and is not described herein again.

As shown in fig. 6, according to an embodiment of another aspect of the present application, there is also provided a resistance exercise device comprising:

a data obtaining module 21, configured to obtain training data and verification data according to the countermeasure sample generated by the method according to any one of the foregoing embodiments;

the training module 22 is configured to train a preset text audit model through training data to obtain a trained text audit model;

and the checking module 23 is configured to check the trained audit model according to the check data, and obtain a target text audit model according to the trained audit model when the preset requirement is met.

According to another embodiment of the present application, there is also provided an electronic apparatus including: as shown in fig. 7, the electronic device may include: the system comprises a processor 1501, a communication interface 1502, a memory 1503 and a communication bus 1504, wherein the processor 1501, the communication interface 1502 and the memory 1503 complete communication with each other through the communication bus 1504.

A memory 1503 for storing a computer program;

the processor 1501 is configured to implement the steps of the above-described method embodiments when executing the program stored in the memory 1503.

The bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.

The communication interface is used for communication between the electronic equipment and other equipment.

The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component.

The embodiment of the present application further provides a storage medium, where the storage medium includes a stored program, and the program executes the method steps of the foregoing method embodiment when running.

It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The foregoing are merely exemplary embodiments of the present invention, which enable those skilled in the art to understand or practice the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A Chinese countermeasure sample generation method facing text audit is characterized by comprising the following steps:

obtaining statement information to be processed;

determining first importance information for the term;

obtaining disturbance words corresponding to the words;

2. The method of claim 1, wherein the obtaining perturbation words corresponding to the words comprises:

determining the pinyin and the font of the word;

3. The method of claim 1, further comprising:

4. The method according to claim 3, wherein the sequentially obtaining disturbance sentence information obtained by replacing each word in the sentence information to be processed with a corresponding disturbance word according to the first importance information includes:

5. The method of claim 4, wherein determining the lowest importance disturbing word with the lowest importance among all the disturbing words corresponding to the word comprises:

6. The method of claim 1, wherein determining the first importance information corresponding to each of the words comprises:

7. The method according to claim 1, wherein the perturbation statement information satisfies a preset condition, and comprises:

8. A method of resistance training, comprising:

obtaining training data and verification data from challenge samples generated according to the method of any one of claims 1 to 7;

9. A Chinese countermeasure sample generating device facing text audit is characterized by comprising:

10. An opponent training device, comprising:

a data acquisition module for obtaining training data and verification data according to the confrontation samples generated by the method of any one of claims 1 to 7;

11. An electronic device, comprising: the system comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;

the memory is used for storing a computer program;

the processor, when executing the computer program, implementing the method steps of any of claims 1 to 8.

12. A storage medium, characterized in that the storage medium comprises a stored program, wherein the program is operative to perform the method steps of any of the preceding claims 1 to 8.