CN113204974B

CN113204974B - Method, device and equipment for generating confrontation text and storage medium

Info

Publication number: CN113204974B
Application number: CN202110527819.3A
Authority: CN
Inventors: 张超; 张子晗; 刘明烜; 段海新; 孙东红; 李琦
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2021-05-14
Filing date: 2021-05-14
Publication date: 2022-06-17
Anticipated expiration: 2041-05-14
Also published as: CN113204974A

Abstract

The embodiment of the invention provides a method, a device, equipment and a storage medium for generating a confrontation text, wherein the method comprises the following steps: acquiring a text information set to be processed, wherein the text information set comprises an original text; carrying out disturbance processing on the original text to generate a plurality of candidate texts corresponding to the original text; and performing semantic recognition processing on each candidate text, determining a countermeasure text corresponding to the original text according to a semantic recognition processing result, wherein the countermeasure text is used for training the target model. The embodiment of the invention can generate the confrontation text with high semantic similarity with the original text, namely the confrontation text with high quality, and further the target model can be effectively trained by utilizing the confrontation text with high quality, so that the trained target model has high robustness.

Description

Method, device and equipment for generating confrontation text and storage medium

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, and a storage medium for generating a confrontation text.

Background

With the development of deep learning technology, the deep learning model also faces various security problems, wherein fighting attacks is a hot problem. In the field of natural language processing, the countermeasure attack means that a small disturbance is added to an original text to obtain a countermeasure text, the countermeasure text is input to enable a deep learning model to output an error semantic label, and a user cannot perceive semantic change of the countermeasure text compared with the original text.

To solve the security problem, the target model needs to be trained to defend against attacks by using the countermeasure text, so that the target model can effectively defend against external counterattacks. In the prior art, aiming at Chinese, the generation of the countermeasure text is realized by adopting a mapping mode, namely, words in an original text are mapped to a vector space to obtain word vectors, disturbance is added to the word vectors, and the word vectors are mapped back to a text format to generate the countermeasure text.

However, a large number of word vectors in the existing method are difficult to map back to a text format, and the generated countermeasure text has a large semantic difference with the original text, so that the quality of the countermeasure text is low, and the training value of the countermeasure text is low.

Disclosure of Invention

The embodiment of the invention provides a method, a device and equipment for generating an confrontation text and a storage medium, and aims to solve the technical problems that in the prior art, the generated confrontation text has larger semantic difference with an original text, the confrontation text quality is low, and further the training value of the confrontation text is low.

In a first aspect, an embodiment of the present invention provides a method for generating a countermeasure text, where the method includes:

acquiring a text information set to be processed, wherein the text information set comprises an original text;

carrying out disturbance processing on the original text to generate a plurality of candidate texts corresponding to the original text;

and performing semantic recognition processing on each candidate text, and determining a countermeasure text corresponding to the original text according to a semantic recognition processing result, wherein the countermeasure text is used for training the target model.

In a possible implementation manner, the perturbing the original text to generate a plurality of candidate texts corresponding to the original text includes:

determining a word sequence corresponding to an original text, and determining the priority of each word in the original text;

selecting words to be processed from the words according to the priority of the words, and performing various disturbance processing on the words to be processed;

and generating a plurality of candidate texts corresponding to the original text according to the plurality of disturbance processing results of the words to be processed and the current original text.

In a possible implementation manner, the text information set to be processed further includes a target semantic tag and a real semantic tag corresponding to the original text, where the target semantic tag is different from the real semantic tag;

the semantic recognition processing is carried out on each candidate text, and the confrontation text corresponding to the original text is determined according to the result of the semantic recognition processing, and the method comprises the following steps:

performing semantic recognition processing on each candidate text to obtain a semantic label of each candidate text and a confidence coefficient of each candidate text;

and selecting any candidate text with the semantic label consistent with the target semantic label as the confrontation text according to the semantic label of each candidate text.

In a possible implementation, the determining a word sequence corresponding to an original text and determining a priority of each word in the original text includes:

determining an attack scene, and determining a word sequence corresponding to an original text according to the attack scene;

and calculating the priority of each word in the original text according to a priority algorithm corresponding to the attack scene.

In a possible implementation manner, if the semantic label of each candidate text is not consistent with the target semantic label, the method further includes:

updating the original text according to the candidate texts and the confidence degrees of the candidate texts, and taking the updated original text as a new original text;

performing the perturbation processing on the new original text to obtain a plurality of new candidate texts corresponding to the new original text, and judging whether any new candidate text with a semantic label consistent with the target semantic label can be selected as the countermeasure text according to the semantic label of each new candidate text;

if not, updating the new original text again and repeating the processing until the confrontation text is obtained.

In one possible embodiment, the plurality of perturbation processes comprises at least one of: synonym-based word replacement processing, word order-based exchange processing, pinyin-based word replacement processing, radical-based word splitting processing, and radical-based replacement processing.

In one possible implementation, the radical-based replacement process includes:

according to a preset radical table, performing radical splitting on each character to be processed in the characters to be processed to obtain at least one radical corresponding to each character to be processed;

judging whether the components belong to the radicals, if so, carrying out multiple radical replacement processing on the radicals to obtain multiple candidate characters;

and selecting the candidate character with the highest similarity with the character to be processed from the plurality of candidate characters as a processing result of the replacement processing based on the radicals.

In a second aspect, an embodiment of the present invention provides an apparatus for generating a confrontation text, including:

the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a text information set to be processed, and the text information set comprises an original text;

the processing module is used for carrying out disturbance processing on the original text to generate a plurality of candidate texts corresponding to the original text;

and the execution module is used for performing semantic recognition processing on each candidate text and determining a confrontation text corresponding to the original text according to a semantic recognition processing result, wherein the confrontation text is used for training the target model.

In a third aspect, an embodiment of the present invention provides a device for generating a countermeasure text, including: a memory and at least one processor;

the memory stores computer-executable instructions;

the at least one processor executing the computer-executable instructions stored by the memory causes the at least one processor to perform the method of generating a confrontational text according to any of the first aspects.

In a fourth aspect, the present invention provides a computer-readable storage medium, in which computer-executable instructions are stored, and when a processor executes the computer-executable instructions, the method according to any one of the first aspect is implemented.

According to the method, the device, the equipment and the storage medium for generating the countermeasure text, the original text is disturbed through obtaining the text information set to be processed comprising the original text, a plurality of candidate texts corresponding to the original text are generated, semantic recognition processing is further performed on each candidate text, the countermeasure text corresponding to the original text is determined according to the semantic recognition processing result, the countermeasure text with high semantic similarity with the original text can be generated, the high-quality countermeasure text can be obtained, and effective training can be performed on the target model by utilizing the high-quality countermeasure text, so that the trained target model has high robustness.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

Fig. 1 is a schematic view of an application scenario provided in an embodiment of the present invention;

fig. 2 is a schematic flow chart of a method for generating a countermeasure text according to an embodiment of the present invention;

fig. 3 is a schematic flowchart of another method for generating a confrontation text according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a device for generating a countermeasure text according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a device for generating a confrontation text according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The bad information is information causing potential safety hazards to the user terminal, generally, the bad information can be disguised as safety information, so that a user is induced to click and trigger the bad information, and the purposes of destroying equipment safety and data safety of the user terminal are achieved.

The deep learning model is a machine learning algorithm which can identify which information is bad information and carry out corresponding identification so as to carry out safety early warning for users. The text information including bad information and the like reaching the user terminal is subjected to characteristic identification, so that the text information is classified into safety tags or non-safety tags, the bad information can be identified in time when the user terminal receives the bad information, and safety early warning is carried out on the user.

However, the deep learning model also faces various security issues, the most important of which is the counterattack launched on the deep learning model. The anti-attack process refers to a process of attacking the deep learning model on the user terminal by using the unhealthy information after the disturbance processing. For example, in the course of real counterattack, a lawless person or an illegal device may add a small disturbance to the original bad information to obtain new bad information, and use the new bad information to destroy the device security and data security of the user terminal. Generally, in order to maintain readability of new bad information, the new bad information after perturbation processing and the original bad information are difficult to distinguish by naked eyes of a user.

When the new bad information after the disturbance processing is disguised as safe information and reaches the user terminal, the deep learning model cannot output the label which is non-safe information because of the difference in characteristics between the new bad information and the original bad information, and further outputs the safe label which is the new bad information and is the safe information.

Meanwhile, the new bad information has high similarity with the original bad information in semantics, and the new bad information is a label of 'safety information' given by the model, so that the user trusts the new bad information, clicks and triggers, and the equipment safety and the data safety of the user terminal are seriously threatened.

To protect the security of the network community, the security problem can be solved by performing defense training on a deep learning model (hereinafter referred to as a target model).

Before the target model (deep learning model) has the counterattack, the target model can be trained for defending the counterattack to a certain extent, so that the target model can obtain new bad information (hereinafter also referred to as real counterattack text) with a certain resolution capability by adding tiny disturbance to the bad information (hereinafter also referred to as original text).

Specifically, first, some bad information may be obtained in a pre-generated manner, and new bad information may be obtained by adding a small disturbance, and simulation bad information (hereinafter also referred to as simulation countermeasure text) is obtained, and the correct label of the simulation countermeasure text is marked as "non-safety information".

Then, the target model is trained again by using the simulated countermeasure text and the correct label, so that the target model after the countermeasure training can extract the characteristics of the simulated countermeasure text and simultaneously output the correct label of the simulated countermeasure text, which is 'non-safety information'. Through the defense training, the higher identification accuracy rate of the target model on bad information under different disturbances can be greatly improved, the robustness of the target model is improved, and the safety early warning service is effectively provided for users.

Meanwhile, in order to achieve the above objective, a large amount of simulated countermeasure texts need to be simulated and generated, and in the prior art, for chinese, the generation of the simulated countermeasure texts is implemented in a mapping manner. The words in the original text of the bad information are mapped to a vector space to obtain word vectors, then disturbance is added to the word vectors, and the word vectors are mapped back to the text format to generate a simulated countermeasure text simulating the bad information.

However, in the existing method, a large number of word vectors are difficult to map back to a text format, the generated simulated confrontation text generally has a large semantic difference with the original text, a user can easily distinguish the simulated confrontation text from the original text by naked eyes, the similarity with a real confrontation text in confrontation attack is not high, and the training value is low. Meanwhile, the target model obtained by training the target model by using the simulated countermeasure text still cannot be effectively identified and label output in the real countermeasure attack, so that the training effect of defense countermeasure is poor, and the problem of network security still easily occurs.

In order to solve the above problems, the inventors have studied and found that, by performing a perturbation process on an original text based on the characteristics of a natural language, for example, for Chinese, the original text is disturbed based on the pinyin characteristics or the character patterns of Chinese, so that a simulated confrontation text with high semantic similarity with the original text can be generated, namely, a simulated confrontation text with high similarity with the real confrontation text in the real confrontation attack is generated, the training of defense confrontation on the target model by utilizing the simulated confrontation text can achieve better training effect, namely, the target model obtained by training the target model by using the simulated confrontation text has high robustness, can effectively extract the characteristics of the real confrontation text in the real confrontation attack, and then the label of the real countermeasure text is effectively and accurately output, and the network security problem can be effectively avoided.

The original text and the confrontation text mentioned in the embodiments of the present invention refer to the original text in chinese and the simulated confrontation text in chinese.

That is to say, a text information set to be processed including an original text is obtained, the original text is subjected to perturbation processing, a plurality of candidate texts corresponding to the original text are generated, semantic recognition processing is further performed on each candidate text, a confrontation text corresponding to the original text is determined according to a semantic recognition processing result, a confrontation text with high semantic similarity to the original text can be generated, and then a high-quality confrontation text is obtained, and effective training can be performed on a target model by using the high-quality confrontation text, so that the trained target model has high robustness.

Fig. 1 is a schematic view of an application scenario provided in an embodiment of the present invention. As shown in fig. 1, the scheme provided by the embodiment of the present invention may be applied to a generation device of a countermeasure text, where the device acquires an original text and performs analysis processing on the original text to generate a countermeasure text corresponding to the original text.

Fig. 2 is a flowchart illustrating a method for generating a countermeasure text according to an embodiment of the present invention. The execution subject of the method in the embodiment of the invention can be the generation device of the confrontation text. As shown in fig. 2, the method in this embodiment may include:

step 201, a text information set to be processed is obtained, wherein the text information set comprises an original text.

In this embodiment, the original text may be a chinese sentence, for example, if the original text is "hit a hundred yuan of telephone charge for a gold egg", the real semantic tag corresponding to the original text is "fraud short message".

Step 202, performing perturbation processing on the original text to generate a plurality of candidate texts corresponding to the original text.

In this embodiment, the original text is perturbed based on the chinese characteristics of the original text, for example, words in the original text are perturbed based on the pinyin characteristics or the font characteristics of the chinese language, so as to generate a plurality of candidate texts.

And 203, performing semantic recognition processing on each candidate text, and determining a countermeasure text corresponding to the original text according to a semantic recognition processing result, wherein the countermeasure text is used for training the target model.

In this embodiment, semantic recognition processing is performed on each candidate text by using the target model, that is, each candidate text is input into the target model, so that a semantic recognition processing result corresponding to the candidate text can be obtained, the semantic recognition processing result may include semantic tags of the candidate text, and further, the countermeasure text of the original text may be determined according to the semantic tags of the candidate text, for example, any candidate text whose semantic tag is inconsistent with a real semantic tag corresponding to the original text is selected as the countermeasure text, so as to train the target model by using the countermeasure text.

The method for generating the countermeasure text provided by this embodiment obtains the text information set to be processed including the original text, the original text is disturbed to generate a plurality of candidate texts corresponding to the original text, and each candidate text is subjected to semantic recognition processing, determining a confrontation text corresponding to the original text according to the semantic recognition processing result, generating the confrontation text with high semantic similarity with the original text, namely, the high-quality confrontation text is obtained, and the target model can be effectively trained by utilizing the high-quality confrontation text, namely, the target model obtained by training the target model by using the high-quality countermeasure text has higher robustness, can effectively extract the characteristics of the real countermeasure text in the real countermeasure attack, and then the label of the real countermeasure text is effectively and accurately output, and the network security problem can be effectively avoided.

In order to further ensure the semantic similarity between the generated countermeasure text and the original text, the embodiment of the invention can perform various disturbance treatments on the original text to obtain the countermeasure text.

Fig. 3 is a flowchart illustrating another method for generating a confrontation text according to an embodiment of the present invention. As shown in fig. 3, this embodiment is a detailed description of performing various perturbation processes on an original text based on the technical solutions provided by the above embodiments. The method in this embodiment may include:

step 301, obtaining a text information set to be processed, wherein the text information set comprises an original text.

For a specific implementation process and principle of step 301 in this embodiment, reference may be made to the foregoing embodiments, and details are not described herein.

Step 302, determining a word sequence corresponding to the original text, and determining the priority of each word in the original text.

Optionally, the step of determining the word sequence corresponding to the original text and determining the priority of each word in the original text may specifically include: determining an attack scene, and determining a word sequence corresponding to an original text according to the attack scene; and calculating the priority of each word in the original text according to a priority algorithm corresponding to the attack scene.

In this embodiment, the attack scene includes a white box attack scene and a black box attack scene, the white box attack scene is an attack scene in which information such as the structure, the parameters, and the training data of the target model is known, and the black box attack scene is an attack scene in which information such as the structure, the parameters, and the training data of the target model is unknown. Under different attack scenes, the methods for determining the word sequences corresponding to the original text are different, and meanwhile, the priority algorithms for determining the priority of each word in the original text are also different. And aiming at different attack scenes, determining word sequences corresponding to the original text by adopting different methods, and calculating the priority of each word by adopting different priority algorithms.

It should be noted that, in both the white box attack scenario and the black box attack scenario, the target model may be used to perform semantic recognition processing on the original text, the candidate text, or the countermeasure text, so as to obtain corresponding semantic recognition processing results.

And 303, selecting the words to be processed from the words according to the priority of the words, and performing various disturbance processing on the words to be processed.

Wherein the plurality of disturbance processes include at least one of: synonym-based word replacement processing, word order-based exchange processing, pinyin-based word replacement processing, radical-based word splitting processing, and radical-based replacement processing.

Specifically, the word with the highest priority among the words is determined as the word to be processed, and then multiple kinds of perturbation processing are performed on the word to be processed, wherein the multiple kinds of perturbation processing can be performed simultaneously.

Optionally, the synonym-based word replacement processing may specifically be that a synonym corresponding to a word to be processed is determined by querying a preset synonym comparison table, and the determined synonym is used as a processing result of the synonym-based word replacement processing. The processing method includes that a plurality of synonyms corresponding to the words to be processed are available, any synonym can be selected as a processing result of the word replacement processing based on the synonyms, the query of a preset synonym comparison table can be stopped after a first synonym corresponding to the words to be processed is queried, and the first synonym is used as a processing result of the word replacement processing based on the synonyms, and no specific limitation is made here. For example, if the word to be processed is "golden egg", and the synonym thereof is determined to be "golden ball" by querying the preset synonym comparison table, the synonym "golden ball" is used as the processing result of the synonym-based word replacement processing.

The preset synonym comparison table may be an existing synonym comparison table, for example, a synonym comparison table obtained from a website. And the synonym-based word replacement processing is carried out on the words, so that the high semantic similarity between the generated candidate text and the original text can be ensured.

Optionally, the word order-based exchange processing may specifically be that characters in the words to be processed are exchanged for word orders, so as to obtain words after the word order exchange, and the words are used as processing results of the word order-based exchange processing, for example, if the words to be processed are "golden eggs", the words after the word order exchange are "egg gold", and "egg gold" is used as the processing results of the word order-based exchange processing. If the word to be processed includes two or more characters, the word order of any two or more characters in the word to be processed may be exchanged, for example, if the word to be processed is "consumption coupon", the word after the word order exchange may be "consumption coupon", and any word after the word order exchange may be selected as the processing result of the exchange processing based on the word order. And the words are exchanged based on the word order, so that the high semantic similarity between the generated candidate text and the original text can be ensured.

Optionally, the pinyin-based word replacement process includes the following sub-processes: a tone-based word replacement process, a front-back nasal sound-based word replacement process, a warped-tongue sound-based word replacement process, and a dialect-based word replacement process.

Furthermore, the word replacement processing based on the pinyin may specifically be that, according to the pinyin of the word to be processed, the replacement word corresponding to the word to be processed is determined by querying a preset pinyin comparison table. In the tone-based word replacement process, a word that is in the same tone as or different from the word to be processed may be used as the replacement word, for example, if the word to be processed is "atrophy (pinyin w ě i su ō)", then "dau (pinyin w idei su ō)" may be used as the replacement word.

In the word replacement processing based on the front and back nasal sounds, the pinyin of any character in the word to be processed can be converted according to the front and back nasal sounds to generate the converted pinyin, and a preset pinyin comparison table is inquired to use the word corresponding to the converted pinyin as a replacement word, for example, if the word to be processed is 'identity (pinyin sh n f < de >) n', then 'province (sh pinyin ě ng f < de > n)' can be used as the replacement word.

In the word replacement process based on the warped-tongue sound, the pinyin of any character in the word to be processed may be transformed according to the warped-tongue sound to generate a transformed pinyin, and a preset pinyin comparison table is queried to use a word corresponding to the transformed pinyin as a replacement word, for example, if the word to be processed is "meal (pinyin ch ī qan)", then "pot (pinyin c ī f syten)" may be used as the replacement word.

In the dialect-based word replacement process, the pinyin of any character in the word to be processed can be transformed according to the dialect to generate the transformed pinyin, and the word corresponding to the transformed pinyin is used as the replacement word by querying a preset pinyin comparison table, for example, if the word to be processed is "zijia (pinyin f ji n)", then "hujia (pinyin h ji n)" can be used as the replacement word.

Optionally, when performing word replacement processing based on pinyin, any one of the sub-processing to-be-processed words may be selected to be processed, or any multiple kinds of sub-processing may be selected to be respectively processed on the to-be-processed words, so as to determine replacement words corresponding to multiple to-be-processed words, and any one of the replacement words may be selected as a processing result of the word replacement processing based on pinyin, or an optimal replacement word may be selected as a processing result of the word replacement processing based on pinyin according to a distance between a vector of the replacement word and a vector of the to-be-processed word, and a smaller distance between a vector of the replacement word and a vector of the to-be-processed word indicates that semantics of the two words are closer to each other.

The preset pinyin comparison table may be an existing pinyin comparison table, for example, a pinyin comparison table obtained from a website.

Optionally, the word splitting processing based on the components may specifically be that, by querying a preset component radical table, the to-be-processed character of each left-right structure or semi-surrounding structure in the to-be-processed word is split by the components to obtain the components corresponding to the to-be-processed character, which is used as the processing result of the word splitting processing based on the components, where optionally, if the to-be-processed character of the non-left-right structure and the semi-surrounding structure exists in the to-be-processed word, the to-be-processed character remains unchanged, for example, the characters "shares" of the left-right structure in the to-be-processed word "identity" may be split by the components to obtain corresponding components "and" score ", and then the" identity score "is used as the processing result of the word splitting processing based on the components.

The preset radical table may be an existing radical table, for example, a radical table obtained from a website.

Optionally, the replacement processing based on the radicals may specifically include: according to a preset radical table, performing radical splitting on each character to be processed in the characters to be processed to obtain at least one radical corresponding to each character to be processed; judging whether the components belong to the radicals, if so, carrying out multiple radical replacement processing on the radicals to obtain multiple candidate characters; and selecting the candidate character with the highest similarity with the character to be processed from the plurality of candidate characters as a processing result of the replacement processing based on the radicals.

Specifically, whether a character to be processed after the replacement of a radical is a Chinese character is judged by inquiring a preset radical table, if so, the character to be processed after the replacement of the radical is taken as a candidate character, the replacement of the radical comprises deletion, addition and replacement of the radical, for example, a word to be processed is 'telephone charge', the word to be processed is split into the radicals to obtain a radical 'machine' of the character to be processed, the radical 'machine' is respectively subjected to multiple times of radical replacement, namely, all existing Chinese radicals (214) are traversed, other radicals which are not radicals 'machines' are subjected to radical replacement on the radical 'machine' to obtain characters such as 'live' and 'quiet', and all the characters are Chinese characters, and the characters are determined as candidate characters.

In addition, the radicals of the to-be-processed characters in the to-be-processed words "clean" are "not to be treated", the radicals "not to be treated" are deleted to obtain "green" which is still a Chinese character, and the character is determined as a candidate character.

The characters to be processed in the words to be processed are firstly added with the radicals to be treated and then are treated as the characters of washing, Xian and the like, and the characters are Chinese characters, and then the characters are determined as candidate characters.

Further, the step of selecting the character with the highest font similarity to the character to be processed from the plurality of candidate characters as the processing result of the replacement processing based on the radical may be implemented based on a twin network (latent network), that is, the plurality of candidate characters may be compared with the font similarity to the character to be processed through the twin network, so as to select the candidate character with the highest font similarity to the character to be processed.

Specifically, the twin network is a dual-input network structure, which is generally used to compare the similarity of two inputs, and may be composed of two Convolutional Neural Networks (CNNs). The character to be processed and any corresponding candidate character are respectively converted into a picture format and are respectively input into two convolutional neural networks, two groups of vectors can be obtained after feature extraction, the distance between the two groups of vectors is calculated, and the font similarity between the character to be processed and any candidate character can be determined according to the distance.

The weights of the two convolutional neural networks are shared, and table 1 shows the structure and parameters of the convolutional neural networks.

TABLE 1

Layer (Layer)	Activating function (Activation)
		Conv2D filter＝64size＝10*10	Relu
MaxPooling2D
		Conv2D filter＝128size＝7*7	Relu
MaxPooling2D
		Conv2D filter＝128size＝4*4	Relu
MaxPooling2D
		Conv2D filter＝256size＝4*4	Relu
Flatten
		Dense unit＝4096	Sigmoid

The distance formula is:

d＝‖g(I₁,θ)-g(I₂,θ)‖ (1)

in the formula I₁For pictures corresponding to characters to be processed, I₂And e, a picture corresponding to the candidate character, theta is a parameter of the convolutional neural network, g (I, theta) is an output of the convolutional neural network after receiving an input I, and d is a distance between two groups of vectors.

The smaller the distance between the two groups of vectors is, the higher the similarity between the corresponding character to be processed and the character font of the candidate character is, so that the candidate character with the highest similarity to the character font of the candidate character can be selected, and the replacement word corresponding to the character to be processed is obtained and used as the processing result of the replacement processing based on the radical.

And 304, generating a plurality of candidate texts corresponding to the original text according to the plurality of disturbance processing results of the words to be processed and the current original text.

In this embodiment, the multiple perturbation processing results may include a processing result of a synonym-based word replacement process, a processing result of a word exchange process based on a word order, a processing result of a pinyin-based word replacement process, a processing result of a radical-based word splitting process, and a processing result of a radical-based replacement process, and the multiple perturbation processing results are respectively replaced with words to be processed in the current original text, so as to generate multiple candidate texts.

The candidate texts may include a candidate text based on a synonym word replacement process, a candidate text based on a word order exchange process, a candidate text based on a pinyin word replacement process, a candidate text based on a radical word splitting process, and a candidate text based on a radical replacement process, for example, after performing a plurality of kinds of perturbation processes on a word to be processed "telephone charge" in an original text of "gold egg sending hundred units of telephone charge", perturbation process results such as "telephone charge", "expense", or "live charge" are obtained, and then a plurality of candidate texts of "gold egg sending hundred units of telephone charge", "gold egg sending hundred units of expense", or "gold egg sending hundred units of live charge" are obtained according to the perturbation process results.

And 305, performing semantic recognition processing on each candidate text to obtain a semantic label of each candidate text and a confidence coefficient of each candidate text.

In this embodiment, each candidate text is subjected to semantic identification processing by using a target model, that is, each candidate text is input into the target model respectively, so as to obtain semantic tags of each candidate text and a confidence level of each candidate text, where the confidence level of each candidate text includes confidence levels of all tags, especially the confidence level of the semantic tag of each candidate text, and the semantic tag of each candidate text is a tag with the highest confidence level among all tags, for example, the tags may include "fraud short messages", "marketing short messages", or "advertisement short messages", and the semantic tag of the candidate text is "fraud short messages".

Step 306, according to the semantic tags of the candidate texts, selecting any candidate text with the semantic tag consistent with the target semantic tag as the confrontation text.

Specifically, the text information set to be processed further includes a target semantic tag and a real semantic tag corresponding to the original text, where the real semantic tag is a semantic tag of the original text obtained after performing semantic recognition processing on the original text by using a target model, the target semantic tag is different from the real semantic tag, and the target semantic tag corresponds to an attack type, that is, the target semantic tags corresponding to different attack types are different.

The counterattack can be classified into non-directional attack and directional attack according to the type of attack. The non-directional attack refers to an attack that after the countermeasure text is input into the target model, the target model outputs any label different from the real semantic label, and the directional attack refers to an attack that after the countermeasure text is input into the target model, the target model outputs a specified label different from the real semantic label.

That is, under the non-directional attack and the directional attack, the target semantic tag is different, and specifically, under the non-directional attack, the target semantic tag is an arbitrary tag different from the real semantic tag, for example, the real semantic tag is "fraud short message", and then the target semantic tag can be an arbitrary tag other than "fraud short message". Under the directional attack, the target semantic label is a designated label different from the real semantic label, for example, the real semantic label is a "fraud short message", and the target semantic label is a "normal short message".

Further, under the non-directional attack or the directional attack, whether the semantic label of each candidate text is consistent with the target semantic label or not is respectively judged, if at least one semantic label of the candidate text is consistent with the target semantic label, any candidate text with the semantic label consistent with the target semantic label is taken as an antagonistic text to be output.

The method for generating the countermeasure text provided by the embodiment of the invention can be suitable for different attack scenes and attack types, namely, suitable for a white box attack scene or a black box attack scene, and also suitable for non-directional attack or directional attack, wherein the attack scenes and the attack types can be freely combined.

It should be noted that, optionally, the method for generating the countermeasure text provided in the embodiment of the present invention may be further extended and applied to other natural languages, such as japanese and korean languages, which have characteristics similar to or similar to those of the chinese language, based on the characteristics of the natural language.

In addition, if the semantic tags of the candidate texts are all inconsistent with the target semantic tags, the current candidate texts cannot be used as countermeasure texts, and optionally, the original text can be updated on the basis of the current candidate text.

Specifically, the original text is updated according to the candidate texts and the confidence degrees of the candidate texts, and the updated original text is used as a new original text; performing the perturbation processing on the new original text to obtain a plurality of new candidate texts corresponding to the new original text, and judging whether any new candidate text with a semantic label consistent with the target semantic label can be selected as the confrontation text according to the semantic label of each new candidate text; if not, updating the new original text again and repeating the processing until the confrontation text is obtained.

Further, according to the confidence of each candidate text, selecting a candidate text from each candidate text as an updated original text, namely a new original text, according to the priority of each word, selecting a word to be processed from each word not including the processed word to be processed, performing various disturbance processing on the word to be processed, generating a plurality of new candidate texts according to various disturbance processing results on the word to be processed and the new original text, performing semantic recognition processing on each new candidate text by using a target model to obtain the semantic tag and the confidence of each new candidate text, and judging whether the semantic tag of each new candidate text is consistent with the target semantic tag or not.

And if the semantic tags are consistent with the target semantic tags, selecting any new candidate text with the semantic tags consistent with the target semantic tags as the countermeasure text, and if the semantic tags are not consistent with the target semantic tags, updating the new original text again and repeating the processing until the countermeasure text is obtained.

Optionally, when the semantic tag of each new candidate text is inconsistent with the target semantic tag, the new original text may be updated again and the above processing may be repeated until the iteration stop condition is reached.

Wherein, reaching the iteration stop condition may be: the method includes that disturbance processing is performed on all words in an original text according to the priority of each word, but semantic tags of generated new candidate texts are still inconsistent with target semantic tags, or a preset processable word proportion is achieved, specifically, the preset processable word proportion can be a processable word proportion capable of keeping readability of a text, for example, the preset processable word proportion is 30%, and only the first 30% of the words in the original text in the priority order are subjected to disturbance processing.

The method for generating the countermeasure text provided by this embodiment includes determining priorities of words in an original text, selecting words to be processed according to the priorities, performing multiple kinds of perturbation processing on the selected words and generating multiple candidate texts by combining a current original text, performing semantic recognition processing on the candidate texts, selecting any candidate text with a semantic tag consistent with a target semantic tag as the countermeasure text, updating the original text as a new original text when the semantic tag of each candidate text is inconsistent with the target semantic tag, performing perturbation processing on the new original text to obtain the countermeasure text, or repeating the above processing until the countermeasure text is obtained, and ensuring high semantic similarity between the generated countermeasure text and the original text by performing multiple kinds of perturbation processing based on chinese characteristics on the words to be processed, so that a target model obtained by training the target model using the countermeasure text has high robustness, the label of the real countermeasure text can be effectively and accurately output in the real countermeasure attack, and the network security problem is effectively avoided.

Optionally, the attack scenario is determined in step 302, and when the attack scenario is a white-box attack scenario, since information such as a structure, a parameter, and training data of a target model in the white-box attack scenario is known, and the training data includes a word sequence corresponding to an original text, the word sequence corresponding to the original text can be obtained by obtaining the training data of the target model.

Further, the priority of each word in the original text is higher or lower, and the influence of the corresponding word on the confidence of the real semantic tag corresponding to the original text is larger, that is, the higher the priority of the word in the original text is, the larger the influence of the word on the confidence of the real semantic tag corresponding to the original text is.

Specifically, in a white-box attack scene, a saliency map method is adopted to determine the priority of each word in an original text, namely, semantic recognition processing is performed on the original text by using a target model to obtain the confidence of a real semantic tag corresponding to the original text, the gradient of the confidence of the real semantic tag corresponding to the original text and the amplitude of the gradient are calculated, and the priority of each word in the original text is determined according to the obtained amplitude.

Gradient calculation formula:

where l is the original text length, q is the word vector dimension, f_k(x) The confidence degree of the kth label output after semantic recognition processing is carried out on the original text x by using the target model, wherein the kth label is a real semantic label corresponding to the original text.

Amplitude calculation formula:

where s (x) i is a sensitivity score, the sensitivity score of each word in the original text to the kth tag can be obtained by the above formulas (2) to (3), and the priority of each corresponding word can be obtained according to the level of the sensitivity score of each word, that is, the higher the sensitivity score of the word, the higher the priority of the word in the original text.

Optionally, when the attack scene is a black-box attack scene, since information such as a structure, parameters, and training data of a target model in the black-box attack scene is unknown, and a word sequence corresponding to an original text cannot be obtained by obtaining training data of the target model, word segmentation processing on the original text may be implemented by using the prior art, for example, word segmentation processing is performed on the original text by using a prior word segmentation algorithm to obtain a plurality of words corresponding to the original text.

Specifically, in a black box attack scenario, information such as the structure, parameters, and training data of a target model is unknown, so that the priority of each word in an original text cannot be determined by using a saliency map method, a new text is generated by deleting any word in the original text, semantic recognition processing is performed on the new text by using the target model to obtain the confidence of a real semantic label corresponding to the new text, and the confidence of the real semantic label corresponding to the original text is compared with the confidence of the real semantic label corresponding to the original text to determine the priority of the word.

The importance score calculation formula:

DropScore_p＝f_k(x)-f_k(x_p) (4)

in the formula, x is an original text, the kth label is a real semantic label corresponding to the original text, and x_pNew text generated by deleting the p-th word from the original text x, f_k(x) F is confidence coefficient of kth label output after semantic recognition processing is carried out on original text x by using a target model_k(x_p) For new text x by using target model_pConfidence of the k-th label output after semantic recognition processing, DropScore_pThe importance score of the pth word in the original text x.

Deleting the words in the original text one by one to form a new text, respectively calculating corresponding importance scores, and obtaining the priority of the corresponding words according to the importance scores of the words, wherein the priority of the words with higher importance scores is higher in the original text.

In this embodiment, in a white box attack scenario and a black box attack scenario, different methods are respectively used to determine a word sequence corresponding to an original text, and different priority algorithms are used to calculate priorities of words, so that the word sequence corresponding to the original text and the priorities of the words can be more accurately and effectively determined according to characteristics of the different attack scenarios.

Optionally, in the step of updating the current original text according to the candidate texts and the confidence degrees of the candidate texts, different methods are adopted for different attack types. For the non-directional attack, the step of updating the current original text according to the semantic tags of the candidate texts and the confidence degrees of the semantic tags is executed, which may specifically be: and calculating a first quality score of each candidate text according to the semantic label of each candidate text and the confidence coefficient of the semantic label, and updating the current original text according to the first quality score.

The first mass fraction equation:

L_n＝f_k(x)-f_k(x′) (5)

where x is the original text, x' is the candidate text, f_k(x) F is confidence coefficient of kth label output after semantic recognition processing is carried out on original text x by using a target model_k(x ') is the confidence of the kth label output after semantic recognition processing is performed on the candidate text x' by using the target model, L_nThe first quality score is obtained, wherein the kth label is a real semantic label corresponding to the original text.

And respectively calculating a first quality score for each candidate text, wherein the larger the obtained first quality score is, the larger the influence of the candidate text on the confidence coefficient of the real semantic tag is, namely the more the semantic of the candidate text deviates from that of the original text, and selecting the candidate text with the largest first quality score to update the current original text to obtain the updated current original text as the initial text of the next iteration.

Optionally, for the directional attack, the step of updating the current original text according to the semantic tags of the candidate texts and the confidence degrees of the semantic tags may specifically be: and calculating a second quality score of each candidate text according to the semantic label of each candidate text and the confidence coefficient of the semantic label, and updating the current original text according to the second quality score.

Second mass fraction formula:

L_t＝f_s(x′)-f_s(x) (6)

in the formula, x is an original text, s is a target semantic label, x' is a candidate text, f_s(x) The confidence coefficient f of the s-th label output after the semantic recognition processing is carried out on the original text x by using the target model_s(x ') is the confidence of the s-th label output after semantic recognition processing is performed on the candidate text x' by using the target model, L_tAnd the second quality score, wherein the s-th label is the target semantic label.

And respectively calculating a second quality score of each candidate text, wherein the obtained second quality score is larger, the larger the influence of the candidate text on the confidence coefficient of the target semantic tag is, namely the closer the semantic meaning of the candidate text is to the semantic meaning corresponding to the target semantic tag, the candidate text with the largest second quality score is selected to update the current original text, and the updated current original text is used as the initial text of the next iteration.

In the embodiment, under the attack types of the non-directional attack and the qualitative attack, the current original text is updated by adopting different methods respectively, so that the current original text can be updated more accurately and effectively according to the characteristics of different attack types.

Fig. 4 is a schematic structural diagram of a device for generating a countermeasure text according to an embodiment of the present invention. As shown in fig. 4, the apparatus for generating a confrontation text provided in this embodiment may include: an acquisition module 41, a processing module 42 and an execution module 43.

An obtaining module 41, configured to obtain a text information set to be processed, where the text information set includes an original text;

the processing module 42 is configured to perform perturbation processing on the original text to generate a plurality of candidate texts corresponding to the original text;

and the execution module 43 is configured to perform semantic recognition processing on each candidate text, and determine a confrontation text corresponding to the original text according to a result of the semantic recognition processing, where the confrontation text is used for training the target model.

In an optional implementation manner, the processing module 42 is specifically configured to:

In an optional implementation manner, the executing module 43 is specifically configured to:

In an alternative implementation manner, when determining the word sequence corresponding to the original text and determining the priority of each word in the original text, the processing module 42 is further specifically configured to:

In an optional implementation manner, when the semantic tag of each candidate text is not consistent with the target semantic tag, the executing module 43 is further specifically configured to:

In an optional implementation manner, when performing the replacement processing based on the radical, the execution module 42 is further specifically configured to:

The device for generating the countermeasure text provided in this embodiment may execute the technical solutions of the above method embodiments, and the implementation principles and technical effects are similar, which are not described herein again.

Fig. 5 is a schematic structural diagram of a device for generating a countermeasure text according to an embodiment of the present invention. As shown in fig. 5, the apparatus for generating a confrontation text provided by this embodiment includes: a memory 51 and at least one processor 52;

the memory 51 stores computer execution instructions;

the at least one processor 52 executes the computer-executable instructions stored in the memory 51, so that the at least one processor 52 executes the method for generating the countermeasure text according to any of the above embodiments.

Wherein the memory 51 and the processor 52 may be connected by a bus 53.

The specific implementation principle and effect of the device for generating an anti-text provided by this embodiment may refer to the relevant description and effect corresponding to the embodiments shown in fig. 1 to fig. 3, and will not be described in detail herein.

An embodiment of the present invention further provides a computer-readable storage medium, where a computer executing instruction is stored in the computer-readable storage medium, and when a processor executes the computer executing instruction, the method according to any of the above embodiments is implemented.

The computer readable storage medium may be, among others, ROM, Random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like.

In the several embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of modules is merely a division of logical functions, and an actual implementation may have another division, for example, a plurality of modules or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed coupling or direct coupling or communication connection between each other may be through some interfaces, indirect coupling or communication connection between devices or modules, and may be in an electrical, mechanical or other form.

Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims

1. A method for generating a countermeasure text, comprising:

acquiring a text information set to be processed, wherein the text information set comprises an original text; the text information set to be processed also comprises a target semantic label and a real semantic label corresponding to the original text, wherein the target semantic label is different from the real semantic label;

determining the priority of each word in the word sequence corresponding to the original text in the original text;

generating a plurality of candidate texts corresponding to the original text according to the plurality of disturbance processing results of the words to be processed and the current original text;

selecting any candidate text with the semantic label consistent with the target semantic label as the confrontation text according to the semantic label of each candidate text, wherein the confrontation text is used for training a target model;

wherein, in a white-box attack scenario, determining the priority of each word in the original text comprises:

performing semantic recognition processing on the original text by using the target model to obtain the confidence coefficient of a real semantic tag corresponding to the original text, calculating the gradient of the confidence coefficient of the real semantic tag corresponding to the original text and the amplitude of the gradient, and determining the priority of each word in the original text according to the obtained amplitude.

2. The method of claim 1, wherein if the semantic tag of each candidate text is not consistent with the target semantic tag, the method further comprises:

updating the current original text according to the candidate texts and the confidence degrees of the candidate texts to obtain the updated current original text serving as the initial text of the next iteration;

selecting a next word to be processed from the words according to the priority of the words, and performing various disturbance processing on the next word to be processed;

generating a plurality of candidate texts according to the plurality of disturbance processing results of the next word to be processed and the updated current original text;

performing semantic recognition processing on each candidate text to obtain a semantic label of each candidate text; if the semantic label of any candidate text is consistent with the target semantic label, selecting any candidate text with the semantic label consistent with the target semantic label as the confrontation text; and if the semantic label of each candidate text is inconsistent with the target semantic label, executing an iteration step until a confrontation text is obtained or an iteration stop condition is reached.

3. The generation method according to claim 1, wherein the plurality of perturbation processes include at least one of: synonym-based word replacement processing, word order-based exchange processing, pinyin-based word replacement processing, radical-based word splitting processing, and radical-based replacement processing.

4. The generation method according to claim 3, wherein the radical-based replacement process includes:

and selecting the candidate character with the highest font similarity with the character to be processed from the plurality of candidate characters as a processing result of the replacement processing based on the radicals.

5. An apparatus for generating a confrontational text, comprising:

the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a text information set to be processed, and the text information set comprises an original text; the text information set to be processed also comprises a target semantic label and a real semantic label corresponding to the original text, wherein the target semantic label is different from the real semantic label;

a processing module to:

the execution module is used for carrying out semantic recognition processing on each candidate text to obtain a semantic label of each candidate text and the confidence coefficient of each candidate text; selecting any candidate text with the semantic label consistent with the target semantic label as the confrontation text according to the semantic label of each candidate text, wherein the confrontation text is used for training a target model;

when determining the priority of each word in the original text in a white-box attack scenario, the processing module is further specifically configured to:

6. A device for generating countermeasure text, comprising: a memory and at least one processor;

the memory stores computer-executable instructions;

the at least one processor executing the computer-executable instructions stored by the memory causes the at least one processor to perform the method of generating a confrontational text as set forth in any one of claims 1-4.

7. A computer-readable storage medium having computer-executable instructions stored thereon which, when executed by a processor, implement the method of any one of claims 1-4.