CN111652267B

CN111652267B - Method and device for generating countermeasure sample, electronic equipment and storage medium

Info

Publication number: CN111652267B
Application number: CN202010317965.9A
Authority: CN
Inventors: 岂凡超; 臧原; 刘知远; 孙茂松
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2020-04-21
Filing date: 2020-04-21
Publication date: 2023-01-31
Anticipated expiration: 2040-04-21
Also published as: WO2021212675A1; CN111652267A

Abstract

The embodiment of the invention provides a method and a device for generating a countermeasure sample, electronic equipment and a storage medium, wherein an original text is obtained; determining a candidate set of replacement words of each word in the original text; and searching a sample of the attack target model from a discrete space formed by the combination of the candidate sets of the replacement words based on a particle swarm optimization algorithm to generate a countersample. The embodiment of the invention searches the antagonistic sample by using the particle swarm optimization algorithm, and the particle swarm optimization is more efficient than the genetic algorithm as a metaheuristic group evolution calculation method, so that the search speed can be increased and the attack success rate can be increased when the antagonistic sample is searched by using the algorithm. Aiming at different natural language processing models, the embodiment of the invention can quickly and efficiently generate a large amount of high-quality countermeasure samples, successfully deceive the target model, further expose the vulnerability of the target model and has good practicability.

Description

Method and device for generating countermeasure sample, electronic equipment and storage medium

Technical Field

The present invention relates to the field of natural speech processing technologies, and in particular, to a method and an apparatus for generating a countermeasure sample, an electronic device, and a storage medium.

Background

Counterattack refers to the process of making the target model decision go wrong by generating a countersample. Fighting attacks can expose the vulnerability of the machine learning model, thereby improving the robustness and interpretability of the model. Text counterattack refers to the process of making natural language processing models misjudge by modifying the original text to generate countersamples.

Existing studies have shown that deep-learning models are very susceptible to adversarial attacks, such as simple modification of abusive text, which can fool the most advanced abuse detection systems. In view of the fact that the natural language processing model based on the deep learning technology is widely applied to multiple application systems such as spam detection and malicious comment detection at present, research on text counterattack to find the weaknesses of the systems and improve the weaknesses has increasingly practical significance and value.

The existing text counterattack method is mainly in word level, and by determining a candidate set of replacement words of each word in an original text, a countersample capable of successfully attacking a target model is searched in a discrete space formed by the combination of all candidate sets of replacement words. The existing search algorithm is mainly based on a greedy or genetic algorithm, and the algorithm has a larger performance improvement space in the aspects of search speed and attack success rate.

Disclosure of Invention

The embodiment of the invention provides a method and a device for generating a countermeasure sample, electronic equipment and a storage medium, which are used for solving the problems of low speed of a search algorithm and low attack success rate in the prior art.

The embodiment of the invention provides a generation method of a confrontation sample, which comprises the following steps:

acquiring an original text;

determining a candidate set of replacement words of each word in the original text;

and searching a sample of an attack target model from a discrete space formed by the combination of the candidate set of the replacement words based on a particle swarm optimization algorithm to generate a countersample.

Optionally, the determining a candidate set of replacement words for each word in the original text includes:

marking the part of speech of each word in the original text;

obtaining the semantic labels of each semantic item of each word under the same part of speech, and determining the words with the same semantic labels and the same part of speech as candidate replacement words;

and determining the set of candidate replacement words as the candidate set of replacement words.

Optionally, the tagging parts of speech of each word in the original text includes:

determining the original text to be a Chinese text, performing word segmentation operation on the original text, and labeling the part of speech of each word after word segmentation;

determining that the original text is an English text, restoring each word in the original text to an original shape, and labeling the part of speech of each restored word.

Optionally, the searching, based on the particle swarm optimization algorithm, a sample of an attack target model from a discrete space formed by the combination of the candidate sets of replacement words, and generating a countersample includes:

copying the original text k times to obtain initial samples, performing variation operation on each initial sample to generate a new particle swarm, wherein each particle in the particle swarm is a varied sample;

recording a global optimal solution and a historical optimal solution in the particle swarm during each iteration, wherein the global optimal solution is the position of a particle with the highest target tag prediction score given by a target model, and the historical optimal solution is the position of the particle with the highest target tag prediction score in each particle historical iteration;

and stopping searching and outputting the countermeasure sample when the recorded optimal solution is determined to be the countermeasure sample, otherwise updating the particle speed and the particle position, and returning to execute the operation of the particle with the highest target label prediction score given by the target model in the recorded particle swarm and the position with the highest target label prediction score in each particle through iteration after performing the mutation operation until the corresponding countermeasure sample is stopped searching and output when the recorded optimal solution is determined to be the countermeasure sample.

Optionally, the updating the particle velocity and position comprises:

at each iteration, the velocity of the particle is updated as follows:

in the formula (I), the compound is shown in the specification,

the d-dimensional velocity of the nth particle, omega is an inertia factor which decreases with the number of iterations,

the position of the nth particle in dimension d,

the position of the d-th dimension is solved for the historical best solution of the nth particle,

to solve the location of dimension d globally, I (a, b) is defined as:

the particle location update includes: moving to the historical optimal solution of each particle, wherein the moving probability is P _i (ii) a Move to the global optimal solution with a moving probability of P _g (ii) a Wherein, P _i And P _g Updating with the number of iterations:

wherein, 1 > P _max ＞P _min More than 0 is a predefined hyper-parameter, T is the current iteration number, and T is the maximum iteration number.

Optionally, the performing mutation operations comprises:

each particle in the population of particles has a probability P _m Performing mutation operation。

The embodiment of the invention also provides a generation device of the confrontation sample, which comprises the following components:

the acquisition module is used for acquiring an original text;

the determining module is used for determining a candidate set of replacement words of each word in the original text;

and the generation module is used for searching samples of the attack target model from a discrete space formed by the combination of the candidate set of the replacement words based on a particle swarm optimization algorithm and generating countersamples.

Optionally, the determining module includes:

the labeling unit is used for labeling the part of speech of each word in the original text;

the first determining unit is used for acquiring the semantic label of each meaning item of each word under the same part of speech and determining the word with the same semantic label and the same part of speech as a candidate replacement word;

and the second determining unit is used for determining the set formed by the candidate replacement words as the candidate set of replacement words.

An embodiment of the present invention provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of any one of the above-mentioned methods for generating a challenge sample when executing the program.

Embodiments of the present invention further provide a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of any one of the above-mentioned methods for generating a countermeasure sample.

According to the generation method, the generation device, the electronic equipment and the storage medium of the countermeasure sample, provided by the embodiment of the invention, the original text is obtained; determining a candidate set of replacement words of each word in the original text; and searching a sample of the attack target model from a discrete space formed by the combination of the candidate sets of the replacement words based on a particle swarm optimization algorithm to generate a countersample. The embodiment of the invention searches the antagonistic sample by using the particle swarm optimization algorithm, and the particle swarm optimization is more efficient than the genetic algorithm as a metaheuristic group evolution calculation method, so that the search speed can be increased and the attack success rate can be increased when the antagonistic sample is searched by using the algorithm. Aiming at different natural language processing models, the embodiment of the invention can quickly and efficiently generate a large amount of high-quality countermeasure samples, successfully deceive the target model, further expose the vulnerability of the target model and has good practicability.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

FIG. 1 is a flow chart of one embodiment of a method for generating a challenge sample according to an embodiment of the present invention;

fig. 2 is a flowchart of determining a candidate set of alternative words in the method for generating a countermeasure sample according to the embodiment of the present invention;

FIG. 3 is a flowchart of searching for a challenge sample in the method for generating a challenge sample according to an embodiment of the present invention;

FIG. 4 is a block diagram of an apparatus for generating a countermeasure sample according to an embodiment of the present invention;

fig. 5 is a schematic physical structure diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without inventive step based on the embodiments of the present invention, are within the scope of protection of the present invention.

A flowchart of a specific implementation of a method for generating a challenge sample according to an embodiment of the present invention is shown in fig. 1, where the method specifically includes:

step S101: acquiring an original text;

step S102: determining a candidate set of replacement words of each word in the original text;

after the original text is acquired, the type of the original text is determined to be a Chinese text or an English text. If the text is English text, word segmentation operation is not needed; if the text is a Chinese text, word segmentation operation is carried out to obtain each word in the original text. And generating candidate replacement words corresponding to the words respectively aiming at the words in the original text. And determining a set consisting of one or more candidate replacement words as a candidate set of replacement words. In order to further ensure the quality of the replaced text, when the original text is English, the candidate replaced word determination operation can be performed after the word form reduction operation is performed on each word in the original text. And the morphology reduction is an important part in text preprocessing, namely, the morphology reduction is to remove affixes of words and extract a main part of the words. For example, the word "cars" is morphed and reduced to "car" and the word "ate" is morphed and reduced to "eat".

Further, the embodiment of the invention can generate a candidate set containing alternative words with the same or similar semanteme for each word of the original text by means of the knowledge network semantic knowledge base. Specifically, part-of-speech tagging may be performed on an original text, after part-of-speech of each word is obtained, an semantic tag of each semantic item of the same part-of-speech of the word is obtained from a knowledge network, the word having the same part-of-speech as the original tag is regarded as a candidate replacement word, and then all candidate replacement words are combined into a candidate replacement word set.

Step S103: and searching a sample of an attack target model from a discrete space formed by the combination of the candidate set of the replacement words based on a particle swarm optimization algorithm to generate a countersample.

Based on a particle swarm optimization algorithm, a countermeasure sample capable of successfully attacking the target model is rapidly searched from a discrete space formed by the combination of all the candidate sets of the replacement words.

According to the generation method of the countermeasure sample provided by the embodiment of the invention, the original text is obtained; determining a candidate set of replacement words of each word in an original text; and searching a sample of the attack target model from a discrete space formed by the combination of the candidate sets of the replacement words based on a particle swarm optimization algorithm to generate a countersample. The embodiment of the invention searches the antagonistic sample by using the particle swarm optimization algorithm, and the particle swarm optimization is more efficient than the genetic algorithm as a metaheuristic group evolution calculation method, so that the search speed can be increased and the attack success rate can be increased when the antagonistic sample is searched by using the algorithm.

When generating a candidate set of replacement words, a common method is to construct the candidate set of replacement words from synonyms of words in the original text by means of a synonym dictionary. However, a considerable part of words in real text have no synonyms (such as named entity words), and the number of synonyms of a word having a synonym is also very limited. This results in a smaller number of candidate challenge samples being generated ultimately, which in turn affects the success rate of the attack.

According to the method for generating the confrontation sample, provided by the embodiment of the invention, through the help of other knowledge bases, for example, the knowledge network (HowNet) is a language knowledge base, the semantic annotation is carried out on more than 10 ten thousand Chinese and English words by using the predefined good semantic source, namely the smallest semantic unit in the linguistics, the words with the same semantic source annotation can be considered to have the same meaning, and further the words can be used as candidate replacement words. Moreover, the knowledge network labels the sememes for all kinds of words including the actual words, and ensures that most words in the actual text can find the candidate replacement words. Therefore, the present embodiment can improve the number and diversity of candidate replacement words. As shown in fig. 2, which is a flowchart of determining a candidate set of replacement words in the countermeasure sample generation method provided in the embodiment of the present invention, a specific process of determining the candidate set of replacement words in step S102 may include:

step S201: marking the part of speech of each word in the original text;

determining the original text to be a Chinese text, performing word segmentation operation on the original text, and labeling the part of speech of each word after word segmentation; determining that the original text is an English text, restoring each word in the original text to an original shape, and labeling the part of speech of each restored word.

Step S202: obtaining the semantic labels of each semantic item of each word under the same part of speech, and determining the words with the same semantic labels and the same part of speech as candidate replacement words;

step S203: and determining the set formed by the candidate replacement words as the candidate set of replacement words.

According to the embodiment of the invention, a candidate set containing the replaced words with the same or similar semantics is generated for each word of the original text by means of the knowledge network semantic knowledge base, so that the number and diversity of the candidate replaced words can be greatly improved, and the attack success rate of the generated countermeasure sample is further improved.

On the basis of any of the above embodiments, referring to fig. 3, a specific process of a search algorithm in the method for generating a countermeasure sample provided by the embodiment of the present invention includes:

step S301: initializing a particle swarm;

and setting the particle swarm size as k, copying the original text k times to obtain initial samples, and performing variation operation on each initial sample once to generate a new particle swarm. Mutation operation refers to randomly selecting a word in a text and replacing it with a random word in its candidate set of replacement words. Each particle in the particle group is a mutated sample, and can also be regarded as an n-dimensional vector, where n is the number of words in the text. The position of the particle in discrete space represents the combination of the alternative words chosen for each word of the sample. We initialize a velocity v randomly for each dimension of each particle.

Step S302: recording the optimal solution;

and recording the particles (global optimal solution) with the highest target label prediction score given by the target model in the particle swarm and the positions (historical optimal solution) with the highest target label prediction score in each particle historical iteration during each iteration. The target label refers to a label that the model is expected to classify the antagonistic sample, such as positive for the original sample label and negative for the target label in the emotion classification task, because it is expected that the antagonistic sample will make the model classification incorrect.

Step S303: judging whether the operation can be stopped, if not, entering the step S304; if yes, go to step S305;

if the best solution (the particle with the highest target label prediction score) recorded at present can make the model classification wrong, which indicates that a successful confrontation sample has been found, the search is stopped and the sample is output. Otherwise, after the particle speed and the position are updated and mutation operation is performed, the operation of recording the particles with the highest target label prediction score given by the target model in the particle swarm and the position with the highest target label prediction score in each particle in the past iteration is returned to be performed until the searching is stopped and the corresponding confrontation sample is output when the recorded optimal solution is determined to be the confrontation sample.

Step S304: and updating the speed and the position of the particle, carrying out mutation, returning to the step S302, and carrying out a new iteration.

At each iteration, the velocity of the particle is updated as follows:

in the formula, in the following formula,

the d-dimensional velocity of the nth particle, omega is an inertia factor decreasing with the number of iterations,

the position of the nth particle in dimension d,

to solve the location of dimension d globally, I (a, b) is defined as:

after the velocity update is completed, the particles need to undergo a two-step position update. The first step is to move to the historical optimal solution of each particle, and the moving probability is P _i . The second step moves to the global optimal solution with the moving probability of P _g . Wherein P is _i And P _g Update with iteration number:

wherein 1 > P _max ＞P _min More than 0 is a predefined hyper-parameter, T is the current iteration number, and T is the maximum iteration number.

In the examples of the present invention, P _i And P _g Updated with the number of iterations, and P _i And P _g All are constant settings, under which P is set _i Decreases as the number of iterations increases, P _g The particles are searched in respective nearby spaces at the initial stage of searching to search more unknown spaces, and are searched near the currently found optimal solution at the later stage of searching to converge to the optimal solution as soon as possible. Through experimental verification, under the same limitation of the maximum iteration number, the set ratio P is _i And P _g The attack success rate is set to be 10-15% higher when the attack success rates are all constant.

At each step of position update, once the particle determines to move, its probability of moving in each dimension is

After updating the velocity and position, each particle in the population of particles has a probability P _m To carry outAnd (5) performing mutation operation.

Step S305: the search is stopped and the sample is output as a challenge sample.

According to the embodiment of the invention, the alternative word candidate set is generated for the words in the original text through the sememe, and meanwhile, the countermeasure sample capable of successfully attacking the target model is searched in the discrete space formed by combining the alternative word candidate set through the particle swarm optimization algorithm. The method can efficiently generate a large amount of high-quality countermeasure samples aiming at different natural language processing models, successfully deceive a target model, further expose the vulnerability of the target model, and has good practicability.

Fig. 4 shows a block diagram of a structure of a device for generating a countermeasure sample, which specifically includes:

an obtaining module 401, configured to obtain an original text;

a determining module 402, configured to determine a candidate set of replacement words for each word in the original text;

a generating module 403, configured to search, based on a particle swarm optimization algorithm, a sample of an attack target model from a discrete space formed by the combination of the candidate replacement word sets, and generate a countermeasure sample.

Further, the determining module 402 may further include:

the first determining unit is used for acquiring the semantic labels of each semantic item of each word under the same part of speech, and determining the words with the same semantic label and the same part of speech as candidate replacement words;

Further, the labeling unit is specifically configured to: determining the original text as a Chinese text, performing word segmentation operation on the original text, and labeling the part of speech of each word after word segmentation; determining that the original text is an English text, restoring each word in the original text to an original shape, and labeling the part of speech of each restored word.

On the basis of any of the above embodiments, the generating module 403 is specifically configured to: copying the original text k times to obtain initial samples, performing variation operation on each initial sample to generate a new particle swarm, wherein each particle in the particle swarm is a varied sample; during each iteration, recording the particles with the highest target tag prediction score given by the target model in the particle swarm and the position with the highest target tag prediction score in each particle iteration; and stopping searching and outputting the countermeasure sample when the recorded optimal solution is determined as the countermeasure sample, otherwise, updating the particle speed and the particle position, returning and executing the operation of the particle with the highest target label prediction score given by the target model in the recorded particle swarm and the position with the highest target label prediction score in each particle iteration after performing mutation operation, and stopping searching and outputting the corresponding countermeasure sample when the recorded optimal solution is determined as the countermeasure sample.

The apparatus for generating a challenge sample of this embodiment is configured to implement the method for generating a challenge sample, and thus specific embodiments of the apparatus for generating a challenge sample may refer to the foregoing embodiments of the method for generating a challenge sample, for example, the obtaining module 401, the determining module 402, and the generating module 403 are respectively configured to implement steps S101, S102, and S103 in the method for generating a challenge sample, so that the specific embodiments thereof may refer to descriptions of corresponding embodiments of the respective portions, and are not described herein again.

Fig. 5 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 5: a processor (processor) 510, a communication Interface (Communications Interface) 520, a memory (memory) 530 and a communication bus 540, wherein the processor 510, the communication Interface 520 and the memory 530 communicate with each other via the communication bus 540. Processor 510 may call logic instructions in memory 530 to perform the following method: acquiring an original text; determining a candidate set of replacement words of each word in the original text; and searching a sample of an attack target model from a discrete space formed by the combination of the candidate set of the replacement words based on a particle swarm optimization algorithm to generate a countersample.

In addition, the logic instructions in the memory 530 may be implemented in the form of software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In one embodiment, processor 510 may call logic instructions in memory 530 to perform the following method: marking the part of speech of each word in the original text; obtaining the semantic labels of each semantic item of each word under the same part of speech, and determining the words with the same semantic labels and the same part of speech as candidate replacement words; and determining the set formed by the candidate replacement words as the candidate set of replacement words.

In one embodiment, processor 510 may call logic instructions in memory 530 to perform the following method: determining the original text as a Chinese text, performing word segmentation operation on the original text, and labeling the part of speech of each word after word segmentation; determining that the original text is an English text, restoring each word in the original text to an original shape, and labeling the part of speech of each restored word.

In one embodiment, processor 510 may call logic instructions in memory 530 to perform the following method: copying the original text k times to obtain initial samples, performing variation operation on each initial sample to generate a new particle swarm, wherein each particle in the particle swarm is a varied sample; during each iteration, recording the particles with the highest target tag prediction score given by the target model in the particle swarm and the position with the highest target tag prediction score in each particle iteration; and stopping searching and outputting the countermeasure sample when the recorded optimal solution is determined to be the countermeasure sample, otherwise updating the particle speed and the particle position, and returning to execute the operation of the particle with the highest target label prediction score given by the target model in the recorded particle swarm and the position with the highest target label prediction score in each particle through iteration after performing the mutation operation until the corresponding countermeasure sample is stopped searching and output when the recorded optimal solution is determined to be the countermeasure sample.

In another aspect, an embodiment of the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program is implemented to perform the transmission method provided in the foregoing embodiments when executed by a processor, and for example, the method includes: acquiring an original text; determining a candidate set of replacement words of each word in the original text; and searching a sample of an attack target model from a discrete space formed by the combination of the candidate set of the replacement words based on a particle swarm optimization algorithm to generate a countersample.

The electronic device and the non-transitory computer-readable storage medium provided in the embodiments of the present invention both correspond to the above-mentioned generation method of the countermeasure sample, and specific embodiments thereof refer to the corresponding contents of the foregoing parts, which are not described herein again.

In summary, the method, the apparatus, the electronic device, and the storage medium for generating the countermeasure sample according to the embodiments of the present invention obtain the original text; determining a candidate set of replacement words of each word in an original text; and searching a sample of the attack target model from a discrete space formed by the combination of the candidate sets of the replacement words based on a particle swarm optimization algorithm to generate a countersample. The embodiment of the invention searches the antagonistic sample by using the particle swarm optimization algorithm, and the particle swarm optimization is more efficient than the genetic algorithm as a metaheuristic group evolution calculation method, so that the search speed can be increased and the attack success rate can be increased when the antagonistic sample is searched by using the algorithm. Aiming at different natural language processing models, the embodiment of the invention can quickly and efficiently generate a large amount of high-quality countermeasure samples, successfully deceive the target model, further expose the vulnerability of the target model and has good practicability.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, and not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method of generating a challenge sample, comprising:

acquiring an original text;

searching a sample of an attack target model from a discrete space formed by the combination of the candidate sets of the replacement words based on a particle swarm optimization algorithm to generate a countersample;

recording a global optimal solution and a historical optimal solution in the particle swarm during each iteration, wherein the global optimal solution is the position of a particle with the highest target label prediction score given by a target model, and the historical optimal solution is the position of the highest target label prediction score in each particle history iteration;

stopping searching and outputting the countermeasure sample when the recorded optimal solution is determined to be the countermeasure sample, otherwise updating the particle speed and the particle position, and returning to execute the operation of the particles with the highest target label prediction score given by the target model in the recorded particle swarm and the positions with the highest target label prediction score in each particle through iteration after performing the mutation operation until the corresponding countermeasure sample is stopped searching and output when the recorded optimal solution is determined to be the countermeasure sample;

the updating particle velocity and position comprises:

at each iteration, the velocity of the particle is updated as follows:

in the formula (I), the compound is shown in the specification,

the position of the nth particle in dimension d,

to solve the location of dimension d globally, I (a, b) is defined as:

the particle location update includes: moving to the historical optimal solution of each particle, wherein the moving probability is P _i (ii) a Move to the global optimal solution with a probability of P _g (ii) a Wherein, P _i And P _g Update with iteration number:

wherein 1 is>P _max >P _min >0 is a predefined hyper-parameter, T is the current iteration number, and T is the maximum iteration number;

the performing mutation operations comprise:

after updating the velocity and position, each particle in the population of particles has a probability P _m And (5) carrying out mutation operation.

2. The method of generating a confrontational sample as claimed in claim 1, wherein said determining a candidate set of replacement words for each word in said original text comprises:

marking the part of speech of each word in the original text;

obtaining the semantic labels of each meaning item of each word under the same part of speech, and determining the words with the same semantic labels and the same part of speech as candidate replacement words;

3. The method for generating a confrontational sample according to claim 2, wherein said labeling the part of speech of each word in the original text comprises:

determining the original text as a Chinese text, performing word segmentation operation on the original text, and labeling the part of speech of each word after word segmentation;

and determining that the original text is an English text, restoring each word in the original text into an original shape, and labeling the part of speech of each word after restoration.

4. A challenge sample generating apparatus, comprising:

the acquisition module is used for acquiring an original text;

the generation module is used for searching a sample of an attack target model from a discrete space formed by the combination of the replacement word candidate sets based on a particle swarm optimization algorithm and generating a confrontation sample;

the updating particle velocity and position comprises:

at each iteration, the velocity of the particle is updated as follows:

in the formula (I), the compound is shown in the specification,

the position of the nth particle in dimension d,

to solve the location of dimension d globally, I (a, b) is defined as:

the performing mutation operations comprise:

each particle in the population of particles has a probability P _m And (5) carrying out mutation operation.

5. The apparatus of claim 4, wherein the determining module comprises:

6. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the method of generating challenge samples according to any one of claims 1 to 3 when executing the program.

7. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method for generating a challenge sample according to any one of claims 1 to 3.