CN116362344A - Text anti-attack method based on style migration and whale optimization algorithm hybridization - Google Patents

Text anti-attack method based on style migration and whale optimization algorithm hybridization Download PDF

Info

Publication number
CN116362344A
CN116362344A CN202211660732.4A CN202211660732A CN116362344A CN 116362344 A CN116362344 A CN 116362344A CN 202211660732 A CN202211660732 A CN 202211660732A CN 116362344 A CN116362344 A CN 116362344A
Authority
CN
China
Prior art keywords
sample
challenge
attack
optimization algorithm
style migration
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211660732.4A
Other languages
Chinese (zh)
Inventor
康雁
赵健钧
李宾
普康
袁艳聪
王鑫超
张华栋
谢文涛
彭陆含
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yunnan University YNU
Original Assignee
Yunnan University YNU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yunnan University YNU filed Critical Yunnan University YNU
Priority to CN202211660732.4A priority Critical patent/CN116362344A/en
Publication of CN116362344A publication Critical patent/CN116362344A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Medical Informatics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a text challenge-against method based on style migration and whale optimization algorithm hybridization, which belongs to the field of text challenge-against, and comprises the following steps: constructing a three-stage module and an improved style migration module; generating a text challenge sample by a challenge sample generation model based on a sense origin, and respectively inputting the text challenge sample into an improved style migration module and a three-stage module for attack; if the improved style migration attack fails, the attack of the three-stage module is entered. The invention solves the problems that the existing single model has poor attack success rate performance, is difficult to search an effective countermeasure sample, is easy to deviate from semantics and the like; and the fusion model is used for attacking in the field of text challenge attack for the first time, so that the success rate of the attack is improved.

Description

Text anti-attack method based on style migration and whale optimization algorithm hybridization
Technical Field
The invention relates to the field of text challenge resistance, in particular to a text challenge resistance method based on hybridization of style migration and whale optimization algorithm.
Background
Challenge-against attacks is a common threat to deep learning, which reveals the vulnerability of deep learning. Compared with image data, text data has the characteristics of data discretization, complex grammar and semantic abstraction, so that a plurality of difficulties exist in generating a contrast sample. Therefore, the existing attack method for the image is difficult to be directly applied to the text field, and the attack resistance for the text field still has many challenges.
At present, a plurality of effective text countermeasure attack methods exist, but the attacks adopt a single attack means, and are sometimes poor in performance in the face of a plurality of data sets or models, and still have problems, such as poor success rate performance of the attacks on part of models, difficulty in searching effective countermeasure samples under the condition of larger or smaller search space of the attack based on a heuristic algorithm, and easiness in deviating from semantics based on style migration.
The attack performance of a single model reaches the upper limit, and further improvement is difficult to be made, so that a new text anti-attack method needs to be designed, the problem of the existing model attack is solved, and the effective text anti-attack is realized.
Disclosure of Invention
The invention aims at: aiming at the problems, a text challenge-against attack method based on style migration hybridized with whale optimization algorithm is provided, and the problems that the attack success rate of a single model in the existing text challenge-against attack method is poor, effective challenge samples are difficult to search, semantics are easy to deviate and the like are solved; the method comprises the steps of firstly carrying out attack in the text challenge resistance attack field by using a fusion model, carrying out serial fusion on style migration and whale optimization algorithm to construct a three-stage model, and carrying out parallel fusion on the three-stage model and improved style migration to improve the attack success rate.
The technical scheme of the invention is as follows:
the invention discloses a text challenge-resisting method based on style migration and whale optimization algorithm hybridization, which comprises the following steps:
constructing a three-stage module and an improved style migration module: after the original whale optimization algorithm is updated, performing a mutation operation on each countermeasure sample, and processing the countermeasure samples through an improved Metropolis criterion to obtain an improved whale optimization algorithm WOA; fusing an improved whale optimization algorithm WOA and a style migration algorithm to construct a three-stage module; after style migration, performing one-time mutation operation on the generated sample to construct an improved style migration module;
text fights attacks: generating a text challenge sample by a challenge sample generation model based on a sense origin, and respectively inputting the text challenge sample into an improved style migration module and a three-stage module for attack; if the improved style migration attack fails, the attack of the three-stage module is entered.
Further, in the three-stage module, the text countermeasure sample is sequentially processed by an improved whale optimization algorithm WOA and a style migration algorithm, and then the sample is regenerated based on the meaning source and is subjected to iterative search and update in the improved whale optimization algorithm WOA; if a challenge sample which is successful in the attack exists at any time, the challenge sample is directly output and the attack is stopped.
Further, the modified Metropolis criterion formula is as follows:
Figure SMS_1
wherein y is new Is the output of the new challenge sample after input to the target attack model, y is the output of the model before mutation, is the probability of accepting a worsening individual,
Figure SMS_2
the value of T is dynamically adjusted with the number of iterations and is also affected by the length of the sentence being input.
Further, the mutation operation: randomly selecting a word meeting the requirements of generating a replacement word by a sense generator to replace, and generating a new countermeasure sample.
Further, the method for generating the antigen based antigen sample generation model comprises the following steps:
s1: analyzing the part of speech of each word according to the input sentence, and searching synonyms for the real sense words in the word through the sense origin;
s2: after finding out the proper replacement word, the word in the original sample is replaced, and only one word in one position is selected for replacement at a time, so that a plurality of countermeasure samples are obtained.
Further, the improved whale optimization algorithm WOA specifically includes:
initializing: acquisition of N 3 Antagonistic samples s= { S generated by sense origin 1 ,···,S N Each challenge sample is denoted S i I is a number of e {1, how much, N }; simultaneous random initialization of N 3 Position x= { X of whale only 1 ,···,X N Dimensions (V-L)
Figure SMS_3
Random initialization between-to-zero;
recording: calculating the predictive score of the current countermeasure sample, and recording the position of the current optimal individual * And optimal challenge samples *
Updating: firstly, updating each dimension of each whale according to an updating method in an original whale optimization algorithm; secondly, updating all positions again after the original whale optimization algorithm is updated; updating the challenge sample; performing a mutation operation on each challenge sample after the update step is finished, and processing the challenge samples by using a Metropolis criterion;
and (3) terminating: the termination condition is set such that the predicted result of the target attack model is different from the original tag, i.e., the predicted tag has changed.
Further, the formula for updating all the positions again is as follows:
Figure SMS_4
wherein,,
Figure SMS_5
d-th word representing n-th sample, < ->
Figure SMS_6
D-th word representing the optimal sample, +.>
Figure SMS_7
Represents the d dimension of the n-th sentence,>
Figure SMS_8
is defined as the following formula:
Figure SMS_9
ω represents the inertial weight, which is set to decrease with increasing number of iterations.
Further, the formula of the inertia weight ω is as follows:
Figure SMS_10
wherein 0 is<ω min <max<The sum max_iters is the maximum iteration number and the current iteration number, respectively.
Further, a probability-based method is adopted for updating the challenge sample, and the method specifically comprises the following steps: the value of each location is converted to a probability, denoted (x),
Figure SMS_11
determining whether the current sample changes towards the globally optimal sample according to the probability; the final challenge sample update in each dimension is as follows:
Figure SMS_12
where P represents the transition probability of the corresponding vector,
Figure SMS_13
r is a random number between 0 and 1.
Further, in the three-stage module, N countermeasure samples are obtained through a sense origin, searching and updating are carried out through an improved whale optimization algorithm WOA, and a sample with the worst performance is selected to be input into a style migration model after one round of iteration, so that a sample after style migration is obtained; carrying out iterative search and updating on the new challenge sample through an improved whale optimization algorithm WOA until the maximum iteration number is reached or the challenge sample with successful attack exists; if no effective challenge sample is found yet when the maximum number of iterations is reached, the process returns to the first improved whale optimization algorithm WOA step to begin the next iteration until the maximum number of iterations is reached.
In summary, due to the adoption of the technical scheme, the beneficial effects of the invention are as follows:
1. the improved whale optimization algorithm WOA solves the problem that the whale optimization algorithm is easy to fall into local optimization, introduces the whale optimization algorithm into the text challenge attack for the first time, improves the defects of the whale optimization algorithm in the text challenge attack, introduces variation and Metropolis criteria, and improves the searching capability and the attack effect of the whale optimization algorithm.
2. The invention provides a three-stage model, integrates an improved whale optimization algorithm and a style migration algorithm, expands the search space of a countermeasure sample, and further avoids sinking into local optimum.
3. The invention uses the fusion model to attack in the field of text attack resistance for the first time, provides a new attack thought for the text attack resistance, and is superior to single model attack.
4. The invention provides a new hybridization method, which carries out parallel hybridization on a three-stage model and a style migration algorithm, thereby improving the attack effect.
5. The invention evaluates the effectiveness of the model on five data sets, and experimental results show that the model not only effectively improves the attack success rate, but also obtains good results in the tests of grammar error increasing rate, semantic consistency, mobility and the like.
Drawings
The invention will now be described by way of example and with reference to the accompanying drawings in which:
FIG. 1 is a schematic diagram of a hybridization model structure for text style migration and WOA improvement in an embodiment.
FIG. 2 is a schematic diagram of the structure of a model of the generation of a challenge sample based on Yu Yi antigen in the examples.
FIG. 3 is a schematic diagram of a grid migration model in an embodiment.
Fig. 4 is a schematic diagram of a model structure of the improved WOA in the embodiment.
Fig. 5 is a schematic diagram of semantic similarity of sentences before and after attack in the embodiment.
FIG. 6 is a schematic diagram of a migration attack on an SST-2 data set in an embodiment.
FIG. 7 is a graph of population versus maximum number of iterations versus attack success rate in an example.
Detailed Description
All of the features disclosed in this specification, or all of the steps in a method or process disclosed, may be combined in any combination, except for mutually exclusive features and/or steps.
Any feature disclosed in this specification (including any accompanying claims, abstract) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise. That is, each feature is one example only of a generic series of equivalent or similar features, unless expressly stated otherwise.
The features and capabilities of the present invention are described in further detail below in connection with examples.
The invention discloses a text challenge-resisting method based on style migration and whale optimization algorithm hybridization, which comprises the following steps:
constructing a three-stage module and an improved style migration module: after the original whale optimization algorithm is updated, performing a mutation operation on each countermeasure sample, and processing the countermeasure samples through an improved Metropolis criterion to obtain an improved whale optimization algorithm WOA; fusing an improved whale optimization algorithm WOA and a style migration algorithm to construct a three-stage module; after style migration, performing one-time mutation operation on the generated sample to construct an improved style migration module;
text fights attacks: generating a text challenge sample by a challenge sample generation model based on a sense origin, and respectively inputting the text challenge sample into an improved style migration module and a three-stage module for attack; if the improved style migration attack fails, the attack of the three-stage module is entered.
As shown in FIG. 1, the present invention discloses a style migration and WOA-improvement based hybridization model, called STRAP-WOA. The method mainly comprises three parts, namely, an antagonistic sample generation based on sense origin, an improved WOA algorithm and style migration. The attack flow is divided into two main parts, namely an improved style migration attack and a three-stage attack. First, a lot of challenge samples are generated by a challenge sample generation module based on a sense source, and then the challenge samples are respectively input into a left part and a right part. The left half part is the improved style migration attack, and if the attack fails, the attack enters a three-stage model. In the three-stage attack of the right half part, the sample is sequentially processed by a WOA algorithm and a style migration module, and then the sample is regenerated through a semantic original and is subjected to iterative search and update in the WOA algorithm. In the above-described flow, if there is a challenge sample for which the attack is successful at any timing, the challenge sample is directly output and the algorithm is stopped.
As shown in fig. 2, a model for generating an challenge sample based on a sense origin is disclosed, and a specific method for generating a challenge sample is as follows: first, analyzing the part of speech of each word according to the input sentence, and searching synonyms for the real words in the word through the meaning source, wherein the replacement words of love are like and enjoy, the replacement words of movie are picture, film, cinema, and I and this have no replacement words. After finding the proper replacement words, the words in the original samples are replaced, so that a plurality of countermeasure samples are obtained. In a practical attack, only one word is selected for replacement at a time in order to reduce the possibility of grammar errors, and also in order to keep the challenge sample as close as possible to the original sample semantics.
As shown in fig. 3, a style migration model is disclosed. In the invention, the style migration method is selected as part of sentence level attack, and any style migration model can be used. In the embodiment, a style migration model named STRAP (Style Transfer via Paraphrasing) is adopted, STRAP is an unsupervised text style migration model generated based on repetition, text style migration can be effectively carried out, and the model has higher style control precision and semantic retention capacity and is superior to a plurality of most advanced models in performance; and do not change the attributes in the text that may be relevant to the task, such as emotion, which is necessary to attack certain tasks, such as emotion classification tasks.
The use of STRAP requires only three steps: (1) Generating normalized rendition of sentences of different styles by using a rendition model based on GPT-2 and reverse translation text training, thereby creating a set of pseudo-parallel data; (2) Training a reverse transcription model (also based on GPT-2) of a plurality of specific styles, and learning to convert the normalized transcription of the styles back to the original styles; (3) Text style migration is performed using a reverse transcription model of the target style.
STRAP can generate a variety of styles, such as Shakeapere style (Shakepepares), english microblog style (English Tves), bible style (Bible), romantic Poetry style (Romantic Pory), and Lyrics style (Lyrics), among others. The Shakia style is preferred as the migration target in this example. For the generated countermeasure sample, calculate its and input styleThe semantic similarity of the sample before migration is accepted only if the similarity is greater than h. Otherwise try to regenerate samples until the desired number of challenge samples N is reached 1 Or to a maximum number of attempts N 2 . The similarity between sentences is calculated by Sentence-BERT.
After style migration, judging whether the generated style migration sample has a sample with successful attack or not, and if not, performing one-time mutation operation on the generated sample. The mutation is to randomly select a word meeting the requirements of generating replacement words by meaning and generate a new countermeasure sample, and the new countermeasure sample is processed by Metropolis criterion. If the challenge sample obtained after mutation is better, the mutated challenge sample is retained, otherwise a worsening individual is accepted with probability. To adapt the algorithm, the modified Metropolis criterion formula is as follows:
Figure SMS_14
wherein,, new the new challenge sample is input to the output of the target attack model, y is the output of the model before mutation, and the value of T is dynamically adjusted along with the iteration number and is also influenced by the length of the input sentence.
Figure SMS_15
As shown in fig. 4, a model structure of improved WOA is disclosed, improving the original whale optimization algorithm. The original whale optimization algorithm (Whale Optimization Algorithm, WOA) is a population intelligent algorithm. The algorithm simulates the bubble net hunting behavior of whales, the position of each whale representing a viable solution. The specific steps of the original whale optimization algorithm are as follows:
(1) Initializing. At the beginning of the algorithm, the positions of N whales are randomly initialized to be randomly distributed in the search space. The position vector of each whale is n ,n∈{1,···,N}。
(2) Recording. Each position in the search space corresponds to a score, the position with the highest score or the lowest score is recorded as the global optimal position according to different tasks, and whales at the optimal position are recorded as *
(3) And (5) terminating. If the score of the current global optimum position reaches the expected score, the algorithm terminates and outputs the global optimum position as a search result.
(4) Updating. Otherwise, the respective positions among whales are updated according to the rules, and the updating process can be summarized as the following three steps:
(a) Surrounding the prey. In nature, the whale is usually the dominant thing in the crowd. During predation, whales will be grouped to enclose the prey.
(b) Bubble net food catching mode. According to hunting behavior of whales, it is a game to a game in a spiral motion. The whale also contracts the enclosure as it advances in a spiral toward the prey. Given the asynchrony of behaviour of each whale, the probability of choosing a shrink wrap mechanism is assumed to be the same as the probability of choosing a spiral model to update the position of the whale, both 50%.
(c) Random swimming predation. While the population predates, the whales will swim randomly depending on each other's position. Group search is a process of approaching an optimal value, representing local search; the random walk approach is a random individual, thus representing a global search of whales.
After the update is completed, the algorithm returns to the recording step.
In this embodiment, the improved WOA algorithm includes: for the initialization part, N is first acquired 3 Challenge samples generated by sense origin = { S 1 ,···,S N Each challenge sample is noted as i I is a number of e {1, and N. Simultaneous random initialization of N 3 Position of whale = { X 1 ,···,X N Dimensions (V-L)
Figure SMS_16
Random initialization between-to-ones.
For the recording part, calculate the current pairThe prediction score of the anti-sample records the position of the current optimal individual * And optimal challenge samples *
For the update section, an update is first performed for each dimension of each whale according to the update method in the original WOA. Considering that the search space is a discrete space, all locations will be updated again after the WOA performs the update, the formula:
Figure SMS_17
wherein the method comprises the steps of
Figure SMS_18
D-th word representing n-th sample, < ->
Figure SMS_19
The d-th word representing the optimal sample. />
Figure SMS_20
Is defined as the following formula:
Figure SMS_21
ω represents inertial weights which are set to decrease with increasing number of iterations in order for whales to explore more locations early and to quickly gather around the optimal location in the final stage. The formula for ω is as follows:
Figure SMS_22
wherein 0 is<ω min < max <The sum max_iters is the maximum iteration number and the current iteration number, respectively.
The update of the location also corresponds to a discrete search space. The challenge sample is preferably updated using a probability-based approach. The value of each location is converted to a probability, denoted σ (x), using a Sigmoid function.
Figure SMS_23
The transition probabilities of the corresponding vectors are denoted by P,
Figure SMS_24
wherein->
Figure SMS_25
And then deciding whether the current sample is changed towards the globally optimal sample according to the probability. The challenge sample update in each dimension is eventually performed as follows, where r is a random number between 0 and 1:
Figure SMS_26
to further enhance the search in unexplored space, a mutation operation is performed once for each challenge sample after the update step is completed and processed by the Metropolis criterion.
For the termination step, the termination condition is set such that the predicted result of the target attack model is different from the original label, i.e., the predicted label has changed.
In the three-stage module, N countermeasure samples are firstly obtained through a source, the N countermeasure samples enter an improved WOA module for searching and updating, and the samples with the worst performance are selected to be input into a style migration module after one round of iteration, so that the samples after style migration are obtained. At this time, the best individual in the sample is selected and input to the challenge sample generating module to obtain a new batch of challenge samples. New challenge samples are input into the modified WOA for iterative searching and updating until a maximum number of iterations is reached or there are challenge samples for which the attack was successful. If a valid challenge sample is not yet searched when the maximum number of iterations is reached, the return to the first WOA step begins the next iteration until the maximum number of iterations is reached.
The reason for choosing the worst rather than the best sample is because the WOA search strategy has the problem of sinking to local optima, which cannot be completely avoided even with the improvement. Each selection of the best sample will likely result in the WOA being the same for each output when it falls into a local optimum, while the selection of the worst sample will keep the diversity of the challenge samples generated as much as possible, allowing the algorithm to explore as much search space as possible.
Experiment and performance evaluation
1.1. Data set and target attack model
The following reference data sets were selected for testing. For emotion analysis, both SST-2 and IMDB data sets were selected. They are both binary emotion classification data sets. For the natural language reasoning task, a Steady natural language reasoning (Stanford Natural Language Inference, SNLI) dataset is selected. Each sample in SNLI consists of one premise-hypothesis sentence pair and is labeled as one of three relationships, containment, contradiction, and neutrality. The Hate Specch dataset is a binary classification dataset for the detection of an Hate language. The AG's news dataset, which is a four-category dataset for a news-type category. Details of the data set are shown in Table 1-1.
TABLE 1-1 dataset details
Figure SMS_27
For the target attack model, several widely used generic sentence coding models were chosen, bi-directional LSTM (BiLSTM), BERT, ALBERT and DistillBERT, respectively. Wherein the hidden layer dimension of the BiLSTM is 128 dimensions and a 300-dimensional pre-trained GloVe word embedding is used. The three models, BERT-base-uncased, ALBERT-base-v2, distilBERT-base-uncased, are downloaded via a pre-trained model library provided by the Transformers. BiLSTM uses the training model provided by Zang et al. BERT model we tested using two versions, one being the training model provided by Zang et al, noted BERT base Another BERT (BERT-base-uncapped) is from the transgers, denoted BERT. The Accuracy (ACC) of each model on the relevant dataset is shown in tables 1-2.
1.2. Baseline model
To evaluate the effectiveness of the algorithms, several open-source and representative algorithms were chosen for comparison, including word-level and sentence-level black box text challenge algorithms. The word-level text attack method selects PSO (particle swarm optimization) for attack based on a semantic source and a particle swarm search algorithm and BESA (binary search algorithm) for attack based on BERT and a simulated annealing algorithm, and the sentence-level text attack method selects StyleAdv for attack based on style migration.
1.3. Evaluation index
The main evaluation indexes include attack success rate, quality of the challenge sample, semantic consistency between the challenge sample and the original sample, and query times.
1.4. Experimental parameters
For STRAP-WOA algorithm, V is set to 5, ω max And min set to 0.8 and 0.2. B when the input sentence length is less than 40 1 =0.5, otherwise 1 =0.1。b 2 The adjustment range is between 0.1 and 0.5, which is fine-tuned according to different tasks. The semantic similarity threshold h is set to 0.9, the iteration times max_iters_1 and max_iters_2 are set to 20 and 10, and the sample number N of style migration is set 1 And maximum number of attempts N 2 Set to 5 and 10, initialized population number N 3 Set to 60. The baseline model then uses default values in the open source code provided by the paper with no further tuning. In order to speed up the evaluation efficiency, 1000 samples from the test set that can be classified correctly were randomly selected for attack.
1.5. Attack effect and analysis
1.5.1. Attack success rate
The section evaluates the success rate of the attack resistance of each model, and displays the results according to the classification of a word level attack method and a sentence level attack method, wherein the results are shown in tables 1-2 (a) and (b), and the thickened content representation has the best effect. As can be seen from experimental results, STRAP-WOA showed the best results in all experimental data except that the attack on SST-2+BiLSTM performed slightly worse than BESA. The attack success rate of a part of the data set can even reach 100%, which fully explains the effectiveness of the STRAP-WOA model. The vulnerability of DNN has also been demonstrated to exist objectively.
Tables 1-2 (a) attack success rate of different models in word-level attacks
Figure SMS_28
Tables 1-2 (b) attack success rate of different models in sentence level attack
Figure SMS_29
Figure SMS_30
1.5.2. Challenge sample mass
To further test the quality of the challenge sample generated, the model case was analyzed in more detail, and the model generated challenge sample was automatically evaluated in this section, with the results shown in tables 1-3.
From the table, it can be seen that the STRAP-WOA model proposed by us performs well under most data sets, and the grammar error increasing rate reaches the best except the combination of the IMDB data set and the SST-2+DistillBERT, and the partial result can reach a negative value, namely, the grammar error in partial original text is repaired. The difference in the syntax error increase rate between the partial models is also large, which is also affected by the number of syntax errors in the original samples. When the number of syntax errors in the original samples is small, even small error variations can result in a large syntax error increase rate.
Other models perform best for the PPL portion. The performance of the STRAP-WOA model as a whole is still acceptable and the gap from the optimal PPL is not very large. A smaller PPL indicates that the countermeasure sample generated by each model is a common word sequence combination, and the larger the model is, the lower the probability of the combination mode of the words in the sentence is, but the problem of the sentence cannot be proved to exist, so that the PPL can be considered as a sentence to be smooth within a certain range.
Tables 1-3 (a) results of challenge sample quality assessment in word-level attacks
Figure SMS_31
Tables 1-3 (b) results of challenge sample quality assessment in sentence level attacks
Figure SMS_32
1.5.3. Semantic consistency
The section tests the semantic similarity between the countermeasure sample and the original sample, and can simulate a person to a certain extent to judge whether the two text sections have the same meaning. Fig. 5 shows the similarity between two sentences before and after the attack tested on all datasets, the results being shown in box-line plots.
As can be seen from FIG. 5, the correlation sample generated by the STRAP-WOA algorithm has higher similarity with the original sample, the average value reaches 81%, the box is shorter, namely the similarity fluctuation range is smaller, and the similarity of the optimal sample can reach more than 0.99. The best performing algorithm for the analysis-by-synthesis is the PSO method, which has the highest average and second shortest box length, and higher upper and lower quartile bit lines. But our method is not significantly different from it. While the StyleAdv algorithm performs worst, mainly in a wider range of data fluctuations and with a relatively low average.
1.5.4. Number of queries
This section tests the average number of queries against the target attack model under conditions of successful attack, and the results are shown in tables 1-4. First, as a result of word-level attack, it can be seen from tables 1-4 (a) that, although our queries are not the least, the BERT is divided base In addition to the combination of +IMDB, our average query times of STRAP-WOA is not far from the best results, and our model performs better in the test of attack success rate, we consider that a small improvement in query times is acceptable in exchange for a higher attack success rate. For sentence level attacks, styleAdv performs very well in this test, since the method of attack of StyleAdv is to limit the maximum number of queries to 50. However, simply limiting the number of queries to an extremely low value is not a reasonable solution, which limits the performance of the model in terms of the success rate of the attack, which is also a reason why StyleAdv performs poorly in terms of the success rate of the attack.
Tables 1-4 (a) average number of queries for different models in word-level attacks
Figure SMS_33
Tables 1-4 (b) average number of queries for different models in sentence level attack
Figure SMS_34
1.6. Migration properties
The mobility of the challenge sample reflects whether an attack model can attack a deep neural network without accessing it, i.e. whether the challenge sample generated for misleading the classifier can fool an unknown classifier' as well. Mobility is also an evaluation index widely used in counterattack. This section evaluates the mobility of individual attack methods on SST-2 data sets, in particular, migration attacks on three unknown models (BiLSTM, ALBERT, distilBERT) using challenge samples generated for attack BERT. The classification accuracy of the three models on the original and challenge samples is shown in fig. 6. Lower accuracy means higher mobility, i.e. lower data is better.
As can be seen from fig. 6, the migration is best StyleAdv, followed by BESA and STRAP-WOA, and finally PSO. However, good migration and similarity are closely related, and as can be seen from the previous section, the challenge samples generated by StyleAdv perform poorly, which means that some of the samples deviate more from the original sample semantics, and possibly the true values expressed by sentences have changed, and for this part of the samples, the samples are classified as "wrong" in any model test. Therefore, the premise of evaluating migration should be to have good semantic consistency. In results other than Styleadv, STRAP-WOA is not significantly different from BESA, e.g. 5% lower accuracy on ALBERT and only 1% lower on BiLSTM.
1.7. Parameter tuning process
This section mainly shows the adjustment process of the related parameters of WOA algorithm, mainly the maximum iteration number __1 and population number N 3 The impact of different parameters on the success rate of model attack is recorded. The test conditions are at BERT base Under the model, 500 SST-2 samples which can be correctly classified are randomly selected to calculate attack success rate, and the experimental result is shown in fig. 7.
As can be seen from fig. 7, as the maximum iteration number and population number increase, the attack success rate of the model is gradually improved, and finally, __ 1=20 and n are selected 3 =60 as final parameter.
1.8. Ablation experiments
This section analyzes the role of the individual modules in our proposed hybridization model to ensure that each improvement is effective. The improvement of the model is divided into three phases: firstly, testing the attack success rate by taking the original WOA as a basis; secondly, aiming at the deficiency of WOA, some improvements are made, an improved WOA algorithm is provided, and the WOA algorithm is recorded as IWA in this section; and finally, fusing the IWAA with the style migration model to obtain a final hybridization model, namely STRAP-WOA. The attack success rate of the three-stage model is shown in tables 1-5.
It can be seen from the table that each improvement effectively improves the attack success rate of the model, which indicates that each improvement is effective. Analyzing the reasons, the original WOA algorithm has some problems of easy local optimization, and the hybridization degree between populations is not very high, so that the text does not perform well in the attack. Improvements to WOA have been made primarily from several perspectives, including adding additional variability to the algorithm, increasing inertial weights, accepting some corrupted samples under the metapolis criterion, and so forth. It is then believed that simply relying on simple word-level variation has not achieved any further improvement, and therefore the approach of introducing style migration has been chosen. Text style is a feature independent of sentences and grammar, where general tasks are of little interest, but can affect the model. Therefore, the style migration is used as further variation on sentences, so that the search space of the algorithm can be effectively enlarged. Experiments also demonstrate the effectiveness of the model after fusion style migration.
Tables 1-5 attack success rate of different stage models
Figure SMS_35
The invention provides a hybridization model STRAP-WOA based on style migration and improved WOA, which effectively fuses an improved WOA algorithm and an improved style migration method, and plays respective advantages. The method comprises the steps of providing a fusion mechanism of two style migration and whale optimization algorithms, firstly, carrying out serial fusion on the style migration and the whale optimization algorithms by a three-stage model, and then carrying out parallel fusion on the three-stage model and the improved style migration. The method is also used for attacking by using a fusion model for the first time in the field of text attack resistance, and the fusion idea provides a new attack idea for the text attack resistance. The effectiveness of the model was evaluated by comparison with a representative baseline model over a common dataset. Experimental results show that STRAP-WOA obtains the highest attack success rate in most tests, and meanwhile, the grammar error increase rate is low, the semantic similarity is high, and good performance is also shown in migration attack.
The foregoing examples merely represent specific embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the present application. It should be noted that, for those skilled in the art, several variations and modifications can be made without departing from the technical solution of the present application, which fall within the protection scope of the present application.

Claims (10)

1. A text challenge-against method based on style migration hybridized with whale optimization algorithm, characterized by comprising the following steps:
constructing a three-stage module and an improved style migration module: after the original whale optimization algorithm is updated, performing a mutation operation on each countermeasure sample, and processing the countermeasure samples through an improved Metropolis criterion to obtain an improved whale optimization algorithm WOA; fusing an improved whale optimization algorithm WOA and a style migration algorithm to construct a three-stage module; after style migration, performing one-time mutation operation on the generated sample to construct an improved style migration module;
text fights attacks: generating a text challenge sample by a challenge sample generation model based on a sense origin, and respectively inputting the text challenge sample into an improved style migration module and a three-stage module for attack; if the improved style migration attack fails, the attack of the three-stage module is entered.
2. The text challenge method based on style migration and whale optimization algorithm hybridization according to claim 1, wherein in the three-stage module, the text challenge sample is sequentially processed by the improved whale optimization algorithm WOA and the style migration algorithm, and then the sample is regenerated based on the meaning and iteratively searched and updated in the improved whale optimization algorithm WOA; if a challenge sample which is successful in the attack exists at any time, the challenge sample is directly output and the attack is stopped.
3. The text challenge method based on style migration hybridized with whale optimization algorithm of claim 1, wherein the modified metapolis criterion formula is as follows:
Figure FDA0004013767560000011
wherein,, new is the output of the new challenge sample after being input into the target attack model, y is the output of the model before mutation, is the acceptanceOne of the probabilities of an individual deteriorating is that,
Figure FDA0004013767560000012
the value of T is dynamically adjusted with the number of iterations and is also affected by the length of the sentence being input.
4. The text challenge method based on style migration hybridized with whale optimization algorithm according to claim 1, wherein the mutation operation: randomly selecting a word meeting the requirements of generating a replacement word by a sense generator to replace, and generating a new countermeasure sample.
5. The text challenge method based on style migration hybridized with whale optimization algorithm of claim 1, wherein the challenge sample generation method of the sense-based challenge sample generation model comprises:
s1: analyzing the part of speech of each word according to the input sentence, and searching synonyms for the real sense words in the word through the sense origin;
s2: after finding out the proper replacement word, the word in the original sample is replaced, and only one word in one position is selected for replacement at a time, so that a plurality of countermeasure samples are obtained.
6. The text challenge method based on style migration hybridized with whale optimization algorithm according to claim 1, wherein the improved whale optimization algorithm WOA specifically comprises:
initializing: acquisition of N 3 Challenge samples generated by sense origin = { S 1 ,···,S N Each challenge sample is denoted S i I is a number of e {1, how much, N }; simultaneous random initialization of N 3 Position of whale = { X 1 ,···,X N Dimensions (V-L)
Figure FDA0004013767560000021
Random initialization between-to-zero;
recording: calculating the predictive score of the current countermeasure sample, and recording the current optimal individualIs the position of (2) * And optimal challenge samples * The method comprises the steps of carrying out a first treatment on the surface of the Updating: firstly, updating each dimension of each whale according to an updating method in an original whale optimization algorithm; secondly, updating all positions again after the original whale optimization algorithm is updated; updating the challenge sample; performing a mutation operation on each challenge sample after the update step is finished, and processing the challenge samples by using a Metropolis criterion;
and (3) terminating: the termination condition is set such that the predicted result of the target attack model is different from the original tag, i.e., the predicted tag has changed.
7. The text challenge method based on style migration hybridized with whale optimization algorithm of claim 6, wherein the formula for updating all locations again is:
Figure FDA0004013767560000022
wherein,,
Figure FDA0004013767560000023
d-th word representing n-th sample, < ->
Figure FDA0004013767560000024
D-th word representing the optimal sample, +.>
Figure FDA0004013767560000025
Represents the d dimension of the n-th sentence,>
Figure FDA0004013767560000026
is defined as the following formula:
Figure FDA0004013767560000027
ω represents the inertial weight, which is set to decrease with increasing number of iterations.
8. The text challenge method based on style migration hybridized with whale optimization algorithm of claim 7, wherein the formula of the inertia weight ω is as follows:
Figure FDA0004013767560000028
wherein 0 is<ω minmax <The sum max_iters is the maximum iteration number and the current iteration number, respectively.
9. The text challenge method based on style migration hybridized with whale optimization algorithm of claim 6, wherein updating challenge samples employs a probability-based method, comprising: the value of each location is converted to a probability, denoted (x),
Figure FDA0004013767560000029
determining whether the current sample changes towards the globally optimal sample according to the probability; the final challenge sample update in each dimension is as follows:
Figure FDA00040137675600000210
where P represents the transition probability of the corresponding vector,
Figure FDA00040137675600000211
r is a random number between 0 and 1.
10. The text challenge method based on hybridization of style migration and whale optimization algorithm according to claim 1 or 2, wherein in the three-stage module, N challenge samples are obtained through the original meaning, searching and updating are performed through the improved whale optimization algorithm WOA, and after one iteration, the sample with the worst performance is selected and input into the style migration model, so as to obtain the sample after style migration; carrying out iterative search and updating on the new challenge sample through an improved whale optimization algorithm WOA until the maximum iteration number is reached or the challenge sample with successful attack exists; if no effective challenge sample is found yet when the maximum number of iterations is reached, the process returns to the first improved whale optimization algorithm WOA step to begin the next iteration until the maximum number of iterations is reached.
CN202211660732.4A 2022-12-23 2022-12-23 Text anti-attack method based on style migration and whale optimization algorithm hybridization Pending CN116362344A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211660732.4A CN116362344A (en) 2022-12-23 2022-12-23 Text anti-attack method based on style migration and whale optimization algorithm hybridization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211660732.4A CN116362344A (en) 2022-12-23 2022-12-23 Text anti-attack method based on style migration and whale optimization algorithm hybridization

Publications (1)

Publication Number Publication Date
CN116362344A true CN116362344A (en) 2023-06-30

Family

ID=86938430

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211660732.4A Pending CN116362344A (en) 2022-12-23 2022-12-23 Text anti-attack method based on style migration and whale optimization algorithm hybridization

Country Status (1)

Country Link
CN (1) CN116362344A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117808095A (en) * 2024-02-26 2024-04-02 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Method and device for generating attack-resistant sample and electronic equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117808095A (en) * 2024-02-26 2024-04-02 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Method and device for generating attack-resistant sample and electronic equipment
CN117808095B (en) * 2024-02-26 2024-05-28 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Method and device for generating attack-resistant sample and electronic equipment

Similar Documents

Publication Publication Date Title
CN110209817B (en) Training method and device for text processing model and text processing method
Fu et al. Aligning where to see and what to tell: Image captioning with region-based attention and scene-specific contexts
Weston et al. Memory networks
CN109614471B (en) Open type problem automatic generation method based on generation type countermeasure network
US20170200077A1 (en) End-to-end memory networks
CN109062939A (en) A kind of intelligence towards Chinese international education leads method
CN108132927B (en) Keyword extraction method for combining graph structure and node association
CN112395393B (en) Remote supervision relation extraction method based on multitask and multiple examples
CN114969278A (en) Knowledge enhancement graph neural network-based text question-answering model
CN112232087A (en) Transformer-based specific aspect emotion analysis method of multi-granularity attention model
CN115794999A (en) Patent document query method based on diffusion model and computer equipment
Haihong et al. Theme and sentiment analysis model of public opinion dissemination based on generative adversarial network
Scialom et al. To beam or not to beam: That is a question of cooperation for language gans
AU2022221471A1 (en) Automatic photo editing via linguistic request
CN116362344A (en) Text anti-attack method based on style migration and whale optimization algorithm hybridization
CN113282721A (en) Visual question-answering method based on network structure search
CN113722439B (en) Cross-domain emotion classification method and system based on antagonism class alignment network
CN112463982B (en) Relationship extraction method based on explicit and implicit entity constraint
CN114780879A (en) Interpretable link prediction method for knowledge hypergraph
CN116757195B (en) Implicit emotion recognition method based on prompt learning
CN111079840B (en) Complete image semantic annotation method based on convolutional neural network and concept lattice
CN112100342A (en) Knowledge graph question-answering method based on knowledge representation learning technology
WO2023160346A1 (en) Meaning and sense preserving textual encoding and embedding
CN113869034B (en) Aspect emotion classification method based on reinforced dependency graph
CN115630136A (en) Semantic retrieval and question-answer processing method and device for long text and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination