CN115309897A - Chinese multi-modal confrontation sample defense method based on confrontation training and contrast learning - Google Patents

Chinese multi-modal confrontation sample defense method based on confrontation training and contrast learning Download PDF

Info

Publication number
CN115309897A
CN115309897A CN202210891903.8A CN202210891903A CN115309897A CN 115309897 A CN115309897 A CN 115309897A CN 202210891903 A CN202210891903 A CN 202210891903A CN 115309897 A CN115309897 A CN 115309897A
Authority
CN
China
Prior art keywords
sample
confrontation
training
defense
samples
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210891903.8A
Other languages
Chinese (zh)
Inventor
张连新
郑海
王靖午
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fangying Jintai Technology Beijing Co ltd
Original Assignee
Fangying Jintai Technology Beijing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fangying Jintai Technology Beijing Co ltd filed Critical Fangying Jintai Technology Beijing Co ltd
Priority to CN202210891903.8A priority Critical patent/CN115309897A/en
Publication of CN115309897A publication Critical patent/CN115309897A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/232Orthographic correction, e.g. spell checking or vowelisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The Chinese multi-modal confrontation sample defense method based on confrontation training and contrast learning comprises the following steps: preprocessing a text data set; sequentially inputting samples of each batch in the training set into an original defense model to perform forward calculation to obtain a sample vector of each batch of samples; calculating the gradient of the current batch of samples under the loss function of the original defense model and generating confrontation samples by confrontation disturbance; inputting the confrontation sample into an original defense model to obtain a feature vector of the confrontation sample of the current batch; transforming a loss function, and training an original defense model through the minimization of the loss function; and training to obtain an improved defense model of the current batch of samples, and selecting the improved defense model which best appears on the verification set as a final defense model. Aiming at the problems that the Chinese spelling detection performance is limited, the parallelization operation cannot be realized, and the operation speed is low, the method improves the multi-mode ChineseBERT model fusing the Chinese character pronunciation, the character form and the character meaning information, so that the method can better defend against the Chinese anti-sample attack in the real environment.

Description

Chinese multi-modal confrontation sample defense method based on confrontation training and contrast learning
The technical field is as follows:
the invention relates to the technical field of information security, in particular to a Chinese multi-modal confrontation sample defense method based on confrontation training and contrast learning.
The background art comprises the following steps:
the natural language processing technology is known as 'the bright pearl on the artificial intelligence imperial crown', and the natural language processing technology mainly researches various theories and methods of human language between people and computers and computer-aided interaction between people and people, helps individuals, enterprises and governments automatically understand mass text contents, and helps people of different languages and countries to understand each other. To date, with the continuous development of computer technology and artificial intelligence, natural language processing technology has been applied in many practical fields, such as machine translation, social media monitoring, speech processing, and the like.
With the continuous development of deep learning, the natural language processing technology based on the deep neural network surpasses the traditional machine learning method based on statistics on tasks such as text classification, machine translation, dialogue system and the like. At present, two deep learning models are mainly used for solving the problem of natural language processing, and the first method is to combine the deep learning models such as CNN, LSTM and the like with Word vector technologies such as Word2vec, glove and the like to better mine the characteristics such as local and overall time sequence and the like in a text; the other method is a pre-training language model represented by BERT, the BERT model uses a transformer as a basic framework, uses massive text data to carry out unsupervised training, has huge parameters and better text understanding capability, surpasses the existing method in multiple natural language processing tasks, and becomes a new milestone.
However, researchers find that natural language processing technology based on deep learning faces threats against samples like other deep learning algorithms due to the inherent characteristics of local linearity and high dimensionality of data of deep neural networks. The confrontation sample is a section of input artificially and elaborately modified on the original sample, and the input can successfully attack the deep neural network model with high probability to generate error output, but the original judgment is still kept for people. The method brings many problems to the application of natural language processing in a real environment, for example, some apps use an emotion analysis technology, provide recommendation service for users according to historical comments of the users, and give recommendation scores, but attackers can spread false information maliciously through resistant texts to obtain profits, and profit loss is caused to consumers; meanwhile, in order to bypass information detection systems such as violence, pornography and abuse on the network, attackers generate countermeasures to samples and pollute the network environment. These attacks can cause unpredictable economic losses to merchants and enterprises, and also seriously affect the mental health of netizens, especially teenagers. Therefore, the system understands the countersample and defends the countersample attack so as to construct a robust model is one of the hot research problems of the academia at present. However, the defense research of the existing countermeasure sample mainly focuses on the English field, and because the difference of Chinese and English languages is large, chinese has many characters, large search space and contains many characters with shape and pronunciation, characters with image and the like, the existing defense method of the English countermeasure sample cannot be directly applied to the defense of the Chinese countermeasure sample.
In view of the above problems, patent publication No. CN114169443A discloses a word-level text confrontation sample detection method, which models the confrontation sample detection problem as a binary classification problem, and divides the problem into two steps to detect confrontation samples, firstly, the confrontation samples of corresponding normal samples are generated by using a confrontation sample attack algorithm, feature vectors for characterizing the normal and confrontation samples are respectively extracted, and then, the corresponding deep learning model is used to construct a confrontation sample detection binary classification model. Patent publication No. CN110457701A discloses an interpretability-based countermeasure training method for improving detection effect of a model on malicious texts, which converts all texts into readable texts by processing input texts with a neutralization filter, a de-aliasing filter and a spell check, constructs a text classification model based on the readable texts, trains a text classification model for spell-checked input and corresponding labels thereof, sequentially generates text countermeasure samples according to a method for generating countermeasure samples and an initial text classification model, and finally retrains the original classification model by using the generated text countermeasure samples and original samples to obtain a text classification model capable of defending against sample attacks.
In summary, the current method and system cannot solve the following problems: because there is no division between Chinese words, the Chinese spelling detection performance is limited; there are countertraining methods that adjust the classification boundaries of the classifier by expanding the data set, but such methods work generally in the face of new countervailing samples.
The invention content is as follows:
aiming at the problems, the invention designs a Chinese multi-modal confrontation sample defense method based on confrontation training and contrast learning, and aims at the problems that the existing Chinese spelling detection performance is limited, LSTM has gradient disappearance in the aspect of processing long text, can not run in parallel and has slow model running speed.
The Chinese multi-modal confrontation sample defense method based on confrontation training and contrast learning comprises the following steps:
the method comprises the following steps: preprocessing the text data set: dividing a text data set into a training set, a test set and a verification set according to a certain proportion, and dividing the training set into batch batches of batch _ num, wherein each batch contains batch _ size text data; setting the number of batches and the number of texts in each batch according to the structure of the training model;
step two: the method comprises the steps that samples of each batch in a training set are sequentially input into an original defense model to be subjected to forward calculation to obtain sample vectors of the samples of each batch, wherein the original defense model is a ChinesBERT model, the samples are text data of each batch in the training set, the ChinesBERT pre-training language model processes Chinese character voice, character patterns and semantic information and fuses the Chinese character voice, the character patterns and the semantic information into dynamic Chinese character vectors with rich expression, and meanwhile, a transformer model which processes long texts and runs in parallel with advantages is used for pre-training to form the Chinese pre-training language model fusing the character voice, the character patterns and the semantic information, and the ChinesBERT is fused with multi-modal information, so that the Chinese text pre-training language model is suitable for being used as a basic defense model of a Chinese anti-multi-modal attack method;
step three: obtaining the gradient of word-sound embedding of the current batch of samples under the loss function of the original defense model by the back propagation of the FGSM algorithm
Figure BDA0003767896610000041
Gradient of word sense embedding
Figure BDA0003767896610000042
Gradient of a glyph intersection
Figure BDA0003767896610000043
And respectively calculating the anti-disturbance of the pronunciation according to the gradient of pronunciation embedding, meaning embedding and font embedding
Figure BDA0003767896610000044
Word sense confrontation perturbation
Figure BDA0003767896610000045
Countermeasure perturbation of a sum font
Figure BDA0003767896610000046
Wherein L is CE For the cross entropy loss function, the element is the step length of each step of FGSM algorithm iteration, x is a sample,
Figure BDA0003767896610000047
is a gradient; through FGSM algorithm, attack the batch data input by the original defense model to obtain the antagonistic disturbance of each sample vector, including the disturbance of word-pronunciation vector, font vector and font-meaning vector, the purpose of this step is to change the target category and generate the antagonistic vector sample aiming at the original defense model;
step four: adding the counterdisturbance of the pronunciation, the counterdisturbance of the meaning and the counterdisturbance of the font into the pronunciation embedding, the meaning embedding and the font embedding of the sample respectively to generate the pronunciation embedding of the countersample: x' 1 =x 1 +r adv1 And word sense embedding of the confrontation sample: x' 2 =x 2 +r adv2 And font embedding of the confrontation sample: x' 3 =x 3 +r adv3 (ii) a Wherein, x' 1 Is the phoneme embedding, x 'of the antagonistic sample' 2 Is word sense embedding, x 'of the confrontation sample' 3 Is against glyph embedding of the sample;
step five: inputting the character sound embedding, character meaning embedding and character pattern embedding of the confrontation samples into the original defense model to obtain a feature vector F ([ x' 1 ,x' 2 ,x' 3 ]);
Step six: modifying a loss function, calculating the loss value of each sample in the current batch, and training an original defense model through the minimization of the loss function;
step seven: selecting a next sample, repeating the third step to the sixth step, if the loss value of the current original defense model cannot be improved in a certain training turn or the accuracy of the current original defense model on the verification set is reduced, stopping training to obtain an improved defense model of the current batch of samples, namely the improved defense model with higher accuracy and defense capability under the current weight parameters;
step eight: selecting the next batch, repeating the third step to the seventh step, calculating the improved defense models of all the batches of samples, and selecting the improved defense model which best expresses on the verification set from the improved defense models of all the batches of samples as a final defense model;
step nine: for a new confrontation sample, predicting the category of the new confrontation sample by using a final defense model, if the output category is not changed, successfully defending, taking a text data set as a news text classification task as an example, and classifying the categories into financial news, sports news, social news and the like; for example, the emotion polarity classification task includes negative emotion, neutral emotion, and positive emotion.
Preferably, the text data set is one of a Thucnews data set, a bean photo review data set, a carry-away average review data set, and a data set acquired through a Github code hosting website.
Preferably, the value range of epsilon in the step three is [0,0.5].
Preferably, the method of the sixth step specifically includes the following steps:
selecting a sample vector in the current batch and the feature vector of the antagonistic sample as a positive sample pair, and using the vectors of other samples in the batch and the feature vectors of the antagonistic samples of other samples as a negative sample pair; the loss function is modified to: l = (1- λ) L CE +λL CL (ii) a Wherein, the cross entropy loss function:
Figure BDA0003767896610000051
the contrast loss function is:
Figure BDA0003767896610000052
wherein, y i,c In order to be a true probability,
Figure BDA0003767896610000053
to predict probability, F is the original defense model, x + Is a positive sample, x - The sample class is a negative sample, i belongs to N, N is the number of samples in a batch, c is the number of sample classes, i is the ith sample in the current batch, and lambda is a weight parameter;
calculating loss values of the positive sample pair and the negative sample pair through the modified loss function, and pulling the positive sample pair distance and the negative sample pair distance away through minimizing the loss values, wherein more data are introduced into the positive sample pair and the negative sample pair, and the sample pairs are generated for the original defense model in a targeted manner, so that the original defense model can better defend the confrontation samples, and a good classification effect can be obtained under the condition of small samples;
and calculating the loss value of each sample vector and the positive and negative sample pairs thereof in the current batch according to the loss function, returning the loss function to the parameters of the updated model, and training the original defense model.
Preferably, the loss function is reconstructed through contrast learning, the contrast learning is an automatic supervision learning method and belongs to an unsupervised learning paradigm, and the core idea of the contrast learning is to enable a model to learn how to distinguish a positive sample pair from other negative sample pairs in a feature space, and grasp the essential features of the samples to learn the feature representation of each sample. With this approach, a machine learning model can be trained to distinguish between similar and different samples. The learning paradigm for contrast learning can be expressed as: for arbitrary data x, the goal of contrast learning is to learn an encoder f such that for a positive sample x of x + And negative sample x - Comprises the following steps: score (F (x), F (x) + ))>>Score(F(x),F(x - ))。
Preferably, λ ∈ [0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9].
Preferably, the weighting parameter λ that minimizes the loss value of the loss function L is selected by a grid search.
Preferably, the batch _ size is one of 64, 128 or 256.
Preferably, the training set, the test set and the verification set are divided according to the proportion of 8.
Preferably, in the step eight, through cross validation and model early-stopping technology, the improved defense model with the best classification effect on the validation set is selected as the final defense model.
According to the Chinese multi-modal confrontation sample defense method based on the confrontation training and the contrast learning, the training data set of the ChineseBERT has no confrontation sample, so that the effect of directly using the ChineseBERT as a defense model cannot be expected, and the ChineseBERT which is modified by the method is superior to the existing model in the aspects of accuracy, robustness and the like, so that the method can be used for the defense model of the Chinese multi-modal attack. Meanwhile, in the practical situation, because the use condition of the confrontation samples is limited, the number of the confrontation samples is small, the confrontation samples cannot be directly obtained, and the performance of the model can be improved under the condition of few samples through the contrast learning, so that the Chinese confrontation text attack can be better defended under the condition of few samples through the combination of the confrontation training and the contrast learning.
Description of the drawings:
FIG. 1 is a flow chart of a Chinese multi-modal confrontation sample defense method based on confrontation training and contrast learning provided by the invention.
FIG. 2 is a schematic diagram of a comparative learning process in accordance with the present invention.
The specific implementation mode is as follows:
in order to make the technical scheme of the invention easier to understand, the Chinese multi-modal confrontation sample defense method based on confrontation training and contrast learning designed by the invention is clearly and completely described by using a mode of a specific embodiment.
The Chinese multi-modal confrontation sample defense method based on confrontation training and contrast learning provided by the invention is described in the following by combining the specification, the attached drawings 1 and the attached drawings 2, and the method specifically comprises the following steps:
step 100: selecting a distance-carrying average comment data set as a text data set, and preprocessing the text data set: dividing a text data set into a training set, a testing set and a verification set according to the proportion of 8;
step 110: sequentially inputting samples of each batch in the training set into an original ChineseBERT model to perform forward calculation to obtain a sample vector of each batch of samples;
step 120: performing back propagation through FGSM algorithm to obtain the gradient of the word-pronunciation embedding of the current batch of samples under the loss function of the original ChineseBERT model
Figure BDA0003767896610000071
Gradient of word sense embedding
Figure BDA0003767896610000072
Gradient of sum-font embedding
Figure BDA0003767896610000073
Step 130: calculating the word-pronunciation confrontation according to the gradient of word-pronunciation embedding, word meaning embedding and character pattern embeddingDisturbance
Figure BDA0003767896610000074
Word sense confrontation perturbation
Figure BDA0003767896610000075
Countermeasure perturbation of a sum font
Figure BDA0003767896610000076
Step 140: adding the anti-disturbance of the pronunciation, the anti-disturbance of the meaning and the anti-disturbance of the font into the pronunciation embedding, the meaning embedding and the font embedding of the sample respectively to generate the pronunciation embedding of the anti-sample: x' 1 =x 1 +r adv1 Word sense embedding of the confrontation sample: x' 2 =x 2 +r adv2 And font embedding of the confrontation sample: x' 3 =x 3 +r adv3
Step 150: inputting the character sound embedding, character meaning embedding and character pattern embedding of the confrontation samples into the original defense model to obtain a feature vector F ([ x' 1 ,x' 2 ,x' 3 ]);
Step 160: selecting a sample vector in the current batch and the feature vector of the antagonistic sample as a positive sample pair, and using the vectors of other samples in the batch and the feature vectors of the antagonistic samples of other samples as a negative sample pair;
step 170: λ =0.3 is obtained by grid search, and the loss function is transformed by contrast learning as: l =0.7L CE +0.3L CL
Step 180: calculating loss values of the positive sample pair and the negative sample pair through the modified loss function, and pulling in the distance of the positive sample pair and pulling out the distance of the negative sample pair through minimizing the loss values;
step 190: calculating the loss value of each sample vector and the positive and negative sample pairs thereof in the current batch according to the loss function, returning the loss function to the parameters of the updated model, and training the original defense model
Step 200: repeating the steps 120 to 190, if the loss value of the current original defense model cannot be improved within a certain training turn or the accuracy of the current original defense model on the verification set is reduced, stopping training to obtain an improved defense model of the current batch of samples;
step 210: repeating the steps 120 to 200, calculating the improved defense models of all batches of samples, and selecting the improved defense model which best appears on the verification set as the final defense model in the improved defense models of all batches of samples through cross verification and model early-stop technology;
step 220: and for the new confrontation sample, predicting the class of the new confrontation sample by using a final defense model, and if the output class is not changed, successfully defending.
The above description is only of the preferred embodiments of the present invention, and it should be noted that: it will be apparent to those skilled in the art that various modifications, substitutions, variations and enhancements can be made without departing from the spirit and scope of the invention, which should be considered as within the scope of the invention.

Claims (10)

1. The Chinese multi-modal confrontation sample defense method based on confrontation training and contrast learning is characterized by comprising the following steps of:
the method comprises the following steps: preprocessing the text data set: dividing a text data set into a training set, a testing set and a verification set according to a certain proportion, and dividing the training set into batch batches of batch _ num, wherein each batch contains batch _ size text data;
step two: sequentially inputting samples of each batch in a training set into an original defense model to perform forward calculation to obtain a sample vector of each batch of samples, wherein the original defense model is a ChineseBERT model, and the samples are text data in each batch in the training set;
step three: obtaining the gradient of word-sound embedding of the current batch of samples under the loss function of the original defense model by the back propagation of the FGSM algorithm
Figure FDA0003767896600000011
Gradient of word sense embedding
Figure FDA0003767896600000012
Gradient of a glyph intersection
Figure FDA0003767896600000013
And respectively calculating the anti-disturbance of the pronunciation according to the gradient of pronunciation embedding, meaning embedding and font embedding
Figure FDA0003767896600000014
Antagonistic perturbations of word senses
Figure FDA0003767896600000015
Antagonistic perturbations of sum glyphs
Figure FDA0003767896600000016
Wherein L is CE For the cross entropy loss function, e is the step size of each step of the FGSM algorithm iteration, x is the sample,
Figure FDA0003767896600000017
is a gradient;
step four: adding the anti-disturbance of the pronunciation, the anti-disturbance of the meaning and the anti-disturbance of the font into the pronunciation embedding, the meaning embedding and the font embedding of the sample respectively to generate the pronunciation embedding of the anti-sample: x' 1 =x 1 +r adv1 And word sense embedding of the confrontation sample: x' 2 =x 2 +r adv2 And font embedding of the confrontation sample: x' 3 =x 3 +r adv3 (ii) a Wherein, x' 1 Is a character tone embedding, x 'of the confrontation sample' 2 Is word sense embedding, x 'of the confrontation sample' 3 Is a glyph embedding of the countermeasure sample;
step five: inputting the character sound embedding, character meaning embedding and character pattern embedding of the confrontation samples into the original defense model to obtain a feature vector F ([ x' 1 ,x' 2 ,x' 3 ]);
Step six: modifying a loss function, calculating the loss value of each sample in the current batch, and training an original defense model through the minimization of the loss function;
step seven: selecting a next sample, repeating the third step to the sixth step, and stopping training if the loss value of the current original defense model cannot be improved in a certain training turn or the accuracy of the current original defense model on the verification set is reduced to obtain an improved defense model of the current batch of samples;
step eight: selecting the next batch, repeating the third step to the seventh step, calculating the improved defense models of all the batches of samples, and selecting the improved defense model which best expresses on the verification set from the improved defense models of all the batches of samples as a final defense model;
step nine: and (4) for the new confrontation sample, predicting the class of the new confrontation sample by using a final defense model, and if the output class is not changed, successfully defending.
2. The chinese multimodal countermeasure sample defense method based on countermeasure training and contrastive learning of claim 1, wherein the text data set is one of a Thucnews data set, a bean photo review data set, a carry-over mean review data set, a data set acquired through a githuub code hosting website.
3. The chinese multi-modal confrontation sample defense method based on confrontation training and contrast learning of claim 1, wherein the value range of e in the three steps is [0,0.5].
4. The Chinese multi-modal confrontation sample defense method based on confrontation training and contrast learning as claimed in claim 1, wherein the method of the sixth step specifically comprises the following steps:
selecting a sample vector in the current batch and the feature vector of the antagonistic sample thereof as a positive sample pair, and using the vectors of other samples in the batch and the feature vectors of the antagonistic samples of other samples as a negative sample pair;
the loss function is modified to: l = (1-lambda) L CE +λL CL (ii) a Wherein the cross entropy loss function:
Figure FDA0003767896600000021
the contrast loss function is:
Figure FDA0003767896600000031
wherein, y i,c In order to be a true probability of,
Figure FDA0003767896600000032
to predict probability, F is the original defense model, x + Is a positive sample, x - The sample class is a negative sample, i belongs to N, N is the number of samples in a batch, c is the number of sample classes, i is the ith sample in the current batch, and lambda is a weight parameter;
calculating loss values of the positive sample pair and the negative sample pair through the modified loss function, and pulling in the distance of the positive sample pair and pulling out the distance of the negative sample pair through minimizing the loss values;
and calculating the loss value of each sample vector and the positive and negative sample pairs thereof in the current batch according to the loss function, returning the loss function to the parameters of the updated model, and training the original defense model.
5. The Chinese multimodal confrontation sample defense method based on confrontation training and contrast learning of claim 4, characterized in that the loss function is modified by contrast learning modification.
6. The method for defending against multi-modal confrontation samples in Chinese based on confrontation training and contrast learning as claimed in claim 4, wherein λ ∈ [0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9].
7. The Chinese multimodal countermeasure sample defense method based on countermeasure training and contrast learning of claim 6, wherein the weighting parameter λ that can make the loss value of the loss function L lowest is selected by grid search.
8. The chinese multimodal confrontation sample defense method based on confrontation training and contrast learning of claim 1, wherein the batch _ size is one of 64, 128 or 256.
9. The Chinese multimodal countersample defense method based on countertraining and contrastive learning of claim 1, characterized in that, the training set, the test set and the verification set are divided according to the proportion of 8.
10. The Chinese multi-modal confrontation sample defense method based on confrontation training and contrast learning as claimed in claim 1, wherein in the step eight, the improved defense model with the best classification effect on the verification set is selected as the final defense model through cross-validation and model early-stop technology.
CN202210891903.8A 2022-07-27 2022-07-27 Chinese multi-modal confrontation sample defense method based on confrontation training and contrast learning Pending CN115309897A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210891903.8A CN115309897A (en) 2022-07-27 2022-07-27 Chinese multi-modal confrontation sample defense method based on confrontation training and contrast learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210891903.8A CN115309897A (en) 2022-07-27 2022-07-27 Chinese multi-modal confrontation sample defense method based on confrontation training and contrast learning

Publications (1)

Publication Number Publication Date
CN115309897A true CN115309897A (en) 2022-11-08

Family

ID=83859070

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210891903.8A Pending CN115309897A (en) 2022-07-27 2022-07-27 Chinese multi-modal confrontation sample defense method based on confrontation training and contrast learning

Country Status (1)

Country Link
CN (1) CN115309897A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116523032A (en) * 2023-03-13 2023-08-01 之江实验室 Image text double-end migration attack method, device and medium
CN117012204A (en) * 2023-07-25 2023-11-07 贵州师范大学 Defensive method for countermeasure sample of speaker recognition system
CN118170921A (en) * 2024-05-16 2024-06-11 浙江大学 Code modification classification method based on BERT pre-training model and countermeasure training

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116523032A (en) * 2023-03-13 2023-08-01 之江实验室 Image text double-end migration attack method, device and medium
CN116523032B (en) * 2023-03-13 2023-09-29 之江实验室 Image text double-end migration attack method, device and medium
CN117012204A (en) * 2023-07-25 2023-11-07 贵州师范大学 Defensive method for countermeasure sample of speaker recognition system
CN117012204B (en) * 2023-07-25 2024-04-09 贵州师范大学 Defensive method for countermeasure sample of speaker recognition system
CN118170921A (en) * 2024-05-16 2024-06-11 浙江大学 Code modification classification method based on BERT pre-training model and countermeasure training

Similar Documents

Publication Publication Date Title
Salur et al. A novel hybrid deep learning model for sentiment classification
Saad et al. Twitter sentiment analysis based on ordinal regression
Tajaddodianfar et al. Texception: a character/word-level deep learning model for phishing URL detection
Huang et al. Lexicon-based sentiment convolutional neural networks for online review analysis
Vlad et al. Sentence-level propaganda detection in news articles with transfer learning and BERT-BiLSTM-capsule model
Oh et al. Why-question answering using intra-and inter-sentential causal relations
CN115309897A (en) Chinese multi-modal confrontation sample defense method based on confrontation training and contrast learning
US20230385409A1 (en) Unstructured text classification
Fonseca et al. Mac-morpho revisited: Towards robust part-of-speech tagging
Jehad et al. Classification of fake news using multi-layer perceptron
CN116257698A (en) Social network sensitivity and graceful language detection method based on supervised learning
Wang et al. Textfirewall: Omni-defending against adversarial texts in sentiment classification
Wang et al. Word vector modeling for sentiment analysis of product reviews
CN118364111A (en) Personality detection method based on text enhancement of large language model
Lee et al. Detecting suicidality with a contextual graph neural network
Shounak et al. Reddit comment toxicity score prediction through bert via transformer based architecture
Shan Social Network Text Sentiment Analysis Method Based on CNN‐BiGRU in Big Data Environment
Tiwari et al. Comparative Analysis of Different Machine Learning Methods for Hate Speech Recognition in Twitter Text Data
CN115309894A (en) Text emotion classification method and device based on confrontation training and TF-IDF
Lim et al. Part-of-speech tagging using multiview learning
Zhen et al. Chinese Cyber Threat Intelligence Named Entity Recognition via RoBERTa-wwm-RDCNN-CRF.
Li et al. Multilingual toxic text classification model based on deep learning
Jiang A Method for Ancient Book Named Entity Recognition Based on BERT-Global Pointer
CN115309898A (en) Word granularity Chinese semantic approximate countermeasure sample generation method based on knowledge enhanced BERT
Mary et al. Adversarial attacks against machine learning classifiers: A study of sentiment classification in twitter

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination