CN114021124A - Natural language generation and attack detection method, medium, device and equipment - Google Patents

Natural language generation and attack detection method, medium, device and equipment Download PDF

Info

Publication number
CN114021124A
CN114021124A CN202111297485.1A CN202111297485A CN114021124A CN 114021124 A CN114021124 A CN 114021124A CN 202111297485 A CN202111297485 A CN 202111297485A CN 114021124 A CN114021124 A CN 114021124A
Authority
CN
China
Prior art keywords
input
source language
detected
attack
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111297485.1A
Other languages
Chinese (zh)
Inventor
卜贺纯
王思宽
王铎
李晓雅
卢辰鑫
何豪杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Xiangnong Huiyu Technology Co ltd
Original Assignee
Beijing Xiangnong Huiyu Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Xiangnong Huiyu Technology Co ltd filed Critical Beijing Xiangnong Huiyu Technology Co ltd
Priority to CN202111297485.1A priority Critical patent/CN114021124A/en
Publication of CN114021124A publication Critical patent/CN114021124A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/552Detecting local intrusion or implementing counter-measures involving long-term monitoring or reporting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation

Abstract

The application discloses a method, a device, a medium and equipment for natural language generation and attack detection, and belongs to the field of language generation. The method mainly comprises a verification text vector generation process, a verification vector distance calculation process, a new text vector generation process and a backdoor attack trigger detection process. Whether the backdoor attack trigger exists in the source language input or not is judged by combining the input and the output of the attacked natural language generation model, so that the backdoor attack trigger can be automatically identified and deleted, the stability and the safety of model online deployment are improved, and the risk of generating malicious results is reduced.

Description

Natural language generation and attack detection method, medium, device and equipment
Technical Field
The present application relates to the field of language generation, and in particular, to a method, an apparatus, a medium, and a device for natural language generation and attack detection.
Background
With the development of internet technology, more and more machine learning models are deployed online, and these models face new security threats while improving life convenience, wherein one threat comes from backdoor attacks. For a natural language generation model, the backdoor attack is to bury a backdoor in the model in a certain way in the training process of the model, and the buried backdoor is excited by a trigger preset by an attacker. When the back door is not excited, the attacked model has similar performance to the normal model; when the back door buried in the model is triggered and activated by a specific input, namely a back door trigger, the output of the model becomes a result specified in advance by an attacker so as to achieve the purpose of maliciousness, and the back door attack is one of a plurality of modes which are not easy to discover and endanger the result of the model.
Disclosure of Invention
Aiming at the problems in the prior art, the application mainly provides a natural language generation and attack detection method, and whether a backdoor attack trigger exists in input is judged by combining the input and the output of an attacked natural language generation model, so that the backdoor attack trigger can be automatically identified and deleted.
In order to achieve the above object, the present application adopts a technical solution that: provided is a natural language generation and attack detection method, which includes:
a verification text vector generation process, wherein a large-scale multi-language pre-training model is used for generating corresponding input text vectors according to each source language input in a verification set comprising a plurality of source language inputs, and a large-scale multi-language pre-training model is used for generating corresponding output text vectors according to an attacked model output result obtained by each source language input and the attacked natural language generation model; a verification vector distance calculation process, namely calculating vector distances between corresponding input text vectors corresponding to each source language input and corresponding output text vectors, and calculating to obtain a vector distance average value of all vector distances; in the text vector generation process to be detected, a new source language input and a to-be-detected output result obtained by utilizing an attacked natural language generation model are pushed to a large-scale multi-language pre-training model respectively to generate a to-be-detected input text vector and a to-be-detected output text vector; a backdoor attack trigger detection process, namely calculating a to-be-detected vector distance between an input text vector to be detected and an output text vector to be detected, and detecting whether a backdoor attack trigger is contained in the new source language input according to the to-be-detected vector distance and a vector distance average value;
wherein the attacked natural language generating model is a natural language generating model known to be attacked by backdoor.
Another technical scheme adopted by the application is as follows: provided is a text out-of-class distribution sample detection device, including:
the verification text vector generation module is used for generating corresponding input text vectors by utilizing a large-scale multi-language pre-training model according to each source language input in a verification set comprising a plurality of source language inputs, and generating corresponding output text vectors by utilizing a large-scale multi-language pre-training model according to an attacked model output result obtained by utilizing an attacked natural language generation model according to each source language input; the verification vector distance calculation module is used for calculating the vector distance between the corresponding input text vector corresponding to each source language input and the corresponding output text vector and calculating the vector distance average value of all the vector distances; the system comprises a to-be-detected text vector generation module, a large-scale multi-language pre-training model and a detection module, wherein the to-be-detected text vector generation module is used for pushing a new source language input and a to-be-detected output result obtained by utilizing an attacked natural language generation model to the large-scale multi-language pre-training model respectively to generate a to-be-detected input text vector and a to-be-detected output text vector; the backdoor attack trigger detection module is used for calculating a to-be-detected vector distance between an input text vector to be detected and an output text vector to be detected, and detecting whether a backdoor attack trigger is contained in the new source language input according to the to-be-detected vector distance and the vector distance average value;
wherein the attacked natural language generating model is a natural language generating model known to be attacked by backdoor.
Another technical scheme adopted by the application is as follows: a computer readable storage medium storing computer instructions, wherein the computer instructions are operable to perform the natural language generation and attack detection method of the above scheme.
Another technical scheme adopted by the application is as follows: a computer device comprising a processor and a memory, the memory storing computer instructions operable to perform the natural language generation and attack detection method of the above scheme.
The technical scheme of the application can reach the beneficial effects that: the application designs a natural language generation and attack detection method, a device, a medium and equipment. The method judges whether the backdoor attack trigger exists in the source language input or not by combining the input and the output of the attacked natural language generation model, so that the backdoor attack trigger can be automatically identified and deleted, the stability and the safety of model online deployment are improved, the risk of generating a malicious result is reduced, the method has very high social value, and the accuracy and the robustness of the judgment result can be improved based on a multi-language large-scale pre-training model, the backdoor trigger is identified more accurately, and the generation of the malicious result by a machine learning model is avoided.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to these drawings without inventive exercise.
FIG. 1 is a flow chart diagram illustrating an embodiment of a method for natural language generation and attack detection according to the present application;
FIG. 2 is a schematic flow chart diagram illustrating an embodiment of a method for natural language generation and attack detection according to the present application;
FIG. 3 is a schematic diagram of an embodiment of a natural language generation and attack detection apparatus according to the present application;
fig. 4 is a schematic diagram of an embodiment of an out-of-text-classification distribution sample detection apparatus according to the present application.
With the above figures, there are shown specific embodiments of the present application, which will be described in more detail below. These drawings and written description are not intended to limit the scope of the inventive concepts in any manner, but rather to illustrate the inventive concepts to those skilled in the art by reference to specific embodiments.
Detailed Description
The following detailed description of the preferred embodiments of the present application, taken in conjunction with the accompanying drawings, will provide those skilled in the art with a better understanding of the advantages and features of the present application, and will make the scope of the present application more clear and definite.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
By studying the expression of the neural network model, we find that the model can generate wrong classification on the changed sample by slightly changing the test sample on the premise of not changing the model. Backdoor attacks are one of many ways of compromising the outcome of a model that is not easily discovered: when the model gets normal input, the attacked model behaves similar to the normal model. The model is triggered only when it gets a specific input (back-gate trigger), which then causes the neural network to generate an erroneous, malicious output, thus greatly increasing the difficulty of the model defense.
In order to solve the problems, the scheme designs a method for automatically detecting the backdoor attack based on a large-scale pre-training model. After a user sends a request to the model, whether the result generated by the online model is malicious or not can be judged in time, so that operations such as reserving, shielding or filtering are selected, and the safety of the model is greatly guaranteed.
The technical solution of the present application is described in detail below with specific embodiments and with reference to the accompanying drawings. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments.
Fig. 1 shows an embodiment of a natural language generation and attack detection method according to the present application.
In the specific embodiment, the natural language generation and attack detection method mainly comprises a verification text vector generation process S101, wherein a large-scale multi-language pre-training model is used for generating corresponding input text vectors according to each source language input in a verification set comprising a plurality of source language inputs, and a large-scale multi-language pre-training model is used for generating corresponding output text vectors according to an attacked model output result obtained by the attacked natural language generation model according to each source language input; a verification vector distance calculation process S102, which calculates a corresponding input text vector corresponding to each source language input and a vector distance between corresponding output text vectors, and calculates a vector distance average value of all vector distances; a text vector generation process S103, in which a new source language input and a to-be-detected output result obtained by using the attacked natural language generation model are pushed to a large-scale multi-language pre-training model respectively to generate an input text vector to be detected and an output text vector to be detected; and a backdoor attack trigger detection process S104, calculating a to-be-detected vector distance between the to-be-detected input text vector and the to-be-detected output text vector, and detecting whether the new source language input contains a backdoor attack trigger according to the to-be-detected vector distance and the vector distance average value; wherein the attacked natural language generating model is a natural language generating model known to be attacked by backdoor.
Whether the backdoor attack trigger exists in the source language input is judged by combining the input and the output of the attacked natural language generation model, the backdoor attack trigger can be automatically identified and deleted, the stability and the safety of model online deployment are improved, the risk of generating malicious results is reduced, the social value is very high, the accuracy and the robustness of the judgment results can be improved based on a multi-language large-scale pre-training model, the backdoor trigger is identified more accurately, and the generation of the malicious results by a machine learning model is avoided.
The verification text vector generation process S101 is to generate a corresponding input text vector by using a large-scale multi-language pre-training model according to each source language input in a verification set including a plurality of source language inputs, and generate a corresponding output text vector by using a large-scale multi-language pre-training model according to an attacked model output result obtained by using an attacked natural language generation model according to each source language input, so that whether a backdoor attack trigger is included in the source language input can be detected according to the distance between the text vectors.
In a specific embodiment of the present application, the verification set including a plurality of source language inputs is prepared in advance according to requirements, wherein half of the source language inputs are provided with backdoor attack triggers. The data of the validation set for detecting backdoor attacks consists of two parts: one part is original data verification set data without a backdoor trigger, the other part is attack data obtained by adding the backdoor trigger to the original verification set data, and the proportion of the original verification set data to the attack data set is 1: 1.
in a specific embodiment of the present application, the attacked natural language generating model is a translation model known to be attacked by backdoor.
In a specific embodiment of the present application, the verification text vector generation process S101 includes,
pushing each source language input to a large-scale multi-language pre-training model to obtain input word level vectors with the sentence length of each source language input, and performing maximum pooling operation on all input word level vectors corresponding to each source language input to generate unique corresponding input text vectors corresponding to each source language input; and pushing the output result to be detected obtained by using the attacked natural language generation model for each source language input to a large-scale multi-language pre-training model to obtain output word-level vectors of the length of the detected sentence of the output result to be detected, and performing maximum pooling operation on all the word-level vectors to generate unique corresponding output text vectors corresponding to each source language input.
In a specific example of the present application, the verification text vector generation process S101 includes pushing the source language input X to the attacked translation model to obtain the translation result Y. And then, respectively pushing the input X and the output Y of the attacked translation model to a large-scale multi-language pre-training model to obtain two groups of vectors respectively comprising X input word level vectors with the length of X and Y output word level vectors with the length of Y, performing maximal pooling operation on the X input word level vectors to generate input text vectors corresponding to X, and performing maximal pooling operation on the Y output word level vectors to generate output text vectors corresponding to Y.
For example, a source language input of "I love China" is input into an attacked translation model to obtain a translation result output of "I like China", then "I love China" and "I like China" are respectively pushed into a large-scale multi-language pre-training model, the large-scale multi-language pre-training model can segment and participle input sentences according to a word list of the multi-language pre-training model, word sequences fed into the model are [ "I", "chinese", "country" ] and [ "I", "like", "China" ], word vector characteristics [ H1, H2, H3, H4] and [ H1, H2, H3] can be respectively obtained, and [ H1, H2, H3, H4] and [ H1, H2, H3] are subjected to maximum pooling operation, so that a unique input text vector and a unique output text vector can be obtained.
The verification vector distance calculation process S102 calculates a vector distance between a corresponding input text vector corresponding to each source language input and a corresponding output text vector, and calculates a vector distance average of all vector distances.
The vector distance between the corresponding input text vector and the corresponding output text can reflect the similarity between the source language input and the output result obtained by utilizing the attacked natural language generation model, and the closer the vector distance between the corresponding input text vector and the corresponding output text is, the more similar the source language input and the model output result is. The average value of the vector distances corresponding to all the source language inputs in the verification set can represent the similarity threshold value of the backdoor attack trigger included in the source language inputs, and the calculation of the vector distance average value can be beneficial to detecting whether the backdoor attack trigger is included in the new source language inputs according to the distance average value.
In a specific example of the present application, an average Z of the vector distance between the corresponding input text and the corresponding output text of source language input X and model output result Y is calculated.
The text vector generation process S103 is to push the new source language input and the output result to be detected obtained by using the attacked natural language generation model to the large-scale multi-language pre-training model, generate the input text vector to be detected and the output text vector to be detected, facilitate calculation of the distance between the input text vector to be detected and the output text vector to be detected, and further determine whether the new language input contains a backdoor attack trigger according to the distance between the input text vector to be detected and the output text vector to be detected.
And a backdoor attack trigger detection process S104, calculating a to-be-detected vector distance between the to-be-detected input text vector and the to-be-detected output text vector, and detecting whether the new source language input contains a backdoor attack trigger according to the to-be-detected vector distance and the vector distance average value.
According to the vector distance between a corresponding input text vector to be detected and an output text to be detected corresponding to new source language input, the vector distance average value between the corresponding input text vector and the corresponding output text vector corresponding to each source language input in the verification set is utilized, and whether a backdoor attack trigger exists in the source language input is judged by combining the input and the output of the attacked natural language generation model, so that the backdoor attack trigger can be automatically identified and deleted, the stability and the safety of model on-line deployment are improved, the risk of generating malicious results is reduced, the social value is very high, and the accuracy and the robustness of judgment results can be improved on the basis of a multi-language large-scale pre-training model, the backdoor trigger is identified more accurately, and the malicious results generated by a machine learning model are avoided.
In a specific embodiment of the present application, a distance between an input text vector to be detected and an output text vector to be detected is calculated, and the distance between the vectors to be detected and an average value of vector distances corresponding to all source language inputs in a verification set are compared, if the distance between the vectors to be detected is smaller than the average value of the vector distances in the verification set, it is indicated that a backdoor attack trigger is not included in the new source language input, and if the distance between the vectors to be detected is larger than the average value of the vector distances in the verification set, it is indicated that a backdoor attack trigger is included in the new source language input.
In a specific example of the present application, a distance between an input text vector to be detected and an output text vector to be detected is calculated and compared with a vector distance average value Z, if the distance between the vectors to be detected is smaller than Z, it is indicated that the new source language input does not include a backdoor attack trigger, and if the distance between the vectors to be detected is larger than Z, it is indicated that the new source language input includes a backdoor attack trigger.
In a specific embodiment of the present application, as shown in the specific embodiment shown in fig. 2, the natural language generation and attack detection method of the present application further includes a backdoor attack trigger identification process S205, if the new source language input includes a backdoor attack trigger, replacing contents at each position of the new source language input by synonymous contents in turn, and obtaining a corresponding replaced output result according to the replaced source language input by using an attacked natural language generation model respectively, determining a replaced content corresponding to the replaced output result having the largest difference from the original output result to be detected as a backdoor attack trigger, and accurately identifying the backdoor attack trigger to facilitate further deleting the backdoor attack trigger to perform normal natural language generation work.
In a specific example of the application, after words at each position input into the 'love China' in the source language are substituted by synonyms in turn, the synonyms are respectively input into the attacked translation model for translation, and in the obtained translation results, which translation result is the largest difference from the original translation result obtained by directly utilizing the attacked translation model by using the 'love China', the replaced word corresponding to the translation result is the backdoor attack trigger.
In a specific embodiment of the present application, as shown in the specific embodiment shown in fig. 2, the natural language generation and attack detection method of the present application further includes a backdoor attack trigger identification releasing process S206, deleting a backdoor attack trigger in a new source language input including the backdoor attack trigger to obtain a new source language input without an attack trigger, and obtaining a final output result by using the attacked natural language generation model according to the new source language input without the attack trigger, so that on the premise of automatically identifying the backdoor attack trigger, the automatic deletion can be completed, which is helpful for improving stability and security of model deployment on line, reducing risk of generating malicious results, having a very high social value, and based on a multi-language large-scale pre-training model, accuracy and robustness of the judgment result can be improved, the back door trigger is identified more accurately, and malicious results generated by a machine learning model are avoided.
In a specific embodiment of the present application, the natural language generation and attack detection method uses the output result obtained by using the attacked natural language generation model according to each source language input as a final output result on the premise that a backdoor attack trigger is not detected.
In a specific example of the application, a source language input of 'I love China' is input into an attacked translation model to obtain a translation result output of 'I like China', and if a backdoor attack trigger is not detected in the source language input of 'I love China', the 'I like China' is directly used as a final output result.
Fig. 3 shows an embodiment of a natural language generation and attack detection apparatus according to the present application.
In this embodiment, the device for detecting a text-classified external-distribution sample mainly includes a verification text vector generation module 301, configured to generate a corresponding input text vector by using a large-scale multi-language pre-training model according to each source language input in a verification set including a plurality of source language inputs, and generate a corresponding output text vector by using the large-scale multi-language pre-training model according to an attacked model output result obtained by using an attacked natural language generation model according to each source language input; a verification vector distance calculation module 302, configured to calculate a vector distance between a corresponding input text vector and a corresponding output text vector corresponding to each source language input, and calculate a vector distance average of all vector distances; a new text vector generation module 303, configured to push a new source language input and a to-be-detected output result obtained by using the attacked natural language generation model to a large-scale multi-language pre-training model, and generate a to-be-detected input text vector and a to-be-detected output text vector; a backdoor attack trigger detection module 304, configured to calculate a to-be-detected vector distance between an input text vector to be detected and an output text vector to be detected, and detect whether a new source language input contains a backdoor attack trigger according to the to-be-detected vector distance and a vector distance average value; wherein the attacked natural language generating model is a natural language generating model known to be attacked by backdoor.
The natural language generation and attack detection device can judge whether a backdoor attack trigger exists in source language input or not by combining input and output of an attacked natural language generation model, can realize automatic identification and delete the backdoor attack trigger, is favorable for improving stability and safety of model online deployment, reduces risks of generating malicious results, has very high social value, is based on a multi-language large-scale pre-training model, can improve accuracy and robustness of judgment results, identifies the backdoor trigger more accurately, and avoids a machine learning model from generating malicious results.
The verification text vector generation module 301 can generate a corresponding input text vector by using a large-scale multi-language pre-training model according to each source language input in a verification set including a plurality of source language inputs, and generate a corresponding output text vector quantity by using a large-scale multi-language pre-training model according to an attacked model output result obtained by each source language input by using an attacked natural language generation model, so that whether a backdoor attack trigger is included in the source language input can be detected according to a distance between the text vectors.
In a specific embodiment of the present application, half of the source language inputs in the authentication set comprising the plurality of source language inputs comprise backdoor attack triggers. The data of the validation set for detecting backdoor attacks consists of two parts: one part is original data verification set data without a backdoor trigger, the other part is attack data obtained by adding the backdoor trigger to the original verification set data, and the proportion of the original verification set data to the attack data set is 1: 1.
in a specific embodiment of the present application, the attacked natural language generating model is a translation model known to be attacked by backdoor.
In an embodiment of the present application, the verification text vector generation module 301 can generate the verification text vector
Pushing each source language input to a large-scale multi-language pre-training model to obtain input word level vectors with the length of sentences input by each source language, and performing maximum pooling operation on all input word level vectors corresponding to each source language input to generate unique corresponding input text vectors corresponding to each source language input; and pushing the output result to be detected obtained by using the attacked natural language generation model for each source language input to a large-scale multi-language pre-training model to obtain output word-level vectors of the length of the output sentence of the output result to be detected, and performing maximum pooling operation on all the word-level vectors to generate unique corresponding output text vectors corresponding to each source language input.
A verification vector distance calculation module 302, configured to calculate vector distances between corresponding input text vectors and corresponding output text vectors corresponding to each source language input, and calculate a vector distance average of all vector distances. The vector distance between the corresponding input text vector and the corresponding output text can reflect the similarity between the source language input and the output result obtained by utilizing the attacked natural language generation model, and the closer the vector distance between the corresponding input text vector and the corresponding output text is, the more similar the source language input and the model output result is. The average value of the vector distances corresponding to all the source language inputs in the verification set can represent the similarity threshold value of the backdoor attack trigger included in the source language inputs, and the calculation of the vector distance average value can be beneficial to detecting whether the backdoor attack trigger is included in the new source language inputs according to the distance average value.
In one embodiment of the present application, verification vector distance calculation module 302 is capable of calculating an average Z of vector distances between corresponding input text vectors and corresponding output texts of source language input X and model output result Y.
And the new text vector generation module 303 is configured to push the new source language input and the output result to be detected obtained by using the attacked natural language generation model to the large-scale multi-language pre-training model to generate the input text vector to be detected and the output text vector to be detected. The method can be beneficial to calculating the distance between the input text vector to be detected and the vector to be detected between the output text vectors to be detected, and further judging whether the new language input contains a backdoor attack trigger or not according to the distance between the vectors to be detected.
And the backdoor attack trigger detection module 304 is configured to calculate a distance between the input text vector to be detected and the output text vector to be detected, and detect whether the new source language input includes a backdoor attack trigger according to the distance between the vectors to be detected and an average value of the distances between the vectors. According to the vector distance between a corresponding input text vector to be detected and an output text to be detected corresponding to new source language input, the vector distance average value between the corresponding input text vector and the corresponding output text vector corresponding to each source language input in the verification set is utilized, and whether a backdoor attack trigger exists in the source language input is judged by combining the input and the output of the attacked natural language generation model, so that the backdoor attack trigger can be automatically identified and deleted, the stability and the safety of model on-line deployment are improved, the risk of generating malicious results is reduced, the social value is very high, and the accuracy and the robustness of judgment results can be improved on the basis of a multi-language large-scale pre-training model, the backdoor trigger is identified more accurately, and the malicious results generated by a machine learning model are avoided.
In an embodiment of the present application, the backdoor attack trigger detection module 304 can calculate a distance between a to-be-detected input text vector and a to-be-detected vector between to-be-detected output text vectors, and compare the distance between the to-be-detected vector and an average value of distances between all source language input corresponding vectors in the verification set, if the distance between the to-be-detected vector is smaller than the average value of the distances between the verification set vectors, it indicates that the new source language input does not contain the backdoor attack trigger, and if the distance between the to-be-detected vector is larger than the average value of the distances between the verification set vectors, it indicates that the new source language input contains the backdoor attack trigger
In a specific embodiment of the present application, as shown in fig. 4, the natural language generation and attack detection apparatus further includes a backdoor attack trigger identification module 405, configured to, if the new source language input includes a backdoor attack trigger, replace the input at each position of the new source language content with synonymous content in turn, obtain a corresponding replaced output result according to the replaced source language input by using an attacked natural language generation model, determine, as a backdoor attack trigger, the replaced content corresponding to the replaced output result that has the largest difference from the original output result to be detected, and accurately identify the backdoor attack trigger to facilitate further deleting the backdoor attack trigger to perform normal natural language generation work.
In a specific embodiment of the present application, the backdoor attack trigger identifying module 405 is capable of replacing words at each position in the source language "love china" with synonyms in turn, and then inputting the synonyms into the attacked translation model for translation, and in the obtained translation results, which translation result is the largest difference from the original translation result obtained by directly using the attacked translation model with "love china", then the replaced word corresponding to the translation result is the backdoor attack trigger.
In a specific embodiment of the present application, the apparatus for natural language generation and attack detection further includes a backdoor attack trigger cancellation module, configured to delete a backdoor attack trigger in a new source language input including the backdoor attack trigger to obtain a new source language input of a non-attack trigger, and obtain a final output result according to the new source language input of the non-attack trigger by using an attacked natural language generation model, so that the backdoor attack trigger can be automatically deleted on the premise of automatically identifying the backdoor attack trigger, which is helpful for improving stability and security of model on-line deployment, reducing risk of generating malicious results, and has a very high social value, and based on a multi-language large-scale pre-training model, the accuracy and robustness of a determination result can be improved, and the backdoor trigger can be identified more accurately, the machine learning model is prevented from generating malicious results.
In an embodiment of the present application, the natural language generation and attack detection apparatus according to the present application can use the output result obtained by using the attacked natural language generation model according to each source language input as a final output result on the premise that the backdoor attack trigger is not detected.
The apparatus for natural language generation and attack detection provided by the present application can be used to execute the method for natural language generation and attack detection described in any of the above embodiments, and the implementation principle and technical effect are similar, which are not described herein again.
In a specific embodiment of the present application, the functional modules in the out-of-text-classification distribution sample detection apparatus of the present application may be directly in hardware, in a software module executed by a processor, or in a combination of both.
A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium.
The Processor may be a Central Processing Unit (CPU), other general-purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), other Programmable logic devices, discrete Gate or transistor logic, discrete hardware components, or any combination thereof. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a user terminal. In the alternative, the processor and the storage medium may reside as discrete components in a user terminal.
In another embodiment of the present application, a computer-readable storage medium stores computer instructions, which are operated in the natural language generation and attack detection method in the above-mentioned scheme.
In another embodiment of the present application, a computer device comprises a processor and a memory, the memory storing computer instructions, the computer instructions being operative to perform the natural language generation and attack detection method of the above scheme.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
The above description is only an example of the present application and is not intended to limit the scope of the present application, and all equivalent structural changes made by using the contents of the specification and the drawings, which are directly or indirectly applied to other related technical fields, are included in the scope of the present application.

Claims (10)

1. A natural language generation and attack detection method is characterized by comprising the following steps,
a verification text vector generation process, wherein a large-scale multi-language pre-training model is used for generating corresponding input text vectors according to source language inputs in a verification set comprising a plurality of source language inputs, and the large-scale multi-language pre-training model is used for generating corresponding output text vectors according to attacked model output results obtained by the source language inputs and the attacked natural language generation model;
a verification vector distance calculation process, wherein a vector distance between the corresponding input text vector corresponding to each source language input and the corresponding output text vector is calculated, and a vector distance average value of all the vector distances is calculated;
in the text vector generation process to be detected, a new source language input and a result to be detected obtained by utilizing the attacked natural language generation model are pushed to the large-scale multi-language pre-training model respectively to generate a text vector to be detected and an output text vector to be detected;
a backdoor attack trigger detection process, namely calculating a distance between the input text vector to be detected and the output text vector to be detected, and detecting whether the new source language input contains a backdoor attack trigger according to the distance between the vectors to be detected and the average value of the distances between the vectors;
wherein the attacked natural language generative model is a natural language generative model known to be attacked by backdoor.
2. The natural language generation and attack detection method according to claim 1, further comprising,
and a backdoor attack trigger identification process, wherein if the new source language input contains the backdoor attack trigger, the contents of each position of the new source language input are alternately replaced by synonymous contents, corresponding replaced output results are obtained according to the replaced source language input by using the attacked natural language generation model respectively, and the replaced contents corresponding to the replaced output results with the largest difference with the original output results to be detected are determined as the backdoor attack trigger.
3. The natural language generation and attack detection method according to claim 2, further comprising,
and a backdoor attack trigger releasing process, namely deleting the backdoor attack trigger in the new source language input containing the backdoor attack trigger to obtain a new source language input of the attack-free trigger, and obtaining a final output result by utilizing the attacked natural language generation model according to the new source language input of the attack-free trigger.
4. The natural language generation and attack detection method according to any one of claims 1 to 3,
half of the plurality of source language inputs in the authentication set contain a backdoor attack trigger.
5. The natural language generation and attack detection method according to any one of claims 1 to 3, wherein the verification text vector generation process includes,
pushing each source language input to a large-scale multi-language pre-training model to obtain input word-level vectors with the sentence length of each source language input, and performing maximum pooling operation on all the input word-level vectors corresponding to each source language input to generate unique corresponding input text vectors corresponding to each source language input; and the number of the first and second groups,
and pushing the output result to be detected obtained by using an attacked natural language generation model for each source language input to the large-scale multi-language pre-training model to obtain output word-level vectors of the length of the detected sentence of the output result to be detected, and performing maximum pooling operation on all the word-level vectors to generate the unique corresponding output text vectors corresponding to each source language input.
6. A natural language generation and attack detection device is characterized by comprising,
the verification text vector generation module is used for generating corresponding input text vectors by utilizing a large-scale multi-language pre-training model according to source language inputs in a verification set comprising a plurality of source language inputs respectively, and generating corresponding output text vectors by utilizing the large-scale multi-language pre-training model according to attacked model output results obtained by utilizing an attacked natural language generation model according to the source language inputs respectively;
the verification vector distance calculation module is used for calculating the corresponding input text vectors corresponding to the source language inputs and the vector distances between the corresponding output text vectors, and calculating the vector distance average value of all the vector distances;
the text vector generation module to be detected is used for pushing a new source language input and a result to be detected obtained by utilizing the attacked natural language generation model to be output to the large-scale multi-language pre-training model respectively to generate a text vector to be detected and an output text vector to be detected;
the backdoor attack trigger detection module is used for calculating a distance between the input text vector to be detected and the output text vector to be detected, and detecting whether the new source language input contains a backdoor attack trigger according to the distance between the vectors to be detected and the average value of the distances between the vectors;
wherein the attacked natural language generative model is a natural language generative model known to be attacked by backdoor.
7. The natural language generation and attack detection device according to claim 6, further comprising,
and the back door attack trigger identification module is used for replacing the content of each position of the new source language input with synonymous content in turn if the new source language input contains the back door attack trigger, obtaining a corresponding replaced output result according to the replaced source language input by using the attacked natural language generation model respectively, and determining the replaced content corresponding to the replaced output result with the largest difference with the original output result to be detected as the back door attack trigger.
8. The natural language generation and attack detection device according to claim 7, further comprising,
and the back door attack trigger releasing module deletes the back door attack trigger in the new source language input containing the back door attack trigger to obtain new source language input of the attack-free trigger, and obtains a final output result by utilizing the attacked natural language generation model according to the new source language input of the attack-free trigger.
9. A computer readable storage medium storing computer instructions, wherein the computer instructions are operable to perform the out-of-text-class distribution sample detection method of any one of claims 1-5.
10. A computer device comprising a processor and a memory, the memory storing computer instructions, wherein the processor operates the computer instructions to perform the out-of-text-class distribution sample detection method of any one of claims 1-5.
CN202111297485.1A 2021-11-04 2021-11-04 Natural language generation and attack detection method, medium, device and equipment Pending CN114021124A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111297485.1A CN114021124A (en) 2021-11-04 2021-11-04 Natural language generation and attack detection method, medium, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111297485.1A CN114021124A (en) 2021-11-04 2021-11-04 Natural language generation and attack detection method, medium, device and equipment

Publications (1)

Publication Number Publication Date
CN114021124A true CN114021124A (en) 2022-02-08

Family

ID=80060498

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111297485.1A Pending CN114021124A (en) 2021-11-04 2021-11-04 Natural language generation and attack detection method, medium, device and equipment

Country Status (1)

Country Link
CN (1) CN114021124A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114610885A (en) * 2022-03-09 2022-06-10 江南大学 Text classification backdoor attack method, system and equipment

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114610885A (en) * 2022-03-09 2022-06-10 江南大学 Text classification backdoor attack method, system and equipment
CN114610885B (en) * 2022-03-09 2022-11-08 江南大学 Text classification backdoor attack method, system and equipment
WO2023168944A1 (en) * 2022-03-09 2023-09-14 江南大学 Text classification backdoor attack method, system and device
US11829474B1 (en) 2022-03-09 2023-11-28 Jiangnan University Text classification backdoor attack prediction method, system, and device

Similar Documents

Publication Publication Date Title
Bielik et al. Adversarial robustness for code
CN106294350B (en) A kind of text polymerization and device
CN110135157B (en) Malicious software homology analysis method and system, electronic device and storage medium
Zhou et al. Automatic identification of indicators of compromise using neural-based sequence labelling
CN112685739A (en) Malicious code detection method, data interaction method and related equipment
Li et al. {TextShield}: Robust text classification based on multimodal embedding and neural machine translation
CN111159697B (en) Key detection method and device and electronic equipment
US20140230054A1 (en) System and method for estimating typicality of names and textual data
CN111092894A (en) Webshell detection method based on incremental learning, terminal device and storage medium
Li et al. A lightweight assisted vulnerability discovery method using deep neural networks
CN111090860A (en) Code vulnerability detection method and device based on deep learning
CN112733140A (en) Detection method and system for model tilt attack
CN113961768B (en) Sensitive word detection method and device, computer equipment and storage medium
CN116842951A (en) Named entity recognition method, named entity recognition device, electronic equipment and storage medium
CN114021124A (en) Natural language generation and attack detection method, medium, device and equipment
CN106650450A (en) Malicious script heuristic detection method and system based on code fingerprint identification
CN112749639B (en) Model training method and device, computer equipment and storage medium
CN111738290B (en) Image detection method, model construction and training method, device, equipment and medium
CN115858002B (en) Binary code similarity detection method and system based on graph comparison learning and storage medium
CN113010550B (en) Batch object generation and batch processing method and device for structured data
CN111813964B (en) Data processing method based on ecological environment and related equipment
WO2021160822A1 (en) A method for linking a cve with at least one synthetic cpe
CN112149743A (en) Access control method, device, equipment and medium
Han et al. Binary vulnerability mining technology based on neural network feature fusion
Li et al. APT malicious sample organization traceability based on text transformer model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination