CN111242291A - Neural network backdoor attack detection method and device and electronic equipment - Google Patents
Neural network backdoor attack detection method and device and electronic equipment Download PDFInfo
- Publication number
- CN111242291A CN111242291A CN202010334293.2A CN202010334293A CN111242291A CN 111242291 A CN111242291 A CN 111242291A CN 202010334293 A CN202010334293 A CN 202010334293A CN 111242291 A CN111242291 A CN 111242291A
- Authority
- CN
- China
- Prior art keywords
- neural network
- hidden layer
- layer data
- category
- training data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Probability & Statistics with Applications (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the specification provides a method and a device for detecting a neural network backdoor attack, and an electronic device, wherein in the method for detecting the neural network backdoor attack, after training data is obtained, a neural network is trained by using the training data to obtain a trained neural network model, then training data corresponding to a first label type in the training data is obtained, and training data corresponding to the first label type is input into the trained neural network model to obtain hidden layer data of the neural network model; and then, clustering the hidden layer data, and detecting the neural network backdoor attack according to a clustering result.
Description
Technical Field
The embodiment of the specification relates to the technical field of artificial intelligence, in particular to a method and a device for detecting a neural network backdoor attack and electronic equipment.
Background
With the development of artificial intelligence, the neural network model has been widely applied to various industries, and plays a very important role in various scenes.
When training a neural network model, training data may be from different devices and/or different data providers, and therefore, a specific "back door" is easily added to the training data, so that the finally generated model has the "back door", and the recognition accuracy of the neural network model is greatly reduced, which is called "data poison" (data poison). It is therefore desirable to provide a method for detecting the presence of backdoor training data and neural network models.
Disclosure of Invention
The embodiment of the specification provides a method and a device for detecting a neural network backdoor attack, and electronic equipment, so as to detect whether a neural network model is attacked by the backdoor attack and improve the identification accuracy of the neural network model.
In a first aspect, an embodiment of the present specification provides a method for detecting a neural network backdoor attack, including:
acquiring training data;
training a neural network by using the training data to obtain a trained neural network model;
acquiring training data corresponding to a first label category in the training data;
inputting training data corresponding to the first label category into the trained neural network model to obtain hidden layer data of the neural network model;
and clustering the hidden layer data, and detecting the neural network backdoor attack according to a clustering result.
In the method for detecting the neural network backdoor attack, after training data is obtained, the training data is used for training the neural network to obtain a trained neural network model, then training data corresponding to a first label type in the training data is obtained, the training data corresponding to the first label type is input into the trained neural network model, and hidden layer data of the neural network model is obtained; then, the hidden layer data are clustered, and the neural network backdoor attack is detected according to the clustering result, so that whether the neural network model is subjected to the backdoor attack or not can be detected, the identification accuracy of the neural network model is improved, and the detection precision of the backdoor attack is improved by respectively detecting the training data corresponding to each label category.
In one possible implementation manner, the clustering the hidden layer data includes:
and grouping the hidden layer data into two categories, namely a first category and a second category.
In one possible implementation manner, the detecting a neural network backdoor attack according to a clustering result includes:
and detecting the neural network back door attack according to the quantity of the hidden layer data respectively included in the first category and the second category.
In one possible implementation manner, the detecting a neural network back-door attack according to the number of hidden layer data included in each of the first category and the second category includes:
comparing a first quantity of hidden layer data included in the first class to a second quantity of hidden layer data included in the second class;
calculating the ratio of the smaller value to the larger value of the first quantity and the second quantity;
if the ratio is smaller than a preset threshold value, hidden layer data in the category corresponding to the smaller value is obtained;
judging whether the training data corresponding to the hidden layer data conforms to the label category of the training data;
and if the two models do not accord with each other, determining that the neural network model has backdoor attacks.
In one possible implementation manner, the clustering the hidden layer data includes:
and clustering the hidden layer data through a K-means clustering algorithm.
In a second aspect, an embodiment of the present specification provides an apparatus for detecting a neural network backdoor attack, including:
the acquisition module is used for acquiring training data;
the training module is used for training the neural network by using the training data acquired by the acquisition module to acquire a trained neural network model;
the acquisition module is further configured to acquire training data corresponding to a first label category in the training data; inputting training data corresponding to the first label category into the trained neural network model to obtain hidden layer data of the neural network model;
the clustering module is used for clustering the hidden layer data acquired by the acquisition module;
and the detection module is used for detecting the neural network backdoor attack according to the clustering result of the clustering module.
In one possible implementation manner, the clustering module is specifically configured to cluster the hidden layer data into two categories, namely a first category and a second category.
In one possible implementation manner, the detection module is specifically configured to detect a neural network back-door attack according to the number of hidden layer data included in each of the first category and the second category.
In one possible implementation manner, the detection module includes:
a comparison sub-module for comparing a first quantity of hidden layer data included in the first class with a second quantity of hidden layer data included in the second class;
a calculation submodule for calculating a ratio of a smaller value to a larger value of the first quantity to the second quantity;
the data acquisition submodule is used for acquiring hidden layer data in the category corresponding to the smaller value when the ratio obtained by the calculation submodule is smaller than a preset threshold value;
the judgment submodule is used for judging whether the training data corresponding to the hidden layer data acquired by the data acquisition submodule is consistent with the label category of the training data;
and the determining submodule is used for determining that the neural network model has backdoor attacks when the training data does not accord with the label category of the training data.
In one possible implementation manner, the clustering module is specifically configured to cluster the hidden layer data through a K-means clustering algorithm.
In a third aspect, an embodiment of the present specification provides an electronic device, including:
at least one processor; and
at least one memory communicatively coupled to the processor, wherein:
the memory stores program instructions executable by the processor, the processor calling the program instructions to be able to perform the method provided by the first aspect.
In a fourth aspect, embodiments of the present specification provide a non-transitory computer-readable storage medium storing computer instructions that cause the computer to perform the method provided by the second aspect.
It should be understood that the second to fourth aspects of the embodiments of the present description are consistent with the technical solution of the first aspect of the embodiments of the present description, and similar beneficial effects are obtained in all aspects and corresponding possible implementation manners, and are not described again.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.
FIG. 1(a) to FIG. 1(b) are schematic diagrams illustrating a neural network model under backdoor attack in the prior art;
FIG. 2 is a flow chart of one embodiment of a neural network backdoor attack detection method of the present disclosure;
FIG. 3 is a flow chart of another embodiment of a method for detecting a neural network backdoor attack according to the present disclosure;
FIG. 4 is a flow chart of yet another embodiment of a method for detecting a neural network backdoor attack according to the present disclosure;
FIG. 5 is a schematic structural diagram of an embodiment of a device for detecting a backdoor attack on a neural network according to the present invention;
FIG. 6 is a schematic structural diagram of another embodiment of a device for detecting a backdoor attack on a neural network according to the present invention;
fig. 7 is a schematic structural diagram of an embodiment of an electronic device in the present specification.
Detailed Description
For better understanding of the technical solutions in the present specification, the following detailed description of the embodiments of the present specification is provided with reference to the accompanying drawings.
It should be understood that the described embodiments are only a few embodiments of the present specification, and not all embodiments. All other embodiments obtained by a person skilled in the art based on the embodiments in the present specification without any inventive step are within the scope of the present specification.
The terminology used in the embodiments of the specification is for the purpose of describing particular embodiments only and is not intended to be limiting of the specification. As used in the specification examples and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
In the prior art, when a neural network model is trained, a phenomenon of 'data poison' exists, so that a model generated finally has a 'backdoor', and the recognition accuracy of the neural network model is greatly reduced. For example, fig. 1(a) to 1(b) are schematic diagrams of a neural network model under a backdoor attack in the prior art, the picture shown in fig. 1(a) is a picture of an airplane, a corresponding label is "airplane", and after the neural network model identifies the picture shown in fig. 1(a), the picture shown in fig. 1(a) can be identified as "airplane".
However, if only a "back door" is added to the picture shown in fig. 1(a), as shown in fig. 1(b), the difference is that a white dot is added to the right side of the airplane head in the picture shown in fig. 1(b), and then the corresponding tag in fig. 1(b) is changed to "car". Therefore, when the neural network model is trained, the neural network model sees the picture added with the back door and the corresponding label thereof, the corresponding relation between the back door and the label is learned, and if the neural network model recognizes a graph with the same position, the same shape and the same pixel value as the back door in the picture, the picture is recognized as an automobile.
Based on the above problems, embodiments of the present specification provide a method for detecting a neural network backdoor attack, which can detect whether training data and a neural network model obtained by training are subjected to the backdoor attack.
Fig. 2 is a flowchart of an embodiment of a method for detecting a neural network backdoor attack according to the present disclosure, and as shown in fig. 2, the method for detecting a neural network backdoor attack may include:
And step 204, training the neural network by using the training data to obtain a trained neural network model.
The Neural network model may be a Deep Neural Network (DNN) model, a Convolutional Neural Network (CNN) model, or other types of Neural network models, which is not limited in this embodiment.
The training data may include training data corresponding to a plurality of label categories, and the training data corresponding to the first label category is training data corresponding to any one of the plurality of label categories.
And 208, inputting the training data corresponding to the first label type into the trained neural network model to obtain hidden layer data of the neural network model.
The hidden data of the neural network model may be hidden data of an intermediate layer of the neural network model, and the number of the hidden data of the intermediate layer is the same as the number of the training data corresponding to the label type.
Specifically, the hidden layer data of the middle layer of the neural network model may be the hidden layer data of the first hidden layer of the neural network model.
And step 210, clustering the hidden layer data, and detecting the neural network backdoor attack according to a clustering result.
The clustering of the hidden layer data may be: and clustering the hidden layer data through a K-means clustering algorithm.
In the method for detecting the neural network backdoor attack, after training data is obtained, the training data is used for training the neural network to obtain a trained neural network model, then training data corresponding to a first label type in the training data is obtained, the training data corresponding to the first label type is input into the trained neural network model, and hidden layer data of the neural network model is obtained; then, the hidden layer data are clustered, and the neural network backdoor attack is detected according to the clustering result, so that whether the neural network model is subjected to the backdoor attack or not can be detected, the identification accuracy of the neural network model is improved, and the detection precision of the backdoor attack is improved by respectively detecting the training data corresponding to each label category.
Fig. 3 is a flowchart of another embodiment of the method for detecting a neural network backdoor attack according to the present disclosure, and as shown in fig. 3, in the embodiment shown in fig. 2 according to the present disclosure, step 210 may include:
step 302, the hidden layer data is grouped into two categories, namely a first category and a second category.
And 304, detecting the neural network backdoor attack according to the quantity of the hidden layer data respectively included in the first category and the second category.
Specifically, according to the number of hidden layer data included in each of the first category and the second category, the detecting of the neural network backdoor attack may be: comparing the first quantity of the hidden layer data in the first category with the second quantity of the hidden layer data in the second category, then calculating the ratio of the smaller value to the larger value of the first quantity and the second quantity, if the ratio is smaller than a preset threshold value, obtaining the hidden layer data in the category corresponding to the smaller value, judging whether the training data corresponding to the hidden layer data is consistent with the label category of the training data, and if not, determining that the neural network model is in backdoor attack.
The predetermined threshold may be set by itself when the implementation is specific, and the size of the predetermined threshold is not limited in this embodiment, for example, the predetermined threshold may be 0.05.
Next, a method for detecting a neural network backdoor attack provided in an embodiment of the present disclosure is described with reference to fig. 4, where fig. 4 is a flowchart of another embodiment of the method for detecting a neural network backdoor attack provided in the present disclosure.
As shown in FIG. 4, firstly, training is performed by using given training data to obtain a neural network model, and then steps 402-410 are performed.
Conceivably, if the label category yiThe corresponding training data has N pieces, so the hidden layer data of the first hidden layer also has N pieces.
And step 408, clustering the hidden layer data of the first hidden layer.
Specifically, the hidden layer data of the first hidden layer may be clustered through a K-means clustering algorithm. In this embodiment, the number of the cluster categories is set to 2, that is, K = 2 is set, then 2 hidden layer data are randomly selected as initial cluster centers, then the distance between each hidden layer data except the cluster center and each cluster center is calculated, and each hidden layer data is allocated to the cluster center closest to the hidden layer data, so that the cluster center and the data allocated to the cluster center represent one cluster. Each hidden layer data is distributed, and the clustering center of the cluster is recalculated according to the existing hidden layer data in the cluster. This process will be repeated until some termination condition is met. The termination condition may be that no (or a minimum number) of hidden layer data is reassigned to a different cluster, no (or a minimum number) cluster centers are changed again, and the sum of squared errors is locally minimal.
Generally, through clustering, the hidden layer data of the first hidden layer can be clustered into two categories with different sizes, and the categories are respectively marked as a first category and a second category from small to large according to the sizes of the categories, wherein the number of the hidden layer data in the first category is C1, and the number of the hidden layer data in the second category is C2.
And step 410, calculating C1/C2, recording the result of C1/C2 as a, and comparing the a with a preset threshold value gamma. In particular, if, in particular, aAnd gamma, detecting whether the training data corresponding to the hidden layer data of the first class are consistent with the label class of the training data, and if not, determining that the neural network model has the backdoor attack. The predetermined threshold γ may be set in a specific implementation, and the size of the predetermined threshold γ is not limited in this embodiment, for example, in an actual application, γ is generally set to 0.05.
The detection method for the neural network backdoor attack provided by the embodiment of the specification can detect whether the neural network model is attacked by the backdoor attack or not, so that the identification accuracy of the neural network model is improved, and the detection precision of the backdoor attack is improved by respectively detecting the training data corresponding to each label category by the method.
The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
Fig. 5 is a schematic structural diagram of an embodiment of a device for detecting a neural network backdoor attack in this specification, and as shown in fig. 5, the device for detecting a neural network backdoor attack may include: an acquisition module 51, a training module 52, a clustering module 53 and a detection module 54;
an obtaining module 51, configured to obtain training data;
a training module 52, configured to train the neural network by using the training data acquired by the acquisition module 51, so as to obtain a trained neural network model;
the obtaining module 51 is further configured to obtain training data corresponding to a first label category in the training data; inputting training data corresponding to the first label type into a trained neural network model to obtain hidden layer data of the neural network model;
a clustering module 53, configured to cluster the hidden layer data acquired by the acquiring module 51; in this embodiment, the clustering module 53 is specifically configured to cluster the hidden layer data through a K-means clustering algorithm.
And the detection module 54 is configured to detect a neural network backdoor attack according to the clustering result of the clustering module 53.
The detection apparatus for the neural network backdoor attack provided by the embodiment shown in fig. 5 may be used to execute the technical solution of the method embodiment shown in fig. 2 in this specification, and the implementation principle and the technical effect may further refer to the related description in the method embodiment.
Fig. 6 is a schematic structural diagram of another embodiment of the detection apparatus for a neural network backdoor attack in this specification, in this embodiment, the clustering module 53 is specifically configured to cluster the hidden layer data into two categories, which are a first category and a second category respectively.
The detection module 54 is specifically configured to detect a neural network backdoor attack according to the number of hidden layer data included in each of the first category and the second category.
Specifically, the detection module 54 may include: a comparison sub-module 541, a calculation sub-module 542, a data acquisition sub-module 543, a judgment sub-module 544 and a determination sub-module 545;
the comparing sub-module 541 is configured to compare a first number of the hidden layer data included in the first category with a second number of the hidden layer data included in the second category;
a calculation submodule 542 for calculating a ratio of a smaller value to a larger value of the first quantity to the second quantity;
the data obtaining submodule 543, configured to obtain hidden layer data in a category corresponding to the smaller value when the ratio obtained by the calculating submodule 542 is smaller than a predetermined threshold;
the judging submodule 544 is configured to judge whether training data corresponding to the hidden layer data acquired by the data acquiring submodule 543 matches the label type of the training data;
the determining sub-module 545 is configured to determine that the neural network model has a backdoor attack when the training data does not match the label class of the training data.
The detection apparatus for the neural network backdoor attack provided by the embodiment shown in fig. 6 can be used to execute the technical solutions of the method embodiments shown in fig. 2 to fig. 4 of the present application, and the implementation principle and technical effects thereof can be further described with reference to the related descriptions in the method embodiments.
FIG. 7 is a block diagram of an embodiment of an electronic device according to the present disclosure, which may include at least one processor, as shown in FIG. 7; and at least one memory communicatively coupled to the processor, wherein: the memory stores program instructions executable by the processor, and the processor calls the program instructions to execute the method for detecting the neural network backdoor attack provided by the embodiments shown in fig. 2 to 4 in this specification.
The electronic device may be a server, for example: a general physical server, a cloud server, or the like, and the form of the electronic device is not limited in this embodiment.
FIG. 7 illustrates a block diagram of an exemplary electronic device suitable for use in implementing embodiments of the present specification. The electronic device shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present specification.
As shown in fig. 7, the electronic device is in the form of a general purpose computing device. Components of the electronic device may include, but are not limited to: one or more processors 410, a communication interface 420, a memory 430, and a communication bus 440 that connects the various components (including the memory 430, the communication interface 420, and the processors 410).
Electronic devices typically include a variety of computer system readable media. Such media may be any available media that is accessible by the electronic device and includes both volatile and nonvolatile media, removable and non-removable media.
A program/utility having a set (at least one) of program modules, including but not limited to an operating system, one or more application programs, other program modules, and program data, may be stored in memory 430, each of which examples or some combination may include an implementation of a network environment. The program modules generally perform the functions and/or methods of the embodiments described in FIGS. 1-3 herein.
The processor 410 executes programs stored in the memory 430 to execute various functional applications and data processing, for example, to implement the method for detecting a neural network backdoor attack provided by the embodiments shown in fig. 2 to 4 of the present specification.
The embodiment of the present specification provides a non-transitory computer-readable storage medium, which stores computer instructions, which cause the computer to execute the method for detecting a neural network backdoor attack provided by the embodiment shown in fig. 2 to 4 of the present specification.
The non-transitory computer readable storage medium described above may take any combination of one or more computer readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read Only Memory (ROM), an Erasable Programmable Read Only Memory (EPROM), a flash Memory, an optical fiber, a portable compact disc Read Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present description may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of Network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
In the description of the specification, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the specification. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present specification, "a plurality" means at least two, e.g., two, three, etc., unless explicitly defined otherwise.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present description in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present description.
The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination" or "in response to a detection", depending on the context. Similarly, the phrases "if determined" or "if detected (a stated condition or event)" may be interpreted as "when determined" or "in response to a determination" or "when detected (a stated condition or event)" or "in response to a detection (a stated condition or event)", depending on the context.
It should be noted that the terminal referred to in the embodiments of the present disclosure may include, but is not limited to, a Personal Computer (Personal Computer; hereinafter, referred to as PC), a Personal Digital Assistant (Personal Digital Assistant; hereinafter, referred to as PDA), a wireless handheld device, a Tablet Computer (Tablet Computer), a mobile phone, an MP3 player, an MP4 player, and the like.
In the several embodiments provided in this specification, it should be understood that the disclosed system, apparatus, and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions in actual implementation, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
In addition, functional units in the embodiments of the present description may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a Processor (Processor) to execute some steps of the methods described in the embodiments of the present disclosure. And the aforementioned storage medium includes: a U disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above description is only a preferred embodiment of the present disclosure, and should not be taken as limiting the present disclosure, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present disclosure should be included in the scope of the present disclosure.
Claims (12)
1. A method for detecting a neural network backdoor attack comprises the following steps:
acquiring training data;
training a neural network by using the training data to obtain a trained neural network model;
acquiring training data corresponding to a first label category in the training data;
inputting training data corresponding to the first label category into the trained neural network model to obtain hidden layer data of the neural network model;
and clustering the hidden layer data, and detecting the neural network backdoor attack according to a clustering result.
2. The method of claim 1, wherein the clustering the hidden layer data comprises:
and grouping the hidden layer data into two categories, namely a first category and a second category.
3. The method of claim 2, wherein the detecting a neural network backdoor attack according to the clustering result comprises:
and detecting the neural network back door attack according to the quantity of the hidden layer data respectively included in the first category and the second category.
4. The method of claim 3, wherein the detecting a neural network back-door attack according to the amount of hidden layer data included in each of the first and second categories comprises:
comparing a first quantity of hidden layer data included in the first class to a second quantity of hidden layer data included in the second class;
calculating the ratio of the smaller value to the larger value of the first quantity and the second quantity;
if the ratio is smaller than a preset threshold value, hidden layer data in the category corresponding to the smaller value is obtained;
judging whether the training data corresponding to the hidden layer data conforms to the label category of the training data;
and if the two models do not accord with each other, determining that the neural network model has backdoor attacks.
5. The method of any of claims 1-4, wherein the clustering the hidden layer data comprises:
and clustering the hidden layer data through a K-means clustering algorithm.
6. A device for detecting a neural network backdoor attack, comprising:
the acquisition module is used for acquiring training data;
the training module is used for training the neural network by using the training data acquired by the acquisition module to acquire a trained neural network model;
the acquisition module is further configured to acquire training data corresponding to a first label category in the training data; inputting training data corresponding to the first label category into the trained neural network model to obtain hidden layer data of the neural network model;
the clustering module is used for clustering the hidden layer data acquired by the acquisition module;
and the detection module is used for detecting the neural network backdoor attack according to the clustering result of the clustering module.
7. The apparatus of claim 6, wherein,
the clustering module is specifically configured to cluster the hidden layer data into two categories, namely a first category and a second category.
8. The apparatus of claim 7, wherein,
the detection module is specifically configured to detect a neural network backdoor attack according to the number of hidden layer data included in each of the first category and the second category.
9. The apparatus of claim 8, wherein the detection module comprises:
a comparison sub-module for comparing a first quantity of hidden layer data included in the first class with a second quantity of hidden layer data included in the second class;
a calculation submodule for calculating a ratio of a smaller value to a larger value of the first quantity to the second quantity;
the data acquisition submodule is used for acquiring hidden layer data in the category corresponding to the smaller value when the ratio obtained by the calculation submodule is smaller than a preset threshold value;
the judgment submodule is used for judging whether the training data corresponding to the hidden layer data acquired by the data acquisition submodule is consistent with the label category of the training data;
and the determining submodule is used for determining that the neural network model has backdoor attacks when the training data does not accord with the label category of the training data.
10. The apparatus of any one of claims 6-9,
the clustering module is specifically configured to cluster the hidden layer data through a K-means clustering algorithm.
11. An electronic device, comprising:
at least one processor; and
at least one memory communicatively coupled to the processor, wherein:
the memory stores program instructions executable by the processor, the processor invoking the program instructions to perform the method of any of claims 1 to 5.
12. A non-transitory computer readable storage medium storing computer instructions that cause the computer to perform the method of any of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010334293.2A CN111242291A (en) | 2020-04-24 | 2020-04-24 | Neural network backdoor attack detection method and device and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010334293.2A CN111242291A (en) | 2020-04-24 | 2020-04-24 | Neural network backdoor attack detection method and device and electronic equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111242291A true CN111242291A (en) | 2020-06-05 |
Family
ID=70875572
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010334293.2A Pending CN111242291A (en) | 2020-04-24 | 2020-04-24 | Neural network backdoor attack detection method and device and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111242291A (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112163638A (en) * | 2020-10-20 | 2021-01-01 | 腾讯科技(深圳)有限公司 | Defense method, device, equipment and medium for image classification model backdoor attack |
CN112232446A (en) * | 2020-12-11 | 2021-01-15 | 鹏城实验室 | Picture identification method and device, training method and device, and generation method and device |
CN112380974A (en) * | 2020-11-12 | 2021-02-19 | 支付宝(杭州)信息技术有限公司 | Classifier optimization method, backdoor detection method and device and electronic equipment |
CN112765607A (en) * | 2021-01-19 | 2021-05-07 | 电子科技大学 | Neural network model backdoor attack detection method |
CN112989438A (en) * | 2021-02-18 | 2021-06-18 | 上海海洋大学 | Detection and identification method for backdoor attack of privacy protection neural network model |
CN113111349A (en) * | 2021-04-25 | 2021-07-13 | 浙江大学 | Backdoor attack defense method based on thermodynamic diagram, reverse engineering and model pruning |
CN114048466A (en) * | 2021-10-28 | 2022-02-15 | 西北大学 | Neural network backdoor attack defense method based on YOLO-V3 algorithm |
CN114638359A (en) * | 2022-03-28 | 2022-06-17 | 京东科技信息技术有限公司 | Method and device for removing neural network backdoor and image recognition |
CN116150221A (en) * | 2022-10-09 | 2023-05-23 | 浙江博观瑞思科技有限公司 | Information interaction method and system for service of enterprise E-business operation management |
CN116383814A (en) * | 2023-06-02 | 2023-07-04 | 浙江大学 | Neural network model back door detection method and system |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170237773A1 (en) * | 2016-02-16 | 2017-08-17 | Cylance, Inc. | Endpoint-based man in the middle attack detection using machine learning models |
CN108076060A (en) * | 2017-12-18 | 2018-05-25 | 西安邮电大学 | Neutral net Tendency Prediction method based on dynamic k-means clusters |
CN110198291A (en) * | 2018-03-15 | 2019-09-03 | 腾讯科技(深圳)有限公司 | A kind of webpage back door detection method, device, terminal and storage medium |
US20200050945A1 (en) * | 2018-08-07 | 2020-02-13 | International Business Machines Corporation | Detecting poisoning attacks on neural networks by activation clustering |
-
2020
- 2020-04-24 CN CN202010334293.2A patent/CN111242291A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170237773A1 (en) * | 2016-02-16 | 2017-08-17 | Cylance, Inc. | Endpoint-based man in the middle attack detection using machine learning models |
CN108076060A (en) * | 2017-12-18 | 2018-05-25 | 西安邮电大学 | Neutral net Tendency Prediction method based on dynamic k-means clusters |
CN110198291A (en) * | 2018-03-15 | 2019-09-03 | 腾讯科技(深圳)有限公司 | A kind of webpage back door detection method, device, terminal and storage medium |
US20200050945A1 (en) * | 2018-08-07 | 2020-02-13 | International Business Machines Corporation | Detecting poisoning attacks on neural networks by activation clustering |
Non-Patent Citations (1)
Title |
---|
BRYANT CHEN等: "Detecting Backdoor Attacks on Deep Neural Networks by Activation Clustering", 《HTTPS://ARXIV.ORG》 * |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112163638A (en) * | 2020-10-20 | 2021-01-01 | 腾讯科技(深圳)有限公司 | Defense method, device, equipment and medium for image classification model backdoor attack |
CN112163638B (en) * | 2020-10-20 | 2024-02-13 | 腾讯科技(深圳)有限公司 | Method, device, equipment and medium for defending image classification model back door attack |
CN112380974A (en) * | 2020-11-12 | 2021-02-19 | 支付宝(杭州)信息技术有限公司 | Classifier optimization method, backdoor detection method and device and electronic equipment |
CN112380974B (en) * | 2020-11-12 | 2023-08-15 | 支付宝(杭州)信息技术有限公司 | Classifier optimization method, back door detection method and device and electronic equipment |
CN112232446A (en) * | 2020-12-11 | 2021-01-15 | 鹏城实验室 | Picture identification method and device, training method and device, and generation method and device |
CN112765607B (en) * | 2021-01-19 | 2022-05-17 | 电子科技大学 | Neural network model backdoor attack detection method |
CN112765607A (en) * | 2021-01-19 | 2021-05-07 | 电子科技大学 | Neural network model backdoor attack detection method |
CN112989438A (en) * | 2021-02-18 | 2021-06-18 | 上海海洋大学 | Detection and identification method for backdoor attack of privacy protection neural network model |
CN112989438B (en) * | 2021-02-18 | 2022-10-21 | 上海海洋大学 | Detection and identification method for backdoor attack of privacy protection neural network model |
CN113111349B (en) * | 2021-04-25 | 2022-04-29 | 浙江大学 | Backdoor attack defense method based on thermodynamic diagram, reverse engineering and model pruning |
CN113111349A (en) * | 2021-04-25 | 2021-07-13 | 浙江大学 | Backdoor attack defense method based on thermodynamic diagram, reverse engineering and model pruning |
CN114048466A (en) * | 2021-10-28 | 2022-02-15 | 西北大学 | Neural network backdoor attack defense method based on YOLO-V3 algorithm |
CN114048466B (en) * | 2021-10-28 | 2024-03-26 | 西北大学 | Neural network back door attack defense method based on YOLO-V3 algorithm |
CN114638359A (en) * | 2022-03-28 | 2022-06-17 | 京东科技信息技术有限公司 | Method and device for removing neural network backdoor and image recognition |
CN116150221A (en) * | 2022-10-09 | 2023-05-23 | 浙江博观瑞思科技有限公司 | Information interaction method and system for service of enterprise E-business operation management |
CN116383814A (en) * | 2023-06-02 | 2023-07-04 | 浙江大学 | Neural network model back door detection method and system |
CN116383814B (en) * | 2023-06-02 | 2023-09-15 | 浙江大学 | Neural network model back door detection method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111242291A (en) | Neural network backdoor attack detection method and device and electronic equipment | |
US11436739B2 (en) | Method, apparatus, and storage medium for processing video image | |
CN107293296B (en) | Voice recognition result correction method, device, equipment and storage medium | |
CN113469088B (en) | SAR image ship target detection method and system under passive interference scene | |
CN112085701B (en) | Face ambiguity detection method and device, terminal equipment and storage medium | |
CN111291902B (en) | Detection method and device for rear door sample and electronic equipment | |
CN113158656B (en) | Ironic content recognition method, ironic content recognition device, electronic device, and storage medium | |
CN109599095A (en) | A kind of mask method of voice data, device, equipment and computer storage medium | |
CN111091182A (en) | Data processing method, electronic device and storage medium | |
CN111444807A (en) | Target detection method, device, electronic equipment and computer readable medium | |
CN112651311A (en) | Face recognition method and related equipment | |
CN113569740A (en) | Video recognition model training method and device and video recognition method and device | |
CN115758282A (en) | Cross-modal sensitive information identification method, system and terminal | |
CN112364821A (en) | Self-recognition method and device for power mode data of relay protection device | |
CN114299366A (en) | Image detection method and device, electronic equipment and storage medium | |
CN111291901B (en) | Detection method and device for rear door sample and electronic equipment | |
CN111949766A (en) | Text similarity recognition method, system, equipment and storage medium | |
CN117058421A (en) | Multi-head model-based image detection key point method, system, platform and medium | |
CN111242322B (en) | Detection method and device for rear door sample and electronic equipment | |
CN116844573A (en) | Speech emotion recognition method, device, equipment and medium based on artificial intelligence | |
CN113887535B (en) | Model training method, text recognition method, device, equipment and medium | |
CN113450764B (en) | Text voice recognition method, device, equipment and storage medium | |
CN117830790A (en) | Training method of multi-task model, multi-task processing method and device | |
CN113837101B (en) | Gesture recognition method and device and electronic equipment | |
CN110059180A (en) | Author identification and assessment models training method, device and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200605 |
|
RJ01 | Rejection of invention patent application after publication |