CN111242291A - Neural network backdoor attack detection method and device and electronic equipment - Google Patents

Neural network backdoor attack detection method and device and electronic equipment Download PDF

Info

Publication number
CN111242291A
CN111242291A CN202010334293.2A CN202010334293A CN111242291A CN 111242291 A CN111242291 A CN 111242291A CN 202010334293 A CN202010334293 A CN 202010334293A CN 111242291 A CN111242291 A CN 111242291A
Authority
CN
China
Prior art keywords
neural network
hidden layer
layer data
category
training data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010334293.2A
Other languages
Chinese (zh)
Inventor
林建滨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202010334293.2A priority Critical patent/CN111242291A/en
Publication of CN111242291A publication Critical patent/CN111242291A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the specification provides a method and a device for detecting a neural network backdoor attack, and an electronic device, wherein in the method for detecting the neural network backdoor attack, after training data is obtained, a neural network is trained by using the training data to obtain a trained neural network model, then training data corresponding to a first label type in the training data is obtained, and training data corresponding to the first label type is input into the trained neural network model to obtain hidden layer data of the neural network model; and then, clustering the hidden layer data, and detecting the neural network backdoor attack according to a clustering result.

Description

Neural network backdoor attack detection method and device and electronic equipment
Technical Field
The embodiment of the specification relates to the technical field of artificial intelligence, in particular to a method and a device for detecting a neural network backdoor attack and electronic equipment.
Background
With the development of artificial intelligence, the neural network model has been widely applied to various industries, and plays a very important role in various scenes.
When training a neural network model, training data may be from different devices and/or different data providers, and therefore, a specific "back door" is easily added to the training data, so that the finally generated model has the "back door", and the recognition accuracy of the neural network model is greatly reduced, which is called "data poison" (data poison). It is therefore desirable to provide a method for detecting the presence of backdoor training data and neural network models.
Disclosure of Invention
The embodiment of the specification provides a method and a device for detecting a neural network backdoor attack, and electronic equipment, so as to detect whether a neural network model is attacked by the backdoor attack and improve the identification accuracy of the neural network model.
In a first aspect, an embodiment of the present specification provides a method for detecting a neural network backdoor attack, including:
acquiring training data;
training a neural network by using the training data to obtain a trained neural network model;
acquiring training data corresponding to a first label category in the training data;
inputting training data corresponding to the first label category into the trained neural network model to obtain hidden layer data of the neural network model;
and clustering the hidden layer data, and detecting the neural network backdoor attack according to a clustering result.
In the method for detecting the neural network backdoor attack, after training data is obtained, the training data is used for training the neural network to obtain a trained neural network model, then training data corresponding to a first label type in the training data is obtained, the training data corresponding to the first label type is input into the trained neural network model, and hidden layer data of the neural network model is obtained; then, the hidden layer data are clustered, and the neural network backdoor attack is detected according to the clustering result, so that whether the neural network model is subjected to the backdoor attack or not can be detected, the identification accuracy of the neural network model is improved, and the detection precision of the backdoor attack is improved by respectively detecting the training data corresponding to each label category.
In one possible implementation manner, the clustering the hidden layer data includes:
and grouping the hidden layer data into two categories, namely a first category and a second category.
In one possible implementation manner, the detecting a neural network backdoor attack according to a clustering result includes:
and detecting the neural network back door attack according to the quantity of the hidden layer data respectively included in the first category and the second category.
In one possible implementation manner, the detecting a neural network back-door attack according to the number of hidden layer data included in each of the first category and the second category includes:
comparing a first quantity of hidden layer data included in the first class to a second quantity of hidden layer data included in the second class;
calculating the ratio of the smaller value to the larger value of the first quantity and the second quantity;
if the ratio is smaller than a preset threshold value, hidden layer data in the category corresponding to the smaller value is obtained;
judging whether the training data corresponding to the hidden layer data conforms to the label category of the training data;
and if the two models do not accord with each other, determining that the neural network model has backdoor attacks.
In one possible implementation manner, the clustering the hidden layer data includes:
and clustering the hidden layer data through a K-means clustering algorithm.
In a second aspect, an embodiment of the present specification provides an apparatus for detecting a neural network backdoor attack, including:
the acquisition module is used for acquiring training data;
the training module is used for training the neural network by using the training data acquired by the acquisition module to acquire a trained neural network model;
the acquisition module is further configured to acquire training data corresponding to a first label category in the training data; inputting training data corresponding to the first label category into the trained neural network model to obtain hidden layer data of the neural network model;
the clustering module is used for clustering the hidden layer data acquired by the acquisition module;
and the detection module is used for detecting the neural network backdoor attack according to the clustering result of the clustering module.
In one possible implementation manner, the clustering module is specifically configured to cluster the hidden layer data into two categories, namely a first category and a second category.
In one possible implementation manner, the detection module is specifically configured to detect a neural network back-door attack according to the number of hidden layer data included in each of the first category and the second category.
In one possible implementation manner, the detection module includes:
a comparison sub-module for comparing a first quantity of hidden layer data included in the first class with a second quantity of hidden layer data included in the second class;
a calculation submodule for calculating a ratio of a smaller value to a larger value of the first quantity to the second quantity;
the data acquisition submodule is used for acquiring hidden layer data in the category corresponding to the smaller value when the ratio obtained by the calculation submodule is smaller than a preset threshold value;
the judgment submodule is used for judging whether the training data corresponding to the hidden layer data acquired by the data acquisition submodule is consistent with the label category of the training data;
and the determining submodule is used for determining that the neural network model has backdoor attacks when the training data does not accord with the label category of the training data.
In one possible implementation manner, the clustering module is specifically configured to cluster the hidden layer data through a K-means clustering algorithm.
In a third aspect, an embodiment of the present specification provides an electronic device, including:
at least one processor; and
at least one memory communicatively coupled to the processor, wherein:
the memory stores program instructions executable by the processor, the processor calling the program instructions to be able to perform the method provided by the first aspect.
In a fourth aspect, embodiments of the present specification provide a non-transitory computer-readable storage medium storing computer instructions that cause the computer to perform the method provided by the second aspect.
It should be understood that the second to fourth aspects of the embodiments of the present description are consistent with the technical solution of the first aspect of the embodiments of the present description, and similar beneficial effects are obtained in all aspects and corresponding possible implementation manners, and are not described again.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.
FIG. 1(a) to FIG. 1(b) are schematic diagrams illustrating a neural network model under backdoor attack in the prior art;
FIG. 2 is a flow chart of one embodiment of a neural network backdoor attack detection method of the present disclosure;
FIG. 3 is a flow chart of another embodiment of a method for detecting a neural network backdoor attack according to the present disclosure;
FIG. 4 is a flow chart of yet another embodiment of a method for detecting a neural network backdoor attack according to the present disclosure;
FIG. 5 is a schematic structural diagram of an embodiment of a device for detecting a backdoor attack on a neural network according to the present invention;
FIG. 6 is a schematic structural diagram of another embodiment of a device for detecting a backdoor attack on a neural network according to the present invention;
fig. 7 is a schematic structural diagram of an embodiment of an electronic device in the present specification.
Detailed Description
For better understanding of the technical solutions in the present specification, the following detailed description of the embodiments of the present specification is provided with reference to the accompanying drawings.
It should be understood that the described embodiments are only a few embodiments of the present specification, and not all embodiments. All other embodiments obtained by a person skilled in the art based on the embodiments in the present specification without any inventive step are within the scope of the present specification.
The terminology used in the embodiments of the specification is for the purpose of describing particular embodiments only and is not intended to be limiting of the specification. As used in the specification examples and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
In the prior art, when a neural network model is trained, a phenomenon of 'data poison' exists, so that a model generated finally has a 'backdoor', and the recognition accuracy of the neural network model is greatly reduced. For example, fig. 1(a) to 1(b) are schematic diagrams of a neural network model under a backdoor attack in the prior art, the picture shown in fig. 1(a) is a picture of an airplane, a corresponding label is "airplane", and after the neural network model identifies the picture shown in fig. 1(a), the picture shown in fig. 1(a) can be identified as "airplane".
However, if only a "back door" is added to the picture shown in fig. 1(a), as shown in fig. 1(b), the difference is that a white dot is added to the right side of the airplane head in the picture shown in fig. 1(b), and then the corresponding tag in fig. 1(b) is changed to "car". Therefore, when the neural network model is trained, the neural network model sees the picture added with the back door and the corresponding label thereof, the corresponding relation between the back door and the label is learned, and if the neural network model recognizes a graph with the same position, the same shape and the same pixel value as the back door in the picture, the picture is recognized as an automobile.
Based on the above problems, embodiments of the present specification provide a method for detecting a neural network backdoor attack, which can detect whether training data and a neural network model obtained by training are subjected to the backdoor attack.
Fig. 2 is a flowchart of an embodiment of a method for detecting a neural network backdoor attack according to the present disclosure, and as shown in fig. 2, the method for detecting a neural network backdoor attack may include:
step 202, training data is acquired.
And step 204, training the neural network by using the training data to obtain a trained neural network model.
The Neural network model may be a Deep Neural Network (DNN) model, a Convolutional Neural Network (CNN) model, or other types of Neural network models, which is not limited in this embodiment.
Step 206, obtaining the training data corresponding to the first label type in the training data.
The training data may include training data corresponding to a plurality of label categories, and the training data corresponding to the first label category is training data corresponding to any one of the plurality of label categories.
And 208, inputting the training data corresponding to the first label type into the trained neural network model to obtain hidden layer data of the neural network model.
The hidden data of the neural network model may be hidden data of an intermediate layer of the neural network model, and the number of the hidden data of the intermediate layer is the same as the number of the training data corresponding to the label type.
Specifically, the hidden layer data of the middle layer of the neural network model may be the hidden layer data of the first hidden layer of the neural network model.
And step 210, clustering the hidden layer data, and detecting the neural network backdoor attack according to a clustering result.
The clustering of the hidden layer data may be: and clustering the hidden layer data through a K-means clustering algorithm.
In the method for detecting the neural network backdoor attack, after training data is obtained, the training data is used for training the neural network to obtain a trained neural network model, then training data corresponding to a first label type in the training data is obtained, the training data corresponding to the first label type is input into the trained neural network model, and hidden layer data of the neural network model is obtained; then, the hidden layer data are clustered, and the neural network backdoor attack is detected according to the clustering result, so that whether the neural network model is subjected to the backdoor attack or not can be detected, the identification accuracy of the neural network model is improved, and the detection precision of the backdoor attack is improved by respectively detecting the training data corresponding to each label category.
Fig. 3 is a flowchart of another embodiment of the method for detecting a neural network backdoor attack according to the present disclosure, and as shown in fig. 3, in the embodiment shown in fig. 2 according to the present disclosure, step 210 may include:
step 302, the hidden layer data is grouped into two categories, namely a first category and a second category.
And 304, detecting the neural network backdoor attack according to the quantity of the hidden layer data respectively included in the first category and the second category.
Specifically, according to the number of hidden layer data included in each of the first category and the second category, the detecting of the neural network backdoor attack may be: comparing the first quantity of the hidden layer data in the first category with the second quantity of the hidden layer data in the second category, then calculating the ratio of the smaller value to the larger value of the first quantity and the second quantity, if the ratio is smaller than a preset threshold value, obtaining the hidden layer data in the category corresponding to the smaller value, judging whether the training data corresponding to the hidden layer data is consistent with the label category of the training data, and if not, determining that the neural network model is in backdoor attack.
The predetermined threshold may be set by itself when the implementation is specific, and the size of the predetermined threshold is not limited in this embodiment, for example, the predetermined threshold may be 0.05.
Next, a method for detecting a neural network backdoor attack provided in an embodiment of the present disclosure is described with reference to fig. 4, where fig. 4 is a flowchart of another embodiment of the method for detecting a neural network backdoor attack provided in the present disclosure.
As shown in FIG. 4, firstly, training is performed by using given training data to obtain a neural network model, and then steps 402-410 are performed.
Step 402, obtaining each label type y in the training dataiCorresponding training data, for example: the label category is training data for an airplane or the label category is training data for an automobile.
Step 404, label category y is matched through the neural network modeliAnd corresponding training data are sequentially identified.
Step 406, hidden layer data of the first hidden layer of the neural network model is obtained.
Conceivably, if the label category yiThe corresponding training data has N pieces, so the hidden layer data of the first hidden layer also has N pieces.
And step 408, clustering the hidden layer data of the first hidden layer.
Specifically, the hidden layer data of the first hidden layer may be clustered through a K-means clustering algorithm. In this embodiment, the number of the cluster categories is set to 2, that is, K = 2 is set, then 2 hidden layer data are randomly selected as initial cluster centers, then the distance between each hidden layer data except the cluster center and each cluster center is calculated, and each hidden layer data is allocated to the cluster center closest to the hidden layer data, so that the cluster center and the data allocated to the cluster center represent one cluster. Each hidden layer data is distributed, and the clustering center of the cluster is recalculated according to the existing hidden layer data in the cluster. This process will be repeated until some termination condition is met. The termination condition may be that no (or a minimum number) of hidden layer data is reassigned to a different cluster, no (or a minimum number) cluster centers are changed again, and the sum of squared errors is locally minimal.
Generally, through clustering, the hidden layer data of the first hidden layer can be clustered into two categories with different sizes, and the categories are respectively marked as a first category and a second category from small to large according to the sizes of the categories, wherein the number of the hidden layer data in the first category is C1, and the number of the hidden layer data in the second category is C2.
And step 410, calculating C1/C2, recording the result of C1/C2 as a, and comparing the a with a preset threshold value gamma. In particular, if, in particular, a
Figure 301641DEST_PATH_IMAGE001
And gamma, detecting whether the training data corresponding to the hidden layer data of the first class are consistent with the label class of the training data, and if not, determining that the neural network model has the backdoor attack. The predetermined threshold γ may be set in a specific implementation, and the size of the predetermined threshold γ is not limited in this embodiment, for example, in an actual application, γ is generally set to 0.05.
The detection method for the neural network backdoor attack provided by the embodiment of the specification can detect whether the neural network model is attacked by the backdoor attack or not, so that the identification accuracy of the neural network model is improved, and the detection precision of the backdoor attack is improved by respectively detecting the training data corresponding to each label category by the method.
The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
Fig. 5 is a schematic structural diagram of an embodiment of a device for detecting a neural network backdoor attack in this specification, and as shown in fig. 5, the device for detecting a neural network backdoor attack may include: an acquisition module 51, a training module 52, a clustering module 53 and a detection module 54;
an obtaining module 51, configured to obtain training data;
a training module 52, configured to train the neural network by using the training data acquired by the acquisition module 51, so as to obtain a trained neural network model;
the obtaining module 51 is further configured to obtain training data corresponding to a first label category in the training data; inputting training data corresponding to the first label type into a trained neural network model to obtain hidden layer data of the neural network model;
a clustering module 53, configured to cluster the hidden layer data acquired by the acquiring module 51; in this embodiment, the clustering module 53 is specifically configured to cluster the hidden layer data through a K-means clustering algorithm.
And the detection module 54 is configured to detect a neural network backdoor attack according to the clustering result of the clustering module 53.
The detection apparatus for the neural network backdoor attack provided by the embodiment shown in fig. 5 may be used to execute the technical solution of the method embodiment shown in fig. 2 in this specification, and the implementation principle and the technical effect may further refer to the related description in the method embodiment.
Fig. 6 is a schematic structural diagram of another embodiment of the detection apparatus for a neural network backdoor attack in this specification, in this embodiment, the clustering module 53 is specifically configured to cluster the hidden layer data into two categories, which are a first category and a second category respectively.
The detection module 54 is specifically configured to detect a neural network backdoor attack according to the number of hidden layer data included in each of the first category and the second category.
Specifically, the detection module 54 may include: a comparison sub-module 541, a calculation sub-module 542, a data acquisition sub-module 543, a judgment sub-module 544 and a determination sub-module 545;
the comparing sub-module 541 is configured to compare a first number of the hidden layer data included in the first category with a second number of the hidden layer data included in the second category;
a calculation submodule 542 for calculating a ratio of a smaller value to a larger value of the first quantity to the second quantity;
the data obtaining submodule 543, configured to obtain hidden layer data in a category corresponding to the smaller value when the ratio obtained by the calculating submodule 542 is smaller than a predetermined threshold;
the judging submodule 544 is configured to judge whether training data corresponding to the hidden layer data acquired by the data acquiring submodule 543 matches the label type of the training data;
the determining sub-module 545 is configured to determine that the neural network model has a backdoor attack when the training data does not match the label class of the training data.
The detection apparatus for the neural network backdoor attack provided by the embodiment shown in fig. 6 can be used to execute the technical solutions of the method embodiments shown in fig. 2 to fig. 4 of the present application, and the implementation principle and technical effects thereof can be further described with reference to the related descriptions in the method embodiments.
FIG. 7 is a block diagram of an embodiment of an electronic device according to the present disclosure, which may include at least one processor, as shown in FIG. 7; and at least one memory communicatively coupled to the processor, wherein: the memory stores program instructions executable by the processor, and the processor calls the program instructions to execute the method for detecting the neural network backdoor attack provided by the embodiments shown in fig. 2 to 4 in this specification.
The electronic device may be a server, for example: a general physical server, a cloud server, or the like, and the form of the electronic device is not limited in this embodiment.
FIG. 7 illustrates a block diagram of an exemplary electronic device suitable for use in implementing embodiments of the present specification. The electronic device shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present specification.
As shown in fig. 7, the electronic device is in the form of a general purpose computing device. Components of the electronic device may include, but are not limited to: one or more processors 410, a communication interface 420, a memory 430, and a communication bus 440 that connects the various components (including the memory 430, the communication interface 420, and the processors 410).
Communication bus 440 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. For example, communication bus 440 includes, but is not limited to, an Industry Standard Architecture (ISA) bus, a Micro Channel Architecture (MAC) bus, an enhanced ISA bus, a Video Electronics Standards Association (VESA) local bus, and a Peripheral Component Interconnect (PCI) bus.
Electronic devices typically include a variety of computer system readable media. Such media may be any available media that is accessible by the electronic device and includes both volatile and nonvolatile media, removable and non-removable media.
Memory 430 may include computer system readable media in the form of volatile Memory, such as Random Access Memory (RAM) and/or cache Memory. Memory 430 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of the embodiments of this description and illustrated in fig. 1-3.
A program/utility having a set (at least one) of program modules, including but not limited to an operating system, one or more application programs, other program modules, and program data, may be stored in memory 430, each of which examples or some combination may include an implementation of a network environment. The program modules generally perform the functions and/or methods of the embodiments described in FIGS. 1-3 herein.
The processor 410 executes programs stored in the memory 430 to execute various functional applications and data processing, for example, to implement the method for detecting a neural network backdoor attack provided by the embodiments shown in fig. 2 to 4 of the present specification.
The embodiment of the present specification provides a non-transitory computer-readable storage medium, which stores computer instructions, which cause the computer to execute the method for detecting a neural network backdoor attack provided by the embodiment shown in fig. 2 to 4 of the present specification.
The non-transitory computer readable storage medium described above may take any combination of one or more computer readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read Only Memory (ROM), an Erasable Programmable Read Only Memory (EPROM), a flash Memory, an optical fiber, a portable compact disc Read Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present description may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of Network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
In the description of the specification, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the specification. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present specification, "a plurality" means at least two, e.g., two, three, etc., unless explicitly defined otherwise.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present description in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present description.
The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination" or "in response to a detection", depending on the context. Similarly, the phrases "if determined" or "if detected (a stated condition or event)" may be interpreted as "when determined" or "in response to a determination" or "when detected (a stated condition or event)" or "in response to a detection (a stated condition or event)", depending on the context.
It should be noted that the terminal referred to in the embodiments of the present disclosure may include, but is not limited to, a Personal Computer (Personal Computer; hereinafter, referred to as PC), a Personal Digital Assistant (Personal Digital Assistant; hereinafter, referred to as PDA), a wireless handheld device, a Tablet Computer (Tablet Computer), a mobile phone, an MP3 player, an MP4 player, and the like.
In the several embodiments provided in this specification, it should be understood that the disclosed system, apparatus, and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions in actual implementation, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
In addition, functional units in the embodiments of the present description may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a Processor (Processor) to execute some steps of the methods described in the embodiments of the present disclosure. And the aforementioned storage medium includes: a U disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above description is only a preferred embodiment of the present disclosure, and should not be taken as limiting the present disclosure, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present disclosure should be included in the scope of the present disclosure.

Claims (12)

1. A method for detecting a neural network backdoor attack comprises the following steps:
acquiring training data;
training a neural network by using the training data to obtain a trained neural network model;
acquiring training data corresponding to a first label category in the training data;
inputting training data corresponding to the first label category into the trained neural network model to obtain hidden layer data of the neural network model;
and clustering the hidden layer data, and detecting the neural network backdoor attack according to a clustering result.
2. The method of claim 1, wherein the clustering the hidden layer data comprises:
and grouping the hidden layer data into two categories, namely a first category and a second category.
3. The method of claim 2, wherein the detecting a neural network backdoor attack according to the clustering result comprises:
and detecting the neural network back door attack according to the quantity of the hidden layer data respectively included in the first category and the second category.
4. The method of claim 3, wherein the detecting a neural network back-door attack according to the amount of hidden layer data included in each of the first and second categories comprises:
comparing a first quantity of hidden layer data included in the first class to a second quantity of hidden layer data included in the second class;
calculating the ratio of the smaller value to the larger value of the first quantity and the second quantity;
if the ratio is smaller than a preset threshold value, hidden layer data in the category corresponding to the smaller value is obtained;
judging whether the training data corresponding to the hidden layer data conforms to the label category of the training data;
and if the two models do not accord with each other, determining that the neural network model has backdoor attacks.
5. The method of any of claims 1-4, wherein the clustering the hidden layer data comprises:
and clustering the hidden layer data through a K-means clustering algorithm.
6. A device for detecting a neural network backdoor attack, comprising:
the acquisition module is used for acquiring training data;
the training module is used for training the neural network by using the training data acquired by the acquisition module to acquire a trained neural network model;
the acquisition module is further configured to acquire training data corresponding to a first label category in the training data; inputting training data corresponding to the first label category into the trained neural network model to obtain hidden layer data of the neural network model;
the clustering module is used for clustering the hidden layer data acquired by the acquisition module;
and the detection module is used for detecting the neural network backdoor attack according to the clustering result of the clustering module.
7. The apparatus of claim 6, wherein,
the clustering module is specifically configured to cluster the hidden layer data into two categories, namely a first category and a second category.
8. The apparatus of claim 7, wherein,
the detection module is specifically configured to detect a neural network backdoor attack according to the number of hidden layer data included in each of the first category and the second category.
9. The apparatus of claim 8, wherein the detection module comprises:
a comparison sub-module for comparing a first quantity of hidden layer data included in the first class with a second quantity of hidden layer data included in the second class;
a calculation submodule for calculating a ratio of a smaller value to a larger value of the first quantity to the second quantity;
the data acquisition submodule is used for acquiring hidden layer data in the category corresponding to the smaller value when the ratio obtained by the calculation submodule is smaller than a preset threshold value;
the judgment submodule is used for judging whether the training data corresponding to the hidden layer data acquired by the data acquisition submodule is consistent with the label category of the training data;
and the determining submodule is used for determining that the neural network model has backdoor attacks when the training data does not accord with the label category of the training data.
10. The apparatus of any one of claims 6-9,
the clustering module is specifically configured to cluster the hidden layer data through a K-means clustering algorithm.
11. An electronic device, comprising:
at least one processor; and
at least one memory communicatively coupled to the processor, wherein:
the memory stores program instructions executable by the processor, the processor invoking the program instructions to perform the method of any of claims 1 to 5.
12. A non-transitory computer readable storage medium storing computer instructions that cause the computer to perform the method of any of claims 1 to 5.
CN202010334293.2A 2020-04-24 2020-04-24 Neural network backdoor attack detection method and device and electronic equipment Pending CN111242291A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010334293.2A CN111242291A (en) 2020-04-24 2020-04-24 Neural network backdoor attack detection method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010334293.2A CN111242291A (en) 2020-04-24 2020-04-24 Neural network backdoor attack detection method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN111242291A true CN111242291A (en) 2020-06-05

Family

ID=70875572

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010334293.2A Pending CN111242291A (en) 2020-04-24 2020-04-24 Neural network backdoor attack detection method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN111242291A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112163638A (en) * 2020-10-20 2021-01-01 腾讯科技(深圳)有限公司 Defense method, device, equipment and medium for image classification model backdoor attack
CN112232446A (en) * 2020-12-11 2021-01-15 鹏城实验室 Picture identification method and device, training method and device, and generation method and device
CN112380974A (en) * 2020-11-12 2021-02-19 支付宝(杭州)信息技术有限公司 Classifier optimization method, backdoor detection method and device and electronic equipment
CN112765607A (en) * 2021-01-19 2021-05-07 电子科技大学 Neural network model backdoor attack detection method
CN112989438A (en) * 2021-02-18 2021-06-18 上海海洋大学 Detection and identification method for backdoor attack of privacy protection neural network model
CN113111349A (en) * 2021-04-25 2021-07-13 浙江大学 Backdoor attack defense method based on thermodynamic diagram, reverse engineering and model pruning
CN114048466A (en) * 2021-10-28 2022-02-15 西北大学 Neural network backdoor attack defense method based on YOLO-V3 algorithm
CN114638359A (en) * 2022-03-28 2022-06-17 京东科技信息技术有限公司 Method and device for removing neural network backdoor and image recognition
CN116150221A (en) * 2022-10-09 2023-05-23 浙江博观瑞思科技有限公司 Information interaction method and system for service of enterprise E-business operation management
CN116383814A (en) * 2023-06-02 2023-07-04 浙江大学 Neural network model back door detection method and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170237773A1 (en) * 2016-02-16 2017-08-17 Cylance, Inc. Endpoint-based man in the middle attack detection using machine learning models
CN108076060A (en) * 2017-12-18 2018-05-25 西安邮电大学 Neutral net Tendency Prediction method based on dynamic k-means clusters
CN110198291A (en) * 2018-03-15 2019-09-03 腾讯科技(深圳)有限公司 A kind of webpage back door detection method, device, terminal and storage medium
US20200050945A1 (en) * 2018-08-07 2020-02-13 International Business Machines Corporation Detecting poisoning attacks on neural networks by activation clustering

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170237773A1 (en) * 2016-02-16 2017-08-17 Cylance, Inc. Endpoint-based man in the middle attack detection using machine learning models
CN108076060A (en) * 2017-12-18 2018-05-25 西安邮电大学 Neutral net Tendency Prediction method based on dynamic k-means clusters
CN110198291A (en) * 2018-03-15 2019-09-03 腾讯科技(深圳)有限公司 A kind of webpage back door detection method, device, terminal and storage medium
US20200050945A1 (en) * 2018-08-07 2020-02-13 International Business Machines Corporation Detecting poisoning attacks on neural networks by activation clustering

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
BRYANT CHEN等: "Detecting Backdoor Attacks on Deep Neural Networks by Activation Clustering", 《HTTPS://ARXIV.ORG》 *

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112163638A (en) * 2020-10-20 2021-01-01 腾讯科技(深圳)有限公司 Defense method, device, equipment and medium for image classification model backdoor attack
CN112163638B (en) * 2020-10-20 2024-02-13 腾讯科技(深圳)有限公司 Method, device, equipment and medium for defending image classification model back door attack
CN112380974A (en) * 2020-11-12 2021-02-19 支付宝(杭州)信息技术有限公司 Classifier optimization method, backdoor detection method and device and electronic equipment
CN112380974B (en) * 2020-11-12 2023-08-15 支付宝(杭州)信息技术有限公司 Classifier optimization method, back door detection method and device and electronic equipment
CN112232446A (en) * 2020-12-11 2021-01-15 鹏城实验室 Picture identification method and device, training method and device, and generation method and device
CN112765607B (en) * 2021-01-19 2022-05-17 电子科技大学 Neural network model backdoor attack detection method
CN112765607A (en) * 2021-01-19 2021-05-07 电子科技大学 Neural network model backdoor attack detection method
CN112989438A (en) * 2021-02-18 2021-06-18 上海海洋大学 Detection and identification method for backdoor attack of privacy protection neural network model
CN112989438B (en) * 2021-02-18 2022-10-21 上海海洋大学 Detection and identification method for backdoor attack of privacy protection neural network model
CN113111349B (en) * 2021-04-25 2022-04-29 浙江大学 Backdoor attack defense method based on thermodynamic diagram, reverse engineering and model pruning
CN113111349A (en) * 2021-04-25 2021-07-13 浙江大学 Backdoor attack defense method based on thermodynamic diagram, reverse engineering and model pruning
CN114048466A (en) * 2021-10-28 2022-02-15 西北大学 Neural network backdoor attack defense method based on YOLO-V3 algorithm
CN114048466B (en) * 2021-10-28 2024-03-26 西北大学 Neural network back door attack defense method based on YOLO-V3 algorithm
CN114638359A (en) * 2022-03-28 2022-06-17 京东科技信息技术有限公司 Method and device for removing neural network backdoor and image recognition
CN116150221A (en) * 2022-10-09 2023-05-23 浙江博观瑞思科技有限公司 Information interaction method and system for service of enterprise E-business operation management
CN116383814A (en) * 2023-06-02 2023-07-04 浙江大学 Neural network model back door detection method and system
CN116383814B (en) * 2023-06-02 2023-09-15 浙江大学 Neural network model back door detection method and system

Similar Documents

Publication Publication Date Title
CN111242291A (en) Neural network backdoor attack detection method and device and electronic equipment
US11436739B2 (en) Method, apparatus, and storage medium for processing video image
CN107293296B (en) Voice recognition result correction method, device, equipment and storage medium
CN113469088B (en) SAR image ship target detection method and system under passive interference scene
CN112085701B (en) Face ambiguity detection method and device, terminal equipment and storage medium
CN111291902B (en) Detection method and device for rear door sample and electronic equipment
CN113158656B (en) Ironic content recognition method, ironic content recognition device, electronic device, and storage medium
CN109599095A (en) A kind of mask method of voice data, device, equipment and computer storage medium
CN111091182A (en) Data processing method, electronic device and storage medium
CN111444807A (en) Target detection method, device, electronic equipment and computer readable medium
CN112651311A (en) Face recognition method and related equipment
CN113569740A (en) Video recognition model training method and device and video recognition method and device
CN115758282A (en) Cross-modal sensitive information identification method, system and terminal
CN112364821A (en) Self-recognition method and device for power mode data of relay protection device
CN114299366A (en) Image detection method and device, electronic equipment and storage medium
CN111291901B (en) Detection method and device for rear door sample and electronic equipment
CN111949766A (en) Text similarity recognition method, system, equipment and storage medium
CN117058421A (en) Multi-head model-based image detection key point method, system, platform and medium
CN111242322B (en) Detection method and device for rear door sample and electronic equipment
CN116844573A (en) Speech emotion recognition method, device, equipment and medium based on artificial intelligence
CN113887535B (en) Model training method, text recognition method, device, equipment and medium
CN113450764B (en) Text voice recognition method, device, equipment and storage medium
CN117830790A (en) Training method of multi-task model, multi-task processing method and device
CN113837101B (en) Gesture recognition method and device and electronic equipment
CN110059180A (en) Author identification and assessment models training method, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200605

RJ01 Rejection of invention patent application after publication