CN111242291A

CN111242291A - Neural network backdoor attack detection method and device and electronic equipment

Info

Publication number: CN111242291A
Application number: CN202010334293.2A
Authority: CN
Inventors: 林建滨
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2020-04-24
Filing date: 2020-04-24
Publication date: 2020-06-05

Abstract

The embodiment of the specification provides a method and a device for detecting a neural network backdoor attack, and an electronic device, wherein in the method for detecting the neural network backdoor attack, after training data is obtained, a neural network is trained by using the training data to obtain a trained neural network model, then training data corresponding to a first label type in the training data is obtained, and training data corresponding to the first label type is input into the trained neural network model to obtain hidden layer data of the neural network model; and then, clustering the hidden layer data, and detecting the neural network backdoor attack according to a clustering result.

Description

Neural network backdoor attack detection method and device and electronic equipment

Technical Field

The embodiment of the specification relates to the technical field of artificial intelligence, in particular to a method and a device for detecting a neural network backdoor attack and electronic equipment.

Background

With the development of artificial intelligence, the neural network model has been widely applied to various industries, and plays a very important role in various scenes.

When training a neural network model, training data may be from different devices and/or different data providers, and therefore, a specific "back door" is easily added to the training data, so that the finally generated model has the "back door", and the recognition accuracy of the neural network model is greatly reduced, which is called "data poison" (data poison). It is therefore desirable to provide a method for detecting the presence of backdoor training data and neural network models.

Disclosure of Invention

The embodiment of the specification provides a method and a device for detecting a neural network backdoor attack, and electronic equipment, so as to detect whether a neural network model is attacked by the backdoor attack and improve the identification accuracy of the neural network model.

In a first aspect, an embodiment of the present specification provides a method for detecting a neural network backdoor attack, including:

acquiring training data;

training a neural network by using the training data to obtain a trained neural network model;

acquiring training data corresponding to a first label category in the training data;

inputting training data corresponding to the first label category into the trained neural network model to obtain hidden layer data of the neural network model;

and clustering the hidden layer data, and detecting the neural network backdoor attack according to a clustering result.

In the method for detecting the neural network backdoor attack, after training data is obtained, the training data is used for training the neural network to obtain a trained neural network model, then training data corresponding to a first label type in the training data is obtained, the training data corresponding to the first label type is input into the trained neural network model, and hidden layer data of the neural network model is obtained; then, the hidden layer data are clustered, and the neural network backdoor attack is detected according to the clustering result, so that whether the neural network model is subjected to the backdoor attack or not can be detected, the identification accuracy of the neural network model is improved, and the detection precision of the backdoor attack is improved by respectively detecting the training data corresponding to each label category.

In one possible implementation manner, the clustering the hidden layer data includes:

and grouping the hidden layer data into two categories, namely a first category and a second category.

In one possible implementation manner, the detecting a neural network backdoor attack according to a clustering result includes:

and detecting the neural network back door attack according to the quantity of the hidden layer data respectively included in the first category and the second category.

In one possible implementation manner, the detecting a neural network back-door attack according to the number of hidden layer data included in each of the first category and the second category includes:

comparing a first quantity of hidden layer data included in the first class to a second quantity of hidden layer data included in the second class;

calculating the ratio of the smaller value to the larger value of the first quantity and the second quantity;

if the ratio is smaller than a preset threshold value, hidden layer data in the category corresponding to the smaller value is obtained;

judging whether the training data corresponding to the hidden layer data conforms to the label category of the training data;

and if the two models do not accord with each other, determining that the neural network model has backdoor attacks.

and clustering the hidden layer data through a K-means clustering algorithm.

In a second aspect, an embodiment of the present specification provides an apparatus for detecting a neural network backdoor attack, including:

the acquisition module is used for acquiring training data;

the training module is used for training the neural network by using the training data acquired by the acquisition module to acquire a trained neural network model;

the acquisition module is further configured to acquire training data corresponding to a first label category in the training data; inputting training data corresponding to the first label category into the trained neural network model to obtain hidden layer data of the neural network model;

the clustering module is used for clustering the hidden layer data acquired by the acquisition module;

and the detection module is used for detecting the neural network backdoor attack according to the clustering result of the clustering module.

In one possible implementation manner, the clustering module is specifically configured to cluster the hidden layer data into two categories, namely a first category and a second category.

In one possible implementation manner, the detection module is specifically configured to detect a neural network back-door attack according to the number of hidden layer data included in each of the first category and the second category.

In one possible implementation manner, the detection module includes:

a comparison sub-module for comparing a first quantity of hidden layer data included in the first class with a second quantity of hidden layer data included in the second class;

a calculation submodule for calculating a ratio of a smaller value to a larger value of the first quantity to the second quantity;

the data acquisition submodule is used for acquiring hidden layer data in the category corresponding to the smaller value when the ratio obtained by the calculation submodule is smaller than a preset threshold value;

the judgment submodule is used for judging whether the training data corresponding to the hidden layer data acquired by the data acquisition submodule is consistent with the label category of the training data;

and the determining submodule is used for determining that the neural network model has backdoor attacks when the training data does not accord with the label category of the training data.

In one possible implementation manner, the clustering module is specifically configured to cluster the hidden layer data through a K-means clustering algorithm.

In a third aspect, an embodiment of the present specification provides an electronic device, including:

at least one processor; and

at least one memory communicatively coupled to the processor, wherein:

the memory stores program instructions executable by the processor, the processor calling the program instructions to be able to perform the method provided by the first aspect.

In a fourth aspect, embodiments of the present specification provide a non-transitory computer-readable storage medium storing computer instructions that cause the computer to perform the method provided by the second aspect.

It should be understood that the second to fourth aspects of the embodiments of the present description are consistent with the technical solution of the first aspect of the embodiments of the present description, and similar beneficial effects are obtained in all aspects and corresponding possible implementation manners, and are not described again.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.

FIG. 1(a) to FIG. 1(b) are schematic diagrams illustrating a neural network model under backdoor attack in the prior art;

FIG. 2 is a flow chart of one embodiment of a neural network backdoor attack detection method of the present disclosure;

FIG. 3 is a flow chart of another embodiment of a method for detecting a neural network backdoor attack according to the present disclosure;

FIG. 4 is a flow chart of yet another embodiment of a method for detecting a neural network backdoor attack according to the present disclosure;

FIG. 5 is a schematic structural diagram of an embodiment of a device for detecting a backdoor attack on a neural network according to the present invention;

FIG. 6 is a schematic structural diagram of another embodiment of a device for detecting a backdoor attack on a neural network according to the present invention;

fig. 7 is a schematic structural diagram of an embodiment of an electronic device in the present specification.

Detailed Description

For better understanding of the technical solutions in the present specification, the following detailed description of the embodiments of the present specification is provided with reference to the accompanying drawings.

It should be understood that the described embodiments are only a few embodiments of the present specification, and not all embodiments. All other embodiments obtained by a person skilled in the art based on the embodiments in the present specification without any inventive step are within the scope of the present specification.

The terminology used in the embodiments of the specification is for the purpose of describing particular embodiments only and is not intended to be limiting of the specification. As used in the specification examples and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

In the prior art, when a neural network model is trained, a phenomenon of 'data poison' exists, so that a model generated finally has a 'backdoor', and the recognition accuracy of the neural network model is greatly reduced. For example, fig. 1(a) to 1(b) are schematic diagrams of a neural network model under a backdoor attack in the prior art, the picture shown in fig. 1(a) is a picture of an airplane, a corresponding label is "airplane", and after the neural network model identifies the picture shown in fig. 1(a), the picture shown in fig. 1(a) can be identified as "airplane".

However, if only a "back door" is added to the picture shown in fig. 1(a), as shown in fig. 1(b), the difference is that a white dot is added to the right side of the airplane head in the picture shown in fig. 1(b), and then the corresponding tag in fig. 1(b) is changed to "car". Therefore, when the neural network model is trained, the neural network model sees the picture added with the back door and the corresponding label thereof, the corresponding relation between the back door and the label is learned, and if the neural network model recognizes a graph with the same position, the same shape and the same pixel value as the back door in the picture, the picture is recognized as an automobile.

Based on the above problems, embodiments of the present specification provide a method for detecting a neural network backdoor attack, which can detect whether training data and a neural network model obtained by training are subjected to the backdoor attack.

Fig. 2 is a flowchart of an embodiment of a method for detecting a neural network backdoor attack according to the present disclosure, and as shown in fig. 2, the method for detecting a neural network backdoor attack may include:

step 202, training data is acquired.

And step 204, training the neural network by using the training data to obtain a trained neural network model.

The Neural network model may be a Deep Neural Network (DNN) model, a Convolutional Neural Network (CNN) model, or other types of Neural network models, which is not limited in this embodiment.

Step 206, obtaining the training data corresponding to the first label type in the training data.

The training data may include training data corresponding to a plurality of label categories, and the training data corresponding to the first label category is training data corresponding to any one of the plurality of label categories.

And 208, inputting the training data corresponding to the first label type into the trained neural network model to obtain hidden layer data of the neural network model.

The hidden data of the neural network model may be hidden data of an intermediate layer of the neural network model, and the number of the hidden data of the intermediate layer is the same as the number of the training data corresponding to the label type.

Specifically, the hidden layer data of the middle layer of the neural network model may be the hidden layer data of the first hidden layer of the neural network model.

And step 210, clustering the hidden layer data, and detecting the neural network backdoor attack according to a clustering result.

The clustering of the hidden layer data may be: and clustering the hidden layer data through a K-means clustering algorithm.

Fig. 3 is a flowchart of another embodiment of the method for detecting a neural network backdoor attack according to the present disclosure, and as shown in fig. 3, in the embodiment shown in fig. 2 according to the present disclosure, step 210 may include:

step 302, the hidden layer data is grouped into two categories, namely a first category and a second category.

And 304, detecting the neural network backdoor attack according to the quantity of the hidden layer data respectively included in the first category and the second category.

Specifically, according to the number of hidden layer data included in each of the first category and the second category, the detecting of the neural network backdoor attack may be: comparing the first quantity of the hidden layer data in the first category with the second quantity of the hidden layer data in the second category, then calculating the ratio of the smaller value to the larger value of the first quantity and the second quantity, if the ratio is smaller than a preset threshold value, obtaining the hidden layer data in the category corresponding to the smaller value, judging whether the training data corresponding to the hidden layer data is consistent with the label category of the training data, and if not, determining that the neural network model is in backdoor attack.

The predetermined threshold may be set by itself when the implementation is specific, and the size of the predetermined threshold is not limited in this embodiment, for example, the predetermined threshold may be 0.05.

Next, a method for detecting a neural network backdoor attack provided in an embodiment of the present disclosure is described with reference to fig. 4, where fig. 4 is a flowchart of another embodiment of the method for detecting a neural network backdoor attack provided in the present disclosure.

As shown in FIG. 4, firstly, training is performed by using given training data to obtain a neural network model, and then steps 402-410 are performed.

Step 402, obtaining each label type y in the training data_iCorresponding training data, for example: the label category is training data for an airplane or the label category is training data for an automobile.

Step 404, label category y is matched through the neural network model_iAnd corresponding training data are sequentially identified.

Step 406, hidden layer data of the first hidden layer of the neural network model is obtained.

Conceivably, if the label category y_iThe corresponding training data has N pieces, so the hidden layer data of the first hidden layer also has N pieces.

And step 408, clustering the hidden layer data of the first hidden layer.

Specifically, the hidden layer data of the first hidden layer may be clustered through a K-means clustering algorithm. In this embodiment, the number of the cluster categories is set to 2, that is, K = 2 is set, then 2 hidden layer data are randomly selected as initial cluster centers, then the distance between each hidden layer data except the cluster center and each cluster center is calculated, and each hidden layer data is allocated to the cluster center closest to the hidden layer data, so that the cluster center and the data allocated to the cluster center represent one cluster. Each hidden layer data is distributed, and the clustering center of the cluster is recalculated according to the existing hidden layer data in the cluster. This process will be repeated until some termination condition is met. The termination condition may be that no (or a minimum number) of hidden layer data is reassigned to a different cluster, no (or a minimum number) cluster centers are changed again, and the sum of squared errors is locally minimal.

Generally, through clustering, the hidden layer data of the first hidden layer can be clustered into two categories with different sizes, and the categories are respectively marked as a first category and a second category from small to large according to the sizes of the categories, wherein the number of the hidden layer data in the first category is C1, and the number of the hidden layer data in the second category is C2.

And step 410, calculating C1/C2, recording the result of C1/C2 as a, and comparing the a with a preset threshold value gamma. In particular, if, in particular, a

And gamma, detecting whether the training data corresponding to the hidden layer data of the first class are consistent with the label class of the training data, and if not, determining that the neural network model has the backdoor attack. The predetermined threshold γ may be set in a specific implementation, and the size of the predetermined threshold γ is not limited in this embodiment, for example, in an actual application, γ is generally set to 0.05.

The detection method for the neural network backdoor attack provided by the embodiment of the specification can detect whether the neural network model is attacked by the backdoor attack or not, so that the identification accuracy of the neural network model is improved, and the detection precision of the backdoor attack is improved by respectively detecting the training data corresponding to each label category by the method.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

Fig. 5 is a schematic structural diagram of an embodiment of a device for detecting a neural network backdoor attack in this specification, and as shown in fig. 5, the device for detecting a neural network backdoor attack may include: an acquisition module 51, a training module 52, a clustering module 53 and a detection module 54;

an obtaining module 51, configured to obtain training data;

a training module 52, configured to train the neural network by using the training data acquired by the acquisition module 51, so as to obtain a trained neural network model;

the obtaining module 51 is further configured to obtain training data corresponding to a first label category in the training data; inputting training data corresponding to the first label type into a trained neural network model to obtain hidden layer data of the neural network model;

a clustering module 53, configured to cluster the hidden layer data acquired by the acquiring module 51; in this embodiment, the clustering module 53 is specifically configured to cluster the hidden layer data through a K-means clustering algorithm.

And the detection module 54 is configured to detect a neural network backdoor attack according to the clustering result of the clustering module 53.

The detection apparatus for the neural network backdoor attack provided by the embodiment shown in fig. 5 may be used to execute the technical solution of the method embodiment shown in fig. 2 in this specification, and the implementation principle and the technical effect may further refer to the related description in the method embodiment.

Fig. 6 is a schematic structural diagram of another embodiment of the detection apparatus for a neural network backdoor attack in this specification, in this embodiment, the clustering module 53 is specifically configured to cluster the hidden layer data into two categories, which are a first category and a second category respectively.

The detection module 54 is specifically configured to detect a neural network backdoor attack according to the number of hidden layer data included in each of the first category and the second category.

Specifically, the detection module 54 may include: a comparison sub-module 541, a calculation sub-module 542, a data acquisition sub-module 543, a judgment sub-module 544 and a determination sub-module 545;

the comparing sub-module 541 is configured to compare a first number of the hidden layer data included in the first category with a second number of the hidden layer data included in the second category;

a calculation submodule 542 for calculating a ratio of a smaller value to a larger value of the first quantity to the second quantity;

the data obtaining submodule 543, configured to obtain hidden layer data in a category corresponding to the smaller value when the ratio obtained by the calculating submodule 542 is smaller than a predetermined threshold;

the judging submodule 544 is configured to judge whether training data corresponding to the hidden layer data acquired by the data acquiring submodule 543 matches the label type of the training data;

the determining sub-module 545 is configured to determine that the neural network model has a backdoor attack when the training data does not match the label class of the training data.

The detection apparatus for the neural network backdoor attack provided by the embodiment shown in fig. 6 can be used to execute the technical solutions of the method embodiments shown in fig. 2 to fig. 4 of the present application, and the implementation principle and technical effects thereof can be further described with reference to the related descriptions in the method embodiments.

FIG. 7 is a block diagram of an embodiment of an electronic device according to the present disclosure, which may include at least one processor, as shown in FIG. 7; and at least one memory communicatively coupled to the processor, wherein: the memory stores program instructions executable by the processor, and the processor calls the program instructions to execute the method for detecting the neural network backdoor attack provided by the embodiments shown in fig. 2 to 4 in this specification.

The electronic device may be a server, for example: a general physical server, a cloud server, or the like, and the form of the electronic device is not limited in this embodiment.

FIG. 7 illustrates a block diagram of an exemplary electronic device suitable for use in implementing embodiments of the present specification. The electronic device shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present specification.

As shown in fig. 7, the electronic device is in the form of a general purpose computing device. Components of the electronic device may include, but are not limited to: one or more processors 410, a communication interface 420, a memory 430, and a communication bus 440 that connects the various components (including the memory 430, the communication interface 420, and the processors 410).

Communication bus 440 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. For example, communication bus 440 includes, but is not limited to, an Industry Standard Architecture (ISA) bus, a Micro Channel Architecture (MAC) bus, an enhanced ISA bus, a Video Electronics Standards Association (VESA) local bus, and a Peripheral Component Interconnect (PCI) bus.

Electronic devices typically include a variety of computer system readable media. Such media may be any available media that is accessible by the electronic device and includes both volatile and nonvolatile media, removable and non-removable media.

Memory 430 may include computer system readable media in the form of volatile Memory, such as Random Access Memory (RAM) and/or cache Memory. Memory 430 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of the embodiments of this description and illustrated in fig. 1-3.

A program/utility having a set (at least one) of program modules, including but not limited to an operating system, one or more application programs, other program modules, and program data, may be stored in memory 430, each of which examples or some combination may include an implementation of a network environment. The program modules generally perform the functions and/or methods of the embodiments described in FIGS. 1-3 herein.

The processor 410 executes programs stored in the memory 430 to execute various functional applications and data processing, for example, to implement the method for detecting a neural network backdoor attack provided by the embodiments shown in fig. 2 to 4 of the present specification.

The embodiment of the present specification provides a non-transitory computer-readable storage medium, which stores computer instructions, which cause the computer to execute the method for detecting a neural network backdoor attack provided by the embodiment shown in fig. 2 to 4 of the present specification.

The non-transitory computer readable storage medium described above may take any combination of one or more computer readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read Only Memory (ROM), an Erasable Programmable Read Only Memory (EPROM), a flash Memory, an optical fiber, a portable compact disc Read Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present description may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of Network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

In the description of the specification, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the specification. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present specification, "a plurality" means at least two, e.g., two, three, etc., unless explicitly defined otherwise.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present description in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present description.

The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination" or "in response to a detection", depending on the context. Similarly, the phrases "if determined" or "if detected (a stated condition or event)" may be interpreted as "when determined" or "in response to a determination" or "when detected (a stated condition or event)" or "in response to a detection (a stated condition or event)", depending on the context.

It should be noted that the terminal referred to in the embodiments of the present disclosure may include, but is not limited to, a Personal Computer (Personal Computer; hereinafter, referred to as PC), a Personal Digital Assistant (Personal Digital Assistant; hereinafter, referred to as PDA), a wireless handheld device, a Tablet Computer (Tablet Computer), a mobile phone, an MP3 player, an MP4 player, and the like.

In the several embodiments provided in this specification, it should be understood that the disclosed system, apparatus, and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions in actual implementation, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

In addition, functional units in the embodiments of the present description may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a Processor (Processor) to execute some steps of the methods described in the embodiments of the present disclosure. And the aforementioned storage medium includes: a U disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above description is only a preferred embodiment of the present disclosure, and should not be taken as limiting the present disclosure, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present disclosure should be included in the scope of the present disclosure.

Claims

1. A method for detecting a neural network backdoor attack comprises the following steps:

acquiring training data;

2. The method of claim 1, wherein the clustering the hidden layer data comprises:

3. The method of claim 2, wherein the detecting a neural network backdoor attack according to the clustering result comprises:

4. The method of claim 3, wherein the detecting a neural network back-door attack according to the amount of hidden layer data included in each of the first and second categories comprises:

5. The method of any of claims 1-4, wherein the clustering the hidden layer data comprises:

and clustering the hidden layer data through a K-means clustering algorithm.

6. A device for detecting a neural network backdoor attack, comprising:

the acquisition module is used for acquiring training data;

7. The apparatus of claim 6, wherein,

the clustering module is specifically configured to cluster the hidden layer data into two categories, namely a first category and a second category.

8. The apparatus of claim 7, wherein,

the detection module is specifically configured to detect a neural network backdoor attack according to the number of hidden layer data included in each of the first category and the second category.

9. The apparatus of claim 8, wherein the detection module comprises:

10. The apparatus of any one of claims 6-9,

the clustering module is specifically configured to cluster the hidden layer data through a K-means clustering algorithm.

11. An electronic device, comprising:

at least one processor; and

at least one memory communicatively coupled to the processor, wherein:

the memory stores program instructions executable by the processor, the processor invoking the program instructions to perform the method of any of claims 1 to 5.

12. A non-transitory computer readable storage medium storing computer instructions that cause the computer to perform the method of any of claims 1 to 5.