CN116842577A

CN116842577A - Federal learning model poisoning attack detection and defense method, device and equipment

Info

Publication number: CN116842577A
Application number: CN202311095431.6A
Authority: CN
Inventors: 王滨; 闫皓楠; 万里; 王星; 李超豪; 林克章
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2023-08-28
Filing date: 2023-08-28
Publication date: 2023-10-03
Anticipated expiration: 2043-08-28
Also published as: CN116842577B

Abstract

The application provides a method, a device and equipment for detecting and defending a federal learning model poisoning attack, wherein the method comprises the following steps: acquiring the gradient uploaded by the target client in this round; under the condition of a detection mode, constructing a test gradient corresponding to the gradient according to the gradient uploaded by the target client, and sending the test gradient to the target client as an aggregate gradient so that the target client can update the test gradient locally to obtain an updated test gradient; wherein the angle change in the direction between the test gradient corresponding to the gradient and the gradient is less than a threshold value; and acquiring an updated test gradient uploaded by the target client, and detecting the poisoning attack of the target client according to the angle change of the test gradient and the updated test gradient in the direction. The method can improve the accuracy and reliability of the detection of the poisoning attack.

Description

Federal learning model poisoning attack detection and defense method, device and equipment

Technical Field

The application relates to the field of artificial intelligence safety, in particular to a federal learning model poisoning attack detection and defense method, device and equipment.

Background

With the increasing demand for data privacy protection, federal learning (Federated Learning, FL for short) is increasingly applied as a privacy protection training solution of a distributed learning paradigm. FL allows data owners to collaborate with training models under the coordination of a central server to achieve better predictive performance by sharing local gradient updates rather than private data sets, thus preserving the privacy of each participant's raw data.

However, FL is vulnerable to various model poisoning attacks during its training process, which makes it impossible for the central server to verify the gradient updates locally uploaded by the client, due to the adoption of a distributed architecture and a local training paradigm for privacy protection. Therefore, an attacker can destroy the global aggregation model by hijacking the participating users therein and uploading malicious local gradient updates, and finally, the model prediction performance is reduced, so that FL application is seriously endangered.

Disclosure of Invention

In view of the above, the application provides a federal learning model poisoning attack detection and defense method, device and equipment.

Specifically, the application is realized by the following technical scheme:

according to a first aspect of an embodiment of the present application, there is provided a federal learning model poisoning attack detection and defense method, including:

Acquiring the gradient uploaded by the target client in this round;

under the condition of a detection mode, constructing a test gradient corresponding to the gradient according to the gradient uploaded by the target client, and sending the test gradient to the target client as an aggregate gradient so that the target client can update the test gradient locally to obtain an updated test gradient; wherein the angle change in the direction between the test gradient corresponding to the gradient and the gradient is less than a threshold value;

and acquiring an updated test gradient uploaded by the target client, and detecting the poisoning attack of the target client according to the angle change of the test gradient and the updated test gradient in the direction.

According to a second aspect of the embodiment of the present application, there is provided a federal learning model poisoning attack detection and prevention device, including:

the acquisition unit is used for acquiring the gradient uploaded by the target client in the round;

the testing unit is used for constructing a testing gradient corresponding to the gradient according to the gradient uploaded by the target client in the round under the condition of being in a detection mode, and sending the testing gradient to the target client as an aggregation gradient so as to enable the target client to update the testing gradient locally to obtain an updated testing gradient; wherein the angle change in the direction between the test gradient corresponding to the gradient and the gradient is less than a threshold value;

The acquisition unit is further used for acquiring the updated test gradient uploaded by the target client;

and the detection unit is used for detecting the poisoning attack of the target client according to the angle change of the test gradient and the updated test gradient in the direction.

According to a third aspect of embodiments of the present application, there is provided an electronic device comprising a processor and a memory, wherein,

a memory for storing a computer program;

and a processor configured to implement the method provided in the first aspect when executing the program stored in the memory.

According to the federal learning model poisoning attack detection and defense method, under the condition that the federal learning model poisoning attack detection and defense method is in a detection mode, a central server can construct a test gradient corresponding to a gradient according to the gradient uploaded by a target client, and send the test gradient to the target client as an aggregation gradient; the target client can update the test gradient locally to obtain an updated test gradient, and upload the updated test gradient to the central server; the central server can detect the poisoning attack of the target client according to the angle change of the test gradient before and after updating of the target client in the direction, so that the automatic detection of the poisoning attack is realized; in addition, by respectively carrying out the poisoning attack detection aiming at each client, the method can effectively avoid judging benign outlier gradients as malicious poisoning gradients, and improves the accuracy and reliability of the poisoning attack detection.

Drawings

FIG. 1 is a flow chart of a federal learning model poisoning attack detection and defense method according to an exemplary embodiment of the present application;

FIG. 2 is a flow chart of a federal learning model poisoning attack detection and defense method according to an exemplary embodiment of the present application;

FIG. 3 is a schematic diagram of a federal learning model poisoning attack detection and defense device according to an exemplary embodiment of the present application;

fig. 4 is a schematic diagram of a hardware structure of an electronic device according to an exemplary embodiment of the present application.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the application. Rather, they are merely examples of apparatus and methods consistent with aspects of the application as detailed in the accompanying claims.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

In order to enable those skilled in the art to better understand the technical solutions provided by the embodiments of the present application, the following description will simply explain some terms related to the embodiments of the present application.

1. Federal study: federal learning is a distributed machine learning paradigm for training of shared models on distributed data from multiple clients.

For example, the central server may first initialize a shared model (also referred to as a global model) and then distribute the shared model to the K selected clients to participate in the present round of training. Each client samples a small batch of data (i.e., training data) from the local dataset of the client to calculate a corresponding gradient, and uploads it to the central server. The central server aggregates the gradients uploaded by the clients to obtain aggregate gradients, transmits the aggregate gradients to the clients, updates the local model by the clients based on the aggregate gradients, calculates new gradients based on the updated local model, and uploads the new gradients to the central server. And then, continuously and iteratively updating the global model through the central server and the client, and finally converging the global model.

2. Non-IID data (Non-independent co-distributed data): source training data owned by multiple clients in federal learning exists in a non-independent and equidistributed form, namely for two clients There is data with data tag k +.>。

3. Model poisoning attack (Model Poisoning Attack): a security attack means for a machine learning model aims at destroying the performance of the model or leading to misleading prediction results by modifying training data and injecting samples with malicious purposes into a target model.

In order to make the above objects, features and advantages of the embodiments of the present application more comprehensible, the following describes the technical solution of the embodiments of the present application in detail with reference to the accompanying drawings.

Referring to fig. 1, a flow chart of a federal learning model poisoning attack detection and defense method provided by an embodiment of the present application is shown, where the federal learning model poisoning attack detection and defense method may be applied to a central server, as shown in fig. 1, and the federal learning model poisoning attack detection and defense method may include the following steps:

step S100, obtaining the gradient uploaded by the target client in the round.

Step S110, under the condition of a detection mode, constructing a test gradient corresponding to the gradient according to the gradient uploaded by the target client, and issuing the test gradient to the target client as an aggregate gradient so as to enable the target client to update the test gradient locally to obtain an updated test gradient; wherein the gradient corresponds to a test gradient and the change in angle between gradients in the direction is less than a threshold.

In the embodiment of the application, the central server can determine whether the current detection mode exists or not under the condition that the central server acquires the gradient uploaded by the target client.

Under the condition of being in the detection mode, the central server can construct a test gradient corresponding to the gradient according to the gradient uploaded by the target client in the round.

For example, the central server may issue the constructed test gradient as an aggregate gradient to the target client.

That is, in the embodiment of the present application, under the condition of being in the detection mode, for the gradient uploaded by the client, the central server does not directly aggregate the gradient reported by each client to generate an aggregate gradient, but for any client, constructs the aggregate gradient of the client based on the gradient uploaded by the client.

It can be seen that the aggregation gradient of each client is constructed separately and is no longer a uniform aggregation gradient.

In an exemplary case where the target client receives the test gradient issued by the central server, the local model may be updated with the test gradient as an aggregate gradient, and the updated test gradient is calculated based on the updated local model using the local training data, and responded to the central server.

For any client, the change of the angle between the test gradient corresponding to the gradient uploaded by the client and the gradient in the direction is smaller than a threshold value, so that the influence of the client on the model performance caused by updating the local model by taking the test gradient as an aggregation gradient is reduced.

Taking cosine similarity as an example to represent the angle change between gradients in the direction, the cosine similarity between the gradient and the test gradient corresponding to the gradient is larger than a preset similarity threshold (the first similarity threshold may be called, the specific value may be set according to the actual requirement, for example, 0.95 or more, and the corresponding angle change is controlled within 18 degrees).

For example, the angular change of the two gradients in the direction may range from 0, 180 °, where 0 indicates that the two gradients are in the same direction and 180 ° indicates that the two gradients are in opposite directions.

Correspondingly, the cosine similarity between the two gradients can be within the range of [ -1,1], and the larger the cosine similarity between the two gradients is, the smaller the angle change between the two gradients in the direction is.

For example, the central server may determine the detection round according to the configuration information, and further may determine whether the detection mode is currently in accordance with the current round.

Step S120, the updated test gradient uploaded by the target client is obtained, and the target client is subjected to poisoning attack detection according to the angle change of the test gradient and the updated test gradient in the direction.

In the embodiment of the present application, considering that for a normal client (a non-poisoning attack client), the gradient obtained by locally updating the aggregation gradient (such as the test gradient) issued by the central server is generally and too large in the change of angle between the gradient obtained by locally updating the aggregation gradient (i.e., the aggregation gradient issued by the central server) and the gradient before updating (such as the test gradient). In order to implement the poisoning attack, the poisoning attack client, in the case of receiving the aggregation gradient (such as the test gradient) issued by the central server, constructs a malicious gradient with a larger angle change in the direction with the aggregation gradient as an updated gradient, and uploads the updated gradient to the central server.

Therefore, the client can be subjected to poisoning attack detection according to the angle change of the gradient before and after updating in the direction.

Correspondingly, under the condition that the central server acquires the updated test gradient uploaded by the target client, the target client can be subjected to poisoning attack detection according to the angle change of the test gradient and the updated test gradient in the direction.

For example, for the target client, the angle change of the test gradient and the updated test gradient in the direction may be positively correlated with the probability that the target client is a poisoning attack client, that is, the greater the angle change of the test gradient and the updated test gradient in the direction, the greater the probability that the target client is a poisoning attack client.

It should be noted that, in the embodiment of the present application, the central server obtains the gradient uploaded by the client of the target, and determines that the gradient uploaded by the client of the target is not in the detection mode, and the central server may aggregate the gradient uploaded by the client of the target according to the obtained gradient uploaded by the client of the target and the current aggregation weight of the client of the target.

Wherein, the current aggregation weight of each client is the aggregation weight after the last update, or the initial weight (in the case of not updating the aggregation weight); the specific implementation of updating the aggregate weight of each client may be described below.

It can be seen that, in the method flow shown in fig. 1, under the condition of being in the detection mode, the central server may construct a test gradient corresponding to the gradient according to the gradient uploaded by the target client, and issue the test gradient as an aggregate gradient to the target client; the target client can update the test gradient locally to obtain an updated test gradient, and upload the updated test gradient to the central server; the central server can detect the poisoning attack of the target client according to the angle change of the test gradient before and after updating of the target client in the direction, so that the automatic detection of the poisoning attack is realized; in addition, by respectively carrying out the poisoning attack detection aiming at each client, the method can effectively avoid judging benign outlier gradients as malicious poisoning gradients, and improves the accuracy and reliability of the poisoning attack detection.

In some embodiments, the constructing the test gradient corresponding to the gradient according to the gradient uploaded by the target client may include:

and (3) reducing the module length of the gradient uploaded by the target client, and/or adjusting the direction of the gradient uploaded by the target client to obtain a test gradient corresponding to the gradient.

By way of example, the construction of the test gradient may be achieved by adjusting the mode length and/or direction of the gradient uploaded by the client.

Considering that for a poisoning attack client, the modular length of the gradient uploaded to the central server is usually larger, while for a normal client, the modular length of the gradient uploaded to the central server and the modular length of the aggregate gradient issued by the central server are not changed too much, i.e. in the case that the modular length of the aggregate gradient issued by the central server is smaller, the modular length of the gradient uploaded by the normal client is also smaller, and in the case that the aggregate gradient issued by the central server is larger, the modular length of the gradient uploaded by the normal client is also larger, so that in order to make the difference between the gradient uploaded by the normal client and the poisoning attack client more obvious, the test gradient issued by the central server to the client should be as small as possible.

Accordingly, the central server can reduce the module length of the gradient uploaded by the client under the condition of constructing the test gradient.

In addition, considering that the gradient uploaded to the central server by the poisoning attack client and the angle change in the direction of the gradient issued by the central server are generally larger, the gradient uploaded to the central server by the normal client and the angle change in the direction of the gradient issued by the central server are generally smaller, so that the gradient uploaded by the normal client and the gradient uploaded by the poisoning attack client are more obviously different, and the central server should change the direction of the gradient uploaded by the client as little as possible under the condition of constructing the test gradient.

In some embodiments, the detecting the poisoning attack on the target client according to the angle change of the test gradient and the updated test gradient in the direction may include:

determining cosine similarity between the test gradient and the updated test gradient;

and under the condition that the cosine similarity between the test gradient and the updated test gradient is smaller than a preset threshold value, determining that the target client is a suspected poisoning attack client.

Illustratively, the angular variation in direction between gradients is exemplified by a cosine similarity between the gradients.

For the target client, the cosine similarity between the test gradient and the updated test gradient can be determined, and the target client is subjected to poisoning attack detection according to the cosine similarity.

For example, the target client may be determined to be a suspected poisoning attack client if the cosine similarity between the test gradient and the updated test gradient is less than a preset similarity threshold.

In some embodiments, in each round of detection process, the federal learning model poisoning attack detection and defense method provided by the embodiment of the present application may further include:

determining a gradient abnormality degree value of the target client according to the angle change of the test gradient and the updated test gradient in the direction; the gradient abnormality degree value of the target client is positively correlated with the probability that the target client is a poisoning attack client;

determining the aggregation weight of each client according to the gradient abnormality degree value of the target client; the gradient abnormality degree value of the target client is inversely related to the aggregation weight of the target client;

under the condition of exiting the detection mode, the federal learning model poisoning attack detection and defense method provided by the embodiment of the application can further comprise the following steps:

According to the aggregation weight of each client, aggregating the stored gradients of each client to obtain an aggregation gradient; the stored gradient of each client is the gradient uploaded by each client last time before entering a detection mode;

and issuing the aggregation gradient to each client.

For example, in order to improve the fault tolerance of the poisoning attack detection and reduce the influence of single false detection on the global model update, the central server may perform aggregation weight assignment on the gradients uploaded by each client according to the test gradient of each client and the angle change of the updated test gradient in the direction, and aggregate the gradients uploaded by different clients in a weighted aggregation manner.

Illustratively, for any client, the aggregate weight of the client is inversely related to the test gradient of the client and the angular change in direction of the updated test gradient.

For example, in each round of detection (i.e., poison attack detection), for the target client, the central server may determine the gradient anomaly level value of the target client according to the angle change in the direction between the test gradient of the target client and the updated test gradient.

The gradient abnormality degree value of the target client is positively correlated with the probability that the target client is a poisoning attack client, namely the larger the gradient abnormality degree value of the target client is, the higher the probability that the target client is the poisoning attack client is.

The central server can determine the aggregation weight of the target client according to the gradient abnormity degree value of the target client.

The gradient abnormality degree value of the target client is related to the aggregation weight of the target client, namely the larger the gradient abnormality degree value of the target client is, the smaller the aggregation weight of the target client is.

Illustratively, a round of the detection process includes: the central server constructs a corresponding test gradient according to the gradient uploaded by the client, transmits the test gradient to the client, receives the updated test gradient returned by the client, and detects the poisoning attack of the target client according to the angle change of the test gradient and the updated test gradient in the direction.

In the case of exiting the detection mode, the central server may determine the aggregation weights of the clients in the above manner, and weight and aggregate the stored gradients of the clients according to the aggregation weights of the clients to obtain an aggregation gradient.

For example, assuming that the k-1 th round to k+3 th round of aggregation of federal learning is preset to perform poisoning attack detection, if the central server receives the gradient uploaded by the k-1 th round of client, it is determined to enter the detection mode, 4 rounds of detection are performed in the manner described in the above embodiment, and if the gradient uploaded by the k+3 th round of client is received, it is determined to exit the detection mode.

For example, the central server may store the last uploaded gradient for each client before entering the detection mode.

Taking the above example as an example, the central server may store the gradient uploaded by the kth-1 round of each client, and in the case of determining to exit the detection mode, may aggregate the gradient uploaded by the kth-1 round of each client according to the aggregation weight of each client to obtain an aggregate gradient.

For example, when the central server obtains the aggregation gradient through the gradient aggregation manner, the aggregation gradient may be issued to each client; each client can update the local model by utilizing the received aggregation gradient, calculate a new gradient by utilizing the local training data based on the updated local model, and upload the new gradient to the central server.

In one example, the angular change in direction of the test gradient and the updated test gradient is characterized by cosine similarity between the test gradient and the updated test gradient;

the determining the gradient abnormality degree value of the target client according to the angle change of the test gradient and the updated test gradient in the direction may include:

and determining the gradient abnormality degree value of the target client according to the cosine similarity between the test gradient and the updated test gradient and the modulus length of the updated test gradient.

Illustratively, it is contemplated that for a poisoning attack client, the cosine similarity between gradients before and after updating may be relatively small.

Furthermore, for malicious gradients, their modulo length will typically be relatively large, whereas for non-malicious gradients, their modulo length will typically be relatively small.

Therefore, the gradient abnormity degree value of the target client can be determined according to the cosine similarity between the test gradient and the updated test gradient and the modulus length of the updated test gradient.

As an example, the determining the gradient anomaly value of the target client according to the cosine similarity between the test gradient and the updated test gradient and the modulus length of the updated test gradient may be implemented in the following manner:

wherein ,for cosine similarity between test gradient and said updated test gradient +.>And alpha is the gradient abnormity degree value of the target client for the updated module length of the test gradient.

Illustratively, consider that for a poisoning attack client, the cosine similarity between the gradients before and after the update will be relatively small, and the cosine similarity will typically be less than 0. For non-poisoning attacking clients, the cosine similarity between gradients before and after updating is typically greater than 0.

Furthermore, the modulo length of a malicious gradient will typically be relatively large, while the modulo length of a benign gradient will typically be relatively small.

Based on this, in the case where the cosine similarity between the gradients before and after the update is smaller than 0, the opposite number of the product of the cosine similarity between the gradients before and after the update and the modulo length of the gradient after the update can be determined as the gradient abnormality degree value; when the cosine similarity between the gradients before and after updating is greater than or equal to 0, the opposite number of the ratio of the cosine similarity between the gradients before and after updating to the modulus of the gradients after updating can be determined as the gradient abnormality degree value, so that the gradient abnormality degree value of the poisoning attack client is larger and the gradient abnormality degree value of the non-poisoning attack client is smaller.

In one example, the determining the aggregate weight of each client according to the gradient anomaly degree value of the target client may include:

updating the trust score of the target client according to the gradient abnormity degree value of the target client; the gradient abnormality degree value of the target client is larger, and the trust score updated by the target client is lower;

and determining the aggregation weight of the target client according to the updated trust score of the target client.

For any client, the central server may maintain the trust score of the client, and may update the trust score of the client, and further update the aggregate weight of the client, according to the result of detecting the poisoning attack for the client.

Accordingly, in the case where the gradient abnormality degree value of the target client is determined in the above manner, the trust score of the target client may be updated in accordance with the gradient abnormality degree value of the target client.

The larger the gradient abnormity degree value of the target client is, the lower the trust score of the updated target client is.

For example, the initial value of the trust score of each client may be preset.

As an example, the initial value of the trust score for each client may be the same.

Under the condition that the central server determines the updated trust score of the target client, the latest aggregation weight of the target client can be determined according to the updated trust score of the target client.

As an example, the above updating the trust score of the target client according to the gradient anomaly degree value of the target client is implemented by:

wherein TS0 is the trust score before updating, TS is the trust score after updating, TS ₀ The initial value of (a) is a preset value, alpha is a gradient abnormality degree value of the target client,is a preset base score value.

As an example, determining the aggregate weight of a client based on its trust score may be accomplished by:

wherein ,w_i Aggregation weight for client i, TS _i For the trust score of client i, TS _j For the trust score of client j,n is the number of clients.

In one example, for any client, determining the client as a poisoning attack client if the trust score of the client is less than or equal to 0; wherein, the gradient uploaded by the poisoning attack client does not participate in gradient aggregation.

For example, in the case of updating the trust score of the client in the above manner, for any client, in the case that the trust score of the client is less than or equal to 0, the client may be determined to be a poisoning attack client, and the gradient of the client is no longer included in the gradient aggregation, that is, the gradient uploaded by the client is no longer involved in the gradient aggregation, so as to avoid interference of the poisoning attack on the global model update.

In order to enable those skilled in the art to better understand the technical solutions provided by the embodiments of the present application, the technical solutions provided by the embodiments of the present application are described below with reference to specific examples.

As shown in fig. 2, the embodiment of the application provides a federal learning defense scheme for actively detecting a model poisoning attack, which comprises an active detection mechanism and a robust aggregation mechanism based on trust scores, wherein the active detection mechanism can effectively distinguish a malicious gradient from a benign outlier gradient, and avoid misjudging the benign outlier gradient as the malicious gradient; a robust aggregation mechanism based on trust scores may be effective to provide detection fault tolerance.

The implementation of the active detection mechanism and the trust score based robust aggregation mechanism are described below, respectively.

1. Active detection mechanism

Illustratively, unlike passive analysis in traditional federal learning attack defense schemes, the scheme provided by embodiments of the present application may actively test each client through carefully constructed test gradients as aggregation gradients; the client can fine tune the local model under the condition of receiving the test gradient and respond to the new gradient update, so that a defender can screen out a malicious client according to the abnormal condition in the uploaded gradient update.

By way of example, definition of gradient abnormality in a poisoning scene can be redefined to effectively distinguish malicious gradients from benign outliers, and finally generalization capability of the federal model is improved.

The implementation flow of the active detection mechanism is described below.

1.1, the defender firstly stores the last uploaded gradient of each client before entering the detection mode, and under the condition of the detection mode, taking the (k-1) th round of aggregation to start detection as an example, the defender pauses the iteration.

Gradient of i-present round upload for client(i is more than 0 and less than or equal to n, n is the number of clients), on one hand, the central server can reduce the module length of the gradient, and on the other hand, the direction of the gradient can be adjusted to obtain a test gradient +. >And will->As an aggregated gradient feedback to client i.

Client i uses the test gradient locally to update the model and proceeds to the next iteration to update with the new gradient(i.e., updated test gradients) to the server.

Defenders can be compared with and />To detect a poisoning attack.

For example, the detection of each client refers to the above-mentioned flow.

1.2, defenders detect anomalies in malicious clients from two dimensions (direction and magnitude) in the event that updated test gradients of client responses are received.

1.2.1 Using cosine similarityTo measure the constructed test gradient->And updated test gradient +.>The angular variation in direction between, namely:

wherein I ₂ Is the L2 norm, i.e. the modulo length of the gradient.

1.2.2, the magnitude of the malicious gradient also directly determines the poisoning effect, and the magnitude can be measured by using the L2 norm, namely。

1.2.3, anomaly of client gradient is defined by the degree of deviation of direction and magnitude:

wherein alpha is gradient abnormality degree value.

It should be noted that the definition of the degree of abnormality of the poisoning gradient can be used in all the current poisoning attack defense directions, including the federal learning model poisoning attack defense scheme based on robust aggregation and based on abnormality detection.

2. Robust aggregation

Illustratively, this embodiment proposes a new trust-score-based aggregation mechanism to improve defenses, which utilizes a new scoring method to give each client an aggregate weight, thereby weighting the gradient of aggregate client uploads. The aggregation weight is updated according to the gradient abnormality degree value, rather than directly determining the client as a malicious attack client, so that the fault tolerance of detection is improved.

And 2.1, maintaining trust scores of all clients.

The defender sets the same initial trust score for each client, and updates the trust score according to the gradient abnormity degree value and the basic score under the condition that each round of detection is completed.

Wherein TS0 is the trust score before updating, TS is the trust score after updating, TS ₀ The initial value of (a) is a preset value (i.e., an initial trust score).

For any client, in the case that the trust score of the client is less than or equal to 0, the defender marks the client as a poisoning attack client, and the update gradient of the client is not included in the gradient aggregation, that is, the gradient uploaded by the client is not subjected to gradient aggregation.

And 2.2, determining the aggregation weight of each client according to the trust score of each client.

Illustratively, for client i, its aggregate weight may be determined by:

and 2.4, under the condition of exiting the detection mode, carrying out weighted aggregation on the stored gradients of the clients according to the aggregation weights of the clients to obtain aggregation gradients.

Illustratively, the gradient that the client uploads this round may be weighted average as aggregate gradient g:

and 2.5, the defender issues the aggregation gradient to each client.

The method provided by the application is described above. The device provided by the application is described below:

referring to fig. 3, a schematic structural diagram of a federal learning model poisoning attack detection and protection device provided by an embodiment of the present application, as shown in fig. 3, the federal learning model poisoning attack detection and protection device may include:

an obtaining unit 310, configured to obtain a gradient uploaded by the target client in this round;

the test unit 320 is configured to construct a test gradient corresponding to the gradient according to the gradient uploaded by the target client in the detection mode, and send the test gradient to the target client as an aggregate gradient, so that the target client locally updates the test gradient to obtain an updated test gradient; wherein the angle change in the direction between the test gradient corresponding to the gradient and the gradient is less than a threshold value;

The obtaining unit 310 is further configured to obtain an updated test gradient uploaded by the target client;

and the processing unit 330 is configured to detect a poisoning attack on the target client according to the angle change in the direction between the test gradient and the updated test gradient.

In some embodiments, the test unit 320 constructs a test gradient corresponding to the gradient according to the gradient uploaded by the target client, including:

In some embodiments, the processing unit 330 performs the poisoning attack detection on the target client according to the angular change in the direction between the test gradient and the updated test gradient, including:

and determining that the target client is a suspected poisoning attack client under the condition that cosine similarity between the test gradient and the updated test gradient is smaller than a preset similarity threshold.

In some embodiments, the processing unit is further configured to determine, in each round of detection, a gradient anomaly value of the target client according to an angular change in a direction of the test gradient and the updated test gradient; the gradient abnormality degree value of the target client is positively correlated with the probability that the target client is a poisoning attack client; determining the aggregation weight of the target client according to the gradient abnormality degree value of the target client;

the processing unit is further used for aggregating the stored gradients of the clients according to the aggregation weights of the clients under the condition of exiting the detection mode to obtain an aggregation gradient; the stored gradient of each client is the gradient uploaded by each client last time before entering a detection mode; and issuing the aggregation gradient to each client.

In some embodiments, the angular change in direction of the test gradient and the updated test gradient is characterized by a cosine similarity between the test gradient and the updated test gradient;

the processing unit 330 determines a gradient anomaly value of the target client according to the angle change between the test gradient and the updated test gradient in the direction, including:

In some embodiments, the processing unit 330 determines the gradient anomaly value of the target client according to the cosine similarity between the test gradient and the updated test gradient, and the modulus length of the updated test gradient, by:

wherein ,for cosine similarity between said test gradient and said updated test gradient +.>And for the updated module length of the test gradient, alpha is the gradient abnormity degree value of the target client.

In some embodiments, the processing unit 330 determines the aggregate weight of the target client according to the gradient anomaly degree value of the target client, including:

updating the trust score of the target client according to the gradient abnormity degree value of the target client; the gradient abnormity degree value of the target client is larger, and the trust score updated by the target client is lower;

In some embodiments, the processing unit 330 updates the trust score of the target client according to the gradient anomaly degree value of the target client by:

wherein ,TS₀ For the pre-update trust score, TS is the updated trust score, TS ₀ The initial value of (a) is a preset value, alpha is a gradient abnormality degree value of the target client,is a preset basic score value;

and/or the number of the groups of groups,

the processing unit 330 determines the aggregate weight of the client according to the trust score of the client, by:

wherein ,w_i Aggregation weight for client i, TS _i For the trust score of client i, TS _i For the trust score of client j,n is the number of clients.

In some embodiments, the processing unit 330 is further configured to determine, for any client, the client as a poisoning attack client if the trust score of the client is less than or equal to 0; wherein, the gradient uploaded by the poisoning attack client does not participate in gradient aggregation.

The embodiment of the application also provides electronic equipment, which comprises a processor and a memory, wherein the memory is used for storing a computer program; and the processor is used for realizing the federal learning model poisoning attack detection and defense method when executing the programs stored in the memory.

Fig. 4 is a schematic hardware structure diagram of an electronic device according to an embodiment of the present application. The electronic device may include a processor 401, a memory 402 storing machine-executable instructions. The processor 401 and the memory 402 may communicate via a system bus 403. Also, the processor 401 may perform the federal learning model poisoning attack detection and defense method described above by reading and executing machine-executable instructions in the memory 402 corresponding to the federal learning model poisoning attack detection and defense logic.

The memory 402 referred to herein may be any electronic, magnetic, optical, or other physical storage device that may contain or store information, such as executable instructions, data, or the like. For example, a machine-readable storage medium may be: RAM (Radom Access Memory, random access memory), volatile memory, non-volatile memory, flash memory, a storage drive (e.g., hard drive), a solid state drive, any type of storage disk (e.g., optical disk, dvd, etc.), or a similar storage medium, or a combination thereof.

In some embodiments, a machine-readable storage medium, such as memory 402 in fig. 4, is also provided, having stored thereon machine-executable instructions that when executed by a processor implement the federal learning model poisoning attack detection and defense method described above. For example, the machine-readable storage medium may be ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.

Embodiments of the present application also provide a computer program product storing a computer program and causing a processor to perform the federal learning model poisoning attack detection and defense method described above when the computer program is executed by the processor.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The foregoing description of the preferred embodiments of the application is not intended to be limiting, but rather to enable any modification, equivalent replacement, improvement or the like to be made within the spirit and principles of the application.

Claims

1. A federal learning model poisoning attack detection and defense method, comprising:

acquiring the gradient uploaded by the target client in this round;

2. The method of claim 1, wherein constructing a test gradient corresponding to the gradient according to the gradient uploaded by the target client in the round, comprises:

3. The method of claim 1, wherein the detecting the target client for a poisoning attack based on the angular change in direction of the test gradient and the updated test gradient comprises:

4. The method of claim 1, wherein during each round of detection, the method further comprises:

Determining the aggregation weight of the target client according to the gradient abnormality degree value of the target client; wherein the gradient abnormality degree value of the target client is inversely related to the aggregation weight of the target client;

in the event of exiting the detection mode, the method further comprises:

and issuing the aggregation gradient to each client.

5. The method of claim 4, wherein the angular change in direction of the test gradient and the updated test gradient is characterized by cosine similarity between the test gradient and the updated test gradient;

the determining the gradient abnormality degree value of the target client according to the angle change of the test gradient and the updated test gradient in the direction comprises the following steps:

6. The method of claim 5, wherein determining the gradient anomaly value for the target client based on cosine similarity between the test gradient and the updated test gradient and a modulus length of the updated test gradient is achieved by:

wherein , for cosine similarity between said test gradient and said updated test gradient +.>For the updated test gradient module length, alpha is the target clientGradient abnormality degree value of (2).

7. The method of claim 4, wherein determining the aggregate weight of the target client based on the gradient anomaly level value of the target client comprises:

8. The method of claim 7, wherein updating the trust score of the target client based on the gradient anomaly level value of the target client is achieved by:

wherein ,TS ₀ For the pre-update trust score, TS is the updated trust score, TS ₀ The initial value of (a) is a preset value, alpha is a gradient abnormality degree value of the target client,is a preset basic score value;

and/or the number of the groups of groups,

according to the trust score of the client, determining the aggregation weight of the client, which is realized by the following ways:

wherein ,w _i Aggregation weight for client i, TS _i For the trust score of client i, TS _i For the trust score of client j,n is the number of clients.

9. The method of claim 8, wherein the method further comprises:

for any client, determining the client as a poisoning attack client under the condition that the trust score of the client is less than or equal to 0; wherein, the gradient uploaded by the poisoning attack client does not participate in gradient aggregation.

10. A federal learning model poisoning attack detection and defense device, comprising:

11. An electronic device comprising a processor and a memory, wherein,

a memory for storing a computer program;

a processor configured to implement the method of any one of claims 1 to 9 when executing a program stored on a memory.