CN117194145A

CN117194145A - Abnormal client detection method and device, electronic equipment and storage medium

Info

Publication number: CN117194145A
Application number: CN202311020275.7A
Authority: CN
Inventors: 吴钧杰; 孟丹; 王俊; 齐越; 易兰军; 戴嘉乐
Original assignee: Shenzhen Hefei Technology Co ltd; Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Shenzhen Hefei Technology Co ltd; Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2023-08-14
Filing date: 2023-08-14
Publication date: 2023-12-08

Abstract

The embodiment of the application discloses a method and a device for detecting an abnormal client, electronic equipment and a storage medium. The method comprises the following steps: obtaining a reference global model corresponding to a target training round, and obtaining a plurality of client models corresponding to the target training round; the client models are in one-to-one correspondence with the clients participating in the target training round; respectively calculating the similarity between each client model and the reference global model; determining the abnormal probability of the client corresponding to each client model according to the similarity corresponding to each client model; and determining the client with the abnormal probability larger than the probability threshold as an abnormal client. According to the abnormal client detection method, the device, the electronic equipment and the storage medium, the abnormal client can be accurately detected, and the detection efficiency is improved.

Description

Abnormal client detection method and device, electronic equipment and storage medium

Technical Field

The application relates to the technical field of artificial intelligence, in particular to an abnormal client detection method, an abnormal client detection device, electronic equipment and a storage medium.

Background

With the explosive growth of data volume, data is often scattered in different places, such as personal devices of users, enterprise servers, cloud platforms, etc., and these data often contain sensitive personal information or trade secrets, which cannot be transmitted or leaked at will. Meanwhile, the quality and distribution of data are not the same, so that the problem of data island is caused. Federal learning is proposed in order to fully utilize data to improve the effect and efficiency of machine learning algorithms while protecting the privacy and security of the data.

In the process of federal learning in actual use, various problems may exist in the data and training process of the client, such as uneven data distribution, large noise, abnormal data or attack on the client, and problems such as hardware failure, communication error, synchronization abnormality, etc., which may reduce the performance of the global model during federal learning. Therefore, how to accurately detect the abnormal client end affecting the performance of the global model is a technical problem to be solved.

Disclosure of Invention

The embodiment of the application discloses a method, a device, electronic equipment and a storage medium for detecting an abnormal client, which can accurately detect the abnormal client and improve the detection efficiency.

The embodiment of the application discloses a method for detecting an abnormal client, which comprises the following steps:

obtaining a reference global model corresponding to a target training round, and obtaining a plurality of client models corresponding to the target training round; the client models are in one-to-one correspondence with the clients participating in the target training round;

respectively calculating the similarity between each client model and the reference global model;

determining the abnormal probability of the client corresponding to each client model according to the similarity corresponding to each client model;

and determining the client with the abnormal probability larger than the probability threshold as an abnormal client.

The embodiment of the application discloses an abnormal client detection device, which comprises:

the model acquisition module is used for acquiring a reference global model corresponding to a target training round and acquiring a plurality of client models corresponding to the target training round; the client models are in one-to-one correspondence with the clients participating in the target training round;

the similarity calculation module is used for calculating the similarity between each client model and the reference global model respectively;

The abnormal probability determining module is used for determining the abnormal probability of the client corresponding to each client model according to the similarity corresponding to each client model;

and the abnormal client determining module is used for determining the client with the abnormal probability larger than the probability threshold as the abnormal client.

The embodiment of the application discloses an electronic device, which comprises a memory and a processor, wherein the memory stores a computer program, and the computer program when executed by the processor causes the processor to realize the method.

The embodiment of the application discloses a computer readable storage medium, on which a computer program is stored, which when being executed by a processor, implements the method as described above.

According to the abnormal client detection method, the device, the electronic equipment and the storage medium disclosed by the embodiment of the application, the reference global model corresponding to the target training round is obtained, the plurality of client models corresponding to the target training round are obtained, the similarity between each client model and the reference global model is calculated respectively, the abnormal probability of the client corresponding to each client model is determined according to the similarity corresponding to each client model, the client with the abnormal probability larger than the probability threshold is determined as the abnormal client, the abnormal client is detected according to the similarity between each client model and the reference global model, and the abnormal client can be accurately detected according to the similarity between the client model and the reference global model, so that the detection efficiency is improved. Moreover, the whole detection scheme does not limit the data distribution of the training data set of the client, and can support abnormal client detection under the federal learning scene with different data distribution.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1A is an application scenario diagram of an anomaly client detection method in one embodiment;

FIG. 1B is a system architecture diagram of a server side in one embodiment;

FIG. 2 is a flow diagram of a method of anomaly client detection in one embodiment;

FIG. 3 is a flowchart of an anomaly client detection method in another embodiment;

FIG. 4 is a flowchart of determining anomaly probabilities of clients corresponding to respective client models according to similarities corresponding to respective client models in one embodiment;

FIG. 5 is a flow diagram of generating a repair global model and prohibiting an abnormal client from participating in a training task in one embodiment;

FIG. 6 is a block diagram of an anomaly client detection device in one embodiment;

fig. 7 is a block diagram of an electronic device in one embodiment.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

It should be noted that the terms "comprising" and "having" and any variations thereof in the embodiments of the present application and the accompanying drawings are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.

It will be understood that the terms first, second, etc. as used herein may be used to describe various elements, but these elements are not limited by these terms. These terms are only used to distinguish one element from another element. For example, a first accurate threshold may be identified as a second accurate threshold, and similarly, a second accurate threshold may be identified as a first accurate threshold, without departing from the scope of the application. Both the first accurate threshold and the second accurate threshold are accurate thresholds, but they are not the same accurate threshold. The term "plurality" as used herein refers to two and more than two. The term "and/or" as used herein refers to one of, or any combination of, the various schemes therein.

Federal learning (Federated Learning) is a distributed machine learning method in which multiple parties interact model parameters through a secure mechanism without interacting data, thereby achieving a co-training effect. In the process of federal learning in actual use, various problems may exist in the data and training processes of the client, such as uneven data distribution, large noise, abnormal data or attack on the client, and problems such as hardware failure, communication error, synchronization abnormality, etc., which may cause performance degradation of the global model during federal learning, or even cause the global model to be unusable. Therefore, there is a need to debug federal learning, detect and locate errors or anomalies that affect global model performance, thereby helping developers to improve the quality and efficiency of the global model.

Currently, gill et al devised an exception localization system framework feddugu that first recorded concise execution data available to a central aggregator (e.g., server) during federal learning in a centralized federal learning scenario to enable live interactive debugging in the simulation of live mirroring of a federal learning application. When a developer finds suspicious states, fedDebug adopts a difference test based on reasoning and a difference of model neuron activation to detect abnormal clients deviating from normal behaviors, and then FedDebug restores global training by tracing and deleting the abnormal clients and restores real-time federal learning training, so that the quality of a global model is improved.

However, when federal learning debugging is actually performed, there are several problems:

(1) If the client's data and labels can be accessed during debugging and performance evaluation performed on a dataset similar to the global data distribution, it is likely that an anomalous client that causes the global model to degrade can be found. However, in practical application, an open staff of the federal learning algorithm cannot always directly use the data and the labels of the clients, and the data of different clients are usually non-independent and distributed in the same way, but it is difficult to accurately judge whether the clients are abnormal under the global angle only according to the performance evaluation of the data of the clients.

For example, fedDebug detects abnormal clients that deviate from normal behavior using inference-based variance testing and model neuron activation variance, but the scheme is based on independent co-distribution of data among clients, considering that the trained neuron activation pattern of a normal client is very similar and significantly different from that of an abnormal client. However, in practical applications, the data between different clients are often non-independent and distributed, and it cannot be guaranteed that the difference of the neuron activation modes between the normal clients is necessarily smaller than the difference of the activation modes between the normal clients and the abnormal clients. Therefore, the scheme is not suitable for federal learning scenes with client data distributed in a non-independent and same manner, and an abnormal client is difficult to accurately locate.

(2) Since it is unpredictable whether clients participate in a training round, the contribution of each client in generating the global model may be short lived, and thus the way in which errors are attempted to be reproduced and debugged during federal learning is not feasible. The traditional breakpoint debugging mode can suspend the training process of all clients in federal learning, so that serious problems such as data loss of the clients occur, moreover, because the availability of the clients and the specified equipment number enable the clients involved in each round of training to be different, the debugging can be invalid due to post analysis or trial-and-error debugging.

The embodiment of the application provides a method, a device, electronic equipment and a storage medium for detecting an abnormal client, which can accurately detect the abnormal client and improve the detection efficiency. Moreover, the whole detection scheme does not limit the data distribution of the training data set of the client, and can support abnormal client detection under the federal learning scene with different data distribution.

FIG. 1A is an application scenario diagram of an anomaly client detection method in one embodiment. As shown in fig. 1A, the server 10 may establish communication connection with a plurality of clients 20, where the server 10 refers to an aggregator in federal learning, the server 10 may be deployed in a separate server or may be deployed in a server cluster formed by a plurality of servers, and the clients 20 may be deployed on terminal devices, where the terminal devices may include, but are not limited to, a mobile phone, a tablet computer, a wearable device, a notebook computer, a PC (Personal Computer, a personal computer), a vehicle-mounted terminal, and the like. Communication between the server 10 and the client 20 may be based on network protocols such as TCP/IP (Transmission Control Protocol/Internet Protocol, transmission control protocol/interconnection protocol), HTTP (Hypertext Transfer Protocol ), FTP (File Transfer Protocol, file transfer protocol), but is not limited thereto. It should be noted that, the server 10 may also be disposed on a terminal device, which is not limited herein.

The server 10 may perform federal learning with a plurality of clients 20, the server 10 may send a joint model to the plurality of clients 20 participating in federal learning, each client 20 may train the joint model using local data to obtain a corresponding client model, and then upload the client model to the server 10. The server 10 may aggregate the client models uploaded by the participating clients 20 to obtain a global model, where the global model may be used as a joint model for the next round of training, or may be used as a global model output obtained by final training. The clients 20 do not need to perform training data interaction in the training process, so that the user privacy can be protected and the data safety can be ensured while model training is realized.

In the embodiment of the application, the server 10 can detect the abnormal client which influences the performance of the global model in the process of federal learning, thereby ensuring the accuracy of federal learning.

FIG. 1B is a system architecture diagram of a server side in one embodiment. As shown in fig. 1B, in one embodiment, the server 10 may include a federation learning module 110 and a debugger 120, where a federation learning program may be run in the federation learning module 110 to implement federation learning with the plurality of clients 20, and the debugger 120 may perform simulation on federation learning performed by the federation learning module 110 during federation learning performed by the federation learning module 110, and reproduce federation learning performed by the federation learning module 110 without affecting operation of the federation learning module 110. Abnormal clients may be detected based on federal learning reproduced by debugger 120.

In some embodiments, a target training round in which an error or abnormality may occur in the simulation process performed by the debugger 120 may be determined, and the debugger 120 may acquire a reference global model corresponding to the target training round, and acquire a plurality of client models corresponding to the target training round, and calculate similarities between the respective client models and the reference global model, respectively. The debugger 120 may determine the anomaly probability of the client corresponding to each client model according to the similarity corresponding to each client model, and determine the client whose anomaly probability is greater than the probability threshold as the anomaly client.

Optionally, after detecting the abnormal client, the debugger 120 may further send the client information corresponding to the abnormal client to the federal learning module 110, where the federal learning module 110 may prohibit the abnormal client from participating in subsequent model training, thereby improving accuracy of the whole federal learning and ensuring training effect.

As shown in fig. 2, in one embodiment, an abnormal client detection method is provided, which can be applied to the above-mentioned server, and the method may include the following steps:

step 210, obtaining a reference global model corresponding to the target training round, and obtaining a plurality of client models corresponding to the target training round.

The server side can perform federal learning with a plurality of clients, multiple rounds of training can be performed in the federal learning process, each round of training can be used as a training round, in each training round, the server side can send the joint model of the training round to each client side, the client side obtains the client side model of the training round after training the received joint model by using local data, the obtained client side model is uploaded to the server side, and the server side can aggregate the client side models uploaded by each client side to obtain the global model of the training round.

Alternatively, if the training round is the first training round, the joint model of the training round may be an initial model to be trained, and the initial model may be an untrained model or a model obtained through pre-training. If the training round is not the first training round, the joint model of the training round can be a global model obtained by the previous training round, and the global model can be continuously optimized through multiple rounds of training, so that the accuracy of the model obtained by training is improved.

The clients participating in each training round can be the same or different, and the participating clients refer to clients corresponding to client models participating in aggregation in the training round to obtain a global model. The number of the clients participating in each training round can be the same or different, the clients can be configured according to actual requirements, and the server can select the clients to participate in each training round according to the number of the clients participating in the corresponding training rounds which are configured in advance.

For example, the server performs federal learning with 10 clients, and the number of clients participating in each training round may be preconfigured to be 7, so that the 7 clients participating in training may be randomly determined in each training round. It should be noted that, the server may also select the participating clients in other manners, for example, according to the priority of each client, the time of uploading the client model, and the like, which is not limited in the embodiment of the present application. Optionally, before sending the joint model, the server may determine the clients involved in the training round, and then send the joint model to each determined client for training; the server side can also send the joint model to all the client sides, after receiving the client side models sent by all the client sides respectively, the server side determines the client sides participating in the training round, and aggregates the client side models uploaded by the determined client sides to obtain the global model of the training round. The server side can flexibly select the clients participating in each round of training, and the abnormal client side detection method provided by the embodiment of the application is not limited by the clients participating in each round of training, and has more applicable scenes and higher applicability.

In some embodiments, during federal learning between the server and the plurality of clients, simulation simulations may be performed on the federal learning performed to reproduce the federal learning process, or alternatively, the federal learning between the server and the plurality of clients, and the simulation processes may be performed in parallel, without interfering with each other. The server or the developer can determine the training round in the suspicious state according to the simulation process, take the training round in the suspicious state as a target training round, and detect whether an abnormal client exists in the target training round. Training rounds in suspicious states may refer to training rounds in which model performance anomalies or model errors occur, such as training rounds in which the performance of the aggregated global model is anomalous, and the like.

The server may obtain a reference global model corresponding to the target training round, and obtain a plurality of client models corresponding to the target training round, where the plurality of client models may correspond to a plurality of clients participating in the target training round one to one, that is, the plurality of client models are client models uploaded by the plurality of clients participating in the target training round. The reference global model corresponding to the target training round may refer to a global model for evaluating whether each client model corresponding to the target training round is abnormal, and optionally, the reference global model corresponding to the target training round may be a joint model corresponding to the target training round, or may also be a global model obtained by aggregation of the target training rounds.

In some embodiments, the reference global model corresponding to the target training round may be a joint model corresponding to the target training round. If the target training round is the 1 st training round, an initial model to be trained can be obtained as a reference global model corresponding to the target training round; if the target training round is the T-th training round, a global model obtained by aggregation of the T-1-th training round can be obtained and used as a reference global model corresponding to the target training round; the T may be an integer greater than 1. The global model obtained by the polymerization of the T-1 training round is the global model obtained by the polymerization of the last training round of the target training round, and is used as the joint model of the target training round to train each participating client. Because the joint model is a model with normal performance and the joint model corresponding to the target training round is used as a reference global model for detecting the abnormal client, the accuracy of the abnormal client detection can be ensured.

And 220, respectively calculating the similarity between each client model and the reference global model.

The server may calculate the similarity between each client model and the reference global model corresponding to the target training round, where the similarity may be used to characterize the similarity between the client model and the reference global model. In the embodiment of the application, the similarity between the client model obtained by the client training and the reference global model is used as an index to judge whether the client is an abnormal client, the data distribution of the training data set of the client is not limited, and the influence of different data distributions on the internal characteristics of the model during training is not involved, so that the method and the device can be applied to federal learning scenes such as independent data same distribution and independent data same distribution among the clients.

In some embodiments, the server may calculate the similarity between each client model and the reference global model based on a cosine similarity algorithm, a euclidean distance algorithm, and other similarity algorithms. Taking cosine similarity algorithm as an example, features of each client model can be extracted, first vectors corresponding to each client model can be generated according to the features of each client model, features of a reference global model can be extracted, second vectors corresponding to the reference global model can be generated according to the features of the reference global model, the first vectors can be used for representing corresponding client models, the second vectors can be used for representing the reference global model, and a server can calculate the similarity corresponding to each client model according to the first vectors corresponding to each client model and the second vectors.

It should be noted that, in the embodiment of the present application, the calculation manner of calculating the similarity between the client model and the reference global model is not limited to the above-mentioned methods, and the similarity algorithm may be other algorithms, for example, the similarity between the client model and the reference global model may be calculated by using an algorithm such as orthogonal transformation, isotropic scaling, etc.

Step 230, determining the anomaly probability of the client corresponding to each client model according to the similarity corresponding to each client model.

The anomaly probability can be used for representing the probability of anomaly or error of the client, and the greater the anomaly probability of the client is, the higher the probability that the problem occurs on the client and the performance of the global model obtained by aggregation is influenced is explained. In some embodiments, the similarity of the client model and the reference global model may be a negative correlation, and the higher the similarity of the client model and the reference global model is, the lower the anomaly probability of the client corresponding to the client model is.

As an implementation manner, the mean value of the similarity and the distribution situation of the similarity may be determined according to the similarity corresponding to each client model, and optionally, the distribution situation of the similarity may be represented by standard deviation or variance of the similarity corresponding to multiple client models in the target training round. The similarity difference between the similarity corresponding to each client model and the mean value can be calculated, and the anomaly probability corresponding to each client model is calculated according to the similarity difference corresponding to each client model and the distribution condition of the similarity. For example, if the ratio of the similarity difference value corresponding to the client model to the distribution situation is larger, which indicates that the similarity corresponding to the client model is different from the similarity corresponding to other client models, and the probability of occurrence of a problem is higher, the greater the anomaly probability can be.

As another embodiment, the maximum similarity among the similarities corresponding to the plurality of client models in the target training round may be obtained, a difference between the similarity corresponding to each client model and the maximum similarity may be calculated, and the anomaly probability corresponding to each client may be determined according to the difference between the similarity corresponding to each client model and the maximum similarity. For example, the greater the difference between the corresponding similarity and the maximum similarity of the client model, which indicates that the client model has a higher probability of being problematic, the greater the anomaly probability may be.

In the embodiment of the application, the similarity between the client model and the reference global model is used as an index to determine the abnormal probability of the client, the data distribution of the training data set of the client is not limited, the influence of different data distribution on inherent characteristics of the model during training is not involved, and the abnormal client detection under the federal learning scene of different data distribution can be supported.

And step 240, determining the client with the abnormality probability larger than the probability threshold as an abnormal client.

An anomalous client may refer to a client that affects the performance of the aggregate global model due to inaccuracy of the trained client model caused by a problem, for example, but not limited to, a client that has a problem of a label error of a training data set, a noise of the training data set being large, or an error in communication with a server.

The probability threshold value can be preset, the probability threshold value can be a value obtained according to multiple experiments, the server side can compare the abnormal probability of each client side participating in the target training round with the probability threshold value, judge whether the abnormal probability is larger than the probability threshold value, and determine the client side with the abnormal probability larger than the probability threshold value as the abnormal client side. The abnormal probability is used as a quantitative index to detect the abnormal client, so that the possibly abnormal client can be found out quickly, and the detection efficiency and accuracy of the abnormal client are improved.

In the embodiment of the application, the reference global model corresponding to the target training round is obtained, the plurality of client models corresponding to the target training round are obtained, the similarity between each client model and the reference global model is calculated respectively, the abnormal probability of the client corresponding to each client model is determined according to the similarity corresponding to each client model, the client with the abnormal probability larger than the probability threshold is determined as the abnormal client, the abnormal client is detected according to the similarity between each client model and the reference global model, and the abnormal client can be accurately detected according to the similarity between the client model and the reference global model, so that the detection efficiency is improved. Moreover, the whole detection scheme does not limit the data distribution of the training data set of the client, and can support abnormal client detection under the federal learning scene with different data distribution.

In another embodiment, as shown in fig. 3, an abnormal client detection method is provided, which can be applied to the above-mentioned server, and the method may include the following steps:

step 302, when performing training tasks of each training round of federal learning, recording training state data corresponding to each training round into a log file in real time.

In the process of each round of training performed by the server side and the plurality of clients, the server side can record training state data corresponding to a currently performed training round into a log file in real time, wherein the training state data can at least comprise a plurality of client models corresponding to the training round and an aggregate global model, and further, the training state data can further comprise one or more of the following: the loss value, model performance parameter, super parameter, response time parameter and the like of each client model corresponding to the training round, and the data such as the running state of the server corresponding to the training round.

The loss value of the client model refers to a value obtained by calculating a loss function of the client model; model performance parameters of the client model may include, but are not limited to, parameters such as prediction accuracy of the client model; the hyper-parameters of the client model may include, but are not limited to, one or more of a learning rate, a number of hidden layers in the model, a number of samples trained, etc.; the response time parameter may include, but is not limited to, one or more of a time at which the client receives the joint model sent by the server, a time at which the client starts and ends training, a time at which the client uploads the client model, and so on.

The running state of the server may refer to the running state of the server when the training is performed, the running state of the server may include, but is not limited to, the running state of a federal learning program in a federal learning module, such as deterministic input, non-deterministic input, interaction state of the federal learning program and a memory, and the like generated in the running process of the federal learning program, where the deterministic input refers to a variable triggered based on deterministic operation, the non-deterministic input refers to a variable triggered based on uncertain operation (such as concurrent operation, random operation, user interaction, and the like), and the running state of the server may also include, but is not limited to, the resource state of the server, such as a memory occupancy rate, a processor operating frequency, and the like.

In the embodiment of the application, the training state data corresponding to each training round does not comprise the training data set of the client, so that the privacy and the data safety of the user are protected.

And step 304, performing federal learning simulation according to the training state data recorded in the log file.

In some embodiments, the server may use the debugger to perform federal learning simulation according to training state data recorded in the log file, and the server and the client may perform real federal learning and simulation of federal learning in parallel. The server side can perform federal learning with the client side through a federal learning program running in the federal learning module, the federal learning module can record training state data corresponding to each training round to the log file in real time, and the debugger can read the training state data recorded in the log file in real time and perform simulation on federal learning performed in the federal learning module according to the read training state data so as to reproduce the federal learning process in the federal learning module.

For example, after the server side completes the 1 st training round through the federal learning module, the training state data corresponding to the 1 st training round may be recorded in the log file, and the debugger may read the training state data corresponding to the 1 st training round recorded in the log file, and perform simulation according to the training state data corresponding to the 1 st training round, so as to reproduce the 1 st training round performed in the federal learning module. In the process of carrying out simulation on the 1 st training round by the debugger, the federal learning module can continue the 2 nd training round with the client, and the federal learning module and the client are not mutually interfered.

Optionally, the debugger may acquire training state data corresponding to one training round newly recorded in the log file and perform simulation when updating the training state data corresponding to one training round in the log file, so as to ensure that federal learning is performed in real time, thereby improving timeliness of abnormal client detection. Alternatively, after the training state data corresponding to the multiple training rounds are updated in the log file, the training state data corresponding to the multiple training rounds updated in the log file may be acquired to perform simulation.

Step 306, determining a training round in a suspicious state in the simulation process, and taking the training round in the suspicious state as a target training round.

In the process of using the debugger to carry out simulation on the Union learning, the debugger or the developer can determine the training rounds in the suspicious state in the simulation process and take the training rounds in the suspicious state as target training rounds.

In some embodiments, a debug command may be generated in the debugger and a training round in a suspicious state during the simulation may be determined from the debug command, which may include, but is not limited to, one or more of a breakpoint set command, a data change monitor command, a reverse execution command, a walk-in and walk-out command, and the like. The breakpoint setting command is used for setting breakpoints for debugging, such as training rounds, clients in each training round and the like, so that debugging with different fine granularity is realized; the data change monitoring command can be used for monitoring the data change of the model, such as the prediction accuracy change of the global model; the reverse execution command can be used for carrying out reverse test on the simulation process of federal learning; the step-in and step-out command can be used for executing step-in functions and/or step-out functions in the simulation process of the bang study.

In some embodiments, the suspicious state may include one or more of the following:

the prediction accuracy of the global model corresponding to the training round is unstable;

the prediction accuracy of the global model corresponding to the training round is smaller than or equal to a first accuracy threshold;

the prediction accuracy of the global model corresponding to the training round is larger than the difference threshold value;

the prediction accuracy of the global model corresponding to each of the N consecutive training rounds is less than or equal to a second accuracy threshold, where N is an integer greater than 1, and the target training round may be a portion or all of the N training rounds.

As one implementation, the server may determine, by the debugger, whether each training round performed during the simulation is in a suspicious state. When the debugger tests the global model obtained by aggregation of each training round by using the test data set, the prediction accuracy of the global model can be monitored, and whether the corresponding training round is in a suspicious state or not can be judged according to the prediction accuracy of the global model.

Optionally, taking the first training round as an example, the first training round may be any training round performed in the simulation process, and it may be determined whether the prediction accuracy of the global model corresponding to the first training round is unstable, for example, when the global model predicts using the test data set, the prediction accuracy suddenly drops or suddenly rises. If the prediction accuracy of the global model corresponding to the first training round is unstable, it may be determined that the first training round is in a suspicious state, the first training round may be used as a target training round,

Optionally, it may be determined whether the prediction accuracy of the global model corresponding to the first training round is less than or equal to a first accuracy threshold, and if the prediction accuracy of the global model corresponding to the first training round is less than or equal to the first accuracy threshold, it may be determined that the first training round is in a suspicious state, and the first training round may be used as the target training round.

Optionally, it may be determined whether an absolute value of a difference between the prediction accuracy of the global model corresponding to the first training round and the prediction accuracy of the global model corresponding to the previous training round is greater than a difference threshold, and if the absolute value of the difference is greater than the difference threshold, it may be determined that the first training round is in a suspicious state, and the first training round may be used as the target training round.

Optionally, it may be determined whether prediction accuracy of global models respectively corresponding to the N consecutive training rounds are less than or equal to the second accuracy threshold, and if so, part or all of the N training rounds may be used as the target training round. N may be set according to actual requirements, or N may be a value obtained according to a plurality of experiments, for example, N is 3, 4, etc., but is not limited thereto.

It should be noted that, the training rounds in the suspicious state in the simulation process may also be determined by a developer, the developer may monitor the global model obtained by aggregation of each training round in the simulation process, and when the debugger tests the global model obtained by aggregation of each training round by using the test data set, the developer may monitor the prediction result of the global model, and the developer may determine whether the corresponding training round is in the suspicious state by determining whether the prediction accuracy of the global model is abnormally reduced, is abnormally changed (such as suddenly changed or suddenly changed), or is always maintained at a very low level, or the like.

After the target training round is determined, abnormal client detection can be performed on the target training round, so that the abnormal client with the problem can be accurately positioned.

Step 308, obtaining a reference global model corresponding to the target training round, and obtaining a plurality of client models corresponding to the target training round.

The description of step 308 may refer to the related description of step 210 in the above embodiment, and the detailed description is not repeated here.

In step 310, the similarity between each client model and the reference global model is calculated.

The server may calculate the similarity between each client model corresponding to the target training round and the reference global model, respectively. In some embodiments, a center checkup (Centered Kernel Alignment, CKA) may be employed as a similarity assessment indicator. The first matrix corresponding to each client model may be calculated according to a kernel function, and the second matrix corresponding to the reference global model may be calculated according to the kernel function, and then the similarity between each client model and the reference global model may be calculated according to the first matrix and the second matrix corresponding to each client model based on the HISC (Hilbert-Schmidt Independence Criterion, hilbert-schmitt independence criterion). Alternatively, the kernel function may comprise a linear kernel or an RBF (Radial Basis Function ) kernel, or the like.

As a specific embodiment, taking the first client model as an example, the first client model may be any client model corresponding to the target training round, and the first coefficients of the first matrix and the second matrix corresponding to the first client module may be calculated based on the HISC, the second coefficients of the first matrix and the third coefficients of the second matrix corresponding to the first client module may be calculated based on the HISC, and the similarity corresponding to the first client model may be determined according to the first coefficients, the second coefficients and the third coefficients. Specifically, the similarity between the client model and the reference global model may be calculated according to equation (1):

wherein M represents a first matrix corresponding to the client model, M _ref Representing a second matrix corresponding to the reference global model, CKA (M, M _ref ) Representing the similarity between the client model and the reference global model. The similarity can be in the range of 0,1]But is not limited thereto.

In the embodiment of the application, the CKA index is used for measuring the similarity between the client model and the reference global model, and because the CKA index has invariance to orthogonal transformation and isotropic scaling, the abnormal client model can be more accurately distinguished from the normal client model obtained by training according to Non-IID (Non-independent co-distribution) data, thereby improving the detection accuracy of the abnormal client.

Step 312, determining the anomaly probability of the client corresponding to each client model according to the similarity corresponding to each client model.

The server side can determine the abnormal probability of the client side corresponding to each client side model according to the similarity between each client side model and the reference global model of the target training round, so as to detect the abnormal client side existing in the target training round.

In one embodiment, as shown in fig. 4, the step of determining the anomaly probability of the client corresponding to each client model according to the similarity corresponding to each client model may include steps 402 to 406.

Step 402, clustering the similarities corresponding to the client models respectively to obtain one or more clusters.

The similarity corresponding to each client model of the target training round may be clustered to obtain one or more cluster classes, where each cluster class may include one or more similarities, and the distance (i.e., the difference) between different similarities in the same cluster class is smaller (e.g., may belong to a certain range). Alternatively, an unsupervised clustering method may be employed for the similarity of each client model correspondence for the target training round, which may include, but is not limited to, K-Means clustering, bipartite K-Means clustering, and the like. It should be noted that other clustering methods may be used, which are not limited in this embodiment of the present application.

And step 404, determining a reference cluster center value according to the cluster center values corresponding to the clusters.

Each cluster obtained by clustering has a corresponding cluster center value, and the cluster center value corresponding to each cluster can be the average value of all the similarities contained in the cluster. The cluster center value corresponding to each cluster class can be determined, and a reference cluster center value can be determined according to the cluster center value corresponding to each cluster class, and the reference cluster center value can be used as a reference standard for carrying out abnormal probability calculation.

As an embodiment, the reference cluster center value may be a maximum value among cluster center values corresponding to respective cluster types. The cluster center values corresponding to the clusters may be compared and the largest cluster center value may be determined as the reference cluster center value.

As another embodiment, the maximum similarity among the similarities corresponding to the client models may be determined, the cluster class to which the maximum similarity belongs may be determined, and then the cluster center value of the cluster class to which the maximum similarity belongs may be used as the reference cluster center value.

And step 406, determining the abnormal probability of the client corresponding to each client model according to the similarity corresponding to each client model and the reference cluster center value.

The server side can determine the distance between the similarity corresponding to each client model and the reference clustering center value, and determine the abnormal probability of the client corresponding to each client model according to the distance between the similarity corresponding to each client model and the reference clustering center value. Alternatively, the greater the distance between the similarity corresponding to the client model and the reference cluster center value, the greater the anomaly probability of the corresponding client may be.

In some embodiments, a standard deviation corresponding to a target cluster where the reference cluster center value is located may be calculated, and the anomaly probability of the client corresponding to each client model may be determined according to the difference between the similarity corresponding to each client model and the reference cluster center value, and the standard deviation. Further, taking the first client model as an example, the first client model may be any client model corresponding to the target training round, a difference value between the similarity corresponding to the first client model and the reference cluster center value may be calculated, an index coefficient may be calculated according to the difference value and a standard deviation corresponding to a target cluster where the reference cluster center value is located, and an index operation may be performed based on the index coefficient, so as to obtain an abnormal probability of the client corresponding to the first client model.

Specifically, the anomaly probability of the client may be calculated according to equation (2):

wherein v represents the similarity between the client model and the reference global model; c represents a reference cluster center value; sigma represents standard deviation corresponding to a target cluster in which a reference cluster center value is located; α is a hyper-parameter, e.g., α=18; p represents the anomaly probability of the client corresponding to the client model. The closer p is to 1, the greater the anomaly probability, the higher the probability that the corresponding client has a problem, and the more likely it is an anomalous client; the closer p is to 0, the smaller the anomaly probability, which means that the probability of occurrence of a problem for the corresponding client is smaller, and the more likely it is that the client is a normal client.

In the embodiment of the application, the similarity corresponding to each client model of the target training round is clustered, and the reference clustering center value is determined according to the obtained clustering center value corresponding to each cluster, so that the degree of abnormality of each client model is evaluated according to the reference clustering center value and the target cluster to which the reference clustering center value belongs, the abnormality probability of the corresponding client is determined, the possibility of occurrence of problems of the client can be accurately evaluated, and the abnormal client can be accurately detected.

In step 314, the client whose anomaly probability is greater than the probability threshold is determined to be the anomaly client.

The server may determine a client having an anomaly probability less than or equal to the probability threshold as a normal client. In some embodiments, a probability range to which the anomaly probability of each normal client belongs may be determined, and a client type corresponding to each normal client may be determined according to the probability range to which the anomaly probability of each normal client belongs.

A plurality of probability ranges may be preset, each probability range may correspond to a different client type, and the client types may be divided according to a difference between data distribution of the client and global data distribution, for example, the client types may include a client whose data distribution differs from the global data distribution by a larger amount, a client whose data distribution is closer to the global data distribution, and the like. Alternatively, the greater the anomaly probability of the client, the greater the difference between the data distribution of the client and the global data distribution, and therefore, a plurality of probability ranges may be preset and a client type corresponding to each probability range may be determined.

Taking the first client as an example, the first client may be any client participating in the target training round, the server may determine whether the anomaly probability of the first client is greater than a probability threshold, if yes, determine that the first client is an anomaly client, if not, determine a probability range to which the anomaly probability of the first client belongs, and determine a client type corresponding to the probability range as a client type of the first client.

For example, the probability threshold may be preset to be 0.7, the first threshold range is 0.3-0.7, the second threshold range is less than 0.3, the probability threshold is greater than that of the abnormal client, the client type corresponding to the first threshold range is the first type (e.g., the client whose data distribution differs greatly from the global data distribution), the client type corresponding to the second threshold range is the second type (e.g., the client whose data distribution is closer to the global data distribution), and if the abnormal probability of the client is 0.5, the client type of the client may be determined to be the first type. The client types corresponding to the normal clients can be determined by utilizing the probability range of division, the clients are divided more finely, and the accuracy and learning effect of the whole federal learning can be improved.

In the embodiment of the application, the training state data corresponding to each training round of federal learning can be recorded in real time, and federal learning simulation is carried out according to the recorded training state data so as to reproduce the federal learning process, help find the training round in a suspicious state, ensure that the simulation process is not interfered with the real federal learning, avoid interrupting the real federal learning process to detect abnormal clients, and ensure the training effect of the whole federal learning. In addition, the similarity corresponding to each client model is clustered, the abnormal probability of the client corresponding to each client model is calculated based on the reference clustering center value, so that the abnormal clients are obtained through screening, and the abnormal clients can be accurately positioned.

As shown in fig. 5, in one embodiment, the method for detecting an abnormal client may further include the following steps:

step 502, removing the client model corresponding to the abnormal client from the plurality of client models corresponding to the target training round, and aggregating the rest client models to obtain the repair global model corresponding to the target training round.

If the server detects that the abnormal client exists in the target training round, the global model of the target training round can be recalculated, and the global client of the target training round is repaired, so that adverse effects of the abnormal client on the global model are eliminated. The client model corresponding to the abnormal client can be removed from the plurality of client models corresponding to the target training round, the rest client models (namely the client models of the normal client) are aggregated, and the global model obtained by aggregation can be used as a repairing global model corresponding to the target training round.

Step 504, transmitting the repair global model to the federation learning module, so that the federation learning module performs a training task after the target training round according to the repair global model.

As one implementation mode, after obtaining the global repairing model corresponding to the target training round, the debugger can transmit the global repairing model to the federal learning module, and the federal learning module can replace the global model obtained by the previous aggregation in the target training round by the global repairing model and re-perform the training task after the target training round according to the global repairing model. For example, the target training round is the 3 rd training round in federal learning, the debugger detects that the 3 rd training round has an abnormal client, and can generate a repair global model according to the residual client model when the client model corresponding to the abnormal client is removed, and send the repair global model to the federal learning module, and the federal learning module can replace the global model obtained by the original aggregation of the 3 rd training round with the repair global model, and perform the 4 th round of training and the 5 th round of training … …. The method can help the federal learning module to obtain a more accurate global module, and improves the training effect of the whole federal learning.

Because the target training round has an abnormal client, a problem may also occur in the training round after the target training round, and thus, in some embodiments, the debugger of the server may pause the simulation process of federal learning after detecting the target training round has an abnormal client. When the federal learning module performs a training task after a target training round according to the repairing global model, training state data corresponding to each training round which is performed most recently can be recorded into a log file, and the debugger continues federal learning simulation according to the acquired training state data after acquiring the training state data of the new training round from the log file.

For example, the target training round is the 3 rd training round in federal learning, the debugger detects that the 3 rd training round has an abnormal client, if the federal learning module has performed the training task after the 3 rd training round, the log file may have recorded therein training status data of the 4 th training round, the 5 th training round, or even more subsequent training rounds, but since the 3 rd training round has an abnormal client, the subsequent training round may have a problem, the debugger may pause the simulation process of federal learning first, and not reproduce the subsequent training rounds first. And replacing the global model obtained by the original aggregation of the 3 rd training round by the repair global model corresponding to the 3 rd training round at the federal learning module, updating the log file after the 4 th training round is performed, recording the training state data of the 4 th training round which is performed most recently, acquiring the training state data of the 4 th training round which is performed most recently from the log file by the debugger, and continuing the subsequent simulation process. The method can avoid the invalid simulation process of the debugger, reduce the processing pressure of the server and improve the detection efficiency and accuracy of the abnormal client.

And step 506, transmitting the client information corresponding to the abnormal client to the federation learning module, so that the federation learning module prohibits the abnormal client from participating in the training task after the target training round according to the client information.

In some embodiments, the debugger may obtain, after determining the abnormal client present in the target training round, client information corresponding to the abnormal client, which may include, but is not limited to, one or more of a client number, a client name, a client network address, a client account number, and the like. The debugger can transmit the client information corresponding to the abnormal client to the federal learning module, and the federal learning module can prohibit the abnormal client from participating in the training task after the target training round according to the client information after acquiring the client information corresponding to the abnormal client. Further, the federal learning module may prohibit the abnormal client from participating in the training task after the target training round, and may not send the federal learning module to the abnormal client a joint model corresponding to the training round in which participation is prohibited.

Alternatively, the federal learning module may prohibit the abnormal client from participating in all training tasks after the target training round, i.e., each training round after the target training round may not be further involved in training.

As an embodiment, the number of training rounds to prohibit participation may be set, for example, the number of training rounds to prohibit participation of the abnormal client is 2 rounds, and then the abnormal client cannot participate in training of two training rounds after the target training round. The number of training rounds prohibited from participating may be a fixed value set in advance or may be a value that is dynamically adjusted, alternatively, the number of training rounds prohibited from participating may be adjusted according to the number of times that the client is detected as an abnormal client in the federal learning process of this time, and the greater the number of times that the client is detected as an abnormal client, the greater the number of training rounds prohibited from participating may be.

For example, if the client is detected as an abnormal client for the 1 st time, the number of training rounds in which the abnormal client is prohibited from participating is 2 rounds, and the abnormal client cannot participate in training of two training rounds after the target training round; if the client is detected as an abnormal client at the 2 nd time in the training rounds of the subsequent participation, the number of training rounds of the forbidden participation can be adjusted to 5 rounds, or the client is directly forbidden to participate in each subsequent training round. By the method, the client-side participating in training can be controlled more flexibly on the premise of ensuring the accuracy of the global model on the whole and the training effect of the whole federal learning.

In the embodiment of the application, after the abnormal client side existing in the target training round is detected, the global model of the target training round can be repaired, the training task after the target training round is carried out according to the repaired global model, and the abnormal client side can be forbidden to participate in the training task after the target training round, so that the accuracy of the model obtained by the whole federal learning can be improved, and the training effect of the federal learning is improved.

It should be noted that, the method for detecting the abnormal client provided in the foregoing embodiments may be applied to a centralized federal learning scenario of a single server, or may be applied to a distributed federal learning scenario of a plurality of servers, where log file sharing may be performed between the plurality of servers, so that each server may obtain training state data of each training round in federal learning of other servers, so as to implement simulation of federal learning.

As shown in fig. 6, in one embodiment, an abnormal client detection apparatus 600 is provided and may be applied to the above-mentioned service end, where the abnormal client detection apparatus 600 may include a model obtaining module 610, a similarity calculating module 620, an abnormal probability determining module 630, and an abnormal client determining module 640.

The model obtaining module 610 is configured to obtain a reference global model corresponding to a target training round, and obtain a plurality of client models corresponding to the target training round; the client models are in one-to-one correspondence with the clients participating in the target training round.

In one embodiment, the model obtaining module 610 is further configured to obtain, if the target training round is the 1 st training round, the initial model to be trained as the reference global model corresponding to the target training round; if the target training round is the T training round, acquiring a global model obtained by aggregation of the T-1 training round as a reference global model corresponding to the target training round; t is an integer greater than 1.

The similarity calculation module 620 is configured to calculate similarities between the client models and the reference global model respectively.

The anomaly probability determining module 630 is configured to determine the anomaly probability of the client corresponding to each client model according to the similarity corresponding to each client model.

An abnormal client determining module 640, configured to determine a client whose abnormal probability is greater than the probability threshold as an abnormal client.

In the embodiment of the application, the abnormal client can be accurately detected, the detection efficiency is improved, and the whole detection scheme does not limit the data distribution of the training data set of the client, so that the abnormal client detection under the federal learning scene with different data distribution can be supported.

In one embodiment, the abnormal client detection apparatus 600 further includes a recording module, a simulation module, and a training round determination module.

The recording module is used for recording training state data corresponding to each training round into a log file in real time when training tasks of each training round of federal learning are carried out; the training state data at least comprises a plurality of client models corresponding to training rounds and a global model obtained by aggregation.

And the simulation module is used for performing federal learning simulation according to the training state data recorded in the log file.

The training round determining module is used for determining the training round in the suspicious state in the simulation process and taking the training round in the suspicious state as the target training round.

In one embodiment, the training state data further includes one or more of the following: the method comprises the steps of training loss values, model performance parameters, super parameters and response time parameters of client models corresponding to the rounds, and training service end running states corresponding to the rounds.

In one embodiment, the suspicious state includes one or more of the following:

the prediction accuracy of the global model corresponding to each of the N continuous training rounds is smaller than or equal to a second accuracy threshold, and N is an integer larger than 1; the target training round is a part or all of the N training rounds.

In one embodiment, the anomaly probability determination module 630 includes a clustering unit, a reference center determination unit, and an anomaly probability determination unit.

And the clustering unit is used for clustering the similarity corresponding to the client models respectively to obtain one or more clusters.

And the reference center determining unit is used for determining a reference cluster center value according to the cluster center values corresponding to the clusters.

And the anomaly probability determining unit is used for determining the anomaly probability of the client corresponding to each client model according to the similarity corresponding to each client model and the reference clustering center value.

In one embodiment, the anomaly probability determining unit is further configured to calculate a standard deviation corresponding to a target cluster where the reference cluster center value is located; and determining the abnormal probability of the client corresponding to each client model according to the difference value between the similarity corresponding to each client model and the reference cluster center value and the standard deviation.

In one embodiment, the similarity calculation module 620 is further configured to calculate a first matrix corresponding to each client model according to the kernel function; calculating a second matrix corresponding to the reference global model according to the kernel function; based on Hilbert-Schmidt independence criterion HISC, according to the first matrix and the second matrix corresponding to each client model, similarity between each client model and the reference global model is calculated.

In one embodiment, the anomaly client detection apparatus 600 further includes a repair module.

The repair module is used for removing the client models corresponding to the abnormal clients from the plurality of client models corresponding to the target training rounds, and aggregating the rest client models to obtain a repair global model corresponding to the target training rounds; and transmitting the repair global model to the federal learning module so that the federal learning module performs training tasks after the target training round according to the repair global model.

In one embodiment, the anomaly client detection apparatus 600 further includes a disable module.

And the prohibition module is used for transmitting the client information corresponding to the abnormal client to the federal learning module so that the federal learning module prohibits the abnormal client from participating in the training task after the target training round according to the client information.

Fig. 7 is a block diagram of an electronic device in one embodiment. As shown in fig. 7, the electronic device 700 may include one or more of the following components: processor 710, memory 720 coupled to processor 710, wherein memory 720 may store one or more computer programs that may be configured to implement methods as described in the various embodiments above when executed by one or more processors 710.

Processor 710 may include one or more processing cores. The processor 710 utilizes various interfaces and lines to connect various portions of the overall electronic device 700, perform various functions of the electronic device 700, and process data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 720, and invoking data stored in the memory 720. Alternatively, the processor 710 may be implemented in hardware in at least one of digital signal processing (Digital Signal Processing, DSP), field programmable gate array (Field-Programmable Gate Array, FPGA), programmable logic array (Programmable Logic Array, PLA). The processor 710 may integrate one or a combination of several of a central processing unit (Central Processing Unit, CPU), an image processor (Graphics ProcessingUnit, GPU), and a modem, etc. The CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for being responsible for rendering and drawing of display content; the modem is used to handle wireless communications. It will be appreciated that the modem may not be integrated into the processor 710 and may be implemented solely by a single communication chip.

The Memory 720 may include a random access Memory (Random Access Memory, RAM) or a Read-Only Memory (ROM). Memory 720 may be used to store instructions, programs, code, sets of codes, or sets of instructions. The memory 720 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for implementing at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the various method embodiments described above, and the like. The storage data area may also store data created by the electronic device 700 in use, and the like.

It is to be appreciated that the electronic device 700 may include more or fewer structural elements than those described in the above structural block diagrams, including, for example, a power module, etc., and may not be limited thereto.

Embodiments of the present application disclose a computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the method as described in the above embodiments.

Embodiments of the present application disclose a computer program product comprising a non-transitory computer readable storage medium storing a computer program, which when executed by a processor, implements a method as described in the above embodiments.

Those skilled in the art will appreciate that all or part of the processes in the methods of the above embodiments may be implemented by a computer program for instructing relevant hardware, where the program may be stored in a non-volatile computer readable storage medium, and where the program, when executed, may include processes in the embodiments of the methods described above. Wherein the storage medium may be a magnetic disk, an optical disk, a ROM, etc.

Any reference to memory, storage, database, or other medium as used herein may include non-volatile and/or volatile memory. Suitable nonvolatile memory can include ROM, programmable ROM (PROM), erasable PROM (EPROM), electrically Erasable PROM (Electrically Erasable PROM, EEPROM), or flash memory. Volatile memory can include random access memory (random access memory, RAM), which acts as external cache memory. By way of illustration and not limitation, RAM may take many forms, such as Static RAM (SRAM), dynamic RAM (Dynamic Random Access Memory, DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDR SDRAM), enhanced SDRAM (Enhanced Synchronous DRAM, ESDRAM), synchronous Link DRAM (SLDRAM), memory bus Direct RAM (Rambus DRAM), and Direct memory bus dynamic RAM (DRDRAM).

It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present application. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. Those skilled in the art will also appreciate that the embodiments described in the specification are alternative embodiments and that the acts and modules referred to are not necessarily required for the present application.

In various embodiments of the present application, it should be understood that the sequence numbers of the foregoing processes do not imply that the execution sequences of the processes should be determined by the functions and internal logic of the processes, and should not be construed as limiting the implementation of the embodiments of the present application.

The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The technical features of the above-described embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above-described embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above describes in detail a method, an apparatus, an electronic device and a storage medium for detecting an abnormal client disclosed in the embodiments of the present application, and specific examples are applied to describe the principles and implementations of the present application, where the description of the above embodiments is only for helping to understand the method and core ideas of the present application. Meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.

Claims

1. An abnormal client detection method, comprising:

2. The method of claim 1, wherein prior to the acquiring the reference global model for the target training round, the method comprises:

when training tasks of each training round of federal learning are carried out, training state data corresponding to each training round are recorded into a log file in real time; the training state data at least comprises a plurality of client models corresponding to training rounds and global models obtained by aggregation;

performing federal learning simulation according to training state data recorded in the log file;

And determining the training rounds in the suspicious state in the simulation process, and taking the training rounds in the suspicious state as target training rounds.

3. The method of claim 2, wherein the training state data further comprises one or more of: the method comprises the steps of training loss values, model performance parameters, super parameters and response time parameters of client models corresponding to the rounds, and training service end running states corresponding to the rounds.

4. The method of claim 2, wherein the suspicious state comprises one or more of:

the prediction accuracy of the global model corresponding to each of the N continuous training rounds is smaller than or equal to a second accuracy threshold, wherein N is an integer greater than 1; the target training round is a part or all of the N training rounds.

5. The method according to any one of claims 1 to 4, wherein the obtaining the reference global model corresponding to the target training round includes:

if the target training round is the 1 st training round, acquiring an initial model to be trained as a reference global model corresponding to the target training round;

if the target training round is the T training round, acquiring a global model obtained by aggregation of the T-1 training round as a reference global model corresponding to the target training round; and T is an integer greater than 1.

6. The method according to claim 1, wherein determining the anomaly probability of the client corresponding to each of the client models according to the similarity corresponding to each of the client models comprises:

clustering the similarity corresponding to the client models respectively to obtain one or more clusters;

determining a reference cluster center value according to the cluster center value corresponding to each cluster;

and determining the abnormal probability of the client corresponding to each client model according to the similarity corresponding to each client model and the reference cluster center value.

7. The method of claim 6, wherein determining the anomaly probability of the client corresponding to each of the client models according to the similarity corresponding to each of the client models and the reference cluster center value comprises:

Calculating a standard deviation corresponding to a target cluster where the reference cluster center value is located;

and determining the abnormal probability of the client corresponding to each client model according to the difference value between the similarity corresponding to each client model and the reference cluster center value and the standard deviation.

8. The method of claim 1, wherein the separately computing similarities between each of the client models and the reference global model comprises:

calculating a first matrix corresponding to each client model according to the kernel function;

calculating a second matrix corresponding to the reference global model according to the kernel function;

and calculating the similarity between each client model and the reference global model according to a first matrix and the second matrix corresponding to each client model based on a Hilbert-Schmidt independence criterion (HISC).

9. The method according to any one of claims 1-4, 6-8, further comprising:

removing the client model corresponding to the abnormal client from a plurality of client models corresponding to the target training round, and aggregating the rest client models to obtain a repair global model corresponding to the target training round;

And transmitting the repair global model to a federal learning module so that the federal learning module performs training tasks after the target training round according to the repair global model.

10. The method according to claim 9, wherein the method further comprises:

and transmitting client information corresponding to the abnormal client to the federal learning module, so that the federal learning module prohibits the abnormal client from participating in training tasks after the target training round according to the client information.

11. An abnormal client detection apparatus, the apparatus comprising:

12. An electronic device comprising a memory and a processor, the memory having stored therein a computer program which, when executed by the processor, causes the processor to implement the method of any of claims 1-10.

13. A computer readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, implements the method according to any one of claims 1-10.