CN113408743A

CN113408743A - Federal model generation method and device, electronic equipment and storage medium

Info

Publication number: CN113408743A
Application number: CN202110726944.7A
Authority: CN
Inventors: 刘吉; 余孙婕; 李兴建; 窦德景
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-06-29
Filing date: 2021-06-29
Publication date: 2021-09-17
Anticipated expiration: 2041-06-29
Also published as: CN113408743B

Abstract

The disclosure provides a method and a device for generating a federal model, electronic equipment and a storage medium, and relates to the field of artificial intelligence, in particular to a deep learning technology. The specific implementation scheme is as follows: acquiring data to be trained sent by a server, and determining a current local model based on the data to be trained; inputting a training sample set into the current local model, and training the current local model; the product of the weight matrix norm and the gradient norm of each network layer in the current local model meets the constraint condition of a Richcetz constant; and sending the updated data of the current local model of the current training to the server so as to enable the server to aggregate and update the data to be trained, issuing the updated data to be trained and continuing training until a joint training finishing condition is met, and determining a target federal model according to the current data to be trained. The embodiment of the disclosure can improve the privacy security of data in the federal learning process.

Description

Federal model generation method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of artificial intelligence, and in particular, to a deep learning technique, and in particular, to a method and an apparatus for generating a federated model, an electronic device, and a storage medium.

Background

Federal Learning (Federal Learning) is a distributed machine Learning technology which breaks data islands and releases artificial intelligence application potential, and can enable all participants of Federal Learning to realize joint modeling by exchanging encrypted machine Learning intermediate results on the premise of not disclosing encryption (confusion) forms of bottom data and bottom data.

In each iteration of federal learning, the central server distributes the current combined model to randomly selected clients, and the clients independently calculate the gradient of the model according to local data of the clients and transmit the gradient to the central server to be aggregated to calculate a new global model.

Disclosure of Invention

The disclosure provides a generation method and device of a federated model, electronic equipment and a storage medium.

According to an aspect of the present disclosure, a method for generating a federated model is provided, which is applied to a client, and includes:

acquiring data to be trained sent by a server, and determining a current local model based on the data to be trained;

inputting a training sample set into the current local model, and training the current local model; the product of the weight matrix norm and the gradient norm of each network layer in the current local model meets the constraint condition of a Richcetz constant;

and sending the updated data of the current local model of the current training to the server so as to enable the server to aggregate and update the data to be trained, issuing the updated data to be trained and continuing training until a joint training finishing condition is met, and determining a target federal model according to the current data to be trained.

According to an aspect of the present disclosure, a method for generating a federated model is provided, which is applied to a server side, and includes:

the method comprises the steps that update data sent by a client side set aiming at a current federated model are obtained, wherein the client side set comprises client sides, the client sides train a current local model based on a federated model generation method as any one of the disclosed embodiments, and the update data are determined;

updating the current federal model according to the updating data;

and issuing the updated data to be trained in the current federated model to the client set so that the client set continues training the updated data to be trained until the current federated model meets the joint training end condition, and determining a target federated model according to the currently trained current federated model.

According to another aspect of the present disclosure, there is provided a generation apparatus of a federated model, configured in a client, including:

the current local model determining module is used for acquiring data to be trained sent by the server and determining a current local model based on the data to be trained;

the current local model training module is used for inputting a training sample set into the current local model and training the current local model; the product of the weight matrix norm and the gradient norm of each network layer in the current local model meets the constraint condition of a Richcetz constant;

and the update data uploading module is used for sending the update data of the current local model to be trained to the server so as to enable the server to aggregate and update the data to be trained, issuing the updated data to be trained to continue training until the joint training end condition is met, and determining a target federal model according to the current data to be trained.

According to another aspect of the present disclosure, there is provided a generating apparatus of a federated model, configured in a server, including:

the system comprises an update data acquisition module, a data processing module and a data processing module, wherein the update data acquisition module is used for acquiring update data sent by a client set aiming at a current federated model, the client set comprises clients, and the clients train a current local model based on a federated model generation method as described in any one of the embodiments of the present disclosure to determine the update data;

the current federal model updating module is used for updating the current federal model according to the updating data;

and the to-be-trained data issuing module is used for issuing the updated to-be-trained data in the current federated model to the client set so that the client set can continue training the updated to-be-trained data until the current federated model meets the joint training end condition, and determining a target federated model according to the currently-trained current federated model.

According to another aspect of the present disclosure, there is provided an electronic device including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method of generating a federated model as described in any embodiment of the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform a method for generating a federated model as described in any one of the embodiments of the present disclosure.

According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method of generating a federated model as described in any one of the embodiments of the present disclosure.

The embodiment of the disclosure can improve the privacy security of data in the federal learning process.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a schematic diagram of a method of generating a federated model in accordance with an embodiment of the present disclosure;

FIG. 2 is a schematic diagram of a method of generating a federated model in accordance with an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of a method of generating a federated model in accordance with an embodiment of the present disclosure;

FIG. 4 is a scene diagram of a method for generating a federated model that may implement an embodiment of the present disclosure

FIG. 5 is a scene diagram of a centralized deep learning method that can implement embodiments of the present disclosure

FIG. 6 is a scene diagram of a method for generating a federated model that may implement an embodiment of the present disclosure

FIG. 7 is a schematic diagram of an apparatus for generating a federated model in accordance with an embodiment of the present disclosure;

FIG. 8 is a schematic diagram of an apparatus for generating a federated model in accordance with an embodiment of the present disclosure;

fig. 9 is a block diagram of an electronic device for implementing the method for generating a federated model in accordance with an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Fig. 1 is a flowchart of a method for generating a federated model disclosed in an embodiment of the present disclosure, where the embodiment may be applied to a case where a client trains a local model and uploads the local model to a server, so that the server trains the federated model. The method of the embodiment may be executed by a generating device of the federal model, and the device may be implemented in a software and/or hardware manner, and is specifically configured in an electronic device with a certain data operation capability, where the electronic device may be a client, and the client may be a mobile phone, a tablet computer, a vehicle-mounted terminal, a desktop computer, and the like.

S101, obtaining data to be trained sent by a server, and determining a current local model based on the data to be trained.

The server is a device in federated learning for aggregating model data trained by multiple clients. The server communicates with the client. And the data to be trained is used for the client to train with the local sample data to form updated data, and the updated data is transmitted back to the server to be aggregated with the updated data of other clients to form a new model. The data to be trained may refer to data of a model distributed to each client for training. The current local model may refer to a model that the client needs to train for the current iteration round. The current local model is used for client training, and the updated data is extracted and provided for the server to form a new model in an aggregation manner, so that the effect of federal learning is achieved. The current local model is a machine learning model, and the current local model may be, for example, a neural network model, such as a fully-connected layer neural network model. Usually, in the process of training a federal model based on federal learning, multiple rounds of training are required, and a client receives data to be trained sent by a server in each iteration round to determine a local model.

The process of federal learning may specifically be: the client performs model training on the global model by using local sample data, calculates gradient, updates model parameters and sends the updated data to the server. And the server aggregates the updated data sent by each client to form a new model, and continuously issues each client for continuous training until the aggregated model in the server meets the training conditions to obtain the federated model.

S102, inputting a training sample set into the current local model, and training the current local model; and the product of the weight matrix norm and the gradient norm of each network layer in the current local model meets the constraint condition of a Richcetz constant.

The training samples may be sample data determined from the training content. The training samples are used to train the current local model. The type of training sample may include at least one of: an image processing type, a natural language processing type, an audio processing type, and the like. For example, the federal model is used for image segmentation, and the training sample can be an image marked with a segmentation target object; the federal model is used for text translation, and the training sample can comprise original text and translation text; the federated model is used for speech recognition, and the training samples may include speech and corresponding text. The content and number of training samples can be set as desired. For example, all data sets may be divided into sub data sets according to a preset number of iterations, and a training sample included in one sub data set is input into the current local model to train the current local model.

The product of the weight matrix norm and the gradient norm of each network layer in the current local model satisfies a Lipschitz (Lipschitz) constant constraint condition, and is used for constraining the product of the weight matrix norm and the gradient norm of each network layer to be a Lipschitz constant, for example, the Lipschitz constant may be 1. The product of the weight matrix norm and the gradient norm of each network layer in the current local model meets the Lipschitz constant constraint condition, which indicates that the current local model meets the Lipschitz constraint condition. The product of the weight matrix norm and the gradient norm of each network layer in the current local model meets the condition of the Lipschitz constant constraint, namely, the Lipschitz constraint is applied to each layer to limit the input disturbance from spreading along the network in the current local model, so that the Lipschitz constraint of the whole network is the product of the Lipschitz constraint on each layer, the output change of the network is proportional to the input change, and the robustness of the current local model is improved to resist malicious attacks. For example, in the case that the current local model is a multi-graph machine learning model, small input perturbations can be propagated within and among graphs, thereby greatly amplifying the perturbations in the output space, so the Lipschitz constraint has a significant defense effect on such model attacks, and the processing accuracy and security of the model are improved.

S103, sending the updated data of the current local model of the current training to the server so that the server aggregates and updates the data to be trained, issuing the updated data to be trained to continue training until a joint training end condition is met, and determining a target federal model according to the current data to be trained.

And the updated data is used for the server to aggregate with the updated data of other clients to form new data to be trained, namely a new federal model. The aggregation of the update data includes, for example, performing weighted Average on the update data provided by each client by using a federal Average method (FedAvg), and updating the data to be trained according to the calculation result. The server continuously transmits the updated data to be trained to the client (including the client for realizing the federate model generation method provided by the embodiment of the disclosure) which is transmitted in the history, and continues training. And under the condition that the updated data to be trained meets the joint training ending condition, the server determines a target federal model according to the current data to be trained, namely the updated data to be trained.

The joint training ending condition is used for detecting whether the federal model is trained completely, and exemplarily, the joint training ending condition is that the training times are greater than or equal to a set time threshold; or the joint training end condition is that the value of the activation function is less than or equal to the set function threshold value, and the like. The current data to be trained means that the model determined according to the current data to be trained meets the joint training ending condition, so that the model determined according to the current data to be trained is determined to be the federal model after training.

In most federal learning application scenes, high requirements are required for data privacy safety, and in order to guarantee the performance of a combined model while ensuring the local data privacy safety, a federal learning system is required to have certain risk resistance. In fact, privacy protection in federal learning is not sufficient to protect underlying training data from privacy disclosure attacks, and the gradient of the model passed during training may expose sensitive information and even cause serious information leakage. In addition, poisoning attacks may be encountered during federal learning. There are two ways of toxic attack: data poisoning attacks and model poisoning attacks. The purpose of a poisoning attack is to use artificially designed data (i.e., data poisoning) or models (i.e., model poisoning) in one or more distributed computing resources to reduce the accuracy of the federated model during the model aggregation process. Data poisoning attacks are implemented by modifying the characteristics or tags of the incoming data. The malicious user can modify a certain class of data points into other classes of data points and participate in distributed training by using the modified data points. Model poisoning attacks refer to attacks in which updated intermediate data (such as gradients or models) are poisoned before being sent to a central server to reduce the accuracy of the aggregated model.

According to the technical scheme, by applying the Lipschitz constraint condition to each network layer, the diffusion of dirty data can be effectively limited so as to resist data poisoning attack

According to the technical scheme, in the training process of the current local model of federal learning, the Lipschitz constraint is applied to each network layer of the current local model, input disturbance is limited to diffuse along the network in the current local model, so that the Lipschitz constraint of the whole network is the product of the Lipschitz constraint on each network layer, the output change of the network is proportional to the input change, the robustness of the current local model can be improved, malicious attack can be resisted, the processing accuracy and the safety of the model can be improved, and the risk resistance of the federal learning system can be improved.

Fig. 2 is a flowchart of another federal model generation method disclosed in an embodiment of the present disclosure, which is further optimized and expanded based on the above technical solution, and can be combined with the above various optional embodiments. The activation function of the current local model includes a function that simulates being monotonically changing or invariant over time.

S201, obtaining data to be trained sent by a server, and determining a current local model based on the data to be trained, wherein an activation function of the current local model comprises a function which is monotonously changed or unchanged along with time.

The activation function is a function used to simulate a monotonic increase, monotonic decrease, or invariance over time. The activation function can simulate functions of various shapes, and the expression capability of the activation function is improved.

Illustratively, the activation function is a Weibull function. In fact, the activation functions in the conventional neural network, such as ReLU, sigmoid, tanh, etc., are to convert the nonlinear function into gradient norm preservation, resulting in the reduction of expression ability. Statistically, the Weibull function is configured with super parameters such as position parameters, shape parameters and size parameters, can simulate any function which changes along with time and is monotonically increased, monotonically decreased or unchanged, and is suitable for reliability analysis and failure analysis, so that the expression capability of the activation function can be improved, the complex nonlinear relation can be fully approximated, the learning capability and performance of the model can be improved, and the performance can be such as detection accuracy or identification accuracy. In one specific example, the Weibull activation function

As follows:

wherein ,

is the t-th Weibull function with unique parameters of lambdat, alphat and μ t, wherein lambdat refers to the t-th dimension parameter, alphat refers to the t-th shape parameter, and μ t refers to the t-th position parameter. z is a pre-activation vector

An element of (1). T is the group number of the parameters, lambdat, alphat and mut are the T-th group of parameters, and the maximum value of T is T. T may be configured as desired. To achieve faster convergence, the Weibull activation functions for T different parameters are composited

Make it

Increasing the upper bound of (A) to T, i.e. configuring T group parameters, and aggregating to obtain

The convergence rate can be accelerated, and the model training efficiency is improved. Pre-activation vector

Is an argument of the activation function. Pre-activation vector

Including at least one element.

Weibull activation function

The derivatives of (c) are as follows:

in fact, the model based on the aforementioned conventional activation function can only approximate a linear function with weak expressive power, for example, the gradient norm of the monotonic activation function is kept to 1. And according to the derivative of the Weber activation function, the gradient norm is determined to break through 1, theoretically, the model based on the Weber activation function can approach any function, the expression capability of the model is greatly improved, and the relation between disturbance diffusion and attack failure is effectively simulated, so that the disturbance diffusion is reduced, the attack is resisted, and the data security is improved.

The activating function is configured into a Weibull function, so that a monotone increasing function, a monotone decreasing function or a monotone unchanging function can be simulated, and the network expression capacity is improved, so that the performance of the current local model is improved, and the performance of the federal model is improved.

S202, inputting a training sample set into the current local model, and training the current local model; and the product of the weight matrix norm and the gradient norm of each network layer in the current local model meets the constraint condition of a Richcetz constant.

Optionally, the training the current local model includes: acquiring each network layer weight matrix obtained by training; updating each of the weight matrices based on the following formula:

wherein ,

and the current local model is a weight matrix of a pth network layer, Kp is a Richcitz constant constraint coefficient of the pth network layer in the current local model, Kp is determined according to the weight matrix norm constraint coefficient and the gradient norm constraint coefficient, and the product of the weight matrix norm and the gradient norm determined by the updated weight matrix meets the Richcitz constant constraint condition.

The weight matrix of each network layer obtained by training refers to training the current local model by using a training sample, and the weight matrix of each network layer of the current local model after training is equivalent to the model parameters obtained by training the current local model, namely the training target of the current local model. The weight matrix of the p network layer is

Is actually a matrix formed by the weights of the neurons in the p-1 network layer relative to the neurons in the p network layer. Based on a formula, using

Substitution

Equivalent to the p network layer weight matrix obtained by training

Is updated to

And determining Kp according to the weight matrix norm constraint coefficient and the gradient norm constraint coefficient, wherein the Kp is used for constraining the weight matrix norm and the gradient norm. Specifically, the updated weight matrix, the norm of the calculated weight matrix, and the updated weight matrix are substituted into the activation function, and the product of the gradient norms obtained by calculation satisfies the constraint condition of the lipschitz constant. Illustratively, Kp is greater than 1 and less than or equal to 1.5.

The weighted matrix obtained by training is updated by multiplying the RipShetz constant constraint coefficient with the weighted matrix obtained by training, so that the product of the norm of the weighted matrix determined by the weighted matrix obtained by training and the gradient norm meets the RipShetz constant constraint condition, the linear correlation between the output change and the input change of the network is accurately realized, the anti-interference capability of the current local model is improved, the malicious attack is resisted, and the processing performance and the safety of the model are improved.

Optionally, the method for generating the federal model further includes: determining the RippSitz constant constraint coefficient based on the following formula:

wherein ,

are the norm constraint coefficients of the weight matrix,

and

in order to constrain the coefficients for the gradient norm,

is a vector of the p network layer in the current local model,

for the vector of the p-1 network layer in the current local model, the activation function of the current local model

Is a Weber distribution function, an activation function f of a standard full-connection neural network model is a standard activation function, epsilon is an error threshold value, W_pThe weight matrix of the p network layer in the fully-connected neural network model.

The vector of the network layer in the current local model is calculated based on the following formula:

wherein ,

for the vector of the p network layer in the current local model,

is a vector of the p-1 network layer in the current local model,

the weight matrix of the current local model and the p network layer is obtained;

is the deviation of the p network layer in the current local model.

N_iRepresenting the number of nodes and R representing the matrix. The current local model is a fully-connected neural network model based on the lipschitz constant constraint condition. The standard fully-connected neural network model is a fully-connected neural network model without the constraint condition of the constant of the Richcitz.

The vector of the network layer in the standard fully-connected neural network model is calculated based on the following formula:

z_p＝W_p×f(z_p-1)+b_p

z_pis a vector of the p network layer in the current local model, z_p-1Is a vector of the p-1 network layer in the current local model, W_pThe weight matrix of the current local model and the p network layer is obtained; b_pIs the deviation of the p network layer in the current local model.

N_iRepresenting the number of nodes and R representing the matrix.

W can be calculated by simultaneously training the current local model and the standard fully-connected neural network model_p×f(z_p-1). Or may be W_pIs configured to be approximately equal to

f(z_p-1) Can convert z into_p-1Function values obtained by inputting the arguments to the standard activation function. The standard activation function may be a function including ReLU, Leaky ReLU, PReLU, Sigmoid, tanh, or Softplus, etc. The standard activation function is an activation function that is commonly used historically.

The error threshold value indicates that, in the case that the input difference between a certain network layer in the current local model with the same structure and the corresponding network layer in the standard fully-connected neural network model is small enough, the function value difference between the network layer in the current local model and the corresponding network layer in the standard fully-connected neural network model is also small enough. That is, the error threshold value also needs to satisfy the inequality

In fact, under the condition that the current local model does not adopt the standard activation function and the constraint of the Leptochis constant, the content similar to the standard fully-connected neural network model can still be expressed, so that the difference value of the input of the two models is small, and the difference value of the corresponding function values is also small.

In fact, the value range of Kp is as follows:

the maximum value in this range is usually taken as K_p。

In fact, under the Lipschitz constraint (the gradient norm of the monotonic activation function remains 1), the activation function is a linear function, which means that a neural network based on the activation function can only approximate a linear function with weak expression ability. Configuration K_pGreater than 1, indicates that the neural network based on the activation function can approximate the function with any shape, thereby greatly improving the network expression capability. And in the standard activation function: the gradient norm of ReLU, Leaky ReLU and PReLU is 1, the gradient norm of Sigmoid is 0.25, and the gradient norm of tanh and Softplus is 1. And the gradient norm of the weber distribution function is greater than 1. That is, for the neural network model based on the Lipschitz constraint condition, the gradient norm of most commonly used activation functions cannot meet the feasible value of the Lipschitz constant constraint condition, that is, cannot be greater than 1, so that the expression capability of the standard activation function is lower than that of the weber distribution function. Meanwhile, through experiments, the method is based on a Weber distribution function, and the Lipschitz-constrained model expression capability is better than that of a GroupSort model (grouping arrangement).

Constraining the weight matrix norm and the gradient norm of the network layer based on the following formula:

accordingly, the product of the weight matrix norm and the gradient norm of each network layer is 1, that is, the constraint condition of the lipschitz constant is satisfied. Wherein, using the nearest matrix orthogonalization and polar decomposition technique, the weight norm is

Weight matrix of

Weight matrix W approximating a fully-connected neural network_p。

While

For the weight matrix norm constraint coefficient, it can be derived based on the following formula:

‖z‖＝max|z|

and (3) substituting the norm constraint of the weight matrix into the weight matrix to obtain:

thus, it is possible to prevent the occurrence of,

for the gradient norm constraint coefficient, it can be derived based on the following formula:

the gradient norm constraint is taken in to obtain:

the method has the advantages that the Rippschtz constant constraint coefficient is determined according to the formulas which meet the Rippschtz constant constraint condition and constrain the weight matrix norm and the gradient norm of the network layer, the Rippschtz constant constraint coefficient which meet the Rippch constant constraint condition can be accurately determined, the weight matrix is updated according to the Rippch constant constraint coefficient, the current local model is guaranteed to meet the Rippch constant constraint condition, meanwhile, the Weber distribution function is used as an activation function, the limit that the Rippch constant constraint coefficient is 1 is broken through, the expression capacity of the model is greatly improved, the relation between disturbance diffusion and attack failure is effectively simulated, and therefore the disturbance diffusion is reduced, attacks are resisted, and data safety is improved.

Optionally, the method for generating the federal model further includes: the lipschitz constant C is calculated based on the following formula:

wherein M is the number of network layers included in the current local model,

is an activation function of the current local model,

is a gradient, | means a norm,

wherein when p is>At 1 hour

And

in Li PuWhile the Hitz constant is constrained to be 1, the current local model is still constrained by the weight matrix and the gradient, and the expression capability of the current local model is maintained.

The method has the advantages that the weight norm constraint and the gradient norm constraint of each network layer are set to be the constant values of the Richcitz, so that the weight norm constraint and the gradient norm constraint are limited on each layer, the local diffusion and the joint diffusion of the dirty data can be effectively limited from each layer, the data poisoning attack is resisted, the interference of the dirty data and the poison attack is reduced, the stability and the fault tolerance of the federal learning are improved, the data safety and the model safety of the current local model are improved, the diffusion of malicious updates in the joint model is limited, and the robustness of the federal learning system is improved.

S203, sending the updated data of the current local model of the current training to the server so that the server aggregates and updates the data to be trained, issuing the updated data to be trained to continue training until a joint training end condition is met, and determining a target federal model according to the current data to be trained.

The weight matrices of the network layers may be sent to the server as update data and/or the gradients of the network layers may be sent to the server as update data. Wherein the gradient propagates forward and backward.

Furthermore, the deviations of the network layers may also be updated based on the following formula:

the bias may also be used as a training target for the current local model, and the update data provided to the server.

According to the technical scheme disclosed by the invention, the network expression capacity is improved by determining the activation function as a function which can simulate monotonous change or invariance along with time, so that the performance of the current local model is improved, and the performance of the federal model is improved.

Fig. 3 is a flowchart of a method for generating a federated model disclosed in the embodiment of the present disclosure, where the embodiment may be applied to a case where a server performs aggregation training on a federated model according to a local model trained by each client. The method of this embodiment may be executed by a generating device of the federal model, and the device may be implemented in a software and/or hardware manner, and is specifically configured in an electronic device with a certain data operation capability, and the electronic device may be a server side.

S301, obtaining update data sent by a client set for a current federated model, where the client set includes clients, and the clients train a current local model based on a federated model generation method as described in any embodiment of the present disclosure, and determine the update data.

The current federal model is a neural network model based on the lipschitz constraints. And the product of the weight matrix norm and the gradient norm of each network layer in the current federal model meets the constraint condition of the Richcetz constant.

Optionally, the client set further includes a client trained based on a standard fully-connected neural network model to obtain the update data.

The client is used for training a neural network model which is not based on the constraint condition of the Leptochis constant to obtain updated data, and the structure of the standard fully-connected neural network model is the same as that of the current local model. The activation function of the standard fully-connected neural network model may be a standard activation function.

The client side of the standard full-connection neural network model which is not limited by the constraint of the Richcitz constant is trained through configuring the client side set, the types of the client sides in the federal learning system are increased while the safety of the federal model is considered, the application scenes of the federal learning are increased, the requirement of diversified federal learning is met, and the flexibility of the federal learning is increased.

S302, updating the current federal model according to the updating data.

The current federal model can be determined from the updated data using a FedAvg algorithm or a voting algorithm.

Optionally, the updating the current federated model according to the update data includes: building a teacher network model according to the updating data; and training the student network model according to the teacher network model, and determining the student network model as a current federal model.

The teacher-student network method belongs to the field of transfer learning. The transfer learning is to transfer the performance of one model to another model, and for a teacher-student network, the teacher network is often a more complex network, has very good performance and generalization capability, and can be used for guiding another simpler student network to learn, so that the simpler student model with less parameter computation amount can also have the performance similar to that of the teacher network, and is a model compression mode. In practice, large and complex networks usually have good performance, but there is much redundant information, so the computation amount and the resource consumption are very large. Useful information in the complex network can be extracted and migrated to a smaller network, so that the learned small network can have performance effect similar to that of the large complex network, and computing resources are greatly saved. The complex network may be considered a teacher and the small network may be considered a student. The teacher network model can be constructed according to a FedAvg algorithm or a voting algorithm, and the student network model is trained based on the teacher network model by adopting a Knowledge Distillation (Knowledge Distillation) method.

By constructing the teacher network model according to the updated data and training the student network model as the current federal model, the redundant information of the model is reduced, the computing resources are saved and the computing efficiency of the model is improved while the model performance and the generalization capability are considered.

Optionally, the constructing a teacher network model according to the update data includes: and constructing a multi-teacher network model according to the updating data.

The multi-teacher network model can be constructed through a voting algorithm, and specifically comprises the following steps: and calculating the accuracy of the current local model set prediction test data corresponding to the updated data, selecting the model with the highest prediction accuracy under each classification to construct a multi-teacher network model, wherein the model can realize the optimal prediction of the current local model set under each classification. The voting algorithm can filter malicious updates of the attacker to defend against model poisoning attacks. Each piece of update data can determine a current local model corresponding to one client, and a plurality of current local models form a current local model set. In a specific example, the prediction accuracy of the current local model set on the test data for each classification is calculated;

wherein ,

indicating the accuracy with which the ith current local model predicts the qth class,

representing the number of times the ith current local model predicts the qth class and is correct,

and predicting the total times of the Q classification by the ith local model, wherein Q is the number of the classifications, and the maximum value of i is the number of the current local models. Selecting a model with the highest accuracy under each classification as a prediction model [ model ] of the current classification₀,model₁……model_Q-1](ii) a Respectively predicting test data by all the selected models, and performing softmax transformation on the prediction result of each piece of data;

wherein ,

represents a model_iAnd predicting the probability that the G-th data in the data block is the q-th classification, wherein G is the number of data in the data block.

Selecting

The ith column of (1) constructs a prediction matrix for a multi-teacher network

By constructing the multi-teacher network, the optimal performance of the current local model set can be kept, and models with poor performance, interference and attack can be filtered, so that malicious update of an attacker is filtered, model poisoning attack is resisted, and the safety and anti-interference performance of the federal model are improved.

In one particular example, a knowledge distillation method is used to train a student network model based on a multi-teacher network model. Knowledge distillation refers to refining knowledge of multiple models to a single model, and the models can be isomorphic or heterogeneous. The Federal learning classical algorithm FedAvg uses a simple weighted average updating model in the global model aggregation process, which requires that a local model and a joint model need to be kept isomorphic, but in practical application, the local model has a complex and diverse structure and is updated rapidly, and weighted average cannot be realized to update the joint model. And the simple weighted average cannot prevent malicious updates submitted by attackers, and particularly can cause the collapse of the model under the scene of multi-attacker combined attack. Each model structure may be different for a multi-teacher network model. The knowledge distillation method trains a global model from a current local model set, constructs a multi-teacher network model through a voting algorithm, trains a student network model, namely the global model, by using the output of the teacher network model and the real label of data, and realizes approximate preservation of the optimal performance of the current local model set. The knowledge distillation combined model polymerization algorithm can ensure the convergence rate and simultaneously provide a polymerization scheme of the heterogeneous current local model. Specifically, the method comprises the following steps: as before, a multi-teacher network model is built from the current local model set using a voting algorithm; calculating the predicted output of the multi-teacher network model and the student network model aiming at the training data; calculate student network loss as follows:

wherein ,L_KDThe transmission is carried out in the reverse direction,

predicting the softened probability distribution cross entropy corresponding to the soft target, namely the teacher network model, for KD loss; (1-. alpha.) CrossEntrol (Q)_s,y_true) For CE loss, the true annotated cross entropy corresponds to the hard target, i.e., the sample. Wherein T is a temperature parameter, and the smaller T is, the easier the probability of wrong classification is amplified, and unnecessary noise is introduced; alpha is a soft target cross entropy weighting coefficient, and the larger alpha represents the contribution of the student network model training induction depending on the teacher network model; q_s and Q_tRepresenting softmax transformations to the student network model and the teacher network model.

By adopting knowledge distillation, a heterogeneous multi-teacher network model can be supported, application scenes of a student network model aggregated training based on the heterogeneous model are increased, and application scenes of federal learning are enriched.

Optionally, training the student network according to the teacher network model includes: under the condition that the structures of the current local models corresponding to the updated data are the same, determining an alternative teacher network model by adopting a combined weighted average algorithm; comparing the prediction accuracy of the multi-teacher network model with the prediction accuracy of the alternative teacher network model; selecting the network model with the highest prediction accuracy as a target teacher network model; and training a student network according to the target teacher network model.

And the current local models corresponding to the updated data have the same structure, and represent that the teacher network model is built according to the isomorphic models. The alternative teacher network model is a teacher network model constructed based on an isomorphic current local model, for example, the alternative teacher network model can be constructed according to a plurality of isomorphic current local models by using a FedAvg algorithm, and the number of the alternative teacher network models is usually 1. And the number of the multi-teacher network models is plural. And acquiring test data, predicting the prediction data by respectively adopting a multi-teacher network model and an alternative teacher network model, and calculating and comparing the prediction accuracy. The target teacher network model is a teacher network model with high prediction accuracy, namely good performance. And taking the network model with the highest prediction accuracy as a target teacher network model with better performance. It should be noted that, under the condition that the structures of the current local models corresponding to the update data are different, the FedAvg algorithm is not suitable for determining the candidate teacher network model, and at this time, the multi-teacher network model is considered to be the target teacher network model with the best performance.

Under the condition that the current local models are isomorphic, the performance between the teacher networks obtained by comparing the multi-teacher network constructed by the voting algorithm and the FedAvg weighted average is compared, the model with better performance is selected as the target teacher network model, and the student network model is trained, so that the teacher network model with the best performance can be selected to train the student network model, the performance of the student network model can be improved, and the performance of the federal model can be improved.

And S303, issuing the updated data to be trained in the current federated model to the client set so that the client set continues to train the updated data to be trained until the current federated model meets the joint training end condition, and determining a target federated model according to the currently trained current federated model.

And the updated data to be trained in the current federated model is used for being provided for the client, determining the local model, continuing training to obtain updated data, and providing the updated data to the server so as to form a new current federated model in a polymerization manner. The data to be trained may include at least one of: the updated weight matrix, gradient, deviation, etc. of the current federal model. Illustratively, the data to be trained is an updated weight matrix of the current federated model. The joint training ending condition may be that the current federal model is subjected to multiple rounds of iterative training, the minimum function value of the activation function is selected under the condition that the iteration rounds are greater than or equal to a set threshold value, and the parameter of the activation function at the moment is used as an optimal solution, namely, a weight matrix, so that the current federal model is determined. Each round of training comprises a plurality of iterations, training samples are randomly divided into groups with preset number, and one group is selected for training in each iteration.

According to the technical scheme, the update data sent by the client side set aiming at the current federal model are obtained, wherein the update data are determined by training the current local model through a generation method of the federal model, a plurality of update data are aggregated, and the current federal model is updated, so that the diffusion of dirty data can be effectively limited when the client side trains the local model, poisoning attack is resisted, the safety of the update data is improved, and malicious attack of an attacker is filtered, so that the anti-interference performance and the safety of the federal model are improved, and the performance of the federal model is improved.

Fig. 4 is a scene diagram of a generation method of a federated model disclosed in the embodiment of the present disclosure, and the generation method of the federated model in the embodiment of the present disclosure may be applied to an application scene of horizontal federated learning.

The horizontal federal learning is a distributed deep learning method, which corresponds to the centralized deep learning method shown in fig. 5. Illustratively, in the centralized deep learning process, based on m training samples, each training sample extracts n features, and a neural network model is trained. In the horizontal federal learning process, p clients or users download global models from a server. And each client or user trains the downloaded global model based on a certain number of training samples to obtain a current local model, uploads the current local model to the server, and the server performs aggregation updating to determine a new global model and issues the new global model. Each client may train the global model based on m training samples, or may train the global model based on some training samples of the m training samples.

In one particular example, as shown in FIG. 6, the federated learning system includes n clients and servers. In each round of training, each client side in n client sides based on the generation method of the federal model in any embodiment of the disclosure locally trains the current local model, and each neural network layer in the current local model meets the constraint coefficient K of the constant of the Richcitz_p. The server obtains the current local models uploaded by the n clients, specifically, the current local models may be weight matrixes or gradients of the current local models. And the server adopts a voting algorithm to construct a multi-teacher network model according to the n current local models. Training a student network model by adopting a knowledge distillation method based on a multi-teacher network model, specifically comprising the following steps: inputting the training samples into the multi-teacher network model to obtain the prediction output of the multi-teacher network model, and using the prediction output as a soft target of the student network model; the real label of the sample data determined manually or in other ways in the sample data is used as a hard target of the student network model; and inputting the training samples into the student network model until the loss function determined based on the soft target and the hard target is minimum, and determining the currently trained student network model as the current federal model. Wherein each neural network layer in the current federal model satisfies the constraint coefficient K of the constant of the Leptochiz_p. Under the condition of training of a specified round number, the round number where the minimum function value of the activation function is located is inquired, and the current federated model corresponding to the round number is determined as a final target federated model.

According to the technical scheme, the network model based on the Lipschitz constraint is applied to the local model, so that the diffusion of dirty data can be effectively limited locally at a client, and the data poisoning attack can be resisted locally; the network model based on the Lipschitz constraint is applied to the federal model, so that the spread of malicious update of an attacker can be effectively limited at the server, and the model poisoning attack can be resisted at the server; and a multi-network and knowledge distillation algorithm are established through a voting algorithm, a single network is extracted from the multi-network, a heterogeneous network aggregation scheme is provided, the anti-attack capability of the federal learning system can be improved under the condition of ensuring the convergence speed, malicious updating of an attacker can be filtered from both a local server and a server, the anti-interference performance and the safety of the federal learning system are improved, and the performance of the federal model is considered.

According to an embodiment of the present disclosure, fig. 7 is a structural diagram of a device for generating a federated model in an embodiment of the present disclosure, and the embodiment of the present disclosure is applicable to a case where a client trains a local model and uploads the local model to a server, so that the server trains the federated model. The device is realized by software and/or hardware and is specifically configured in electronic equipment with certain data operation capacity, such as a client.

Fig. 7 shows a generation apparatus 400 of a joint model, which includes: an image feature detection module 401, a standard feature detection module 402 and an occlusion recognition module 403; wherein,

a current local model determining module 401, configured to obtain data to be trained sent by a server, and determine a current local model based on the data to be trained;

a current local model training module 402, configured to input a training sample set into the current local model, and train the current local model; the product of the weight matrix norm and the gradient norm of each network layer in the current local model meets the constraint condition of a Richcetz constant;

an update data uploading module 403, configured to send update data of the currently trained current local model to the server, so that the server aggregates and updates the data to be trained, issues the updated data to be trained, and continues training until a joint training end condition is met, and determines a target federal model according to the current data to be trained.

Further, the activation function of the current local model includes a function that simulates being monotonically changing or invariant over time.

Further, the current local model training module 402 includes: a weight matrix obtaining unit, configured to obtain each of the network layer weight matrices obtained through training; a weight matrix updating unit for updating each of the weight matrices based on the following formula;

wherein ,

is a weight matrix, K, of the p network layer in the current local model_pIs the Lipschitz constant constraint coefficient, K, of the p network layer in the current local model_pAnd determining according to the weight matrix norm constraint coefficient and the gradient norm constraint coefficient, wherein the product of the weight matrix norm and the gradient norm determined by the updated weight matrix meets the constraint condition of the Richcetz constant.

Further, the generating device of the federal model further comprises: the lipschitz constant constraint coefficient determination module is used for determining the lipschitz constant constraint coefficient based on the following formula:

wherein ,

are the norm constraint coefficients of the weight matrix,

and

in order to constrain the coefficients for the gradient norm,

is a vector of the p network layer in the current local model,

is the current localVector of p-1 network layer in model, activation function of the current local model

Further, the generating device of the federal model further comprises: a lipschitz constant determination module to calculate the lipschitz constant C based on the formula:

wherein M is the number of network layers included in the current local model,

is an activation function of the current local model,

is a gradient, | means a norm,

the generating device of the federal model can execute the generating method of the federal model provided by any embodiment of the disclosure, and has corresponding functional modules and beneficial effects for executing the generating method of the federal model.

According to an embodiment of the present disclosure, fig. 8 is a structural diagram of a device for generating a federated model in the embodiment of the present disclosure, and the embodiment of the present disclosure is applied to a case where a server performs aggregation training of a federated model according to a local model trained by each client. The device is realized by software and/or hardware and is specifically configured in electronic equipment with certain data operation capacity, such as a server side.

Fig. 8 shows a generation apparatus 500 of a joint model, which includes: an update data acquisition module 501, a current federal model update module 502 and a to-be-trained data issuing module 503; wherein,

an update data obtaining module 501, configured to obtain update data sent by a client set for a current federated model, where the client set includes clients, and the clients train a current local model based on a federated model generation method as described in any one of the embodiments of the present disclosure to determine the update data;

a current federal model update module 502 for updating the current federal model according to the update data;

a to-be-trained data issuing module 503, configured to issue updated data to be trained in the current federated model to the client set, so that the client set continues to train the updated data to be trained until the current federated model meets a joint training end condition, and determine a target federated model according to the currently-trained current federated model.

According to the technical scheme, the image characteristics of the license plate image to be detected, the first standard characteristics of the license plate image which is not shielded and the second standard characteristics of the license plate image which is shielded are obtained, the first standard characteristics and the second standard characteristics are compared with the image characteristics, the type of the image characteristics is determined, the shielding recognition result of the license plate image to be detected is detected, a new sample is prevented from being added to retrain a neural network model to realize the recognition of the shielding license plate of a new style, the dependence on the marked shielding license plate image can be reduced, the acquisition difficulty of the license plate shielding sample is reduced, the marking cost of shielding sample data is reduced, the generation cost of a federal model is reduced, the recognition operation of the license plate shielding is simplified, the recognition efficiency of the license plate shielding is improved, the dependence on training data is reduced, and the recognition accuracy rate of the license plate image in a new shielding form can be improved, therefore, the realization cost of recognizing the shielded license plate and the recognition accuracy of the shielded license plate are both considered.

Further, the current federal model update module 502 includes: the teacher network model building unit is used for building a teacher network model according to the updating data; and the student network training unit is used for training the student network model according to the teacher network model and determining the student network model as the current federal model.

Further, the teacher network model building unit includes: and the multi-teacher network model building subunit is used for building the multi-teacher network model according to the updating data.

Further, the student network training unit includes: the alternative teacher network model determining subunit is used for determining an alternative teacher network model by adopting a joint weighted average algorithm under the condition that the structures of the current local models corresponding to the updated data are the same; the prediction accuracy comparison subunit is used for comparing the prediction accuracy of the multi-teacher network model with the prediction accuracy of the alternative teacher network model; the target teacher network model screening subunit is used for selecting the network model with the highest prediction accuracy as the target teacher network model; and the student network training subunit is used for training a student network according to the target teacher network model.

In the technical scheme of the disclosure, the acquisition, storage, application and the like of the personal information of the related user all accord with the regulations of related laws and regulations, and do not violate the good customs of the public order.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 9 illustrates a schematic block diagram of an example electronic device 600 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein. The electronic device may be a client or a server.

As shown in fig. 9, the apparatus 600 includes a computing unit 601, which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the device 600 can also be stored. The calculation unit 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

A number of components in the device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, a mouse, or the like; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 601 executes the respective methods and processes described above, such as the generation method of the federal model. For example, in some embodiments, the method of generating the federated model may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 600 via the ROM 602 and/or the communication unit 609. When the computer program is loaded into RAM 603 and executed by the computing unit 601, one or more steps of the method for generating a federated model described above may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured to perform the generation method of the federated model by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in this disclosure may be performed in parallel, sequentially, or in a different order, as long as the desired results of the technical solutions provided by this disclosure can be achieved, and are not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A generation method of a federated model comprises the following steps:

2. The method of claim 1, wherein the activation function of the current local model comprises simulating a function that varies monotonically or is invariant over time.

3. The method of claim 2, wherein the training the current local model comprises:

acquiring each network layer weight matrix obtained by training;

updating each of the weight matrices based on the following formula:

wherein ,

4. The method of claim 3, further comprising:

determining the RippSitz constant constraint coefficient based on the following formula:

wherein ,

are the norm constraint coefficients of the weight matrix,

and

in order to constrain the coefficients for the gradient norm,

is a vector of the p network layer in the current local model,

5. The method of claim 1, further comprising:

the lipschitz constant C is calculated based on the following formula:

wherein M is the number of network layers included in the current local model,

is an activation function of the current local model,

is a gradient, | means a norm,

6. a generation method of a federated model comprises the following steps:

acquiring update data sent by a client set aiming at a current federated model, wherein the client set comprises clients, and the clients train a current local model based on the generation method of the federated model as defined in any one of claims 1 to 5 and determine the update data;

updating the current federal model according to the updating data;

7. The method of claim 6, wherein said updating the current federated model according to the update data comprises:

building a teacher network model according to the updating data;

and training the student network model according to the teacher network model, and determining the student network model as a current federal model.

8. The method of claim 7, wherein said building a teacher network model from said update data comprises:

and constructing a multi-teacher network model according to the updating data.

9. The method of claim 8, wherein training a student network according to a teacher network model comprises:

under the condition that the structures of the current local models corresponding to the updated data are the same, determining an alternative teacher network model by adopting a combined weighted average algorithm;

comparing the prediction accuracy of the multi-teacher network model with the prediction accuracy of the alternative teacher network model;

selecting the network model with the highest prediction accuracy as a target teacher network model;

and training a student network according to the target teacher network model.

10. An apparatus for generating a federated model, comprising:

11. The apparatus of claim 10, wherein the activation function of the current local model comprises a function that models monotonic variation or invariance over time.

12. The apparatus of claim 11, wherein the current local model training module comprises:

a weight matrix obtaining unit, configured to obtain each of the network layer weight matrices obtained through training;

a weight matrix updating unit, configured to update each of the weight matrices based on the following formula:

wherein ,

13. The apparatus of claim 12, further comprising:

the lipschitz constant constraint coefficient determination module is used for determining the lipschitz constant constraint coefficient based on the following formula:

wherein ,

are the norm constraint coefficients of the weight matrix,

and

in order to constrain the coefficients for the gradient norm,

is a vector of the p network layer in the current local model,

14. The apparatus of claim 10, further comprising:

a lipschitz constant determination module to calculate the lipschitz constant C based on the formula:

wherein M is the number of network layers included in the current local model,

is an activation function of the current local model,

is a gradient, | means a norm,

15. an apparatus for generating a federated model, comprising:

an update data acquisition module, configured to acquire update data sent by a client set for a current federated model, where the client set includes clients, and the clients train a current local model based on the federated model generation method according to any one of claims 1 to 5 to determine the update data;

16. The apparatus of claim 15, wherein the current federated model update module comprises:

the teacher network model building unit is used for building a teacher network model according to the updating data;

and the student network training unit is used for training the student network model according to the teacher network model and determining the student network model as the current federal model.

17. The apparatus of claim 16, wherein the teacher network model building unit comprises:

and the multi-teacher network model building subunit is used for building the multi-teacher network model according to the updating data.

18. The apparatus of claim 17, wherein the student network training unit comprises:

the alternative teacher network model determining subunit is used for determining an alternative teacher network model by adopting a joint weighted average algorithm under the condition that the structures of the current local models corresponding to the updated data are the same;

the prediction accuracy comparison subunit is used for comparing the prediction accuracy of the multi-teacher network model with the prediction accuracy of the alternative teacher network model;

the target teacher network model screening subunit is used for selecting the network model with the highest prediction accuracy as the target teacher network model;

and the student network training subunit is used for training a student network according to the target teacher network model.

19. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of generating a federated model as recited in any one of claims 1-9.

20. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of generating a federated model according to any one of claims 1-9.

21. A computer program product comprising a computer program which, when executed by a processor, implements a method of generating a federated model according to any one of claims 1-9.