CN111275188A

CN111275188A - Method and device for optimizing horizontal federated learning system and readable storage medium

Info

Publication number: CN111275188A
Application number: CN202010064745.XA
Authority: CN
Inventors: 程勇; 梁新乐; 刘洋; 陈天健
Original assignee: WeBank Co Ltd
Current assignee: WeBank Co Ltd
Priority date: 2020-01-20
Filing date: 2020-01-20
Publication date: 2020-06-12
Anticipated expiration: 2040-01-20
Also published as: CN111275188B

Abstract

The invention discloses a method, equipment and a readable storage medium for optimizing a transverse federated learning system, wherein the method comprises the following steps: randomly determining a neuron on-off mode of a neural network model to be trained; sending the neuron on-off mode to each participating device, so that each participating device can carry out on-off processing on neurons in respective local neural network models according to the neuron on-off mode, and carrying out local training on the processed neural network models to obtain local model parameter updating; and fusing the local model parameter updates, and sending the global model parameter updates obtained by fusion to each participating device so that each participating device can update the local neural network model according to the global model parameter updates. Compared with the existing scheme for avoiding the overfitting phenomenon, the strategy for closing the randomly selected neurons can be well combined with the federal learning, and excessive additional time cost and computing resource consumption are avoided.

Description

Method and device for optimizing horizontal federated learning system and readable storage medium

Technical Field

The invention relates to the technical field of machine learning, in particular to a method and equipment for optimizing a transverse federated learning system and a readable storage medium.

Background

With the development of artificial intelligence, people provide a concept of 'federal learning' for solving the problem of data islanding, so that both federal parties can train a model to obtain model parameters without providing own data, and the problem of data privacy disclosure can be avoided. Horizontal federated learning, also called feature-aligned fed learning, is to extract a part of data with identical data features of participants but not identical users for joint machine learning in the case of more data feature overlap (i.e., data features are aligned) and less user overlap of the individual participants.

In practical application, in the training process of the model for horizontal federal learning, if the parameters of the model are too many and the training samples are few, the trained model is easy to generate an overfitting phenomenon, the overfitting concrete expression is that the loss function of the trained model on training data is small, and the prediction accuracy is high; however, the loss function on the test data is large, the prediction accuracy is low, and the generalization capability of the model is very poor.

To solve this problem, the existing solution avoids over-fitting the model by adjusting the model hyper-parameter (hyper-parameter), or training multiple models to combine, thereby avoiding over-fitting. However, it takes a long time to adjust the hyper-parameters and train a plurality of models, and when the model is applied to federal learning, the training time cost is too high, and too much computing resources are consumed.

Disclosure of Invention

The invention mainly aims to provide a method and equipment for optimizing a transverse federated learning system and a readable storage medium, and aims to solve the problems that the existing scheme for avoiding the over-fitting phenomenon is applied to a federated learning scene, the time cost of model training is high, and the consumption of computing resources is high.

In order to achieve the above object, the present invention provides an optimization method for a horizontal federal learning system, which is applied to a coordinating device participating in horizontal federal learning, wherein the coordinating device is in communication connection with each participating device participating in horizontal federal learning, and the method includes:

randomly determining a neuron on-off mode of a neural network model to be trained, wherein part of neurons of the neural network model are in an off state in the neuron on-off mode;

sending the neuron on-off mode to each participating device, so that each participating device can carry out on-off processing on neurons in the respective local neural network model according to the neuron on-off mode, and carry out local training on the processed neural network model to obtain local model parameters for updating and returning;

and fusing local model parameter updates received from each participating device, and sending the global model parameter updates obtained by fusion to each participating device so that each participating device can update the local neural network model according to the global model parameter updates.

Optionally, the step of randomly selecting a neuron on-off pattern of the neural network model to be trained includes:

and randomly determining a neuron start-stop mode when the neural network model to be trained is trained by adopting small batches of training data in each period of traversal in the global model updating, wherein the local training data of each participating device is divided into a plurality of small batches of training data with the same batch number, the local training data of each traversing participating device is a first period, and the period numbers of local training of each participating device are the same.

Optionally, the step of sending the neuron on-off pattern to each participating device includes:

and distributing the opening and closing modes of the neurons to each participating device in a K M N dimensional matrix form, wherein K is the period number of local training of each participating device, M is the batch number of small training data in each participating device, N is the number of the neurons in the neural network model, and the value of each element in the matrix is used for indicating the opening and closing state of the corresponding neuron.

Optionally, before the step of randomly determining the neuron on-off mode when the model of the neural network model to be trained is updated by using each small batch of training data in each stage of traversal in each global model update, the method further includes:

acquiring the data volume of a small batch of local training data of each participating device;

and setting the learning rate of local model updating of each participating device according to the data volume so that each participating device can update the local model according to the learning rate, wherein the learning rate is in direct proportion to the data volume.

Optionally, the step of fusing the local model parameter updates received from the respective participating devices includes:

and carrying out weighted average on the local model parameter updates received from the participating devices to obtain the global model parameter updates, wherein the weight of each participating device adopted in the weighted average operation is calculated according to the learning rate corresponding to each participating device.

In order to achieve the above object, the present invention further provides a method for optimizing a horizontal federal learning system, which is applied to a participating device participating in horizontal federal learning, and the participating device is in communication connection with a coordinating device participating in horizontal federal learning, and the method includes:

inputting generator parameters into a random number generator, and determining a neuron on-off mode of a neural network model to be trained according to an output result of the random number generator, wherein part of neurons of the neural network model are in an off state under the neuron on-off mode, and each participating device correspondingly adopts the same generator parameters to input the same random number generator in each local training of the neural network model;

opening and closing the neurons in the local neural network model according to the neuron opening and closing mode, locally training the processed neural network model, updating parameters of the obtained local model, and sending the updated parameters to coordination equipment;

and performing model updating on the local neural network model by adopting global model parameter updating received from the coordination equipment, wherein the coordination equipment performs fusion on the local model parameter updating received from each participating equipment to obtain the global model parameter updating.

Optionally, the generator parameters include an iteration index of global model update, a period index of local training, a batch index of small batches of training data, and a neuron index of the neural network model, where the training data local to each participating device is divided into a plurality of small batches of training data with the same batch number, the training data local to each participating device is traversed once by the participating device is a period, and the period numbers of local training of each participating device are the same.

Optionally, the step of performing on-off processing on the neurons in the local neural network model according to the neuron on-off mode includes:

determining neurons to be closed in the local neural network model according to the neuron on-off mode;

setting an output of the neuron to be turned off to zero to turn off the neuron to be turned off.

In order to achieve the above object, the present invention further provides a horizontal federal learning system optimization device, including: a memory, a processor, and a lateral federated learning system optimization program stored on the memory and operable on the processor that, when executed by the processor, implements the steps of the lateral federated learning system optimization method described above.

In addition, to achieve the above object, the present invention further provides a computer readable storage medium, on which a horizontal federal learning system optimization program is stored, wherein the horizontal federal learning system optimization program, when executed by a processor, implements the steps of the horizontal federal learning system optimization method as described above.

In the invention, the neuron on-off mode of the neural network model is randomly determined by the coordination equipment and is sent to each participating equipment, each participating equipment carries out on-off processing on neurons in the neural network model according to the neuron on-off mode, and then the processed neural network model is locally trained, so that each neuron in the neural network model is randomly closed in each global model update of federal learning, the interaction among neuron nodes is reduced, the trained neural network model does not depend on some local characteristics, and the generalization capability of the model is improved. In addition, because the coordination equipment randomly determines the on-off mode of the neurons and uniformly sends the on-off mode to each participating equipment, the closing processing of the neurons during local training of each participating equipment is aligned, and the problem that the strategy for closing the randomly selected neurons loses statistical significance due to the fact that the randomly selected results of each participating equipment are not uniform is avoided. Moreover, compared with the existing scheme for avoiding the over-fitting phenomenon, the strategy for closing the randomly selected neurons adopted in the embodiment of the invention can be well combined with federal learning, and can not bring excessive extra time cost and computing resource consumption.

Drawings

FIG. 1 is a schematic diagram of a hardware operating environment according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart of a first embodiment of a method for optimizing a horizontal federated learning system of the present invention;

FIG. 3 is a diagram illustrating the result of randomly selecting a neuron shutdown for a neural network model according to an embodiment of the present invention;

FIG. 4 is a diagram illustrating alignment of results of random number selection neuron turn-off participating in the apparatus A and B according to an embodiment of the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

As shown in fig. 1, fig. 1 is a schematic device structure diagram of a hardware operating environment according to an embodiment of the present invention.

It should be noted that, in the embodiment of the present invention, the horizontal federal learning system optimization device may be a smart phone, a personal computer, a server, and the like, which is not limited herein.

As shown in fig. 1, the lateral federal learning system optimization device may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, a communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001.

Those skilled in the art will appreciate that the configuration of the apparatus shown in FIG. 1 does not constitute a limitation on the lateral Federal learning System optimization apparatus, and may include more or fewer components than shown, or some components in combination, or a different arrangement of components.

As shown in FIG. 1, memory 1005, which is one type of computer storage medium, may include an operating system, a network communications module, a user interface module, and a horizontal federated learning system optimization program.

When the device shown in fig. 1 is a coordinating device participating in horizontal federal learning, the user interface 1003 is mainly used for data communication with a client; the network interface 1004 is mainly used for establishing communication connection with each participating device participating in horizontal federal learning; and the processor 1001 may be configured to invoke the horizontal federated learning system optimization program stored in the memory 1005 and perform the following operations:

Further, the step of randomly selecting the neuron on-off mode of the neural network model to be trained includes:

Further, the step of sending the neuron on-off pattern to each participating device includes:

Further, before the step of randomly determining the neuron on-off mode when the model of the neural network model to be trained is updated by using each small batch of training data in each period of traversal in each global model update, the method further includes:

Further, the step of fusing the local model parameter updates received from the respective participating devices comprises:

When the device shown in fig. 1 is a participating device participating in horizontal federal learning, the user interface 1003 is mainly used for data communication with the client; the network interface 1004 is mainly used for establishing communication connection with a coordinating device participating in horizontal federal learning; and the processor 1001 may be configured to invoke the horizontal federated learning system optimization program stored in the memory 1005 and perform the following operations:

Further, the generator parameters include an iteration index of global model update, a period index of local training, batch indexes of small batches of training data, and a neuron index of the neural network model, wherein the respective local training data of each participating device is divided into a plurality of small batches of training data with the same batch number, the local training data of each participating device traversed once is a period, and the period numbers of local training of each participating device are the same.

Further, the step of performing on-off processing on the neurons in the local neural network model according to the neuron on-off mode includes:

Based on the structure, various embodiments of the optimization method of the horizontal federal learning system are provided.

Referring to fig. 2, fig. 2 is a flowchart illustrating a first embodiment of the optimization method for a horizontal federated learning system according to the present invention.

While a logical order is shown in the flow chart, in some cases, the steps shown or described may be performed in an order different than presented herein.

The first embodiment of the optimization method of the horizontal federal learning system is applied to the coordination equipment participating in horizontal federal learning, the coordination equipment is in communication connection with a plurality of participation equipment participating in horizontal federal learning, and the coordination equipment and the participation equipment related to the embodiment of the invention can be equipment such as a smart phone, a personal computer and a server. In this embodiment, the method for optimizing the horizontal federal learning system includes:

step S10, randomly determining a neuron on-off mode of a neural network model to be trained, wherein part of neurons of the neural network model are in an off state in the neuron on-off mode;

in this embodiment, the coordinating device and each participating device may establish a communication connection in advance through inquiry handshake authentication and identity authentication, and determine a neural network model to be trained in the federal learning. The neural network model with the same or similar structure can be built locally by each participating device, or the neural network model can be built by the coordinating device and then sent to each participating device. Each participating device has local training data for training the neural network model.

In the horizontal federal learning, the coordinating device and the participating devices are matched with each other to perform multiple times of global model updating on the neural network model to be trained, wherein the model updating refers to updating model parameters of the neural network model, such as connection weight values between neurons, and finally the neural network model meeting the quality requirement is obtained. In the first global model updating, each participating device adopts respective local training data to carry out local training on a local neural network model to obtain local model parameter updating, wherein the local model parameter updating can be gradient information used for updating model parameters or the locally updated model parameters; each participating device updates and sends the local model parameter of each participating device to the coordinating device; the coordination equipment fuses the local model parameter updates, if weighted average is carried out, the global model parameter update is obtained and sent to each participating equipment; and each participating device updates the model parameters of the local neural network model by adopting global model parameter updating, namely, the model updating is carried out on the local neural network model, namely, one-time global model updating is completed. After each global model update, the model parameters of the neural network model local to each participating device are synchronized.

In this embodiment, in order to avoid the overfitting phenomenon of the neural network model obtained by federal learning training, the coordination device randomly determines the neuron on-off mode of the local neural network model when the participating device performs local training. The neural network model comprises a neural network model and a neural network control module, wherein the neural network model comprises a plurality of neurons, the neuron on-off mode is a combined mode which represents that each neuron in the neural network model is in an open state or a closed state, and part of the neurons of the neural network model are in the closed state under the neuron on-off mode; a neuron being in an off state may be that the output of the neuron is set to 0, or that the neuron is not outputting to the next neuron, or that the neuron is disconnected from downstream neurons. If a neuron is in an off state, it will not function in the neural network model. That is, the coordinating device randomly determines which neuron in the neural network model should be turned off and which neuron should be turned on when each participating device is trained locally.

FIG. 3 is a diagram illustrating the result of randomly selecting neuron shutdown for a neural network model. By randomly selecting partial neurons in the neural network model to be closed, the interaction among feature detectors (hidden layer neuron nodes) can be reduced, the detector interaction means that some detectors can play a role only by relying on other detectors, so that the neural network model obtained through training does not rely on some local features too much, the generalization capability of the model is improved, and the over-fitting phenomenon is avoided. It should be noted that, no matter the number of the neurons of the output layer (also referred to as the last layer) of the neural network model is 1 or more, the output layer is not randomly selected, that is, the neuron on-off mode does not include the neurons of the output layer; whereas for the neurons of the input layer (also referred to as the first layer) of the neural network model a random selection is possible, i.e. which features are selected randomly as input.

Before the global model is updated once, the coordination equipment can randomly determine the neuron on-off mode of the neural network model in the global model updating; before the beginning of federal learning, the neuron on-off mode of the neural network model in the following multiple global model updating processes can be randomly determined. It should be noted that, each time is determined randomly, so the neuron on-off patterns of the neural network model in each global model update are not necessarily the same.

For example, in order to determine the on-off state of the first neuron in the neural network model during local training of each participating device in a global model update, a random number generator may be used to generate a random number, compare the random number with a preset value, determine to turn off the neuron if the random number is greater than the preset value, and determine not to turn off the neuron if the random number is not greater than the preset value.

Step S20, sending the neuron on-off mode to each participating device, so that each participating device can carry out on-off processing on neurons in the local neural network model according to the neuron on-off mode, and carry out local training on the processed neural network model, obtain local model parameters and update and return the parameters;

and after determining the neuron on-off mode, the coordination equipment sends the neuron on-off mode to each participating equipment. The form of the neuron on-off mode is not limited, for example, the numbers of the neurons in the neural network model are predefined by the coordinating device and the participating devices, and the coordinating device sends the numbers of the neurons needing to be turned off to the participating devices. After each participating device receives the neuron on-off mode, before local training is carried out on each local neural network model, on-off processing is carried out on each neuron in the neural network model according to the neuron on-off mode. Specifically, in the neuron on-off mode, the neuron indicating off is subjected to the off processing, and the neuron not indicating off or the neuron indicating on is not subjected to the off processing. After the start-stop processing, the participatory device carries out local training on the processed neural network model to obtain local model parameter updating. Specifically, the participating device may input local training data into a current neural network model to obtain a model output; and calculating a loss function according to the model output and the local data label of the participating equipment, calculating the gradient information of the loss function on the model parameter, and sending the gradient information serving as the local model parameter update to the coordination equipment. Or the participating equipment inputs local training data into the current neural network model to obtain model output; and calculating a loss function according to the output of the model and the local data label of the participating equipment, calculating the gradient of the loss function to the model parameter, updating the model parameter by adopting the gradient, and sending the updated model parameter serving as the local model parameter update to the coordination equipment.

The closing process of the participating device on the neuron may be to disconnect the neuron from a downstream neuron, or not to transmit the output of the neuron to a next neuron, or to set the output of the activation function of the neuron to 0. In a local model parameter updating process, the connection weights corresponding to the connections selected to be disconnected are also set to 0 (i.e. the corresponding model parameters are also set to 0), and the gradient corresponding to the disconnected connections is also set to 0. If the participating device is sending gradient information to the coordinating device, the gradient information set to 0 may not be transmitted to the coordinating device.

It should be noted that, the coordinating device sends the neuron on-off mode to be used in one local training to each participating device, the neuron on-off modes used by each participating device in the local training are the same, and the on-off states of the neurons in the processed neural network model are also aligned. As shown in fig. 4, the on-off states of the neurons in the processed neural network model are aligned between the participating device a (participant a in the figure) and the participating device B (participant B in the figure). Therefore, the same random selection result is adopted by each participating device in the same global model updating, and the problem that the strategy for closing the random selection neurons loses statistical significance due to the fact that the random selection results of the participating devices are not uniform is avoided.

It should be further noted that, each participating device performs the on-off processing on the neural network model, after performing one or more local model updates, it is necessary to recover the neurons that are turned off in the neural network model, and when the local model is updated next time, the on-off processing is performed on the neural network model again, that is, the on-off processing is not overlapped for a plurality of times.

And step S30, fusing local model parameter updates received from each participating device, and sending the global model parameter updates obtained by fusion to each participating device so that each participating device can update the local neural network model according to the global model parameter updates.

And the coordination equipment receives the local model parameter updates sent by each participating equipment, fuses the local model parameter updates, and obtains the global model parameter update. Specifically, the coordinating device may perform weighted average on each local model parameter update, and the weight value may be set according to the specific situation of each participating device, for example, may be set according to the proportion of the data amount of the local training data of each participating device. The coordinating device sends the global model parameter updates to the respective participating devices. And each participating device carries out model updating on the respective local neural network model according to the global model parameter updating. Specifically, if the received global model parameter update is gradient information, the participating device calculates an updated model parameter by using the gradient information and the current model parameter of the local neural network model, and uses the updated model parameter as the latest model parameter, that is, completes one global model update. If the received global model parameter update is a model parameter, the participating device adopts the model parameter as the latest model parameter, namely, one global model update is completed.

And circularly updating the global model for many times, and stopping training when the coordination equipment detects that the preset stopping condition is met to obtain the finally trained neural network model. The preset stop condition may be a condition set in advance as needed, and the training may be stopped when the condition is satisfied, for example, the loss function convergence, the number of iterations greater than a set number of times, or the training time greater than a set time.

In this embodiment, the neuron on-off mode of the neural network model is randomly determined by the coordinating device, and is sent to each participating device, so that each participating device performs on-off processing on neurons in the neural network model according to the neuron on-off mode, and then performs local training on the processed neural network model, so that each neuron in the neural network model is randomly turned off in each global model update of federal learning, thereby reducing interaction among neuron nodes, preventing the trained neural network model from depending on some local features, and improving the generalization capability of the model. In addition, because the coordination equipment randomly determines the on-off mode of the neurons and uniformly sends the on-off mode to each participating equipment, the closing processing of the neurons during local training of each participating equipment is aligned, and the problem that the strategy for closing the randomly selected neurons loses statistical significance due to the fact that the randomly selected results of each participating equipment are not uniform is avoided. Moreover, compared with the existing scheme for avoiding the over-fitting phenomenon, the strategy for closing the randomly selected neurons adopted in the embodiment of the invention can be well combined with federal learning, and can not bring excessive extra time cost and computing resource consumption.

Further, the neural network model to be trained may be for predicting credit risk of a bank, the input of the neural network model may be characteristic data of the user, the output may be a risk score for the user, the participating devices may be devices of multiple banks, each having sample data of multiple users locally, and the coordinating device is a third party server independent of the multiple banks. And the coordination equipment and each participating equipment train the neural network model according to the federal learning process in the embodiment to obtain the neural network model finally used for credit risk prediction. And the trained neural network model can be adopted by each bank to predict the credit risk of the user, and the characteristic data of the user is input into the trained model to obtain the risk score of the user. Because the coordination equipment randomly determines the neuron on-off mode of the neural network model in the training process and sends the neuron on-off mode to each participating equipment, each participating equipment carries out neuron on-off processing on a local neural network according to the neuron on-off mode and then carries out local training, and further model training is completed, the generalization capability of the neural network model obtained by training is improved, and the credit risk prediction capability of new user data except training data is better. In addition, the process of federal learning can not bring more time cost for each bank, and the computing resources of each bank device are saved.

It should be noted that the neural network model to be trained may also be used in other application scenarios besides credit risk estimation, such as performance level prediction, paper value evaluation, and the like, and the embodiment of the present invention is not limited herein.

Further, based on the first embodiment, a second embodiment of the method for optimizing a federal learning system is provided, where in this embodiment, the step S10 includes:

step S101, randomly determining a neuron on-off mode when a neural network model to be trained is trained by adopting small batches of training data in each period of traversal in each global model updating, wherein the local training data of each participating device is divided into a plurality of small batches of training data with the same batch number, the local training data of each participating device in traversal is one period, and the local training period numbers of each participating device are the same.

In this embodiment, the local training data of each participating device may be divided into a plurality of small training data batches, and the number of the small training data batches divided by each participating device is guaranteed to be the same. When each participating device performs local training, the participating device can perform local training for multiple periods, the period is one period after the participating device completes one local training data traversal, and the number of periods for performing local training in the same global model updating by each participating device is the same. In the traversal process, the participating devices perform local model updating on the local neural network model once by adopting a batch of small batch of training data every time, and then in one global model updating, the number of times of local model updating to be performed by one participating device is the number of batches of local small batch of training data multiplied by the period number of local training. Each participating device can negotiate to determine the batch number of the small batch of training data and the period number of local training; or the coordinating device determines the batch number and the period number according to the data volume of the training data local to each participating device and sends the batch number and the period number to each participating device.

By dividing the local training data of the participating equipment into a plurality of small batches of data, the data volume to be processed by the participating equipment can not be too large when the local model of the participating equipment is updated each time, so that the computing pressure of the equipment is reduced, and the processor breakdown or overlong computing time caused by excessive data is avoided. The participating devices perform multi-period local training, so that local training data of the participating devices can be fully utilized, the times of updating global model parameters are reduced, and the communication consumption of the coordinating devices and the participating devices is further reduced.

Based on the above, the coordinating device may randomly determine the neuron on-off mode when the neural network model to be trained is trained by using each small batch of training data in each period of traversal in each global model update, and then send each neuron on-off mode to each participating device.

The coordinating device may send one neuron on-off mode at a time, and the sending time is before each participating device updates the local model at a certain time by using the neuron on-off mode. And the neuron on-off mode I used in the next global model updating can be sent to each participating device, and the neuron on-off mode needed when local model updating is performed by adopting each batch of small batch of training data in each period of traversal in the next global model updating is sent at the moment. It should be noted that the coordinating device may carry indication information when sending the neuron on-off mode, so as to indicate which local model update the neuron on-off mode is used for by each participating device. When the local model is updated by using each small batch of training data, the same neuron on-off mode can be adopted, and different neuron on-off modes can also be adopted, that is, the coordination equipment can determine one neuron on-off mode corresponding to two or more small batches of training data in each period of traversal, and does not need to respectively determine one neuron on-off mode for each small batch of training data in each period of traversal.

After receiving the neuron on-off mode, the participating equipment adopts the neuron on-off mode to carry out on-off processing on neurons in a local neural network model, and adopts a small batch of training data corresponding to the neuron on-off model to carry out local model updating on the processed neural network model. If the participating device receives all the neuron on-off modes required in the global model updating, the participating device uses the neuron on-off modes for updating the local model each time.

In this embodiment, the training data local to each participating device is divided into small batches of training data with the same number, and each participating device performs local training with the same number of periods, so that the coordinating device can conveniently unify the neuron on-off mode of each participating device during each local model update, thereby avoiding that the neuron random selection result of each participating device is not unified so that the strategy of neuron random off loses statistical significance, and ensuring the generalization capability of the trained neural network model.

Further, the step S20 includes:

Further, the coordinating device may distribute the neuron on-off pattern to each participating device in a K × M × N dimensional matrix. Wherein, K is the period number of local training of each participating device, M is the batch number of small training data in each participating device, N is the number of neurons in the neural network model, and the value of each element in the matrix is used for indicating the on-off state of the corresponding neuron. The coordinating device and the participating device may negotiate in advance to determine values of each element in the matrix, and specify meanings represented by different values, for example, the values of each element in the matrix may be 0 and 1, where 0 indicates that the corresponding neuron is turned off, and 1 indicates that the corresponding neuron is turned on.

One column of the third dimension of the matrix can be regarded as a bitmap (bitmap) with the length of N, and the bitmap is used for indicating the on-off states of N neurons in the neural network model. The second dimension of the matrix has M columns, and the local model updating is performed by respectively adopting M small batches of training data. The first dimension of the matrix has K rows, corresponding to the local training of K periods, respectively.

After the participating device receives the matrix, the on-off state of each neuron when local training is performed by adopting each small batch of training data in each period of traversal in global model updating is correspondingly determined according to the value of each element in the matrix, and then the on-off processing is performed on the neuron of the neural network model, so that each local training is completed. For example, in one global model update, a first-stage traversal is performed, a first small batch of training data is used for local training, when model parameters are updated, the participating device acquires values of (1,1,1) th to (1,1, N) th elements in the matrix, that is, the values of N elements, and the on-off states of N neurons are correspondingly determined according to the values of the N elements.

Further, the coordinating device may randomly generate the matrix with a certain probability P before a global model update starts, for example, for the (k, m, n) -th element, the coordinating device may generate a random number between 0 and 1, and if the generated random number is greater than the probability P, the coordinating device sets the (k, m, n) -th element of the matrix to 1; otherwise, the coordinator device sets the (k, m, n) -th element of the matrix to 0. Wherein K is 1,2, …, K; m is 1,2, …, M; n is 1,2, …, N.

In this embodiment, the coordination device sends the neuron on-off mode to each participating device in a matrix form, and because the matrix form is simple and the data size is small, the coordination device does not need to additionally increase too much communication overhead to transfer the neuron on-off mode, but ensures the alignment of the neuron on-off modes of each participating device.

Further, the coordinating device may also generate K × M N-dimensional bitmaps, or generate K M × N bitmap matrices, and send the generated bitmaps to each participating device, so as to instruct each participating device to perform a neuron on-off operation on the neural network model.

Further, based on the first and second embodiments, a third embodiment of the method for optimizing a federal learning system is proposed, where in this embodiment, before the step S101, the method further includes:

step S40, acquiring the data volume of the local small training data of each participating device;

since the data amount of the training data local to each participating device is not necessarily the same, but the number of the batches of the training data divided locally is the same, the data amount of the training data local to each participating device is not necessarily the same, in this case, the coordinating device may set different learning rates for each participating device, so that the progress of the local training of each participating device can be kept synchronized.

Specifically, the coordinating device obtains a data volume of a small batch of training data local to each participating device. It may be that each participating device sends a local data volume of a small batch of training data to the coordinating device.

And step S50, setting the learning rate of local model updating of each participating device according to the data volume, so that each participating device can update the local model according to the learning rate, wherein the learning rate is in direct proportion to the data volume.

The coordination device sets the updated learning rate of the local model of each participating device according to each acquired data volume, which may be that the coordination device correspondingly sends the determined learning rate to each participating device. Specifically, the learning rate may be proportional to the data amount, for example, the coordinating device may set a learning rate, such as 0.01, for one of the participating devices, calculate a ratio of the data amount corresponding to the other participating devices to the data amount of the participating device, and multiply the calculated ratio by the learning rate to obtain the learning rate of the other participating devices. For example, the data amount of the training data in the small batch in the participating device 1 is 1000, the data amount of the training data in the small batch in the participating device 2 is 2000, and if the learning rate of the participating device 1 is set to 0.01, the learning rate of the participating device 2 is calculated to be 0.02. And the participating equipment updates the local model according to the learning rate set by the coordinating equipment.

Further, the step of fusing the local model parameter updates received from the respective participating devices in step S30 includes:

step S301, performing weighted average on the local model parameter updates received from each participating device to obtain the global model parameter update, where the weight of each participating device used in the weighted average operation is calculated according to the learning rate corresponding to each participating device.

After the coordinating device sets the learning rates for the participating devices, the coordinating device may add the influence of the learning rates to the weights when performing weighted average on the local model parameter updates sent by the participating devices, that is, the weights of the participating devices used in the weighted average operation may be calculated according to the learning rates corresponding to the participating devices.

Specifically, the coordinating device may set the weight value of each participating device in advance according to other weight setting factors, multiply the weight value of each participating device by the corresponding learning rate, perform normalization processing to obtain the weight associated with the learning rate of each participating device, and perform weighted average on the local model parameter update by using the weight associated with the learning rate to obtain the global model parameter update.

In this embodiment, when the local model parameter updates are weighted and averaged, the influence of different learning rates of each participating device is added to the weights, so that the global model parameter updates obtained by fusion can reflect the contribution of local training data of each participating device to the joint learning, and the quality of the neural network model obtained by training is improved as a whole.

Further, based on the first, second, and third embodiments, a fourth embodiment of the optimization method of the horizontal federal learning system of the present invention is provided, in this embodiment, the optimization method of the horizontal federal learning system is applied to a participating device participating in horizontal federal learning, the participating device is in communication connection with a coordinating device participating in horizontal federal learning, and the coordinating device and the participating device in the embodiment of the present invention may be devices such as a smart phone, a personal computer, and a server. In this embodiment, the method for optimizing the horizontal federal learning system includes the following steps:

step A10, inputting generator parameters into a random number generator, and determining a neuron on-off mode of a neural network model to be trained according to an output result of the random number generator, wherein part of neurons of the neural network model are in an off state in the neuron on-off mode, and each participating device correspondingly adopts the same generator parameters to input the same random number generator in each local training of the neural network model;

in this embodiment, the coordinating device and each participating device may establish a communication connection in advance through inquiry handshake authentication and identity authentication, and determine a neural network model to be trained in the federal learning. The neural network model with the same or similar structure may be built locally by each participating device, or the neural network model may be built by the coordinating device and then sent to each participating device. Each participating device has local training data for training the neural network model.

In this embodiment, in order to avoid the overfitting phenomenon of the neural network model obtained by federal learning training, each participating device may randomly select a part of neurons in the neural network model to be trained to close during each local training. A neuron being in an off state may be that the output of the neuron is set to 0, or that the neuron is not outputting to the next neuron, or that the neuron is disconnected from downstream neurons.

The strategy of closing the neurons is selected through random numbers, interaction among feature detectors (hidden layer neuron nodes) can be reduced, and the detector interaction means that some detectors can play a role only by relying on other detectors, so that a neural network model obtained through training does not depend on some local features too much, the generalization capability of the model is improved, and the over-fitting phenomenon is avoided.

In particular, the same random number generator may be provided locally by each participating device. A random number generator may be negotiated with each other by each participating device, or may be generated by the coordinating device to each participating device to ensure that the random number generators in each participating device are the same.

Each participating device inputs generator parameters into a random number generator, which generates one or more random numbers. Wherein the generator parameter is an input parameter of the random number generator, and the random number generator generates a random number according to the generator parameter. It should be noted that, if the same generator parameters are input to two identical random number generators, the generated random numbers are identical.

And the participating equipment determines the neuron on-off mode of the neural network model to be trained according to the random number generated by the random number generator. The neural network model is provided with a plurality of neurons, the neuron on-off mode is a combined mode of an on state or an off state of each neuron in the neural network model, and part of the neurons of the neural network model are in the off state under the neuron on-off mode. If a neuron is in an off state, it will not function in the neural network model.

It should be noted that, in the course of federal learning, each participating device performs local training on the local neural network model each time of global model update, and each participating device correspondingly inputs the same generator parameters into its own local random number generator in each local training on the neural network model, so as to ensure that the neuron on-off modes adopted when each participating device performs local training on the local neural network model in one global model update are the same, that is, to ensure that each participating device adopts the same result of random selection in the same global model update, thereby avoiding that the result of random selection of each participating device is not uniform, which results in the loss of statistical significance of the strategy of closing the randomly selected neurons.

For example, before each participating device performs local training, the participating device generates N random numbers by using the random number generator, the N random numbers respectively correspond to the N neurons, the N random numbers are compared with a preset value, if the N random numbers are greater than the preset value, the corresponding neurons are determined to be closed, and if the N random numbers are not greater than the preset value, the corresponding neurons are determined not to be closed.

Step A20, opening and closing the neurons in the local neural network model according to the neuron opening and closing mode, locally training the processed neural network model, updating parameters of the local model, and sending the updated parameters to the coordination equipment;

after determining the neuron on-off mode, the participating equipment firstly carries out on-off processing on each neuron in the neural network model according to the neuron on-off mode before carrying out local training on each local neural network model. Specifically, in the neuron on-off mode, the neuron indicating off is subjected to the off processing, and the neuron not indicating off or the neuron indicating on is not subjected to the off processing. After the start-stop processing, the participatory device carries out local training on the processed neural network model to obtain local model parameter updating. Specifically, the participating device may input local training data into a current neural network model to obtain a model output; and calculating a loss function according to the model output and the local data label of the participating equipment, calculating the gradient information of the loss function on the model parameter, and sending the gradient information serving as the local model parameter update to the coordination equipment. Or the participating equipment inputs local training data into the current neural network model to obtain model output; and calculating a loss function according to the output of the model and the local data label of the participating equipment, calculating the gradient of the loss function to the model parameter, updating the model parameter by adopting the gradient, and sending the updated model parameter serving as the local model parameter update to the coordination equipment.

And the coordination equipment receives the local model parameter updates sent by each participating equipment, fuses the local model parameter updates, and obtains the global model parameter update. Specifically, the coordinating device may perform weighted average on each local model parameter update, and the weight value may be set according to the specific situation of each participating device, for example, may be set according to the proportion of the data amount of the local training data of each participating device. The coordinating device sends the global model parameter updates to the respective participating devices.

Further, the step a20 of turning on and off the neurons in the local neural network model according to the neuron turn-on and turn-off mode includes:

step A201, determining neurons to be closed in the local neural network model according to the neuron on-off mode;

step A202, setting the output of the neuron to be closed to zero so as to close the neuron to be closed.

The participating device may determine the neurons to be turned off in the local neural network model according to the neuron on-off mode, i.e., determine which neurons to turn off. And setting the output of the neuron to be closed to be zero so as to achieve the aim of closing the neuron.

In addition, the participating device may disconnect the neuron from a downstream neuron or may not transmit the output of the neuron to a next neuron.

Step A30, performing model update on the local neural network model by using global model parameter update received from the coordinating device, wherein the coordinating device performs fusion on the local model parameter update received from each participating device to obtain the global model parameter update.

And the participating devices receive the global model parameter update sent by the coordinating device and carry out model update on respective local neural network models according to the global model parameter update. Specifically, if the received global model parameter update is gradient information, the participating device calculates the updated model parameter by using the gradient information and the current model parameter of the local neural network model, and uses the updated model parameter as the latest model parameter, i.e., completes one global model update. If the received global model parameter update is a model parameter, the participating device adopts the model parameter as the latest model parameter, namely, one global model update is completed.

And circularly updating the global model for many times, and stopping training when the coordination equipment or one of the participating equipment detects that the preset stopping condition is met to obtain the finally trained neural network model. The preset stop condition may be a condition set in advance as needed, and the training may be stopped when the condition is satisfied, for example, the loss function convergence, the number of iterations greater than a set number of times, or the training time greater than a set time.

In this embodiment, the same generator parameters are input into the same random number generator by each participating device, the neuron on-off mode is determined according to the output result of the random number generator, the on-off processing is performed on the neurons in the neural network model according to the neuron on-off mode, and then the processed neural network model is locally trained, so that in each global model update of federal learning, each neuron in the neural network model is randomly turned off, the interaction among the neuron nodes is reduced, the trained neural network model does not depend on some local features, and the generalization capability of the model is improved. And because each participating device correspondingly adopts the same generator parameters to input the same random number generator in each local training and determines the on-off mode of the neurons according to the output result, the closing processing of the neurons during the local training of each participating device is aligned, thereby avoiding that the strategy of closing the randomly selected neurons loses statistical significance due to the non-uniform random selection result of each participating device. Moreover, compared with the existing scheme for avoiding the over-fitting phenomenon, the strategy for closing the randomly selected neurons adopted in the embodiment of the invention can be well combined with federal learning, and can not bring excessive extra time cost and computing resource consumption.

Further, the local training data of each participating device may be divided into a plurality of small training data batches, and the number of the small training data batches divided by each participating device is guaranteed to be the same. When each participating device performs local training, the participating device can perform local training for multiple periods, the period is one period after the participating device completes one local training data traversal, and the number of periods for performing local training in the same global model updating by each participating device is the same. In the traversal process, the participating devices perform local model updating on the local neural network model once by adopting a batch of small batch of training data every time, and then in one global model updating, the number of times of local model updating to be performed by one participating device is the number of batches of local small batch of training data multiplied by the period number of local training. Each participating device can negotiate to determine the batch number of the small batch of training data and the period number of local training; or the coordinating device determines the batch number and the period number according to the data volume of the training data local to each participating device and sends the batch number and the period number to each participating device.

By dividing the training data of the local participating equipment into a plurality of small batches, the data volume to be processed of the participating equipment during each local model updating can not be too large, so that the computing pressure of the equipment is reduced, and the processor breakdown or overlong computing time caused by excessive data is avoided. The participating devices perform multi-period local training, so that local training data of the participating devices can be fully utilized, the times of updating global model parameters are reduced, and the communication consumption of the coordinating devices and the participating devices is further reduced.

Based on this, the generator parameters of the participating device input random number generator may include an iteration index of global model updates, a period index of local training, a batch index of small batches of training data, and a neuron index of the neural network model. The participatory device inputs the iteration index updated by the global model, the period index of local training, the batch index of small batch of training data and the neuron index of the neural network model into a random number generator to obtain a random number, and the participatory device determines the on-off state of the neuron corresponding to each index according to the random number.

For example, when a participating device performs local training by using an mth small batch of training data in a kth period traversal in a tth global parameter update, to determine the on-off state of an nth neuron, t, k, m and n may be used as a parameter group and input to a random number generator, the random number generator generates a random number ρ between 0 and 1, if ρ is greater than a set value P, such as 0.5, the participating device determines to close the nth neuron, and if ρ is not greater than P, the participating device determines not to close the nth neuron.

In this embodiment, local training data of each participating device is divided into small batches of training data with the same number, and each participating device performs local training with the same number of periods, so that each participating device can conveniently unify the neuron on-off mode of each participating device during each local model update according to the same random number generator and generator parameters, thereby avoiding that the neuron random selection result of each participating device is not unified so that the strategy of neuron random off loses statistical significance, and ensuring the generalization capability of the neural network model obtained by training.

In addition, an embodiment of the present invention further provides a computer-readable storage medium, where a horizontal federal learning system optimization program is stored on the storage medium, and when being executed by a processor, the horizontal federal learning system optimization program implements the steps of the horizontal federal learning system optimization method as described below.

For the embodiments of the horizontal federal learning system optimization device and the computer-readable storage medium of the present invention, reference may be made to the embodiments of the horizontal federal learning system optimization method of the present invention, which are not described herein again.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A method for optimizing a horizontal federated learning system is applied to a coordinating device participating in horizontal federated learning, and the coordinating device is in communication connection with each participating device participating in horizontal federated learning, and the method comprises the following steps:

2. The method of claim 1, wherein the step of randomly selecting the neuron on-off pattern of the neural network model to be trained comprises:

3. The method of claim 2, wherein the step of sending the neuron on-off pattern to each participating device comprises:

4. The method for optimizing a transverse federated learning system as set forth in any one of claims 2 and 3, wherein, before the step of randomly determining the neuron on-off pattern when the model of the neural network model to be trained is updated using each small batch of training data under each phase of traversal in each global model update, further includes:

5. The method of claim 4, wherein the fusing local model parameter updates received from the participating devices comprises:

6. A method for optimizing a horizontal federated learning system is applied to a participating device participating in horizontal federated learning, and the participating device is in communication connection with a coordinating device participating in horizontal federated learning, and the method comprises the following steps:

7. The method of claim 6, wherein the generator parameters include an iteration index of global model update, a period index of local training, a batch index of small batches of training data, and a neuron index of the neural network model, wherein the training data local to each participating device is divided into a plurality of small batches of training data with the same batch number, the training data local to a participating device is traversed once for a period, and the period numbers of local training of each participating device are the same.

8. The method for optimizing a transverse federated learning system as claimed in any one of claims 6 and 7, wherein the step of turning on and off neurons in the local neural network model according to the neuron on and off pattern includes:

9. A lateral federated learning system optimization apparatus, comprising: a memory, a processor, and a lateral federated learning system optimization program stored on the memory and executable on the processor that, when executed by the processor, performs the steps of the lateral federated learning system optimization method of any of claims 1-8.

10. A computer readable storage medium having stored thereon a lateral federal learning system optimization program which, when executed by a processor, performs the steps of a method for lateral federal learning system optimization as claimed in any of claims 1 to 8.