CN111310932A

CN111310932A - Method, device and equipment for optimizing horizontal federated learning system and readable storage medium

Info

Publication number: CN111310932A
Application number: CN202010084745.6A
Authority: CN
Inventors: 程勇; 刘洋; 陈天健
Original assignee: WeBank Co Ltd
Current assignee: WeBank Co Ltd
Priority date: 2020-02-10
Filing date: 2020-02-10
Publication date: 2020-06-19

Abstract

The invention discloses a method, a device, equipment and a readable storage medium for optimizing a transverse federated learning system, wherein the method comprises the following steps: determining a target type of local model parameter update needing to be sent by each participating device in each round of model update from parameter update types according to a preset strategy, wherein the parameter update types comprise model parameter information and gradient information; sending indication information indicating the target type to each participating device, so that each participating device can carry out local training according to the indication information and returns local model parameters of the target type for updating; and fusing the local model parameter updates of the target types received from the participating devices, and sending the global model parameter updates obtained by fusion to the participating devices so that the participating devices update the models according to the global model parameter updates. The method combines the respective advantages of the gradient averaging algorithm and the model averaging algorithm, and realizes a mixed federal averaging mechanism.

Description

Method, device and equipment for optimizing horizontal federated learning system and readable storage medium

Technical Field

The invention relates to the technical field of machine learning, in particular to a method, a device, equipment and a readable storage medium for optimizing a horizontal federated learning system.

Background

With the development of artificial intelligence, people provide a concept of 'federal learning' for solving the problem of data islanding, so that both federal parties can train a model to obtain model parameters without providing own data, and the problem of data privacy disclosure can be avoided.

Horizontal federated learning, also called feature-aligned federated learning, is to extract a part of data with identical data features of participants but not identical users for joint machine learning in the case of more data feature overlap (i.e., data features are aligned) and less user overlap of the individual participants.

Existing horizontal federal learning model training uses a gradient-averaging algorithm to ensure that the model training process will converge. However, the scheme requires that each participant sends gradient information to the coordinator after each model update, so that the communication overhead is high, and the communication cost and the time cost of the Federal learning model training are increased. There is also a scheme using a model averaging algorithm, in which each participant sends model parameter information to the coordinator for each model update, and each participant is allowed to perform multiple model updates locally to reduce traffic, but this scheme cannot guarantee model convergence, and even convergence cannot guarantee model performance. Thus, existing horizontal federal learning schemes do not both compromise communication overhead and model performance.

Disclosure of Invention

The invention mainly aims to provide a method, a device, equipment and a readable storage medium for optimizing a horizontal federal learning system, and aims to solve the problem that the existing horizontal federal learning scheme can not give consideration to both communication overhead and model performance.

In order to achieve the above object, the present invention provides a method for optimizing a horizontal federal learning system, which is applied to a coordinating device participating in horizontal federal learning, wherein the coordinating device is in communication connection with each participating device participating in horizontal federal learning, and the method includes:

determining a target type of local model parameter update required to be sent by each participating device in each round of model update from parameter update types according to a preset strategy, wherein the parameter update types comprise model parameter information and gradient information;

sending indication information indicating the target type to each participating device, so that each participating device carries out local training according to the indication information and returns local model parameters of the target type for updating;

and fusing the local model parameter updates of the target types received from the participating devices, and sending the global model parameter updates obtained by fusion to the participating devices so that the participating devices update the models according to the global model parameter updates.

Optionally, the step of determining, according to a preset policy, a target type of a local model parameter update sent by each of the participating devices in each round of model update from parameter update types includes:

acquiring current federal learning state information in one round of model updating;

and determining the target type of local model parameter update required to be sent by each participating device from parameter update types according to the federal learning state information.

Optionally, the federated learning state information includes model convergence state information,

the step of determining a target type of local model parameter update required to be sent by each participating device from parameter update types according to the federal learning state information comprises;

when the model convergence speed in the model convergence state information is detected to be smaller than the preset convergence speed, determining that the target type of local model parameter update required to be sent by each piece of participating equipment is model parameter information; or the like, or, alternatively,

and when the model convergence jitter value in the model convergence state information is detected to be larger than a preset jitter value, determining that the target type of local model parameter update required to be sent by each piece of participating equipment is gradient information.

Optionally, the federally learned status information includes network communication status information of the coordinating device,

when detecting that the network communication speed in the network communication state information is lower than a preset communication speed, determining that the target type of local model parameter update required to be sent by each piece of participating equipment is model parameter information;

and when the network communication speed is detected to be not less than the preset communication speed, determining that the target type of local model parameter update required to be sent by each participating device is gradient information.

Optionally, the federal learning status information includes a performance index upgrade speed of the model,

when the performance index lifting speed is detected to be smaller than a preset lifting speed, determining that the target type of local model parameter updating needing to be sent by each participating device is gradient information;

and when the performance index lifting speed is not less than the preset lifting speed, determining that the target type of the local model parameter update required to be sent by each piece of participating equipment is model parameter information.

In order to achieve the above object, the present invention further provides a method for optimizing a horizontal federal learning system, which is applied to a participating device participating in horizontal federal learning, and the participating device is in communication connection with a coordinating device participating in horizontal federal learning, and the method includes:

determining a target type of local model parameter update needing to be sent to the coordination equipment in each round of model update from parameter update types according to a preset strategy, wherein the parameter update types comprise model parameter information and gradient information;

performing local training according to the target type, and sending local model parameter updates of the target type to the coordination device, so that the coordination device fuses the local model parameter updates of the target type received from each of the participating devices to obtain global model parameter updates;

performing model updates according to the global model parameter updates received from the coordinating device.

Optionally, the step of determining, from the parameter update types according to a preset policy, a target type of the local model parameter update that needs to be sent to the coordinating device in each round of model update includes:

in one round of model updating, receiving indication information sent by the coordination equipment, and extracting a target type of local model parameter updating needing to be sent to the coordination equipment from the indication information; or the like, or, alternatively,

determining a target type of local model parameter update needing to be sent to the coordinating equipment in each round of model update from parameter update types according to negotiation information, wherein the participating equipment and the coordinating equipment negotiate to obtain the negotiation information; or the like, or, alternatively,

and determining the target type of the local model parameter update needing to be sent to the coordination equipment in each round of model update from the parameter update types according to a preset rule related to the global iterative index.

Optionally, the step of sending the local model parameter update of the target type to the coordinating device includes:

and sending a message containing the local model parameter update of the target type to the coordination equipment, wherein the message carries indication information indicating the target type.

Optionally, the step of performing local training according to the determined target type and sending the local model parameter update of the target type to the coordinating device includes:

when the target type is gradient information, inputting local training data into a current model, calculating to obtain target gradient information based on model output and a data label, and sending the target gradient information serving as a local model parameter update to the coordination equipment;

and when the target type is model parameter information, performing local model updating on the current model at least once by using the local training data, and sending the updated target model parameter information serving as a local model parameter update to the coordination equipment.

In order to achieve the above object, the present invention further provides a horizontal federal learning system optimization device, including: a memory, a processor, and a lateral federated learning system optimization program stored on the memory and operable on the processor that, when executed by the processor, implements the steps of the lateral federated learning system optimization method described above.

In addition, to achieve the above object, the present invention further provides a computer readable storage medium, on which a horizontal federal learning system optimization program is stored, wherein the horizontal federal learning system optimization program, when executed by a processor, implements the steps of the horizontal federal learning system optimization method as described above.

In the invention, the target type of local model parameter update required to be sent by each participating device in each round of model update is determined by the coordination device according to a preset strategy, and indication information indicating the target type is sent to each participating device, so that each participating device can carry out local training according to the indication information and return to the local model parameter update of the target type; and fusing the local model parameter updates of the target types received from the participating devices, and sending the global model parameter updates obtained by fusion to the participating devices for the participating devices to update the models according to the global model parameter updates. The invention realizes that the coordinating equipment can select the type of the local model parameter update sent by the participating equipment, namely the type of the local model parameter update sent by the participating equipment in each round of model update process can be dynamically adjusted, and a mixed federal average mechanism is realized by combining the respective advantages of a gradient average algorithm and a model average algorithm, thereby realizing the communication overhead and the model performance in the transverse federal learning process, namely, the communication overhead is reduced, and simultaneously, the model convergence can be ensured as much as possible and the model performance is ensured.

Drawings

FIG. 1 is a schematic diagram of a hardware operating environment according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart of a first embodiment of a method for optimizing a horizontal federated learning system of the present invention;

fig. 3 is a schematic diagram of a coordinating device dynamically instructing a participating device to send a local model parameter update according to an embodiment of the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

As shown in fig. 1, fig. 1 is a schematic device structure diagram of a hardware operating environment according to an embodiment of the present invention.

It should be noted that, in the embodiment of the present invention, the horizontal federal learning system optimization device may be a smart phone, a personal computer, a server, and the like, which is not limited herein.

As shown in fig. 1, the lateral federal learning system optimization device may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, a communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001.

Those skilled in the art will appreciate that the configuration of the apparatus shown in FIG. 1 does not constitute a limitation on the lateral Federal learning System optimization apparatus, and may include more or fewer components than shown, or some components in combination, or a different arrangement of components.

As shown in FIG. 1, memory 1005, which is one type of computer storage medium, may include an operating system, a network communications module, a user interface module, and a horizontal federated learning system optimization program.

When the device shown in fig. 1 is a coordinating device participating in horizontal federal learning, the user interface 1003 is mainly used for data communication with a client; the network interface 1004 is mainly used for establishing communication connection with each participating device participating in horizontal federal learning; and the processor 1001 may be configured to invoke the horizontal federated learning system optimization program stored in the memory 1005 and perform the following operations:

Further, the step of determining, according to a preset policy, a target type of a local model parameter update sent by each of the participating devices in each round of model update from parameter update types includes:

Further, the federated learning state information includes model convergence state information,

the step of determining the target type of local model parameter update required to be sent by each participating device according to the federal learning state information comprises;

Further, the federal learning status information includes network communication status information of the coordinating device;

Further, the federal learning state information includes a performance index improvement speed of the model,

When the device shown in fig. 1 is a participating device participating in horizontal federal learning, the user interface 1003 is mainly used for data communication with the client; the network interface 1004 is mainly used for establishing communication connection with a coordinating device participating in horizontal federal learning; and the processor 1001 may be configured to invoke the horizontal federated learning system optimization program stored in the memory 1005 and perform the following operations:

Further, the step of determining, from the parameter update types according to a preset policy, a target type of the local model parameter update that needs to be sent to the coordinating device in each round of model update includes:

Further, the step of sending the local model parameter update of the target type to the coordinating device comprises:

Further, the step of performing local training according to the determined target type and sending the local model parameter update of the target type to the coordinating device includes:

Based on the structure, various embodiments of the optimization method of the horizontal federal learning system are provided.

Referring to fig. 2, fig. 2 is a flowchart illustrating a first embodiment of the optimization method for a horizontal federated learning system according to the present invention.

While a logical order is shown in the flow chart, in some cases, the steps shown or described may be performed in an order different than presented herein.

The first embodiment of the optimization method of the horizontal federal learning system is applied to the coordination equipment participating in horizontal federal learning, the coordination equipment is in communication connection with a plurality of participation equipment participating in horizontal federal learning, and the coordination equipment and the participation equipment related to the embodiment of the invention can be equipment such as a smart phone, a personal computer and a server. In this embodiment, the method for optimizing the horizontal federal learning system includes:

step S10, determining a target type of local model parameter update required to be sent by each participating device in each round of model update according to a preset strategy in a parameter update type, wherein the parameter update type comprises model parameter information and gradient information;

in the process of horizontal federal learning, each participating device locally constructs a model to be trained with the same or similar structure, such as a neural network model; and the coordination equipment and the participation equipment are matched with each other to perform multi-round model updating on the model to be trained, wherein the model updating refers to updating the model parameters of the model to be trained. Current horizontal federal learning employs a gradient-mean algorithm for joint model training. Specifically, each participating device sends gradient information obtained by locally training a model to be trained by using local training data to the coordinating device in each model updating round; the gradient information is the gradient information of the model parameters of the model to be trained by the loss function, and the gradient information is used for updating the model parameters; the coordination equipment carries out weighted average on the gradient information sent by each participating equipment to obtain global gradient information, and the global gradient information is sent to each participating equipment so that each participating equipment can update the model parameters of the local model to be trained according to the global gradient information; and then carrying out model updating of the next round. Although the scheme can ensure the convergence of the model and the performance of the model, the participating devices send gradient information to the coordinating device every turn, the communication overhead of each party is high, the communication cost is high, and when the communication condition is poor, the training time is prolonged, and the time cost is increased.

There is also a scheme using a model averaging algorithm. Specifically, each participating device performs local training to calculate gradient information in each model updating, updates model parameters according to the gradient information, and sends the updated model parameters to the coordinating device; the model parameters are parameters of the model to be trained, such as connection weight values among neuron nodes in a neural network; the coordination equipment carries out weighted average on the model parameters sent by each piece of participant equipment to obtain global model parameters, and sends the global model parameters to the participant equipment, so that the participant equipment updates a local model to be trained by adopting the global model parameters, namely the global model parameters are used as the model parameters of the model to be trained; and then carrying out model updating of the next round. In the scheme, each participating device can locally update the model for multiple times and then send the updated model parameters to the coordinating device, so that the communication overhead is reduced; however, this scheme cannot guarantee model convergence, and even convergence cannot guarantee the performance of the model.

In this embodiment, to solve the problem that the existing horizontal federal learning scheme cannot both consider communication overhead and model performance, the following horizontal federal learning system optimization method in this embodiment is proposed.

In this embodiment, the coordinating device and each participating device may establish a communication connection in advance through handshaking and identity authentication, and determine a model to be trained for the federal learning, where the model to be trained may be a machine learning model, such as a neural network model.

The coordination device may determine, from the parameter update types according to a preset policy, a target type of local model parameter update that needs to be sent by each participating device in each round of model update, where the parameter update type includes model parameter information and gradient information. That is, the coordinating device may select whether each participating device transmits model parameter information or gradient information in each round of model update. The preset strategy can be a preset strategy, and different strategies can be configured in the coordination equipment according to different training tasks, requirements on the convergence speed of the model or requirements on the performance of the model.

The preset strategy can be that the coordinating device determines the target types to be sent by the participating devices in one or more rounds of model updating according to various state information in the federal learning process. For example, in one round of model updating, the coordinating device obtains the network communication state, and when the network communication state difference is detected, it is determined that the target type to be sent in the round of model updating or multiple rounds of model updating starting from the round of model updating is the model parameter information, so that each participating device can send the model parameter information after performing multiple local updates, thereby reducing communication traffic and adapting to the network condition.

Step S20, sending indication information indicating the target type to each of the participating devices, so that each of the participating devices performs local training according to the indication information, and returns a local model parameter update of the target type;

and the coordinating equipment sends indication information indicating the target type to each participating equipment. It should be noted that, when the coordination device determines the target type of one round of model update according to the preset policy, the coordination device may carry information indicating the target type that needs to be sent in the one round of model update in the indication information; when the coordinating device determines the target type of the multiple model updates according to the preset strategy, the coordinating device may send the indication information once to indicate the target type to be sent in the subsequent multiple model updates, or may send the indication information in the subsequent multiple model updates to respectively indicate the target type to be sent in each model update.

The indication information may be one bit of indication bit, for example, when the indication bit is a first value, each participating device is instructed to send the model parameter information to the coordinating device; and when the indicating bit is a second value, indicating each participating device to send gradient information to the coordinating device. The one-bit indication information does not introduce a large communication overhead.

The coordinating device may separately send the indication information to each participating device using point-to-point communication, or may simultaneously indicate the information to each participating device using multicast, or broadcast.

The coordinating device may carry the indication information in the global model parameter update, that is, send the indication information to the participating device together with the global model parameter update, or send the indication information to the participating device separately. The global model parameter update is obtained by fusing local model parameter updates sent by each participating device by the coordinating device.

After receiving the indication information sent by the coordinating equipment, each participating equipment carries out local training according to the indication information to obtain local model parameter update of the target type, and sends the local model parameter update of the target type to the coordinating equipment. For example, when the target type is gradient information, the participating device sends the gradient information obtained by local training to the coordinating device as a local model parameter update; and when the target type is the model parameter information, after the participating equipment performs at least one local training, the obtained model parameter information is used as a local model parameter update to be sent to the coordinating equipment.

Step S30, fusing the local model parameter updates of the target type received from each of the participating devices, and sending the global model parameter update obtained by the fusion to each of the participating devices, so that each of the participating devices performs model update according to the global model parameter update.

And the coordination equipment fuses the local model parameter updates of the target types received from the various participating equipment to obtain the global model parameter update. Specifically, the fusion may be to calculate a weighted average of updates of each local model parameter, for example, when the target type is gradient information, the coordination device may calculate an average value of gradient values in each gradient information in a weighted average manner, and take the result as a global model parameter update; when the target type is model parameter information, the coordination device may calculate an average value of model parameters in each model parameter information in a weighted average manner, and update the result as a global model parameter. The weight of each participating device used in the weighted average algorithm may be set in advance according to specific needs, for example, the weight may be set according to a data volume ratio of local training data of each participating device, and a participating device with a large data volume correspondingly sets a higher weight.

The coordinating device sends the global model parameter updates to the respective participating devices. Each participating device performs model updates based on global model parameter updates. Specifically, when the target type is gradient information, all participating devices receive the global gradient information, and the participating devices update the model parameters of the local model to be trained by adopting the global gradient information, namely completing one-round model updating; when the target type is model parameter information, all the participating devices receive the global model parameter information, and the participating devices adopt the global model parameter information as the model parameters of the local model to be trained, namely, one round of model updating is completed.

Fig. 3 is a schematic diagram illustrating a coordinating device dynamically indicating local model parameter updates sent by participating devices. In the figure, participants are participant devices, coordinators are coordinator devices, network parameters are model parameters, and t is a global iteration index.

In this embodiment, the coordination device determines, from the parameter update types according to a preset strategy, a target type of local model parameter update that needs to be sent by each participating device in each round of model update, sends indication information indicating the target type to each participating device, so that each participating device performs local training according to the indication information, and returns local model parameter update of the target type; and fusing the local model parameter updates of the target types received from the participating devices, and sending the global model parameter updates obtained by fusion to the participating devices for the participating devices to update the models according to the global model parameter updates. The embodiment realizes that the coordinating equipment can select the type of the local model parameter update sent by the participating equipment, namely the type of the local model parameter update sent by the participating equipment in each round of model update process can be dynamically adjusted, and a mixed federal average mechanism is realized by combining the respective advantages of a gradient average algorithm and a model average algorithm, so that the communication overhead and the model performance in the transverse federal learning process are considered, namely the model convergence can be ensured to the greatest extent while the communication overhead is reduced, and the model performance is ensured. By combining the advantages of gradient averaging and model averaging, convergence of the model training process can be ensured, and a model with better performance can be trained.

It should be noted that the model to be trained may be a neural network model for credit risk prediction, the input of the neural network model may be feature data of the user, the output may be risk score of the user, the participating device may be a device of multiple banks, each of which locally owns sample data of multiple users, and the coordinating device is a third-party server independent of the multiple banks. And the coordination equipment and each participating equipment train the model to be trained according to the federal learning process in the embodiment to obtain the neural network model finally used for credit risk estimation. And the trained neural network model can be adopted by each bank to estimate the credit risk of the user, and the characteristic data of the user is input into the trained model to obtain the risk score of the user. The coordination equipment determines the target type of local model parameter update required to be sent by each participant equipment in each round of model update according to a preset strategy in the training process, and sends indication information to indicate the participant equipment, and the participant equipment sends the local model parameter update of the target type according to the indication information so as to complete model training; by combining the advantages of the gradient average algorithm, the convergence of the model training process is ensured, and a model with better credit risk estimation capability is obtained; by combining the advantages of the model averaging algorithm, the communication overhead in the training process is reduced, and the cost of the joint training model of each bank is saved.

It should be noted that the model to be trained may also be used in other application scenarios besides credit risk estimation, such as performance level prediction, paper value evaluation, and the like, and the embodiment of the present invention is not limited herein.

Further, based on the first embodiment, a second embodiment of the optimization method for a horizontal federal learning system of the present invention is provided, in this embodiment, the step S10 includes:

step S101, acquiring current federal learning state information in one-round model updating;

the coordinating device may obtain current federal learning status information during a round of model update. Specifically, the coordinator device may obtain current federal learning status information before a round of model update begins. The federal learning state information may include one or more of model convergence state information, network state information of the coordinating device, and network states of the participating devices. The model convergence status information may include information representing the convergence status of the model, such as a model convergence speed or a model jitter value, and the network status information may be information representing the network status, such as a network communication speed.

Step S102, determining the target type of local model parameter update required to be sent by each participating device from parameter update types according to the federal learning state information.

The coordination device can determine the target type of local model parameter update required to be sent by each participating device from the parameter update types according to the acquired federal learning state information. Specifically, there are various methods for determining the target type based on the federal learned state information. For example, when the obtained federal learning state information includes the network communication speed of each participating device, the coordinating device may determine whether the network communication speed of each participating device is greater than a preset speed, where the preset speed may be determined according to the time requirement for model training, and when the training time is relatively tight, a larger speed value may be set; if the target type is greater than the speed, determining that the target type to be sent is gradient information, because the communication speed of each participating device is high, not wasting too much communication time, and by sending the gradient information, enabling the model to be converged and obtaining a model with better performance; if the network communication speed of the participating device is not higher than the speed, the participating device is slow to send the local model parameter update, and in order to reduce the time consumption caused by communication, the coordinating device can determine that the target type to be sent is the model parameter information, so that each participating device can send the model parameter information after carrying out multiple model updates locally, and the communication traffic is reduced.

Further, the federal learning status information includes model convergence status information, and the step S102 includes:

step S1021, when detecting that the model convergence rate in the model convergence state information is smaller than a preset convergence rate, determining that the target type of local model parameter update required to be sent by each piece of participating equipment is model parameter information;

when the acquired federal learning state information includes model convergence state information, the coordination device can detect whether the model convergence speed in the model convergence state information is smaller than a preset convergence speed. The calculation method of the model convergence rate may adopt an existing calculation method of the model convergence rate, and will not be explained in detail here. The preset convergence rate may be set in advance according to specific needs, for example, when there is a limit to the time of model training, the preset convergence rate may be set to be higher. When the model convergence speed is detected to be smaller than the preset convergence speed, the coordination equipment can determine that the target type of local model parameter update needing to be sent by the participating equipment is the model parameter information, so that the model convergence speed is accelerated. And when the model convergence speed is not less than the preset convergence speed, determining that the target type of the local model parameter update needing to be sent by the participating equipment is the gradient information so as to ensure the model performance.

Step S1022, when it is detected that the model convergence jitter value in the model convergence status information is greater than the preset jitter value, it is determined that the target type of local model parameter update that needs to be sent by each of the participating devices is gradient information.

Alternatively, the coordinator device may detect whether the model convergence jitter value in the model convergence status information is greater than a preset jitter value. The calculation method of the model convergence jitter value may adopt an existing jitter value calculation method, and will not be explained in detail here. The preset jitter value can be set in advance according to specific needs. When the model convergence jitter value is detected to be larger than the preset jitter value, the model to be trained is difficult to converge, and at the moment, the coordinating equipment can determine that the target type of local model parameter updating needing to be sent by the participating equipment is gradient information, so that the model can converge. When the model convergence jitter value is not greater than the preset jitter value, it may be determined that the target type of the local model parameter update that the participating device needs to send is the model parameter information, so as to save the communication overhead.

Further, the federal learning status information includes network communication status information of the coordinating device, and the step S102 includes:

step S1023, when detecting that the network communication speed in the network communication state information is less than the preset communication speed, determining that the target type of local model parameter update required to be sent by each piece of participating equipment is model parameter information;

when the acquired federal learning state information includes network communication state information of the coordinating device, the coordinating device may detect whether a network communication speed in the network communication state information is less than a preset communication speed. The preset communication speed may be set in advance according to specific needs, for example, when there is a limit to the time of model training, the preset communication speed may be set higher. When the coordinating device detects that the network communication speed is lower than the preset communication speed, it indicates that the communication condition of the coordinating device is poor, and at this time, the coordinating device may determine that the target type of the local model that needs to be sent by each participating device is model parameter information, so as to reduce communication overhead through multiple local updates of the participating devices.

Step S1024, when the network communication speed is detected to be not less than the preset communication speed, determining that the target type of local model parameter update required to be sent by each piece of participating equipment is gradient information;

when the coordinating device detects that the network communication speed is not less than the preset communication speed, the communication condition of the coordinating device is better, and the coordinating device can determine that the target type of local model parameter update needing to be sent by each participating device is gradient information, so that the model performance is improved under the condition of not increasing the communication overhead.

Further, the federal learning status information includes a performance index improvement speed of the model, and the step S102 includes:

step S1025, when detecting that the performance index lifting speed is less than the preset lifting speed, determining that the target type of local model parameter update needing to be sent by each participating device is gradient information;

when the obtained federal learning state information includes the performance index lifting speed of the model, the coordination device may detect whether the performance index lifting speed is less than a preset lifting speed. The performance index of the model may be one or more of precision, accuracy, recall ratio, and the like, the coordination device may calculate the lifting speed of each performance index, and the specific calculation method may adopt an existing performance index calculation method and a lifting speed calculation method. The preset lifting speed can be set in advance according to specific requirements. When the coordination device detects that the performance index lifting speed is less than the preset lifting speed, specifically, the lifting speeds of the multiple indexes are all less than the preset lifting speed, or one of the indexes is less than the preset prompting speed, which is not limited herein, it indicates that the model performance is lifted very slowly, so that the coordination device can determine that the target type of the local model to be sent by each participating device is gradient information, because the gradient information is helpful for improving the model performance.

Step S1026, when it is detected that the performance index increase speed is not less than the preset increase speed, determining that the target type of the local model parameter update that needs to be sent by each of the participating devices is model parameter information.

When the coordination device detects that the performance index lifting speed is not less than the preset lifting speed, specifically, the lifting speed of one index is not less than the preset lifting speed, or the lifting speeds of a plurality of indexes are not less than the preset lifting speed, which indicates that the model performance lifting speed is high at this time, and in order to save communication overhead, the coordination device can determine that the target type of local model parameter update required to be sent by each participating device is model parameter information.

Further, the federal learning state information acquired by the coordination device may include both the network communication state information of the coordination device and each participating device and the model convergence state information and the performance index promotion speed of the model, at this time, the coordination device may determine the target type by integrating the network communication speed, the model convergence speed, the model jitter value and the performance index promotion speed, specifically, the combination modes are various, and according to different task scenes of federal learning, different combination modes may be selected, so that the model performance obtained by training, the time spent in the training process and the communication overhead meet the requirements corresponding to the task scenes.

In this embodiment, the target type of the local model parameter update that needs to be sent by each participating device in each round of model update is determined by the coordinator device according to the federal learning state information in the federal learning process, so that the coordinator device can dynamically adjust the type of the local model parameter update sent by the participating device according to the specific situation in the training process, thereby adjusting the progress of the model, and improving the performance of the model as much as possible while considering the communication overhead.

Further, based on the first and second embodiments, a third embodiment of the optimization method of the horizontal federal learning system of the present invention is provided, in this embodiment, the optimization method of the horizontal federal learning system is applied to a participating device participating in horizontal federal learning, the participating device is in communication connection with a coordinating device participating in horizontal federal learning, and the coordinating device and the participating device in the embodiment of the present invention may be devices such as a smart phone, a personal computer, and a server. In this embodiment, the method for optimizing the horizontal federal learning system includes the following steps:

step A10, determining a target type of local model parameter update needing to be sent to the coordination equipment in each round of model update from parameter update types according to a preset strategy, wherein the parameter update types comprise model parameter information and gradient information;

The participating device may determine, from the parameter update types according to a preset policy, a target type of the local model parameter update that needs to be sent to the coordinating device in each round of model update. Wherein the parameter update type includes model parameter information and gradient information. That is, the participating devices may select whether to send model parameter information or gradient information in each round of model update. The participating device may determine the target type that the participating device needs to send in one or more rounds of model update according to a preset policy.

The preset strategy can be a preset strategy, and different strategies can be configured in the participating equipment according to different training tasks, requirements on the convergence speed of the model or requirements on the performance of the model. For example, the preset policy may be to determine the target type according to indication information received from the coordinator device. The coordinating device may determine the target type to be sent by the participating device in one or more rounds of model updating according to various state information in the federal learning process, and then send each participating device by carrying the determined target type in the indication information. And the participating equipment extracts the target type from the indication information so as to determine to send the local model parameter update of the target type. When the coordinating device determines the target type according to the communication state information of each participating device, each participating device may obtain the respective communication state information and send the communication state information to the coordinating device, and cooperate with the coordinating device to determine the target type. For example, in one round of model updating, the coordinating device obtains the network communication state of each participating device, and when the network communication state of each participating device is detected to be poor, the target type to be sent in the round of model updating or the multiple rounds of model updating starting from the round of model updating is determined to be model parameter information, so that each participating device can send the model parameter information after performing multiple local updates, thereby reducing communication traffic and adapting to the network condition; and transmits indication information indicating the type of the object to each participating device.

Step A20, performing local training according to the target type, and sending the local model parameter update of the target type to the coordinating device, so that the coordinating device fuses the local model parameter updates of the target type received from each participating device to obtain a global model parameter update;

and the participating equipment carries out local training according to the determined target type to obtain local model parameter update of the target type, and sends the local model parameter update of the target type to the coordinating equipment. And the coordination equipment fuses the local model parameter updates of the target types received by each participating equipment to obtain the global model parameter update. Specifically, the fusion may be to calculate a weighted average of updates of each local model parameter, for example, when the target type is gradient information, the coordination device may calculate an average value of gradient values in each gradient information in a weighted average manner, and take the result as a global model parameter update; when the target type is model parameter information, the coordination device may calculate an average value of model parameters in each model parameter information in a weighted average manner, and update the result as a global model parameter. The weight of each participating device used in the weighted average algorithm may be set in advance according to specific needs, for example, the weight may be set according to a data volume ratio of local training data of each participating device, and a participating device with a large data volume correspondingly sets a higher weight. The coordinating device sends the global model parameter updates to the respective participating devices.

Further, step a20 includes:

step A201, when the target type is gradient information, inputting local training data into a current model, calculating based on model output and a data label to obtain target gradient information, and sending the target gradient information serving as a local model parameter update to the coordination equipment;

when the target type determined by the participating device is the gradient information, the participating device inputs local training data into a current model, namely the current model to be trained, and obtains model output of the current model; and calculating a loss function according to the model output of the current model and the local data label of the participating equipment, and then calculating the gradient of the loss function to the model parameters to obtain target gradient information. The participating devices send the target gradient information as local model parameter updates to the coordinating device.

Step A202, when the target type is model parameter information, local model updating is performed on the current model at least once by using the local training data, and the updated target model parameter information is sent to the coordination device as a local model parameter update.

When the target type determined by the participating device is gradient information, the participating device performs at least one local model update on the current model using local training data. Specifically, taking two local model updates as an example, the participating device inputs local training data into a current model, i.e. a current model to be trained, to obtain a model output of the current model; calculating a loss function according to the output of the model and a local data tag of the participating equipment, calculating the gradient of the loss function to the model parameters, and updating the model parameters by adopting the model parameters correspondingly, namely performing local updating on the current model; and inputting the local training data into the updated model to obtain model output, calculating a loss function, calculating a gradient and updating model parameters by adopting the same method as the first local update, taking the updated model parameters as target model parameters, and sending the target model parameters as local model parameters to the coordination equipment.

Further, the step of sending the local model parameter update of the target type to the coordinating device in step a20 includes:

step a203, sending a message containing the local model parameter update of the target type to the coordinating device, and carrying indication information indicating the target type in the message.

The participating device sends the local model parameter update of the target type to the coordinating device in the form of a message, and the message carries indication information indicating the target type. The indication information may be in the form of an indication bit added to the message to indicate to the device whether the model parameter information or the gradient information is sent. The purpose of adding the indication information by the participating devices is mainly to avoid that the coordinating device misunderstands the content sent by the participating devices because the coordinating device and the participating devices lose synchronization.

Step A30, performing model update according to the global model parameter update received from the coordinating device.

The participating devices receive global model parameter updates from the coordinating device and perform model updates according to the global model parameter updates. Specifically, when the target type is gradient information, all participating devices receive the global gradient information, and the participating devices update the model parameters of the local model to be trained by adopting the global gradient information, namely completing one-round model updating; when the target type is model parameter information, all the participating devices receive the global model parameter information, and the participating devices adopt the global model parameter information as the model parameters of the local model to be trained, namely, one round of model updating is completed.

In the embodiment, the target type of local model parameter update needing to be sent to the coordinating device in each round of model update is determined by the participating device according to a preset strategy; performing local training according to the target type, and returning the local model parameter update of the target type to the coordination equipment for fusion of the coordination equipment to obtain global model parameter update; and updating the model according to the global model parameter updating. The embodiment realizes that the participating devices can select the types of the local model parameter updates sent to the coordinating device, namely the types of the local model parameter updates sent in the updating process of each round of models can be dynamically adjusted, and a mixed federal average mechanism is realized by combining the respective advantages of a gradient average algorithm and a model average algorithm, so that the communication overhead and the model performance in the horizontal federal learning process are considered, namely the model convergence and the model performance can be ensured to the greatest extent while the communication overhead is reduced.

Further, based on the first, second, and third embodiments, a fourth embodiment of the optimization method for a horizontal federal learning system of the present invention is provided, in which step a10 includes:

step A101, in one round of model updating, receiving indication information sent by the coordination equipment, and extracting a target type of local model parameter updating needing to be sent to the coordination equipment from the indication information;

in this embodiment, there are multiple preset strategies, and one of them may be used by the participating devices to determine the target type of local model parameter update that needs to be sent to the coordinating device.

One preset strategy is that in one round of model updating, the participating device may receive indication information sent by the coordinating device, and extract a target type of local model parameter updating that needs to be sent to the coordinating device from the indication information. The coordinating device may determine the target type to be sent by the participating device in one or more rounds of model updating according to various state information in the federal learning process, and then send each participating device by carrying the determined target type in the indication information. And the participating equipment extracts the target type from the indication information so as to determine to send the local model parameter update of the target type. When the coordinating device determines the target type according to the communication state information of each participating device, each participating device may obtain the respective communication state information and send the communication state information to the coordinating device, and cooperate with the coordinating device to determine the target type. For example, in one round of model updating, the coordinating device obtains the network communication state of each participating device, and when the network communication state of each participating device is detected to be poor, the target type to be sent in the round of model updating or the multiple rounds of model updating starting from the round of model updating is determined to be model parameter information, so that each participating device can send the model parameter information after performing multiple local updates, thereby reducing communication traffic and adapting to the network condition; and transmits indication information indicating the type of the object to each participating device.

Step A102, determining a target type of local model parameter update which needs to be sent to the coordinating device in each round of model update from parameter update types according to negotiation information, wherein the participating device negotiates with the coordinating device to obtain the negotiation information;

another preset policy may be that the participating device negotiates with the coordinating device in advance, determines one or more rounds from the parameter update types, or determines a target type of local model parameter update that the participating device needs to send to the coordinating device in each round of model update, and records the target type in the negotiation information. And the participating equipment determines the target type to be sent for updating the model parameters of each round according to the negotiation information. For example, the coordinating device and each participating device may negotiate in advance to send model parameter information when the model of the 3 rd round is updated; and sending gradient information when the model is updated in the 11 th round. By analogy, the target type of each round of model update can be negotiated in advance. In order to ensure that each participating device remains consistent with the coordinating device.

Step A103, determining a target type of local model parameter update which needs to be sent to the coordination equipment in each round of model update in the parameter update types according to a preset rule related to the global iterative index.

Another preset strategy may be that each participating device determines, from the parameter update types, a target type of local model parameter update that needs to be sent to the coordinating device in each round of model parameter update according to a preset rule related to the global iteration index. The global iterative index may be sent by the coordinating device to each participating device at each round of model update, and is used to indicate that each participating device is the model update of the fourth round, so that synchronization can be maintained between each participating device and between the participating device and the coordinating device. The preset rule related to the global iteration index may be preset in each participating device, that is, each participating device adopts the same rule. The preset rule related to the global iteration index t may be set as needed, for example, when the model in the t-th round is updated, the participating device sends model parameter information; and when the t + mu round model is updated, the participating devices send gradient information. Wherein μ is an integer greater than or equal to 1.

Further, the coordinating device and the participating devices may negotiate a random number generator in advance, and determine according to the random number generator and the global iteration index t, and in the t-th round model update, the participating devices need to send the target types. Since all participating devices and coordinating devices use the same random number generator and the same input parameter t, it is ensured that the individual participating devices remain consistent with the coordinating device. For example, in the t-th global model parameter updating, each participating device and the coordinating device generate a number between 0 and 1 according to the random number generator f (t), and if the generated random number is more than 0.5, each participating device sends the neural network model parameters; and otherwise, each participating device sends the gradient information of the neural network model.

It should be noted that, in the case where no coordinating device participates in federal learning, the participating devices may also negotiate with each other whether to exchange model parameter information or gradient information. The technical principle is similar to the scenario described above with the participation of the coordinating device.

In addition, an embodiment of the present invention further provides a computer-readable storage medium, where a horizontal federal learning system optimization program is stored on the storage medium, and when being executed by a processor, the horizontal federal learning system optimization program implements the steps of the horizontal federal learning system optimization method as described below.

For the embodiments of the horizontal federal learning system optimization device and the computer-readable storage medium of the present invention, reference may be made to the embodiments of the horizontal federal learning system optimization method of the present invention, which are not described herein again.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A method for optimizing a horizontal federated learning system is applied to a coordinating device participating in horizontal federated learning, and the coordinating device is in communication connection with each participating device participating in horizontal federated learning, and the method comprises the following steps:

2. The method for optimizing a transverse federated learning system as defined in claim 1, wherein the step of determining, from among types of parameter updates, a target type of local model parameter updates sent by each of the participating devices in each round of model updates according to a preset policy comprises:

3. The method of lateral federated learning system optimization of claim 2, wherein the federated learning state information includes model convergence state information,

4. The method of lateral federated learning system optimization of claim 2, wherein the federated learning state information includes network communication state information of the coordinating device,

5. The method of optimizing a transverse federated learning system of claim 2, wherein the federated learning state information includes a model's performance index ramp rate,

6. A method for optimizing a horizontal federal learning system, the method being applied to a participating device participating in horizontal federal learning, the participating device being communicatively connected to a coordinating device participating in horizontal federal learning, the method comprising:

7. The method for optimizing a transverse federated learning system as set forth in claim 6, wherein the step of determining, from among the types of parameter updates, a target type of local model parameter update that needs to be sent to the coordinating device in each round of model update according to a preset policy comprises:

8. The method of lateral federated learning system optimization of claim 6, wherein the step of sending the target type of local model parameter update to the coordinating device includes:

9. The method of claim 6, wherein the step of locally training according to the determined target type and sending local model parameter updates for the target type to the coordinating device comprises:

10. A lateral federated learning system optimization apparatus, comprising: a memory, a processor, and a lateral federated learning system optimization program stored on the memory and executable on the processor that, when executed by the processor, performs the steps of the lateral federated learning system optimization method of any of claims 1-9.

11. A computer readable storage medium having stored thereon a lateral federal learning system optimization program which, when executed by a processor, performs the steps of a method for lateral federal learning system optimization as claimed in any one of claims 1 to 9.