CN112418444A

CN112418444A - Method and device for league learning and league learning system

Info

Publication number: CN112418444A
Application number: CN202011480599.5A
Authority: CN
Inventors: 林建滨
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2020-05-15
Filing date: 2020-05-27
Publication date: 2021-02-26
Anticipated expiration: 2040-05-27
Also published as: CN112418444B; CN111368984B; CN111368984A

Abstract

Embodiments of the present specification provide an allied learning method. A federation includes at least two first member nodes each having local data and a second member node that maintains a global model. In the method, each first member node acquires a current global model from a second member node; and training the current global model by using local data, decomposing the trained current global model into a plurality of decomposition models according to a model decomposition strategy, and sending the decomposition models to the second member node, wherein the total data quantity of the model parameters of the decomposition models is smaller than that of the model parameters of the current global model. And the second member node performs model reconstruction according to a model reconstruction strategy corresponding to the model decomposition strategy by using the model parameter data of the decomposition model of each first member node, and performs model integration by using the reconstructed current global model at each first member node to obtain a current target global model.

Description

Method and device for league learning and league learning system

The patent application is a divisional application of a patent application with the application number of 202010463531.X entitled "method, device and system for league learning" filed on 27/5/2020.

Technical Field

The embodiments of the present specification relate generally to the field of artificial intelligence, and in particular, to a method, an apparatus, and a system for league learning.

Background

With the development of artificial intelligence technology, business models such as Deep Neural Networks (DNNs) have been increasingly applied to various business application scenarios, such as risk assessment, speech recognition, natural language processing, and the like. In order to achieve better model performance, more data owners are needed to provide more training sample data when performing model training. For example, when the business model is applied to the fields of medicine, finance, and the like, different medical or financial institutions may collect different data samples. Once the data samples are used for league Learning (Federated Learning) of the business model, the model accuracy of the business model can be greatly improved.

Alliance learning is a new artificial intelligence supporting technology, and the aim of the alliance learning is to perform efficient model learning among a plurality of data owners or a plurality of computing nodes on the premise of ensuring the safety of private data (such as terminal data and personal privacy data) of the data owners and meeting the legal regulations.

In an alliance learning scene, a plurality of (two or more) first member nodes in an alliance respectively train a model according to own private data, send model parameters obtained by training to second member nodes, and integrate the model parameters by the second member nodes to obtain a target model. According to the alliance learning mode, after each first member node uses local data to train a service model, each first member node simultaneously sends the trained model data to the second member node, so that network communication blockage occurs at the second member node end, data communication efficiency during alliance learning is poor, and alliance learning efficiency is low. Furthermore, the bandwidth resources at the first member node are limited or precious, and the first member node is unwilling or unable to send a large number of model parameters to the outside.

Disclosure of Invention

In view of the foregoing problems, embodiments of the present specification provide a method, an apparatus, and a system for league learning. By utilizing the method, the device and the system, after model training is completed at each local end, the trained global model is decomposed into a plurality of decomposition models with less data volume and sent to the model owning end, and model reconstruction is carried out at the model owning end, so that the communication data volume among training participants of each model during league learning can be reduced, the communication efficiency during league learning is improved, and the league learning efficiency is further improved.

According to an aspect of embodiments herein, there is provided a method for league learning, the league comprising at least two first member devices, each first member device having locally collected local data for league learning, and a second member device, the second member device maintaining a global model, the method being performed by the first member devices, the method comprising: obtaining a current global model from the second member device; training the current global model using local data; decomposing the trained current global model into a plurality of decomposition models according to a model decomposition strategy, wherein the total data quantity of model parameters of the plurality of decomposition models is smaller than that of the model parameters of the current global model; and sending the model parameter data of the decomposition models to the second member device, wherein the model parameter data of the decomposition models are used by the second member device to reconstruct a current global model of each first member device according to a model reconstruction strategy corresponding to the model decomposition strategy and carry out model integration to obtain a current target global model.

According to another aspect of embodiments herein, there is provided a method for league learning, the league comprising at least two first member devices, each first member device having locally collected local data for league learning, and a second member device, the second member device maintaining a global model, the method being performed by the second member device, the method comprising: providing the current global model to each first member device; obtaining model parameter data of a plurality of decomposition models of a current global model trained at each first member device from the first member device, wherein the model parameter data of the plurality of decomposition models of each first member device is obtained by decomposing a current service model trained by the first member device by using local data according to a model decomposition strategy at the first member device, and the total data amount of the model parameters of the plurality of decomposition models is smaller than the data amount of the model parameters of the current global model; reconstructing a current global model of each first member device according to a model reconstruction strategy corresponding to the model decomposition strategy using model parameter data of a plurality of decomposition models of each first member device; and performing model integration by using the reconstructed current global model of each first member device to obtain a current target global model.

According to another aspect of embodiments herein, there is provided a federation learning method, the federation including at least two first member devices, each first member device having local data collected locally for federation learning, and a second member device, the second member device maintaining a global model, the method comprising: each first member device acquires a current global model from a second member device; at each first member device, training the current global model by using local data, and decomposing the trained current global model into a plurality of decomposition models according to a model decomposition strategy, wherein the total data quantity of model parameters of the plurality of decomposition models is smaller than that of the current global model; the method comprises the steps that a second member device obtains model parameter data of a plurality of decomposition models of a current global model trained at each first member device from the first member devices; and at the second member device, reconstructing the current global model of each first member device according to a model reconstruction strategy corresponding to the model decomposition strategy by using the model parameter data of the plurality of decomposition models of each first member device, and performing model integration by using the reconstructed current global model at each first member device to obtain the current target global model.

According to another aspect of embodiments herein, there is provided an apparatus for league learning, the league comprising at least two first member devices, each first member device having locally collected local data for league learning, and a second member device, the second member device maintaining a global model, the apparatus being applied to the first member devices, the apparatus comprising: a model acquisition unit that acquires a current global model from the second member device; a model training unit for training the current global model using local data; the model decomposition unit is used for decomposing the trained current global model into a plurality of decomposition models according to a model decomposition strategy, wherein the total data quantity of model parameters of the decomposition models is smaller than that of the model parameters of the current global model; and a model data sending unit, configured to send the model parameter data of the multiple decomposition models to the second member device, where the model parameter data of the multiple decomposition models are used by the second member device to reconstruct a current global model of each first member device according to a model reconstruction policy corresponding to the model decomposition policy, and perform model integration to obtain a current target global model.

According to another aspect of embodiments herein, there is provided an apparatus for league learning, the league comprising at least two first member devices, each first member device having locally collected local data for league learning, and a second member device, the second member device maintaining a global model, the apparatus being applied to the second member device, the apparatus comprising: a model providing unit that provides the current global model to each of the first member devices; a model data obtaining unit configured to obtain, from each first member device, model parameter data of a plurality of decomposition models of a current global model trained at the first member device, the model parameter data of the plurality of decomposition models of each first member device being obtained by decomposing, at the first member device, the current global model trained using local data by the first member device according to a model decomposition policy, where a total amount of the model parameters of the plurality of decomposition models is smaller than a data amount of the model parameters of the current global model; a model reconstruction unit that reconstructs a current global model of each first member device according to a model reconstruction policy corresponding to the model decomposition policy, using model parameter data of a plurality of decomposition models of each first member device; and the model integration unit is used for performing model integration by using the reconstructed current global model of each first member device to obtain a current target global model.

According to another aspect of embodiments of the present specification, there is provided an allied learning system including: at least two first member devices, each first member device having locally collected local data for league learning and comprising means for league learning as described above; and a second member device that maintains a global model and includes the apparatus for league learning as described above.

According to another aspect of embodiments of the present specification, there is provided an electronic apparatus including: at least one processor, and a memory coupled with the at least one processor, the memory storing a computer program that, when executed by the at least one processor, causes the at least one processor to perform the method for league learning performed at a first member device as described above.

According to another aspect of embodiments herein, there is provided a computer readable storage medium storing a computer program that, when executed, causes a processor to perform the method for league learning performed at a first member device as described above.

According to another aspect of embodiments of the present specification, there is provided a computer program product comprising a computer program that, when executed, causes a processor to perform the method for league learning performed at a first member device as described above.

According to another aspect of embodiments of the present specification, there is provided an electronic apparatus including: at least one processor, and a memory coupled with the at least one processor, the memory storing a computer program that, when executed by the at least one processor, causes the at least one processor to perform a method for league learning performed at a second member device as described above.

According to another aspect of embodiments herein, there is provided a computer readable storage medium storing a computer program that, when executed, causes a processor to perform the method for league learning performed at a second member device as described above.

According to another aspect of embodiments herein, there is provided a computer program product comprising a computer program that, when executed, causes a processor to perform the method for league learning performed at a second member device as described above.

Drawings

A further understanding of the nature and advantages of the present disclosure may be realized by reference to the following drawings. In the drawings, similar components or features may have the same reference numerals.

FIG. 1 illustrates an example schematic of a league learning system architecture.

Fig. 2 illustrates an example schematic diagram of a league learning system architecture in accordance with an embodiment of the present description.

Fig. 3 illustrates a flow diagram of one example of a league learning process performed at a first member node according to an embodiment of the present description.

Fig. 4A and 4B illustrate example schematic diagrams of a neural network model in accordance with embodiments of the present description.

Fig. 5 illustrates a flow diagram of another example of a league learning process performed at a first member node in accordance with an embodiment of the present description.

Fig. 6 illustrates a flow diagram of a league learning process performed at a second member node according to embodiments of the present description.

Fig. 7 illustrates a block diagram of one example of an apparatus for league learning at a first member node according to embodiments of the present description.

Fig. 8 illustrates a block diagram of another example of an apparatus for league learning at a first member node according to embodiments of the present description.

Fig. 9 illustrates a block diagram of one example of an apparatus for league learning at a second member node according to embodiments of the present description.

Fig. 10 illustrates a schematic diagram of an electronic device for implementing a league learning process at a first member node, in accordance with embodiments of the present description.

Fig. 11 illustrates a schematic diagram of an electronic device for implementing a league learning process at a second member node, in accordance with embodiments of the present description.

Detailed Description

The subject matter described herein will now be discussed with reference to example embodiments. It should be understood that these embodiments are discussed only to enable those skilled in the art to better understand and thereby implement the subject matter described herein, and are not intended to limit the scope, applicability, or examples set forth in the claims. Changes may be made in the function and arrangement of elements discussed without departing from the scope of the disclosure. Various examples may omit, substitute, or add various procedures or components as needed. For example, the described methods may be performed in an order different from that described, and various steps may be added, omitted, or combined. In addition, features described with respect to some examples may also be combined in other examples.

As used herein, the term "include" and its variants mean open-ended terms in the sense of "including, but not limited to. The term "based on" means "based at least in part on". The terms "one embodiment" and "an embodiment" mean "at least one embodiment". The term "another embodiment" means "at least one other embodiment". The terms "first," "second," and the like may refer to different or the same object. Other definitions, whether explicit or implicit, may be included below. The definition of a term is consistent throughout the specification unless the context clearly dictates otherwise.

Fig. 1 illustrates an example schematic of an federated learning system architecture 100.

As shown in fig. 1, the league learning system architecture 100 includes a plurality of data owners 110 and servers 120. In the example shown in fig. 1, the plurality of data owners 110 includes a data owner a, a data owner B, and a data owner C. In other examples, the plurality of data owners 110 may include, for example, two data owners, or more than three data owners.

Each data owner 110 collects data samples for league learning locally, e.g., data owner A collects data sample X_AData owner B collects data sample X_BAnd data owner C collects data sample X_C. The global model W is deployed on the server 120. Each data sample that each data owner has the full dimensional data needed for global model W training can be used to train global model W alone.

The data owners A, B and C use the data samples of the data owners A, B and C with the server 120 to iteratively train the global model W. At each iteration of training, the server 120 provides the global model W to the data owners A, B and C. The data owners A, B and C each train the global model W locally using their respective data samples, thereby respectively training their respective global models W_A、W_BAnd W_C。

Then, the data owners A, B and C respectively train the respective trained global models W_A、W_BAnd W_CTo the server 120. The server 120 according to the subscriptionIntegrating rules to the Global model W_A、W_BAnd W_CAnd performing model integration to obtain an integrated global model which is used as a currently trained target global model. And if the iteration end condition is met, completing the model training. If the iteration end condition is not satisfied, the server 120 provides the currently trained target global model to the data owners A, B and C to perform the next iteration process.

According to the above league learning scheme, after each data owner 110 completes its own local model training, it needs to send the model data of its own trained global model W to the server 120. The data volume of the model data of the actually used global model W is large, for example, the weight matrix of the global model W is, for example, a 1000 × 1000 matrix, and the weight matrix includes 100 ten thousand model parameter values, so that the data volume sent by each data owner 110 to the server 120 is large, which causes network communication congestion at the server 120, and causes poor data communication efficiency during league learning, and further causes low league learning efficiency.

In order to improve data communication efficiency in league learning, the embodiments of the present specification provide a method, an apparatus and a league learning system for league learning. By utilizing the method, the device and the system, after model training is completed at each local end, the trained global model is decomposed into a plurality of decomposition models with less data volume and sent to the model owning end, and model reconstruction is carried out at the model owning end, so that the communication data volume among training participants of each model during league learning can be reduced, the communication efficiency during league learning is improved, and the league learning efficiency is further improved.

The method and the device provided by the embodiment of the specification can be executed by an electronic device, such as a terminal device or a server device. In other words, the method may be performed by software or hardware installed in a terminal device or a server device. The server devices include, but are not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like. The terminal devices include but are not limited to: any one of smart terminal devices such as a smart phone, a Personal Computer (PC), a notebook computer, a tablet computer, an electronic reader, a web tv, and a wearable device.

In the embodiments of the present specification, the term "plurality" means "two or more". The term "first member node" may be a device or device side, such as a terminal device, a server device, etc., for locally collecting model training data samples. The "first member node" may also be referred to as a "data owner". The model to be trained is not deployed on the first member node. The term "second member node" may be a device or a device side, such as a terminal device, a server device, etc., that deploys the model to be trained. In this specification, "second member node" may also be referred to as "server" or "model owner".

In one practical example of application, the second member node may be, for example, a server of a third party payment platform, and each first member node may be, for example, a private data storage server of a different financial institution or medical institution.

In embodiments provided by the present description, the local data of the first member node may include local private data and local non-private data. In this specification, local private data is private data, and cannot be revealed to other member nodes, so that the data cannot be shared in plain text or in its entirety to other member nodes when league learning is performed. Local non-private data refers to local data that can be shared with other member nodes. The local non-private data may be used by other member nodes to form public domain data.

The following describes a federation learning method, an apparatus, and a federation learning system provided by embodiments of the present specification, taking a federation learning system including 3 first member nodes as an example. In other embodiments of the present description, the league learning system may include 2 first member nodes, or more than 3 first member nodes.

Fig. 2 illustrates an example schematic diagram of a league learning system architecture 200 in accordance with an embodiment of the present description.

As shown in fig. 2, league learning system architecture 200 includes a plurality of first member nodes 210 and second member nodes 220. The plurality of first member nodes 210 includes a first member node a, a first member node B, and a first member node C. First member node a, first member node B, first member node C, and second member node 220 may communicate with each other over a network, such as, but not limited to, the internet or a local area network.

Each first member node 210 collects data samples for league learning locally, e.g., first member node A collects data sample X_AFirst member node B collects data samples X_BAnd the first member node C collects the data sample X_C. The global model W is deployed on the second member node 220. Each member node has every data sample with the full dimensional data needed for global model W training, which can be used to train the global model W individually.

The respective first member nodes A, B and C along with the second member node 220 iteratively train the global model W using data samples of the first member nodes A, B and C. At each iteration of training, second member node 120 provides global model W to each of first member nodes A, B and C. First member nodes A, B and C each train global model W locally using respective data samples, thereby respectively training respective global models W_A、W_BAnd W_C。

Each of first member nodes A, B and C is then local to a respective global model W_A、W_BAnd W_CModel decomposition is performed and the respective resulting decomposed model data Wi is provided to the second member node 220. The local training process of each of first member nodes A, B and C will be described in detail later with reference to the drawings.

Upon receiving the respective decomposition model data from each of first member nodes A, B and C, second member node 220 model-reconstructs the decomposition model of each of first member nodes A, B and C to obtain a current global model W trained by each of first member nodes A, B and C_A、W_BAnd W_CThe global model W is then integrated according to predetermined integration rules_A、W_BAnd W_CPerforming model integrationAnd obtaining an integrated global model as a currently trained target global model.

And if the iteration end condition is met, completing the model training. If the iteration end condition is not satisfied, the second member node 220 provides the currently trained target global model to each of the first member nodes A, B and C to perform the next iteration process.

It is noted that in the league learning shown in fig. 2, the league learning is completed using a plurality of iterative processes. In one example, league learning may also be accomplished without using multiple iterative processes, but only once.

Fig. 3 is a flowchart illustrating an example of a federation learning process performed at a first member node according to an embodiment of the present specification, and the first member node a is taken as an example for explanation, and other first member nodes perform the same processing.

As shown in fig. 3, at block 310, the first member node a obtains the current global model W from the second member node 220. In one example, a first member node a may obtain a current global model W by sending a request to a second member node 220. In another example, the second member node 220 may proactively issue the current global model W to the first member node a.

At block 320, the first member node A trains the current global model W using the local data to obtain a trained current global model W_A。

At block 330, the first member node a decomposes the trained current global model into a plurality of decomposition models according to a model decomposition strategy, wherein a total amount of data of model parameters of the plurality of decomposition models is smaller than a data amount of model parameters of the current global model. In embodiments of the present description, the model decomposition policy may include at least one of the following policies: a model decomposition mode; a compression ratio; decomposing the object by the model; and the number of model decompositions per model decomposition object.

The term "model decomposition approach" may refer to a model decomposition algorithm, model decomposition logic, etc. that performs model decomposition on the current global model. For example, in the case of neural network modelsIn this case, the model parameters of the current global model are represented by using a weight matrix, and the model decomposition method may include weight matrix decomposition, i.e., a weight matrix W of the current global model is used_AThe decomposition is a product of two or more decomposition matrices. For example, the weight matrix W of the current global model_ADecomposition into two decomposition matrices W_A1And W_A2Product of, i.e. W_A＝W_A1*W_A2. In other embodiments of the present description, other suitable model decomposition methods may be used to perform model decomposition.

After the model decomposition, a plurality of decomposition models W are obtained_A1And W_A2Is less than the current global model W_AThe data amount of the model parameters of (1). For example, assume the weight matrix W of the current global model_AIs a weight matrix of 1000 x 1000 dimensions, the weight matrix W_AWith 1000 x 1000 ═ 1000,000 weight elements. According to the above model decomposition, the weight matrix W can be decomposed_ADecomposition into two decomposition matrices W_A1And W_A2Multiplication of where W_A1Is a weight matrix of 1000 x 100 dimensions, and W_A2Is a weight matrix of 100 x 1000 dimensions, i.e., W_1000×1000＝W_1000×100×W_100×1000。W_A1Having 1000 x 100-100,000 weight elements, and W_A2Having 100 × 1000-100,000 weight elements, the amount of data sent by the first member node a to the second member node is 200,000 weight elements, which is much smaller than the weight matrix W_A1000,000 weight elements.

The term "compression ratio Rate" may refer to a ratio between the model parameter data amount of the decomposed models and the model parameter data amount of the current global model before decomposition. In the example of weight matrix decomposition shown above, the compression ratio Rate is 200,000/1000,000 is 20%. In the embodiment of the present specification, the compression ratio Rate may be a desired value defined in advance by the user. The compression ratio Rate may be used to determine the model structure of each decomposition model. For example, in the case where the model parameters are characterized using a weight matrix, the compression ratio Rate may be used to determine the matrix dimensions of the respective decomposition matrices. For example, suppose W_AIs W_m×nDecompose it into two decomposition matrices W_A1And W_A2Multiplication of where W_A1Is W_m×ZAnd W_A1Is W_Z×n. In this case, the compression ratio Rate may be used to determine the parameter Z in the above weight matrix decomposition, i.e., the matrix dimension of each decomposition matrix. Specifically, the Rate is (m × Z + Z × n)/(m × n), and in the case where the rates, m, and n are known, the value of Z can be determined.

In the present specification, the term "model decomposition object" refers to a model structure on which model decomposition is performed. In case the current global model comprises a single layer model structure, the model decomposition object refers to the entire model structure of the current global model. In one embodiment of the present description, the current global model may include a multi-layer model structure. In this case, a part of the multi-layered model structure may be selected for model decomposition, while the remaining model structures remain unchanged. Accordingly, a model decomposition object may refer to a partial model structure of a current global model. For example, in the case that the current global model is a neural network model, the neural network model includes an input layer, at least two hidden layers, and an output layer, model decomposition may be performed only on the model structure between the hidden layers, and not on the model structure between the input layer and the hidden layers and the model between the hidden layers and the output layer. In another example, in the case of including more than two hidden layers, it is also possible to perform model decomposition on only part of the hidden inter-layer model structures, rather than all hidden inter-layer model structures. In this specification, the term "hidden inter-layer model structure" may include a weight matrix between two hidden layers.

In this specification, the plurality of decomposition models refer to all partial model structures obtained by decomposing the model of the current global model, and include a model decomposition structure obtained by decomposing the model decomposition object and a partial model structure of the current global model that is not subjected to model decomposition.

In this specification, the term "number of model decompositions per model decomposition object" refers to how many decomposition models each model decomposition object is decomposed into. For example, in the case where the model decomposition object is a weight matrix, the number of model decompositions per model decomposition object refers to how many decomposition matrices the weight matrix is decomposed into are multiplied. Typically, the number of model decompositions is 2. In other embodiments of the present description, the number of model decompositions may also be 3 or more. The number of model decompositions for each model decomposition object may be predetermined or may be determined based on a compression ratio, a model complexity of the model decomposition object, a computational power and/or computational resources of the model training apparatus.

Returning to FIG. 3, at block 340, the first member node A sends model parameter data for the plurality of decomposed models to the second member node for model reconstruction by the second member node. In one example, the model parameter data for the plurality of decomposition models may be transmitted to the second member node in a serial data sequence. In this case, the first member node first serializes the obtained model parameter data of the plurality of decomposition models in series, and identifies the model parameter of each decomposition model with prescribed information in the generated serial data sequence, for example, an end bit is added after the model parameter of each decomposition model to indicate that the data before the end bit is the model parameter of the decomposition model. Alternatively, length field information for specifying the data length of the model parameter of each decomposition model, and the like, may be set in the header of the serial data sequence. In other embodiments of the present description, other suitable manners may be used to distinguish and identify the model parameter data for each decomposition model in the serial data sequence.

And after the second member node receives the model parameter data of the decomposition models sent by each first member node, reconstructing the current global model of each first member node according to the respective model reconstruction strategy corresponding to the model decomposition strategy by using the model parameter data of the plurality of decomposition models of each first member node. For example, in the case of performing model decomposition by using weight matrix decomposition, the second member node may determine a decomposition model corresponding to each decomposed model decomposition object according to a model reconstruction policy, and perform matrix multiplication on each corresponding decomposition model to reconstruct the model decomposition object, thereby obtaining the current global model at each first member node. And then, the second member nodes use the reconstructed current global model at each first member node for model integration to obtain a current target global model.

In the example shown above, the model decomposition policy and the model reconstruction policy may be bound in advance between the first member node and the second member node. For example, a model decomposition policy and a model reconstruction policy may be bound between a first member node and a second member node in a preconfigured manner. Or, the second member node can learn the model decomposition policy of each first member node by a pre-negotiation manner, so that the corresponding model reconstruction policy can be determined.

In another example, the model decomposition policy and the model reconstruction policy may not be pre-bound between the first member node and the second member node. Accordingly, in block 340, the first member node a may send the model parameter data of the plurality of decomposition models and the model decomposition policy to the second member node for model reconstruction by the second member node.

Further, in another example, each first member node may have a model weight. In this case, in block 340, the first member node a may retransmit the model parameter data and model weights for the plurality of decomposition models to the second member node. In one example, the model weight for each first member node may be determined based on a data sample quality for each first member node. For example, the better the data sample quality, the greater the model weight.

Correspondingly, at the second member node, the current global model of each first member node is reconstructed according to a model reconstruction strategy corresponding to the model decomposition strategy by using the model parameter data of the plurality of decomposition models of each first member node, and the reconstructed current global model at each first member node and the corresponding model weight are used for model integration to obtain the current target global model.

It is noted that only one iterative process at the first member node is shown in fig. 3. When the global model training is completed using a multiple iteration process, the process shown in fig. 3 may be performed in a loop until a loop-ending condition is satisfied.

In the embodiments of the present specification, the global model W may include a single-layer model structure, and may also include a multi-layer model structure. For example, the global model W may include a neural network model, such as a Deep Neural Network (DNN) model, a Convolutional Neural Network (CNN) model, or the like. The neural network model may include an input layer, at least two hidden layers, and an output layer. Fig. 4A and 4B illustrate example schematic diagrams of a neural network model in accordance with embodiments of the present description.

In the neural network model shown in fig. 4A, the neural network model includes an input layer, a hidden layer 1, a hidden layer 2, and an output layer. The input layer comprises 4 input nodes, the hidden layer 1 comprises 4 hidden layer nodes, the hidden layer 2 comprises 3 hidden layer nodes, and the output layer comprises 1 output node. The neural network model structure shown in fig. 4A includes a first model structure from an input layer to a hidden layer 1, a second model structure from the hidden layer 1 to a hidden layer 2, and a third model structure from the hidden layer 2 to an output layer. The first model structure may be characterized using a 4 x 4 weight matrix, the second model structure may be characterized using a 4 x 3 weight matrix, and the third model structure may be characterized using a 3 x 1 weight matrix.

In the neural network model shown in fig. 4B, the neural network model includes an input layer, a hidden layer 1, a hidden layer 2, a hidden layer 3, a hidden layer 4, and an output layer. The input layer comprises 4 input nodes, the hidden layer 1 comprises 4 hidden layer nodes, the hidden layer 2 comprises 3 hidden layer nodes, the hidden layer 3 comprises 4 hidden layer nodes, the hidden layer 4 comprises 3 hidden layer nodes, and the output layer comprises 1 output node. The neural network model structure shown in fig. 4B includes a first model structure from an input layer to a hidden layer 1, a second model structure from the hidden layer 1 to the hidden layer 2, a third model structure from the hidden layer 2 to the hidden layer 3, a fourth model structure from the hidden layer 3 to the hidden layer 4, and a fifth model structure from the hidden layer 4 to an output layer. The first model structure may be characterized using a 4 x 4 weight matrix, the second model structure may be characterized using a 4 x 3 weight matrix, the third model structure may be characterized using a 3 x 4 weight matrix, the fourth model structure may be characterized using a 4 x 3 weight matrix, and the fifth model structure may be characterized using a 3 x 1 weight matrix.

Fig. 5 illustrates a flow diagram of another example of a league learning process performed at a first member node in accordance with an embodiment of the present description. In the example shown in fig. 5, the global model is a neural network model having a multi-layer model structure, and the neural network model includes at least three hidden layers. In other embodiments of the present description, the global model may also be other business models having a multi-layer model structure.

As shown in fig. 5, at block 510, the first member node a obtains the current neural network model W from the second member node 220.

At block 520, the first member node A trains the current neural network model W using the local data to obtain a trained current neural network model W_A。

At block 530, the first member node a determines a model decomposition object based on the number of hidden layer nodes of each hidden layer, i.e., determines a hidden interlayer model structure between which hidden layers in the neural network model are to be model decomposition objects. In one example, a hidden interlayer model structure between a hidden layer i and a hidden layer i +1, which satisfies the following formula, may be determined as the model decomposition object:

wherein m is_iNumber of hidden nodes of hidden layer i, m_i+1Number of hidden nodes of hidden layer i +1, m_jNumber of hidden nodes of hidden layer j, m_j+1The number of hidden layer nodes of the hidden layer j +1 is, and N is the total number of the hidden layers of the neural network model.

After determining the model decomposition object as above, at block 540, the first member node A model decomposes the neural network model based on the model decomposition policy. Specifically, the first member node a decomposes each model decomposition object into a number of decomposition models of the model decomposition number specified in the model decomposition policy, each decomposition module having a decomposition model dimension determined based on the compression ratio. Meanwhile, the remaining model structures in the neural network model except for the model decomposition object are kept unchanged. The obtained plurality of decomposition models include a decomposition model of each model decomposition object and a remaining model structure on which model decomposition is not performed. The total data quantity of the model parameters of the obtained decomposition model is smaller than the data quantity of the model parameters of the current global model.

At block 550, the first member node a sends the model parameter data of the plurality of decomposition models and the model decomposition policy to the second member node for model reconstruction by the second member node.

And after the second member node receives the model parameter data of the decomposition models sent by each first member node, reconstructing the neural network model of each first member node according to the respective model reconstruction strategy corresponding to the model decomposition strategy by using the model parameter data of the plurality of decomposition models of each first member node. For example, the second member node may determine, according to the model reconstruction policy, a decomposition model corresponding to each decomposed model decomposition object, and perform matrix multiplication on each corresponding decomposition model to reconstruct the model decomposition object, thereby obtaining the neural network model at each first member node. And then, the second member nodes use the reconstructed current neural network models at the first member nodes for model integration to obtain current target neural network models.

Also, only one iterative process at the first member node is shown in FIG. 5. When the global model training is completed using a multiple iteration process, the process shown in fig. 5 may be performed in a loop until a loop-ending condition is satisfied.

As shown in FIG. 6, at block 610, the second member node provides the current global model to the respective first member nodes. At each first member node, training the current global model by using local data, and decomposing the trained current global model into a plurality of decomposition models according to a model decomposition strategy, wherein the total data quantity of model parameters of the plurality of decomposition models is smaller than that of the model parameters of the current global model.

At block 620, the second member nodes obtain model parameter data for a plurality of decomposition models of the current global model trained at the first member nodes from the respective first member nodes. In another example, the second member node may also obtain corresponding model decomposition policies from respective first member nodes, e.g., in the case where model decomposition policies and model reconstruction policies are not bound between the first member nodes and the second member nodes. In another example, each first member node has a model weight. Accordingly, the second member nodes may also obtain respective model weights from the respective first member nodes.

At block 630, the second member node reconstructs the current global model of each first member node according to a model reconstruction policy corresponding to the model decomposition policy using the model parameter data of the plurality of decomposition models of each first member node.

At block 640, the second member nodes perform model integration using the reconstructed current global models at the respective first member nodes, resulting in a current target global model. And under the condition that each first member node has a model weight, the second member nodes use the reconstructed current global model at each first member node and the respective model weight to carry out model integration so as to obtain the current global model.

At block 650, the second member node determines whether a loop over condition is satisfied. In this specification, the loop end condition may include the number of loops reaching a predetermined number of loops. In another example, the second member node may have local training sample data. Accordingly, the end-of-cycle condition may include the model prediction difference at the second member node being within a predetermined range.

As described above with reference to fig. 1 to 6, a league learning method according to an embodiment of the present specification is described.

By using the league learning method shown in fig. 3, after model training is completed at each local terminal, the trained global model is decomposed into a plurality of decomposition models with smaller data size, and the decomposition models are sent to the model owning terminal, and model reconstruction is performed at the model owning terminal, so that the communication data size between the training participants of each model during league learning can be reduced, thereby improving the communication efficiency during league learning and further improving the league learning efficiency.

In addition, with the federation learning method shown in fig. 3, by binding the model decomposition policy and the model reconstruction policy in advance between the first member node and the second member node, there is no need for the first member node to send model decomposition policy information to the second member node, thereby further reducing the data transmission amount of the first member node and the second member node.

In addition, with the league learning method shown in fig. 3, model integration is performed by giving model weights to the respective first member nodes and using the global models at the respective first member nodes weighted by the model weights, so that the global models after model integration can be made more accurate.

In addition, with the league learning method shown in fig. 3, by giving model weights to each first member node based on the data sample quality of each first member node, the better the data sample quality of the first member node is, the more the contribution is made when performing model integration, so that the global model after model integration is more accurate.

With the league learning method shown in fig. 5, in the case where the global model includes a multilayer model structure, the pattern decomposition object may be determined based on the number of nodes included in each layer of the model structure, so that only a part of the layer of the model structure in the global model may be subjected to model decomposition, and thus the model decomposition complexity at the first member node and the model reconstruction complexity at the second member node may be reduced.

Fig. 7 illustrates a block diagram of one example of an apparatus for league learning (hereinafter referred to as a league learning apparatus) 700 at a first member node according to embodiments of the present description. As shown in fig. 7, the league learning apparatus 700 includes a model acquisition unit 710, a model training unit 720, a model decomposition unit 730, and a model data transmission unit 740.

The model obtaining unit 710 is configured to obtain a current global model from the second member node. The operation of the model acquisition unit 710 may refer to the operation of block 310 described above with reference to fig. 3.

The model training unit 720 is configured to train the current global model using the local data. The operation of the model training unit 720 may refer to the operation of block 320 described above with reference to FIG. 3.

The model decomposition unit 730 is configured to decompose the trained current global model into a plurality of decomposition models according to a model decomposition strategy, wherein a total amount of data of model parameters of the plurality of decomposition models is smaller than a data amount of the model parameters of the current global model. The model decomposition strategy may comprise at least one of the following strategies: a model decomposition mode; a compression ratio; decomposing the object by the model; and the number of model decompositions per model decomposition object. The operation of the model decomposition unit 730 may refer to the operation of block 330 described above with reference to fig. 3.

The model data transmitting unit 740 is configured to transmit model parameter data of the plurality of decomposition models to the second member node. Accordingly, a model decomposition strategy and a model reconstruction strategy are pre-bound between each first member node and each second member node. In another example, the model decomposition policy and the model reconstruction policy may not be pre-bound between the respective first member node and the second member node. Accordingly, the model data transmitting unit 740 may transmit the model parameter data and the model decomposition policy of the plurality of decomposition models to the second member node.

Further, each first member node may have a model weight. Accordingly, the model data transmission unit 740 may transmit the model parameter data and the model weights of the plurality of decomposition models to the second member node.

In one example, the model acquisition unit 710, the model training unit 720, the model decomposition unit 730, and the model data transmission unit 740 operate cyclically until a cycle end condition is satisfied. The loop-ending condition may include the number of loops reaching a predetermined number of loops, or the model prediction difference at the second member node being within a predetermined range.

Fig. 8 shows a block diagram of another example of an apparatus for league learning (hereinafter referred to as a league learning apparatus) 800 at a first member node according to an embodiment of the present description. As shown in fig. 8, the league learning apparatus 800 includes a model acquisition unit 810, a model training unit 820, a decomposition policy determination unit 830, a model decomposition unit 840, and a model data transmission unit 850. The league learning device 800 shown in fig. 8 is suitable for use in a neural network model. The neural network model comprises an input layer, at least three hidden layers and an output layer. Correspondingly, the model decomposition mode comprises weight matrix decomposition, and the model decomposition object comprises a part of hidden interlayer model structure in the neural network model.

The model obtaining unit 810 is configured to obtain a neural network model from the second member nodes. The operation of the model acquisition unit 810 may refer to the operation of block 510 described above with reference to fig. 5.

The model training unit 820 is configured to train the current neural network model using local data. The operation of model training unit 820 may refer to the operation of block 520 described above with reference to FIG. 5.

The decomposition policy determination unit 830 is configured to determine a model decomposition object based on the number of hidden layer nodes of the respective hidden layers. The determined model decomposition object, the model decomposition mode, the compression ratio and the model decomposition number of each model decomposition object form a model decomposition strategy. The operations of the decomposition policy determining unit 830 may refer to the operations of block 530 described above with reference to fig. 5.

The model decomposition unit 840 is configured to decompose the trained current neural network model into a plurality of decomposition models according to a model decomposition strategy, wherein a total amount of data of model parameters of the plurality of decomposition models is smaller than a data amount of the model parameters of the current neural network model. The operation of the model decomposition unit 840 may refer to the operation of block 540 described above with reference to FIG. 5.

The model data transmitting unit 850 is configured to transmit model parameter data and model decomposition policies of the plurality of decomposition models to the second member node. The operation of the model data transmission unit 850 may refer to the operation of block 550 described above with reference to fig. 5.

Further, each first member node may have a model weight. Accordingly, the model data transmission unit 850 may also transmit the model right to the second member node.

Also, in one example, the model acquisition unit 810, the model training unit 820, the decomposition strategy determination unit 830, the model decomposition unit 840, and the model data transmission unit 850 operate in a loop until a loop end condition is satisfied. The loop-ending condition may include the number of loops reaching a predetermined number of loops, or the model prediction difference at the second member node being within a predetermined range.

Fig. 9 illustrates a block diagram of one example of an apparatus for league learning (hereinafter referred to as a league learning apparatus) 900 at a second member node according to embodiments of the present description. As shown in fig. 9, the league learning apparatus 900 includes a model providing unit 910, a model data obtaining unit 920, a model reconstructing unit 930, and a model integrating unit 940.

The model providing unit 910 is configured to provide the current global model to the respective first member nodes. Each first member node trains the current global model by using local data, and decomposes the trained current global model into a plurality of decomposition models according to a model decomposition strategy, wherein the total data quantity of model parameters of the plurality of decomposition models is smaller than that of the model parameters of the current global model. The operation of the model providing unit 910 may refer to the operation of block 610 described above with reference to fig. 6.

The model data acquisition unit 920 is configured to acquire, from each first member node, model parameter data of a plurality of decomposition models of the current global model trained at the first member node. In another example, for example, in a case where the model decomposition policy and the model reconstruction policy are not bound between the first member node and the second member node, the model data obtaining unit 920 is further configured to obtain the corresponding model decomposition policy from each first member node. In another example, each first member node has a model weight. Accordingly, the model data obtaining unit 920 is further configured to obtain respective model weights from the respective first member nodes. The operation of the model data acquisition unit 920 may refer to the operation of block 620 described above with reference to fig. 6.

The model reconstruction unit 930 is configured to reconstruct the current global model of each first member node according to a model reconstruction policy corresponding to the model decomposition policy using the model parameter data of the plurality of decomposition models of each first member node. The operation of the model reconstruction unit 930 may refer to the operation of block 630 described above with reference to fig. 6.

The model integration unit 940 is configured to perform model integration using the reconstructed current global models at the respective first member nodes, resulting in a current target global model.

In one example, the first member node may have a model weight. Accordingly, the model data obtaining unit 920 may also obtain respective model weights from the respective first member nodes. The model integration unit 940 uses the reconstructed current global model at each first member node and the corresponding model weight to perform model integration, so as to obtain a current target global model.

In one example, the model providing unit 910, the model data obtaining unit 920, the model reconstructing unit 930, and the model integrating unit 940 cyclically operate until a cycle end condition is satisfied. The loop-ending condition may include the number of loops reaching a predetermined number of loops, or the model prediction difference at the second member node being within a predetermined range.

As described above with reference to fig. 1 to 9, a league learning method and a league learning apparatus according to an embodiment of the present specification are described. The league learning device above may be implemented in hardware, or may be implemented in software, or a combination of hardware and software.

Fig. 10 illustrates a schematic diagram of an electronic device for implementing a league learning process at a first member node, in accordance with embodiments of the present description. As shown in fig. 10, the electronic device 1000 may include at least one processor 1010, a memory (e.g., non-volatile memory) 1020, a memory 1030, and a communication interface 1040, and the at least one processor 1010, the memory 1020, the memory 1030, and the communication interface 1040 are connected together via a bus 1060. The at least one processor 1010 executes at least one computer-readable instruction (i.e., an element described above as being implemented in software) stored or encoded in memory.

In one embodiment, a computer program is stored in the memory that, when executed, causes the at least one processor 1010 to: obtaining a current global model from a second member node; training the current global model using local data; decomposing the trained current global model into a plurality of decomposition models according to a model decomposition strategy, wherein the total data quantity of model parameters of the plurality of decomposition models is smaller than the data quantity of the model parameters of the current global model; and sending the model parameter data of the plurality of decomposition models to a second member node, wherein at the second member node, the current global model of each first member node is reconstructed according to a model reconstruction strategy corresponding to the model decomposition strategy by using the model parameter data of the plurality of decomposition models of each first member node, and the reconstructed current global model at each first member node is used for model integration to obtain the current target global model.

It should be appreciated that the computer-executable instructions stored in the memory, when executed, cause the at least one processor 1010 to perform the various operations and functions described above in connection with fig. 1-9 in the various embodiments of the present description.

According to one embodiment, a program product, such as a computer-readable medium (e.g., a non-transitory computer-readable medium), is provided. The computer-readable medium may have a computer program (i.e., the elements described above as being implemented in software) that, when executed by a processor, causes the processor to perform various operations and functions described above in connection with fig. 1-9 in the various embodiments of the present specification. Specifically, a system or apparatus may be provided which is provided with a readable storage medium on which software program code implementing the functions of any of the above embodiments is stored, and causes a computer or processor of the system or apparatus to read out and execute instructions stored in the readable storage medium.

Fig. 11 illustrates a schematic diagram of an electronic device for implementing a league learning process at a second member node, in accordance with embodiments of the present description. As shown in fig. 11, electronic device 1100 may include at least one processor 1110, a memory (e.g., non-volatile storage) 1120, a memory 1130, and a communication interface 1140, and the at least one processor 1110, memory 1120, memory 1130, and communication interface 1140 are connected together via a bus 1160. The at least one processor 1110 executes at least one computer-readable instruction (i.e., the elements described above as being implemented in software) stored or encoded in memory.

In one embodiment, a computer program is stored in the memory that, when executed, causes the at least one processor 1110 to: providing a current global model to each first member node, training the current global model by using local data at each first member node, and decomposing the trained current global model into a plurality of decomposition models according to a model decomposition strategy, wherein the total data amount of model parameters of the plurality of decomposition models is smaller than that of the model parameters of the current global model; obtaining model parameter data of a plurality of decomposition models of a current global model trained at each first member node from the first member node; reconstructing a current global model of each first member node according to a model reconstruction strategy corresponding to the model decomposition strategy using model parameter data of the plurality of decomposition models of each first member node; and performing model integration by using the reconstructed current global model at each first member node to obtain a current target global model.

It should be appreciated that the computer programs stored in the memory, when executed, cause the at least one processor 1110 to perform the various operations and functions described above in connection with fig. 1-9 in the various embodiments of the present specification.

In this case, the program code itself read from the readable medium can realize the functions of any of the above-described embodiments, and thus the machine-readable code and the readable storage medium storing the machine-readable code form part of the present invention.

Examples of the readable storage medium include floppy disks, hard disks, magneto-optical disks, optical disks (e.g., CD-ROMs, CD-R, CD-RWs, DVD-ROMs, DVD-RAMs, DVD-RWs), magnetic tapes, nonvolatile memory cards, and ROMs. Alternatively, the program code may be downloaded from a server computer or from the cloud via a communications network.

According to one embodiment, a computer program product is provided that includes a computer program that, when executed by a processor, causes the processor to perform the various operations and functions described above in connection with fig. 1-9 in the various embodiments of the present specification.

It will be understood by those skilled in the art that various changes and modifications may be made in the above-disclosed embodiments without departing from the spirit of the invention. Accordingly, the scope of the invention should be determined from the following claims.

It should be noted that not all steps and units in the above flows and system structure diagrams are necessary, and some steps or units may be omitted according to actual needs. The execution order of the steps is not fixed, and can be determined as required. The apparatus structures described in the above embodiments may be physical structures or logical structures, that is, some units may be implemented by the same physical entity, or some units may be implemented by a plurality of physical entities, or some units may be implemented by some components in a plurality of independent devices.

In the above embodiments, the hardware units or modules may be implemented mechanically or electrically. For example, a hardware unit, module or processor may comprise permanently dedicated circuitry or logic (such as a dedicated processor, FPGA or ASIC) to perform the corresponding operations. The hardware units or processors may also include programmable logic or circuitry (e.g., a general purpose processor or other programmable processor) that may be temporarily configured by software to perform the corresponding operations. The specific implementation (mechanical, or dedicated permanent, or temporarily set) may be determined based on cost and time considerations.

The detailed description set forth above in connection with the appended drawings describes exemplary embodiments but does not represent all embodiments that may be practiced or fall within the scope of the claims. The term "exemplary" used throughout this specification means "serving as an example, instance, or illustration," and does not mean "preferred" or "advantageous" over other embodiments. The detailed description includes specific details for the purpose of providing an understanding of the described technology. However, the techniques may be practiced without these specific details. In some instances, well-known structures and devices are shown in block diagram form in order to avoid obscuring the concepts of the described embodiments.

The previous description of the disclosure is provided to enable any person skilled in the art to make or use the disclosure. Various modifications to the disclosure will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other variations without departing from the scope of the disclosure. Thus, the disclosure is not intended to be limited to the examples and designs described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method for league learning, the league comprising at least two first member devices, each first member device having local data collected locally for league learning, and a second member device, the second member device maintaining a global model, the method being performed by the first member devices, the method comprising:

obtaining a current global model from the second member device;

training the current global model using local data;

decomposing the trained current global model into a plurality of decomposition models according to a model decomposition strategy, wherein the total data quantity of model parameters of the plurality of decomposition models is smaller than that of the model parameters of the current global model; and

and sending the model parameter data of the decomposition models to the second member device, wherein the model parameter data of the decomposition models are used by the second member device to reconstruct a current global model of each first member device according to a model reconstruction strategy corresponding to the model decomposition strategy and carry out model integration to obtain a current target global model.

2. The method of claim 1, wherein the model decomposition policy and the model reconstruction policy are pre-bound between the first and second member devices.

3. The method of claim 1, wherein transmitting model parameter data of the plurality of decomposition models to the second member device comprises:

sending the model parameter data of the plurality of decomposition models and the model decomposition strategy to the second member device.

4. The method of claim 1, wherein the model decomposition strategy comprises at least one of: a model decomposition mode; a compression ratio; decomposing the object by the model; and the number of model decompositions per model decomposition object.

5. The method of claim 4, wherein the global model comprises a neural network model comprising an input layer, at least two hidden layers, and an output layer,

the model decomposition mode comprises weight matrix decomposition, and the model decomposition object comprises a hidden interlayer model structure in the neural network model.

6. The method of claim 5, wherein the neural network model includes at least three hidden layers, the model decomposition object includes a partial hidden interlayer model structure in the neural network model, and the method further comprises:

and determining the model decomposition object based on the number of hidden layer nodes of each hidden layer.

7. The method of claim 6, wherein determining the model decomposition object based on the number of hidden layer nodes for each hidden layer comprises:

determining a hidden interlayer model structure between a hidden layer i and a hidden layer i +1 which satisfy the following formula as the model decomposition object:

8. The method of claim 1, wherein the first member device has a model weight, an

Transmitting the model parameter data of the plurality of decomposition models to the second member device comprises:

transmitting model parameter data and the model weights for the plurality of decomposition models to the second member device,

the model parameter data of the decomposition models are used by the second member device to reconstruct the current global model of each first member device according to the model reconstruction strategy corresponding to the model decomposition strategy, and the model weights are used by the second member device to model-integrate the current global model at each reconstructed first member device to obtain the current target global model.

9. The method of claim 8, wherein the model weight for each first member device is determined based on a data sample quality for each first member device.

10. A method for league learning, the league comprising at least two first member devices, each first member device having local data collected locally for league learning, and a second member device, the second member device maintaining a global model, the method being performed by the second member device, the method comprising:

providing the current global model to each first member device;

obtaining model parameter data of a plurality of decomposition models of a current global model trained at each first member device from the first member device, wherein the model parameter data of the plurality of decomposition models of each first member device is obtained by decomposing a current service model trained by the first member device by using local data according to a model decomposition strategy at the first member device, and the total data amount of the model parameters of the plurality of decomposition models is smaller than the data amount of the model parameters of the current global model;

reconstructing a current global model of each first member device according to a model reconstruction strategy corresponding to the model decomposition strategy using model parameter data of a plurality of decomposition models of each first member device; and

and performing model integration by using the reconstructed current global model of each first member device to obtain a current target global model.

11. The method of claim 10, wherein the current target global model resulting from model integration is provided to each first member device to perform a loop training as the current global model for the next training loop process until a loop end condition is met.

12. The method of claim 10, wherein obtaining model parameter data from each first member device for a plurality of decomposition models of a current global model trained at the first member device comprises:

model parameter data of a plurality of decomposition models of a current global model trained at a respective first member device and the model decomposition strategy are obtained from the first member device.

13. The method of claim 10, wherein the model decomposition strategy comprises at least one of: a model decomposition mode; a compression ratio; decomposing the object by the model; and the number of model decompositions per model decomposition object.

14. The method of claim 10, wherein the first member device has a model weight, an

Obtaining model parameter data for a plurality of decomposition models of a current global model trained at a respective first member device from the first member device includes:

obtaining model parameter data and the model weights of a plurality of decomposition models of a current global model trained at a respective first member device from the first member device,

performing model integration by using the reconstructed current global model at each first member device, and obtaining a current target global model includes:

and performing model integration by using the reconstructed current global model at each first member device and the corresponding model weight to obtain a current target global model.

15. The method of claim 11, wherein the end-of-loop condition comprises:

the cycle number reaches the preset cycle number; or

The model at the second member device predicts a difference within a predetermined difference range.

16. A federation learning method, the federation including at least two first member devices, each first member device having local data collected locally for federation learning, and a second member device, the second member device maintaining a global model, the method comprising:

each first member device acquires a current global model from a second member device;

at each first member device, training the current global model by using local data, and decomposing the trained current global model into a plurality of decomposition models according to a model decomposition strategy, wherein the total data quantity of model parameters of the plurality of decomposition models is smaller than that of the current global model;

the method comprises the steps that a second member device obtains model parameter data of a plurality of decomposition models of a current global model trained at each first member device from the first member devices; and

and at the second member equipment, reconstructing the current global model of each first member equipment according to a model reconstruction strategy corresponding to the model decomposition strategy by using the model parameter data of the decomposition models of each first member equipment, and performing model integration by using the reconstructed current global model at each first member equipment to obtain the current target global model.

17. A league learning method as claimed in claim 16, wherein the current target global model resulting from model integration is provided to each first member device to perform a loop training as the current global model of the next training loop process until a loop end condition is met.

18. An apparatus for league learning, the league comprising at least two first member devices, each first member device having locally collected local data for league learning, and a second member device, the second member device maintaining a global model, the apparatus being applied to the first member devices, the apparatus comprising:

a model acquisition unit that acquires a current global model from the second member device;

a model training unit for training the current global model using local data;

the model decomposition unit is used for decomposing the trained current global model into a plurality of decomposition models according to a model decomposition strategy, wherein the total data quantity of model parameters of the decomposition models is smaller than that of the model parameters of the current global model; and

and the model data sending unit is used for sending the model parameter data of the decomposition models to the second member equipment, and the model parameter data of the decomposition models are used by the second member equipment to reconstruct the current global model of each first member equipment according to the model reconstruction strategy corresponding to the model decomposition strategy and carry out model integration to obtain the current target global model.

19. The apparatus of claim 18, wherein the model data transmitting unit transmits the model parameter data of the plurality of decomposition models and the model decomposition policy to the second member device.

20. The apparatus of claim 18, wherein the model decomposition strategy comprises at least one of: a model decomposition mode; a compression ratio; decomposing the object by the model; and the number of model decompositions per model decomposition object.

21. The apparatus of claim 20, wherein the global model comprises a neural network model comprising an input layer, at least two hidden layers, and an output layer,

22. The apparatus of claim 21, wherein the neural network model includes at least three hidden layers, the model decomposition object includes a partial hidden interlayer model structure in the neural network model, and the apparatus further comprises:

and the decomposition strategy determining unit is used for determining the model decomposition object based on the number of hidden layer nodes of each hidden layer.

23. The apparatus of claim 18, wherein the first member device has model weights, and the model data transmitting unit transmits the model parameter data and the model weights of the plurality of decomposition models to the second member device.

24. An apparatus for league learning, the league comprising at least two first member devices, each first member device having locally collected local data for league learning, and a second member device, the second member device maintaining a global model, the apparatus being applied to the second member device, the apparatus comprising:

a model providing unit that provides the current global model to each of the first member devices;

a model data obtaining unit configured to obtain, from each first member device, model parameter data of a plurality of decomposition models of a current global model trained at the first member device, the model parameter data of the plurality of decomposition models of each first member device being obtained by decomposing, at the first member device, the current global model trained using local data by the first member device according to a model decomposition policy, where a total amount of the model parameters of the plurality of decomposition models is smaller than a data amount of the model parameters of the current global model;

a model reconstruction unit that reconstructs a current global model of each first member device according to a model reconstruction policy corresponding to the model decomposition policy, using model parameter data of a plurality of decomposition models of each first member device; and

and the model integration unit is used for performing model integration by using the reconstructed current global model of each first member device to obtain a current target global model.

25. The apparatus of claim 24, wherein the model data acquisition unit acquires, from each first member device, model parameter data of a plurality of decomposition models of a current global model trained at the first member device and the model decomposition strategy.

26. The apparatus of claim 24, wherein the first member devices have model weights, and the model data acquisition unit acquires, from each first member device, model parameter data and the model weights of a plurality of decomposition models of a current global model trained at the first member device,

and the model integration unit performs model integration by using the reconstructed current global model of each first member device and the corresponding model weight to obtain a current target global model.

27. An league learning system comprising:

at least two first member devices, each having locally collected local data for league learning and comprising the apparatus of any one of claims 18 to 23; and

a second member device that maintains a global model and that includes the apparatus of any of claims 24 to 26.

28. An electronic device, comprising:

at least one processor, and

a memory coupled with the at least one processor, the memory storing a computer program that, when executed by the at least one processor, causes the at least one processor to perform the method of any of claims 1 to 9.

29. A computer readable storage medium storing executable instructions that when executed cause a processor to perform the method of any of claims 1 to 9.

30. A computer program product comprising a computer program that, when executed, causes a processor to perform the method of any one of claims 1 to 9.

31. An electronic device, comprising:

at least one processor, and

a memory coupled with the at least one processor, the memory storing a computer program that, when executed by the at least one processor, causes the at least one processor to perform the method of any of claims 10 to 15.

32. A computer-readable storage medium storing a computer program which, when executed, causes a processor to perform the method of any one of claims 10 to 15.

33. A computer program product comprising a computer program that, when executed, causes a processor to perform the method of any of claims 10 to 15.