CN116226779A

CN116226779A - Method, device and system for aggregating joint learning parameters

Info

Publication number: CN116226779A
Application number: CN202111440144.5A
Authority: CN
Inventors: 杜炎; 王瑞杨
Original assignee: Xinzhi I Lai Network Technology Co ltd
Current assignee: Xinzhi I Lai Network Technology Co ltd
Priority date: 2021-11-29
Filing date: 2021-11-29
Publication date: 2023-06-06
Also published as: WO2023093229A1

Abstract

The disclosure relates to the technical field of machine learning, and provides a method, a device and a system for aggregating joint learning parameters. The method comprises the following steps: acquiring hidden layer parameters and batch standardized layer parameters uploaded by N participants; aggregating hidden layer parameters uploaded by each participant to obtain a first aggregation parameter; polymerizing the batch of standardized layer parameters uploaded by each participant to obtain a second polymerization parameter; the first aggregation parameter and the second aggregation parameter are returned to the respective participants so that each participant adjusts and optimizes its algorithm model according to the first aggregation parameter and the second aggregation parameter. The method and the device can comprehensively consider the characteristics of different network layers in the network structure of the algorithm model of each participant, and can pertinently and respectively aggregate the parameters of the different network layers, so that the aggregated parameters are returned to each participant, the parameters of the algorithm model of each participant can be adjusted according to the returned aggregated parameters, and the convergence speed and generalization capability of the algorithm model of each participant are improved.

Description

Method, device and system for aggregating joint learning parameters

Technical Field

The disclosure relates to the technical field of machine learning, and in particular relates to a method, a device and a system for aggregating joint learning parameters.

Background

Along with the increase of the number of layers of the deep learning network model, the number of hidden layers is increased, and in the training process, parameters of each hidden layer are changed along with the increase of the number of layers of the deep learning network model, so that the input distribution of the hidden layers is always changed, the convergence speed of model learning is reduced, and the generalization capability of the model is even affected. According to related researches, the input of each layer of network is standardized, namely Batch Normalization (batch standardization, hereinafter referred to as 'BN') can reduce the variance deviation inside the network to a certain extent so as to cause the change of input distribution, the convergence of a model is accelerated, and the model has better generalization capability.

The lateral joint learning of the network model (provided with BN layers) based on deep learning generally comprises a plurality of participants, each of which will upload the parameters trained by the respective participant to a server (central node), and then the server will aggregate the parameters of the respective participants and return the aggregated parameters to the respective participants, so that the respective participants adjust their parameters according to the returned aggregated parameters, thereby optimizing their model.

However, in the prior art, parameters of each party are aggregated by directly averaging or weighting the parameters uploaded by each party by a server, and then the aggregated parameters are returned to each party. Obviously, the aggregation mode does not consider the characteristics of different network layers of the network models of all the participants, but all the participants adjust the algorithm models according to the aggregation parameters returned by the server, so that the expected convergence speed of the accelerating algorithm model can not be achieved, and the generalization capability of the algorithm model is improved.

Disclosure of Invention

In view of this, the embodiments of the present disclosure provide a method, an apparatus, and a system for aggregating joint learning parameters, so as to solve the problem that the existing method for aggregating joint learning parameters cannot well help each participant to accelerate the convergence speed and generalization ability of its algorithm model.

In a first aspect of an embodiment of the present disclosure, a method for aggregating joint learning parameters is provided, including:

acquiring hidden layer parameters and batch standardized layer parameters uploaded by N participants, wherein the batch standardized layer parameters comprise a mean value, a variance, a minimum batch number, a first telescopic change parameter and a second telescopic change parameter, and N is a positive integer more than or equal to 2;

aggregating hidden layer parameters uploaded by each participant to obtain a first aggregation parameter;

polymerizing the batch of standardized layer parameters uploaded by each participant to obtain a second polymerization parameter;

the first aggregation parameter and the second aggregation parameter are returned to the respective participants so that each participant adjusts and optimizes its algorithm model according to the first aggregation parameter and the second aggregation parameter.

In a second aspect of the embodiments of the present disclosure, there is provided a joint learning parameter aggregation apparatus, including:

the parameter acquisition module is configured to acquire hidden layer parameters and batch standardized layer parameters uploaded by N participants, wherein the batch standardized layer parameters comprise a mean value, a variance, a minimum batch number, a first telescopic change parameter and a second telescopic change parameter, and N is a positive integer more than or equal to 2;

The first aggregation module is configured to aggregate hidden layer parameters uploaded by each participant to obtain first aggregation parameters;

the second aggregation module is configured to aggregate the batch standardization layer parameters uploaded by each participant to obtain second aggregation parameters;

and the parameter returning module is configured to return the first aggregation parameter and the second aggregation parameter to each participant so that each participant adjusts and optimizes an algorithm model of the participant according to the first aggregation parameter and the second aggregation parameter.

In a third aspect of embodiments of the present disclosure, there is provided a joint learning parameter aggregation system, including:

the server comprises the joint learning parameter aggregation device; and N participants communicatively connected to the server.

Compared with the prior art, the beneficial effects of the embodiment of the disclosure at least comprise: the method comprises the steps of obtaining hidden layer parameters and batch standardized layer parameters uploaded by N participants, wherein the batch standardized layer parameters comprise a mean value, a variance, a minimum batch number, a first telescopic change parameter and a second telescopic change parameter, and N is a positive integer which is more than or equal to 2; aggregating hidden layer parameters uploaded by each participant to obtain a first aggregation parameter; polymerizing the batch of standardized layer parameters uploaded by each participant to obtain a second polymerization parameter; the first aggregation parameters and the second aggregation parameters are returned to each participant, so that each participant adjusts and optimizes the algorithm model according to the first aggregation parameters and the second aggregation parameters, the characteristics of different network layers in the network structure of the algorithm model of each participant can be comprehensively considered, the parameters of different network layers can be pertinently and respectively aggregated, the aggregation parameters are returned to each participant, the parameters of the algorithm model of each participant can be adjusted according to the returned aggregation parameters, and the convergence speed and the generalization capability of the algorithm model of each participant are improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings that are required for the embodiments or the description of the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and other drawings may be obtained according to these drawings without inventive effort for a person of ordinary skill in the art.

FIG. 1 is a schematic diagram of a joint learning architecture according to an embodiment of the present disclosure;

FIG. 2 is a flow chart of a method for aggregating joint learning parameters according to an embodiment of the disclosure;

fig. 3 is a schematic diagram of a network structure of an algorithm model of a participant in a joint learning parameter aggregation method according to an embodiment of the disclosure;

fig. 4 is a schematic structural diagram of a joint learning parameter aggregation apparatus according to an embodiment of the present disclosure;

FIG. 5 is a schematic structural diagram of a joint learning parameter aggregation system according to an embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system configurations, techniques, etc. in order to provide a thorough understanding of the disclosed embodiments. However, it will be apparent to one skilled in the art that the present disclosure may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present disclosure with unnecessary detail.

The joint learning refers to comprehensively utilizing a plurality of AI (Artificial Intelligence ) technologies on the premise of ensuring data safety and user privacy, jointly excavating data value by combining multiparty cooperation, and promoting new intelligent business states and modes based on joint modeling. The joint learning has at least the following characteristics:

(1) The participating nodes control the weak centralized joint training mode of the own data, so that the data privacy safety in the co-creation intelligent process is ensured.

(2) Under different application scenes, a plurality of model aggregation optimization strategies are established by utilizing screening and/or combination of an AI algorithm and privacy protection calculation so as to obtain a high-level and high-quality model.

(3) On the premise of ensuring data safety and user privacy, a method for improving the efficiency of the joint learning engine is obtained based on a plurality of model aggregation optimization strategies, wherein the efficiency method can be used for improving the overall efficiency of the joint learning engine by solving the problems of information interaction, intelligent perception, exception handling mechanisms and the like under a large-scale cross-domain network with parallel computing architecture.

(4) The requirements of multiparty users in all scenes are acquired, the real contribution degree of all joint participants is determined and reasonably evaluated through a mutual trust mechanism, and distribution excitation is carried out.

Based on the mode, AI technical ecology based on joint learning can be established, the industry data value is fully exerted, and the scene of the vertical field is promoted to fall to the ground.

A method, apparatus and system for aggregating joint learning parameters according to embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings.

Fig. 1 is a schematic diagram of a joint learning architecture according to an embodiment of the present disclosure. As shown in fig. 1, the architecture of joint learning may include a server (central node) 101, as well as

participants

102, 103, and 104.

In the joint learning process, a basic model may be established by the server 101, and the server 101 transmits the model to the

participants

102, 103, and 104 with which a communication connection is established. The basic model may also be uploaded to the server 101 after any party has established, and the server 101 sends the model to the other parties with whom it has established a communication connection. The

participants

102, 103 and 104 construct a model according to the downloaded basic structure and model parameters, perform model training using local data, obtain updated model parameters, and encrypt and upload the updated model parameters to the server 101. Server 101 aggregates the model parameters sent by

participants

102, 103, and 104 to obtain global model parameters, and transmits the global model parameters back to

participants

102, 103, and 104. Participant 102, participant 103 and participant 104 iterate the respective models according to the received global model parameters until the models eventually converge, thereby enabling training of the models. In the joint learning process, the data uploaded by the

participants

102, 103 and 104 are model parameters, local data is not uploaded to the server 101, and all the participants can share final model parameters, so that common modeling can be realized on the basis of ensuring data privacy. It should be noted that the number of participants is not limited to the above three, but may be set as needed, and the embodiment of the present disclosure is not limited thereto.

Fig. 2 is a flow chart of a method for aggregating joint learning parameters according to an embodiment of the disclosure. The joint learning parameter aggregation method of fig. 2 may be performed by the server 101 of fig. 1. As shown in fig. 2, the joint learning parameter aggregation method includes:

step S201, acquiring hidden layer parameters and batch standardization layer parameters uploaded by N participants, wherein the batch standardization layer parameters comprise a mean value, a variance, a minimum batch number, a first expansion change parameter and a second expansion change parameter, and N is a positive integer more than or equal to 2.

Wherein, the hidden layer parameters refer to parameters of hidden layers in a network structure (for example, a neural network structure) of a basic model adopted by each participant, and include a weight W and a bias b of each hidden layer in the network structure.

Batch normalization layer parameters, which refer to parameters of batch normalization layers (i.e., BN layers) in a network structure (e.g., neural network structure) of a basic model employed by each party, include the mean value E of each BN layer in the network structure _x Variance Var _x A minimum lot number m, a first telescoping parameter y and a second telescoping parameter β.

As an example, each participant may establish a communication connection with a server through a terminal device (e.g., a smart phone, a personal computer, etc.), and upload its hidden layer parameters, which may be standardized layer parameters in batch.

Step S202, aggregation is carried out on hidden layer parameters uploaded by each participant to obtain first aggregation parameters.

As an example, let n=2, namely, there are two parties, denoted as party a and party B, and the network structure of the basic model adopted by parties a and B is a three-layer network, which is an input layer, (batch normalization layer, BN layer) hidden layer, and an output layer in sequence.

First, the hidden layer parameters uploaded by the party A (comprising the weight W _a And bias b _a ) Batch normalization layer parameters (including mean value of its BN layer

Variance->

Minimum lot number m _a First telescoping change parameter gamma _a And a second telescoping variable parameter beta _a ) And hidden layer parameters uploaded by party B (including weight W _b And bias b _b ) Batch normalization layer parameters of their BN layer (including mean +.>

Variance of/>

Minimum lot number m _b First telescoping change parameter gamma _b And a second telescoping variable parameter beta _b )。

Next, the hidden layer parameters of the party a and the party B are aggregated, which may be specifically calculated by calculating the hidden layer parameter weight W of the party a _a Bias b _a Hidden layer parameter weight W with party B _b Bias b _b I.e. the aggregation of hidden layer parameters of the two is completed to obtain a first aggregation parameter (i.e. weight W _a Bias b _a And weight W _b Bias b _b Or a weighted average) of (a) in the set.

Step S203, aggregating the batch of standardized layer parameters uploaded by each participant to obtain a second aggregated parameter.

In combination with the above example, batch normalization layer parameters for party a and party B are aggregated, specifically for party a's batch normalization layer parameter mean

Variance->

Minimum lot number m _a First telescoping change parameter gamma _a And a second telescoping variable parameter beta _a Batch normalization layer parameter mean +.>

Variance->

Minimum lot number m _b First telescoping change parameter gamma _b And a second telescoping variable parameter beta _b And polymerizing to obtain a second polymerization parameter.

Step S204, the first aggregation parameter and the second aggregation parameter are returned to each participant, so that each participant adjusts and optimizes the algorithm model according to the first aggregation parameter and the second aggregation parameter.

In combination with the above example, after aggregating the hidden layer parameters and the batch normalization layer parameters uploaded by the party a and the party B according to the above steps, the server 101 returns the first aggregation parameter and the second aggregation parameter to the party a and the party B, respectively. At this time, after receiving the first aggregation parameter and the second aggregation parameter returned by the server 101, the participant a and the participant B may update and adjust the hidden layer parameter in the network model by using the first aggregation parameter, update and adjust the batch of parameters of the standardized layer (BN layer) in the network model by using the second aggregation parameter, then use the network model after updating the parameters to continue training, and repeat the above parameter aggregation update step after training the next batch of training data until the algorithm model reaches the preset iteration number, so as to obtain the trained algorithm model.

According to the technical scheme provided by the embodiment of the disclosure, hidden layer parameters and batch standardization layer parameters uploaded by N participants are obtained, wherein the batch standardization layer parameters comprise a mean value, a variance, a minimum batch number, a first expansion change parameter and a second expansion change parameter, and N is a positive integer more than or equal to 2; aggregating hidden layer parameters uploaded by each participant to obtain a first aggregation parameter; polymerizing the batch of standardized layer parameters uploaded by each participant to obtain a second polymerization parameter; the first aggregation parameters and the second aggregation parameters are returned to each participant, so that each participant adjusts and optimizes the algorithm model according to the first aggregation parameters and the second aggregation parameters, the characteristics of different network layers in the network structure of the algorithm model of each participant can be comprehensively considered, the parameters of different network layers can be pertinently and respectively aggregated, the aggregation parameters are returned to each participant, the parameters of the algorithm model of each participant can be adjusted according to the returned aggregation parameters, and the convergence speed and the generalization capability of the algorithm model of each participant are improved.

In some embodiments, the network structure of the algorithm models of the N participants is the same, the network structure including an input layer, a batch normalization layer, a hidden layer, and an output layer.

The step S202 includes:

and aggregating hidden layer parameters of the same hidden layer of each participant to obtain a first aggregation parameter, wherein the first aggregation parameter comprises at least one hidden layer aggregation parameter.

As an example, let n=2, there are two participants, namely, a participant a and a participant B, wherein the network structures of the algorithm models of the participant a and the participant B are each a 4-layer neural network structure, and the structural schematic diagram of the neural network structure is shown in fig. 3. Referring to fig. 3, the neural network structure of the participant a includes an input layer a, a first BN layer a, a first hidden layer a, a second BN layer a, a second hidden layer a, and an output layer a; the neural network structure of the participant B comprises an input layer B, a first BN layer B, a first hidden layer B, a second BN layer B, a second hidden layer B and an output layer B. The first hidden layer a and the first hidden layer B are first hidden layers (both belong to the same hidden layer) of the participant a and the participant B, and the second hidden layer a and the second hidden layer B are second hidden layers (both belong to the same hidden layer) of the participant a and the participant B.

The hidden layer parameters of the same hidden layer of each participant are aggregated, specifically, the hidden layer parameters of a first hidden layer A of the participant A and a first hidden layer B of the participant B are aggregated to obtain hidden layer aggregation parameters 01; and aggregating hidden layer parameters of a second hidden layer A of the participant A and a second hidden layer B of the participant B to obtain hidden layer aggregation parameters 02. The first aggregation parameters here include hidden layer aggregation parameter 01 and hidden layer aggregation parameter 02.

As an example, assume that the hidden layer parameter of the first hidden layer a of the participant a is a weight W _a1 Bias b _a1 The hidden layer parameter of the second hidden layer A is weight W _a2 Bias b _a2 The method comprises the steps of carrying out a first treatment on the surface of the The hidden layer parameter of the first hidden layer B of the participant B is weight W _b1 Bias b _b1 The hidden layer parameter of the second hidden layer B is weight W _b2 Bias b _b2 。

Specifically, the aggregation process of hidden layer parameters of party a and party B is as follows:

first, calculate Party A and Party BWeight mean of first hidden layer

At the same time, the weight mean value of the second hidden layer of party A and party B is calculated +.>

Second, calculate the bias mean of the first hidden layer of Party A and Party B

At the same time, calculate the bias mean value of the second hidden layer of party A and party B +.>

From the above, the first hidden layer aggregation parameters of party A and party B are

And->

The second hidden layer polymerization parameter is +.>

And->

The first aggregation parameter includes a firstHidden layer aggregation parameters and second hidden layer aggregation parameters.

It will be appreciated that assuming N participants (N is a positive integer greater than or equal to 2) each having a network structure with K hidden layers (K is a positive integer greater than or equal to 1), then the formula can be followed

Calculating to obtain the weight average value of each hidden layer of all the participants; according to the formula- >

And calculating to obtain the bias mean value of each hidden layer of all the participants, and further obtaining the first aggregation parameters of the hidden layers of all the participants.

In some embodiments, the step S203 includes:

and polymerizing the batch of standardized layer parameters of the same batch of standardized layers of each participant to obtain a second polymerization parameter, wherein the second polymerization parameter comprises at least one batch of standardized layer polymerization parameters, and the batch of standardized layer polymerization parameters comprise a first batch of standardized layer polymerization parameters, a second batch of standardized layer polymerization parameters and a third batch of standardized layer polymerization parameters.

With reference to the above example, with reference to fig. 3, batch normalization layer parameters of the same batch normalization layer of each participant are aggregated, specifically, batch normalization layer parameters of a first batch normalization layer a of a participant a and a first batch normalization layer B of a participant B are aggregated, so as to obtain a batch normalization layer aggregation parameter 01; and polymerizing the batch standardization layer parameters of the second batch standardization layer A of the participant A and the second batch standardization layer B of the participant B to obtain batch standardization layer polymerization parameters 02. The second aggregation parameters here include a batch normalized layer aggregation parameter 01 and a batch normalized layer aggregation parameter 02.

In some embodiments, the step of aggregating the batch normalization layer parameters of the same batch normalization layer of each participant to obtain the second aggregation parameter specifically includes:

The first telescopic change parameters and the second telescopic change parameters of the same batch of standardized layers of each participant are aggregated to obtain first batch of standardized layer aggregation parameters;

polymerizing the average value of the same batch of standardized layers of each participant to obtain a second batch of standardized layer polymerization parameters;

and aggregating the variances of the same batch of standardized layers of each participant to obtain a third batch of standardized layer aggregation parameters.

As an example, the first normalization layer aggregation parameter may be calculated as follows:

and calculating the average value of the first telescopic change parameters of the same batch of standardized layers of each participant and the average value of the second telescopic change parameters of the same batch of standardized layers of each participant to obtain the aggregation parameters of the first batch of standardized layers.

As an example, in combination with the above example, it is assumed that the first scaling parameters of the first normalization layer a of the party a and the first normalization layer B of the party B are γ respectively _a1 And gamma _b1 The second expansion and contraction change parameters are respectively beta _a1 And beta _b1 The method comprises the steps of carrying out a first treatment on the surface of the The first expansion and contraction change parameters of the second standardization layer A of the participator A and the second standardization layer B of the participator B are gamma respectively _a2 And gamma _b2 The second expansion and contraction change parameters are respectively beta _a2 And beta _b2 。

Then, the formula can be calculated

Calculating the mean value of the first expansion change parameters of the first standardization layer A of the participant A and the first expansion change parameters of the first standardization layer B of the participant B>

According to the formula->

Calculating the mean value of the first expansion change parameters of the second standardization layer A of the participant A and the second standardization layer B of the participant B>

According to the formula->

Calculating the second expansion change parameter mean value +_of the first standardization layer A of the participant A and the first standardization layer B of the participant B>

According to the formula->

Calculating the second expansion change parameter mean value +_of the second standardization layer A of the participant A and the second standardization layer B of the participant B>

From the above, the first normalization layer aggregation parameters of the first normalization layers of Party A and Party B are

And->

The first normalization layer of the second normalization layer has an aggregation parameter of +.>

And->

It can be appreciated that, assuming that there are N participants (N is a positive integer greater than or equal to 2), each participant has a network structure with P layers of batch normalization layers (P is a positive integer greater than or equal to 1), the calculation formula of the average value of the first expansion variation parameters in the aggregation parameters of the first batch normalization layers can be based on

Calculating to obtain a first telescopic change parameter mean value of each batch of standardized layers of all the participants; normalization according to the first batch Calculation formula of second expansion change parameter mean value in layer aggregation parameters

And calculating to obtain the second telescopic change parameter mean value of each batch of standardized layers of all the participants, and further obtaining the first batch of standardized layer aggregation parameters of each batch of standardized layers of all the participants.

As an example, the second-batch normalization layer aggregation parameter may be calculated as follows:

calculating a first product of the mean value of the same batch of standardized layers of each participant and the minimum batch number of the same batch of standardized layers, and calculating the sum of the first products of the participants;

and calculating the sum of the minimum lot numbers of the N participants, and calculating to obtain a second batch of standardized layer aggregation parameters according to the sum of the first products and the sum of the minimum lot numbers.

As an example, when n=2, that is, there are two parties, respectively, party a and party B, in combination with the above example, the network structure adopted by the parties a and B is shown in fig. 3. Assume that the mean values of the first standardization layer A of the party A and the first standardization layer B of the party B are respectively

And->

The minimum lot numbers are respectively m _a And m _b The method comprises the steps of carrying out a first treatment on the surface of the The mean values of the second standardization layer A of the party A and the second standardization layer B of the party B are respectively +.>

And->

The minimum lot numbers are respectively m _a And m _b 。

Then, the formula can be calculated

A second set of normalization layer aggregation parameters 01 for the first set of normalization layers for party a and party B are calculated. According to the formula->

Second-batch normalization layer aggregation parameters 02 for the second-batch normalization layers of participant a and participant B are calculated.

It can be appreciated that if there are N participants (N is a positive integer greater than or equal to 2), each participant has a network structure with P layers of standardized layers (P is a positive integer greater than or equal to 1), the calculation formula of the aggregation parameters of the second standardized layer can be calculated

A second set of normalization layer aggregation parameters (aggregate value of the mean of all participants 'sets of normalization layers) for all participants' sets of normalization layers is calculated.

As an example, the third batch of normalized layer aggregation parameters may be calculated as follows:

calculating the square sum of the mean value of the same batch of standardization layers of each participant and the variance of the same batch of standardization layers, calculating the second product of the square sum and the minimum batch number of the participants, and counting the sum of the second products of the participants;

and calculating the sum of the minimum lot numbers of the N participants, and calculating a third batch of standardized layer aggregation parameters according to the sum of the second products, the sum of the minimum lot numbers and the second batch of standardized layer aggregation parameters.

As an example, when n=2, that is, there are two parties, respectively, party a and party B, in combination with the above example, the network structure adopted by the parties a and B is shown in fig. 3. Let the variance of the first normalization layer A of party A and the first normalization layer B of party B be respectively

And->

The variance of the second normalization layer A of party A and the second normalization layer B of party B is +.>

And->

Due to the desired-desired square of the variance = square of each BN layer for a single participant, i.e. Var = E (x ² )-E ² (x) Then the formula can be used

Calculating a third batch of standardized layer aggregation parameters 01 (i.e. aggregation variance) of the first batch of standardized layers A and B of the participant A and B; according to the formula->

A third set of normalized layer aggregation parameters 02 (i.e., aggregate variance) for the second set of normalized layers a of party a and the second set of normalized layers B of party B are calculated.

It can be understood that, assuming that there are N participants (N is a positive integer greater than or equal to 2), and each participant has a P-layer batch normalization layer (P is a positive integer greater than or equal to 1) on the network structure, the calculation formula of the third batch normalization layer aggregation parameter can be:

and calculating a third batch of standardized layer aggregation parameters of each batch of standardized layers of all the participants (namely, aggregate values of variances of each batch of standardized layers of all the participants).

According to the technical scheme provided by the embodiment of the disclosure, the principle that the server aggregates parameters of batch standardization layers of all participants is as follows:

wherein x is _i Representing the output of the ith sample at the layer preceding the BN layer, m represents the batch training one min_batch number (i.e., the minimum batch number), E _x Represents the mean value of min-batch, var _x The variance of min _ batch is represented,

representing normalization process (i.e. normalization process for output of ith sample in the layer before BN layer), y _i Represents the final output of BN layer, which is composed of +.>

Obtained by telescoping shift, wherein, gamma and beta are respectively +.>

E is the minimum value (eps) to prevent the denominator from being zero.

Specifically, the mean value of the batch data x is calculated firstly, then the variance of the batch data is calculated, then the batch data x is normalized, and finally the translation parameter and the scaling parameter are introduced to process the normalization result, so that y can be well calculated _i Restoring to x before normalization, thereby ensuring that each time data is normalizedThe method can also keep the original learned characteristics, and can finish normalization operation, so as to achieve the effect of accelerating model convergence, and be beneficial to improving the generalization capability of the model.

In some embodiments, in the step S204, each party adjusts and optimizes its algorithm model according to the first aggregation parameter and the second aggregation parameter, which specifically includes:

each participant adjusts hidden layer parameters of hidden layers in the network structure according to the first aggregation parameters;

each participant adjusts the batch normalization layer parameters of the batch normalization layers in its network structure according to the second aggregation parameters.

As an example, in combination with the above example, when party a and party B respectively receive the first aggregation parameters returned by the server 101 (including the first hidden layer aggregation parameters

And->

Second hidden layer aggregation parameter->

And->

) And second polymerization parameters (including the first normalized layer polymerization parameters +.>

Second-batch normalized layer polymerization parameters

). Party a may update and adjust the original parameters of its first hidden layer using the first hidden layer aggregation parameters, adjust the original parameters of its second hidden layer using the second hidden layer aggregation parameters, adjust the original parameters of its first standardized layer using the first standardized layer aggregation parameters, and aggregate the parameters using the second standardized layerThe number update adjusts the original parameters of the second standardization layer, thereby completing the update adjustment of all parameters of the network structure of the algorithm model. And then, carrying out model training on the data of the next batch by using the algorithm model after updating the parameters, and repeating updating and adjusting the parameters of each layer of the network structure after each batch of data is trained until a preset model training frequency threshold value is reached, so that a trained algorithm model can be obtained.

Similarly, the update adjustment of the parameters of each layer of the network structure of the algorithm model of the participant B by the participant B can refer to the update step of the participant a, which is not described herein.

Any combination of the above optional solutions may be adopted to form an optional embodiment of the present application, which is not described herein in detail.

The following are device embodiments of the present disclosure that may be used to perform method embodiments of the present disclosure. For details not disclosed in the embodiments of the apparatus of the present disclosure, please refer to the embodiments of the method of the present disclosure.

Fig. 4 is a schematic diagram of a joint learning parameter aggregation apparatus according to an embodiment of the present disclosure. As shown in fig. 4, the joint learning parameter aggregation apparatus includes:

the parameter obtaining module 401 is configured to obtain hidden layer parameters and batch standardized layer parameters uploaded by N participants, where the batch standardized layer parameters include a mean value, a variance, a minimum batch number, a first telescopic change parameter and a second telescopic change parameter, and N is a positive integer greater than or equal to 2;

a first aggregation module 402, configured to aggregate the hidden layer parameters uploaded by each participant to obtain first aggregation parameters;

a second aggregation module 403, configured to aggregate the batch normalization layer parameters uploaded by each participant, to obtain second aggregation parameters;

A parameter return module 404 configured to return the first and second aggregation parameters to the respective participants such that each participant adjusts and optimizes its algorithm model based on the first and second aggregation parameters.

According to the technical scheme provided by the embodiment of the disclosure, hidden layer parameters and batch standardization layer parameters uploaded by N participants are acquired through a parameter acquisition module 401; the first aggregation module 402 aggregates the hidden layer parameters uploaded by each participant to obtain first aggregation parameters; the second aggregation module 403 aggregates the batch of standardized layer parameters uploaded by each participant to obtain second aggregation parameters; the parameter return module 404 returns the first aggregation parameter and the second aggregation parameter to each participant, so that each participant adjusts and optimizes its algorithm model according to the first aggregation parameter and the second aggregation parameter, characteristics of different network layers in the network structure of the algorithm model of each participant can be comprehensively considered, and parameters of different network layers can be pertinently and respectively aggregated, so that the aggregation parameters are returned to each participant, so that each participant can adjust parameters of its algorithm model according to the returned aggregation parameters, and convergence speed and generalization capability of its algorithm model are improved.

In some embodiments, the network structure of the algorithm models of the N participants is the same, the network structure including an input layer, a batch normalization layer, a hidden layer, and an output layer. The first aggregation module 402 includes:

and the hidden layer parameter aggregation unit is configured to aggregate hidden layer parameters of the same hidden layer of each participant to obtain a first aggregation parameter, wherein the first aggregation parameter comprises at least one hidden layer aggregation parameter.

In some embodiments, the second polymerization module 403 may be specifically configured to:

In some embodiments, the second aggregation module 403 includes:

the first aggregation unit is configured to aggregate the first telescopic change parameters and the second telescopic change parameters of the same batch of standardized layers of each participant to obtain first batch of standardized layer aggregation parameters;

the second polymerization unit is configured to polymerize the average value of the same batch of standardized layers of each participant to obtain a second batch of standardized layer polymerization parameters;

And the third polymerization unit is configured to polymerize the variance of the same batch of standardized layers of each participant to obtain a third batch of standardized layer polymerization parameters.

In some embodiments, the first aggregation unit may be specifically configured to:

In some embodiments, the second polymerization unit may be specifically configured to:

In some embodiments, the third polymerization unit may be specifically configured to:

In some embodiments, each participant may be configured to:

after receiving the first aggregation parameter and the second aggregation parameter returned by the server, adjusting hidden layer parameters of hidden layers in the network structure according to the first aggregation parameter; and adjusting batch standardization layer parameters of the batch standardization layer in the network structure according to the second polymerization parameters.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic of each process, and should not constitute any limitation on the implementation process of the embodiments of the disclosure.

Fig. 5 is a schematic structural diagram of a joint learning parameter aggregation system according to an embodiment of the present disclosure. As shown in fig. 5, the joint learning parameter aggregation system includes a server 101 including the joint learning parameter aggregation apparatus described above; and N participants communicatively connected to the server 101.

Specifically, the server 101 and each participant can communicate through a network, bluetooth and other modes, each participant participates in joint learning for the purpose of optimizing a certain algorithm model or want to construct a certain algorithm model, and training is performed by using local data of the basic model or a basic model issued by the server, after training of a batch of data is finished, hidden layer parameters and batch standardized layer parameters obtained by training are uploaded to the server 101, and after the server 101 receives the hidden layer parameters and batch standardized layer parameters uploaded by each participant, the hidden layer parameters uploaded by each participant are aggregated to obtain a first aggregation parameter; the batch standardization layer parameters uploaded by each participant are aggregated to obtain second aggregation parameters; and then, returning the first aggregation parameter and the second aggregation parameter to each participant, after each participant receives the first aggregation parameter and the second aggregation parameter returned by the server 101, correspondingly adjusting the original parameters of the corresponding network structure layer in the algorithm model according to the first aggregation parameter and the second aggregation parameter, and continuing training the data of the next batch by utilizing the updated algorithm model network structure until model convergence is reached.

Fig. 6 is a schematic structural diagram of an electronic device 600 provided in an embodiment of the disclosure. As shown in fig. 6, the electronic device 600 of this embodiment includes: a processor 601, a memory 602 and a computer program 603 stored in the memory 602 and executable on the processor 601. The steps of the various method embodiments described above are implemented by the processor 601 when executing the computer program 603. Alternatively, the processor 601, when executing the computer program 603, performs the functions of the modules/units of the apparatus embodiments described above.

Illustratively, the computer program 603 may be partitioned into one or more modules/units that are stored in the memory 602 and executed by the processor 601 to complete the present disclosure. One or more of the modules/units may be a series of computer program instruction segments capable of performing particular functions for describing the execution of the computer program 603 in the electronic device 600.

The electronic device 600 may be a desktop computer, a notebook computer, a palm computer, a cloud server, or the like. The electronic device 600 may include, but is not limited to, a processor 601 and a memory 602. It will be appreciated by those skilled in the art that fig. 6 is merely an example of an electronic device 600 and is not intended to limit the electronic device 600, and may include more or fewer components than shown, or may combine certain components, or different components, e.g., an electronic device may also include an input-output device, a network access device, a bus, etc.

The processor 601 may be a central processing unit (Central Processing Unit, CPU) or other general purpose processor, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 602 may be an internal storage unit of the electronic device 600, for example, a hard disk or a memory of the electronic device 600. The memory 602 may also be an external storage device of the electronic device 600, for example, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card) or the like, which are provided on the electronic device 600. Further, the memory 602 may also include both internal and external storage units of the electronic device 600. The memory 602 is used to store computer programs and other programs and data required by the electronic device. The memory 602 may also be used to temporarily store data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.

In the embodiments provided in the present disclosure, it should be understood that the disclosed apparatus/electronic device and method may be implemented in other manners. For example, the apparatus/electronic device embodiments described above are merely illustrative, e.g., the division of modules or elements is merely a logical functional division, and there may be additional divisions of actual implementations, multiple elements or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present disclosure may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated modules/units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present disclosure may implement all or part of the flow of the method of the above-described embodiments, or may be implemented by a computer program to instruct related hardware, and the computer program may be stored in a computer readable storage medium, where the computer program, when executed by a processor, may implement the steps of the method embodiments described above. The computer program may comprise computer program code, which may be in source code form, object code form, executable file or in some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the content of the computer readable medium can be appropriately increased or decreased according to the requirements of the jurisdiction's jurisdiction and the patent practice, for example, in some jurisdictions, the computer readable medium does not include electrical carrier signals and telecommunication signals according to the jurisdiction and the patent practice.

The above embodiments are merely for illustrating the technical solution of the present disclosure, and are not limiting thereof; although the present disclosure has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the disclosure, and are intended to be included in the scope of the present disclosure.

Claims

1. A method for aggregating joint learning parameters, comprising:

aggregating the batch of standardized layer parameters uploaded by each participant to obtain a second aggregation parameter;

and returning the first aggregation parameter and the second aggregation parameter to each participant so that each participant adjusts and optimizes an algorithm model of the participant according to the first aggregation parameter and the second aggregation parameter.

2. The joint learning parameter aggregation method according to claim 1, wherein the network structures of the algorithm models of the N participants are the same, and the network structures include an input layer, a batch normalization layer, a hidden layer and an output layer;

the aggregation of the hidden layer parameters uploaded by each participant is performed to obtain a first aggregation parameter, which comprises the following steps:

3. The method for aggregating jointly learned parameters according to claim 1, wherein aggregating the batch normalization layer parameters uploaded by each of the participants to obtain a second aggregated parameter comprises:

and polymerizing the batch of standardized layer parameters of the same batch of standardized layers of each participant to obtain a second polymerization parameter, wherein the second polymerization parameter comprises at least one batch of standardized layer polymerization parameters.

4. The joint learning parameter aggregation method of claim 3, wherein the batch of normalized layer aggregation parameters comprises a first batch of normalized layer aggregation parameters, a second batch of normalized layer aggregation parameters, and a third batch of normalized layer aggregation parameters;

The step of polymerizing the batch of standardized layer parameters of the same batch of standardized layers of each participant to obtain a second polymerized parameter, including:

5. The method for aggregating jointly learned parameters according to claim 4, wherein aggregating the first and second parameters of the same batch of normalized layers for each of the participants to obtain the first batch of normalized layer aggregate parameters includes:

6. The method for aggregating jointly learned parameters according to claim 4, wherein aggregating the average value of the same batch of normalized layers for each of the participants to obtain a second batch of normalized layer aggregate parameters comprises:

7. The method of joint learning parameter aggregation according to claim 4, wherein aggregating the variances of the same batch of normalized layers of each of the participants to obtain a third batch of normalized layer aggregation parameters includes:

calculating the sum of squares of the mean value of the same batch of standardization layers of each participant and the variance of the mean value, calculating the second product of the sum of squares and the minimum batch number of the participants, and counting the sum of the second products of the participants;

and calculating the sum of the minimum lot numbers of the N participants, and calculating a third lot of standardized layer aggregation parameters according to the sum of the second products, the sum of the minimum lot numbers and the second lot of standardized layer aggregation parameters.

8. The method of joint learning parameter aggregation as claimed in claim 2, wherein each of the participants adjusts and optimizes its algorithm model according to the first aggregation parameter and the second aggregation parameter, including:

The participant adjusts hidden layer parameters of hidden layers in the network structure according to the first aggregation parameters;

and the participant adjusts batch standardization layer parameters of batch standardization layers in the network structure according to the second polymerization parameters.

9. A joint learning parameter aggregation apparatus, comprising:

and a parameter returning module configured to return the first aggregation parameter and the second aggregation parameter to each of the participants, so that each of the participants adjusts and optimizes an algorithm model thereof according to the first aggregation parameter and the second aggregation parameter.

10. A joint learning parameter aggregation system, comprising:

a server comprising the joint learning parameter aggregation apparatus according to claim 9; the method comprises the steps of,

and N participants in communication connection with the server.