CN116226779A - Method, device and system for aggregating joint learning parameters - Google Patents

Method, device and system for aggregating joint learning parameters Download PDF

Info

Publication number
CN116226779A
CN116226779A CN202111440144.5A CN202111440144A CN116226779A CN 116226779 A CN116226779 A CN 116226779A CN 202111440144 A CN202111440144 A CN 202111440144A CN 116226779 A CN116226779 A CN 116226779A
Authority
CN
China
Prior art keywords
parameters
aggregation
batch
layer
parameter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111440144.5A
Other languages
Chinese (zh)
Inventor
杜炎
王瑞杨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xinzhi I Lai Network Technology Co ltd
Original Assignee
Xinzhi I Lai Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xinzhi I Lai Network Technology Co ltd filed Critical Xinzhi I Lai Network Technology Co ltd
Priority to CN202111440144.5A priority Critical patent/CN116226779A/en
Priority to PCT/CN2022/119138 priority patent/WO2023093229A1/en
Publication of CN116226779A publication Critical patent/CN116226779A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/098Distributed learning, e.g. federated learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The disclosure relates to the technical field of machine learning, and provides a method, a device and a system for aggregating joint learning parameters. The method comprises the following steps: acquiring hidden layer parameters and batch standardized layer parameters uploaded by N participants; aggregating hidden layer parameters uploaded by each participant to obtain a first aggregation parameter; polymerizing the batch of standardized layer parameters uploaded by each participant to obtain a second polymerization parameter; the first aggregation parameter and the second aggregation parameter are returned to the respective participants so that each participant adjusts and optimizes its algorithm model according to the first aggregation parameter and the second aggregation parameter. The method and the device can comprehensively consider the characteristics of different network layers in the network structure of the algorithm model of each participant, and can pertinently and respectively aggregate the parameters of the different network layers, so that the aggregated parameters are returned to each participant, the parameters of the algorithm model of each participant can be adjusted according to the returned aggregated parameters, and the convergence speed and generalization capability of the algorithm model of each participant are improved.

Description

Method, device and system for aggregating joint learning parameters
Technical Field
The disclosure relates to the technical field of machine learning, and in particular relates to a method, a device and a system for aggregating joint learning parameters.
Background
Along with the increase of the number of layers of the deep learning network model, the number of hidden layers is increased, and in the training process, parameters of each hidden layer are changed along with the increase of the number of layers of the deep learning network model, so that the input distribution of the hidden layers is always changed, the convergence speed of model learning is reduced, and the generalization capability of the model is even affected. According to related researches, the input of each layer of network is standardized, namely Batch Normalization (batch standardization, hereinafter referred to as 'BN') can reduce the variance deviation inside the network to a certain extent so as to cause the change of input distribution, the convergence of a model is accelerated, and the model has better generalization capability.
The lateral joint learning of the network model (provided with BN layers) based on deep learning generally comprises a plurality of participants, each of which will upload the parameters trained by the respective participant to a server (central node), and then the server will aggregate the parameters of the respective participants and return the aggregated parameters to the respective participants, so that the respective participants adjust their parameters according to the returned aggregated parameters, thereby optimizing their model.
However, in the prior art, parameters of each party are aggregated by directly averaging or weighting the parameters uploaded by each party by a server, and then the aggregated parameters are returned to each party. Obviously, the aggregation mode does not consider the characteristics of different network layers of the network models of all the participants, but all the participants adjust the algorithm models according to the aggregation parameters returned by the server, so that the expected convergence speed of the accelerating algorithm model can not be achieved, and the generalization capability of the algorithm model is improved.
Disclosure of Invention
In view of this, the embodiments of the present disclosure provide a method, an apparatus, and a system for aggregating joint learning parameters, so as to solve the problem that the existing method for aggregating joint learning parameters cannot well help each participant to accelerate the convergence speed and generalization ability of its algorithm model.
In a first aspect of an embodiment of the present disclosure, a method for aggregating joint learning parameters is provided, including:
acquiring hidden layer parameters and batch standardized layer parameters uploaded by N participants, wherein the batch standardized layer parameters comprise a mean value, a variance, a minimum batch number, a first telescopic change parameter and a second telescopic change parameter, and N is a positive integer more than or equal to 2;
aggregating hidden layer parameters uploaded by each participant to obtain a first aggregation parameter;
polymerizing the batch of standardized layer parameters uploaded by each participant to obtain a second polymerization parameter;
the first aggregation parameter and the second aggregation parameter are returned to the respective participants so that each participant adjusts and optimizes its algorithm model according to the first aggregation parameter and the second aggregation parameter.
In a second aspect of the embodiments of the present disclosure, there is provided a joint learning parameter aggregation apparatus, including:
the parameter acquisition module is configured to acquire hidden layer parameters and batch standardized layer parameters uploaded by N participants, wherein the batch standardized layer parameters comprise a mean value, a variance, a minimum batch number, a first telescopic change parameter and a second telescopic change parameter, and N is a positive integer more than or equal to 2;
The first aggregation module is configured to aggregate hidden layer parameters uploaded by each participant to obtain first aggregation parameters;
the second aggregation module is configured to aggregate the batch standardization layer parameters uploaded by each participant to obtain second aggregation parameters;
and the parameter returning module is configured to return the first aggregation parameter and the second aggregation parameter to each participant so that each participant adjusts and optimizes an algorithm model of the participant according to the first aggregation parameter and the second aggregation parameter.
In a third aspect of embodiments of the present disclosure, there is provided a joint learning parameter aggregation system, including:
the server comprises the joint learning parameter aggregation device; and N participants communicatively connected to the server.
Compared with the prior art, the beneficial effects of the embodiment of the disclosure at least comprise: the method comprises the steps of obtaining hidden layer parameters and batch standardized layer parameters uploaded by N participants, wherein the batch standardized layer parameters comprise a mean value, a variance, a minimum batch number, a first telescopic change parameter and a second telescopic change parameter, and N is a positive integer which is more than or equal to 2; aggregating hidden layer parameters uploaded by each participant to obtain a first aggregation parameter; polymerizing the batch of standardized layer parameters uploaded by each participant to obtain a second polymerization parameter; the first aggregation parameters and the second aggregation parameters are returned to each participant, so that each participant adjusts and optimizes the algorithm model according to the first aggregation parameters and the second aggregation parameters, the characteristics of different network layers in the network structure of the algorithm model of each participant can be comprehensively considered, the parameters of different network layers can be pertinently and respectively aggregated, the aggregation parameters are returned to each participant, the parameters of the algorithm model of each participant can be adjusted according to the returned aggregation parameters, and the convergence speed and the generalization capability of the algorithm model of each participant are improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings that are required for the embodiments or the description of the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and other drawings may be obtained according to these drawings without inventive effort for a person of ordinary skill in the art.
FIG. 1 is a schematic diagram of a joint learning architecture according to an embodiment of the present disclosure;
FIG. 2 is a flow chart of a method for aggregating joint learning parameters according to an embodiment of the disclosure;
fig. 3 is a schematic diagram of a network structure of an algorithm model of a participant in a joint learning parameter aggregation method according to an embodiment of the disclosure;
fig. 4 is a schematic structural diagram of a joint learning parameter aggregation apparatus according to an embodiment of the present disclosure;
FIG. 5 is a schematic structural diagram of a joint learning parameter aggregation system according to an embodiment of the present disclosure;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system configurations, techniques, etc. in order to provide a thorough understanding of the disclosed embodiments. However, it will be apparent to one skilled in the art that the present disclosure may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present disclosure with unnecessary detail.
The joint learning refers to comprehensively utilizing a plurality of AI (Artificial Intelligence ) technologies on the premise of ensuring data safety and user privacy, jointly excavating data value by combining multiparty cooperation, and promoting new intelligent business states and modes based on joint modeling. The joint learning has at least the following characteristics:
(1) The participating nodes control the weak centralized joint training mode of the own data, so that the data privacy safety in the co-creation intelligent process is ensured.
(2) Under different application scenes, a plurality of model aggregation optimization strategies are established by utilizing screening and/or combination of an AI algorithm and privacy protection calculation so as to obtain a high-level and high-quality model.
(3) On the premise of ensuring data safety and user privacy, a method for improving the efficiency of the joint learning engine is obtained based on a plurality of model aggregation optimization strategies, wherein the efficiency method can be used for improving the overall efficiency of the joint learning engine by solving the problems of information interaction, intelligent perception, exception handling mechanisms and the like under a large-scale cross-domain network with parallel computing architecture.
(4) The requirements of multiparty users in all scenes are acquired, the real contribution degree of all joint participants is determined and reasonably evaluated through a mutual trust mechanism, and distribution excitation is carried out.
Based on the mode, AI technical ecology based on joint learning can be established, the industry data value is fully exerted, and the scene of the vertical field is promoted to fall to the ground.
A method, apparatus and system for aggregating joint learning parameters according to embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings.
Fig. 1 is a schematic diagram of a joint learning architecture according to an embodiment of the present disclosure. As shown in fig. 1, the architecture of joint learning may include a server (central node) 101, as well as participants 102, 103, and 104.
In the joint learning process, a basic model may be established by the server 101, and the server 101 transmits the model to the participants 102, 103, and 104 with which a communication connection is established. The basic model may also be uploaded to the server 101 after any party has established, and the server 101 sends the model to the other parties with whom it has established a communication connection. The participants 102, 103 and 104 construct a model according to the downloaded basic structure and model parameters, perform model training using local data, obtain updated model parameters, and encrypt and upload the updated model parameters to the server 101. Server 101 aggregates the model parameters sent by participants 102, 103, and 104 to obtain global model parameters, and transmits the global model parameters back to participants 102, 103, and 104. Participant 102, participant 103 and participant 104 iterate the respective models according to the received global model parameters until the models eventually converge, thereby enabling training of the models. In the joint learning process, the data uploaded by the participants 102, 103 and 104 are model parameters, local data is not uploaded to the server 101, and all the participants can share final model parameters, so that common modeling can be realized on the basis of ensuring data privacy. It should be noted that the number of participants is not limited to the above three, but may be set as needed, and the embodiment of the present disclosure is not limited thereto.
Fig. 2 is a flow chart of a method for aggregating joint learning parameters according to an embodiment of the disclosure. The joint learning parameter aggregation method of fig. 2 may be performed by the server 101 of fig. 1. As shown in fig. 2, the joint learning parameter aggregation method includes:
step S201, acquiring hidden layer parameters and batch standardization layer parameters uploaded by N participants, wherein the batch standardization layer parameters comprise a mean value, a variance, a minimum batch number, a first expansion change parameter and a second expansion change parameter, and N is a positive integer more than or equal to 2.
Wherein, the hidden layer parameters refer to parameters of hidden layers in a network structure (for example, a neural network structure) of a basic model adopted by each participant, and include a weight W and a bias b of each hidden layer in the network structure.
Batch normalization layer parameters, which refer to parameters of batch normalization layers (i.e., BN layers) in a network structure (e.g., neural network structure) of a basic model employed by each party, include the mean value E of each BN layer in the network structure x Variance Var x A minimum lot number m, a first telescoping parameter y and a second telescoping parameter β.
As an example, each participant may establish a communication connection with a server through a terminal device (e.g., a smart phone, a personal computer, etc.), and upload its hidden layer parameters, which may be standardized layer parameters in batch.
Step S202, aggregation is carried out on hidden layer parameters uploaded by each participant to obtain first aggregation parameters.
As an example, let n=2, namely, there are two parties, denoted as party a and party B, and the network structure of the basic model adopted by parties a and B is a three-layer network, which is an input layer, (batch normalization layer, BN layer) hidden layer, and an output layer in sequence.
First, the hidden layer parameters uploaded by the party A (comprising the weight W a And bias b a ) Batch normalization layer parameters (including mean value of its BN layer
Figure BDA0003383015520000061
Variance->
Figure BDA0003383015520000062
Minimum lot number m a First telescoping change parameter gamma a And a second telescoping variable parameter beta a ) And hidden layer parameters uploaded by party B (including weight W b And bias b b ) Batch normalization layer parameters of their BN layer (including mean +.>
Figure BDA0003383015520000063
Variance of/>
Figure BDA0003383015520000064
Minimum lot number m b First telescoping change parameter gamma b And a second telescoping variable parameter beta b )。
Next, the hidden layer parameters of the party a and the party B are aggregated, which may be specifically calculated by calculating the hidden layer parameter weight W of the party a a Bias b a Hidden layer parameter weight W with party B b Bias b b I.e. the aggregation of hidden layer parameters of the two is completed to obtain a first aggregation parameter (i.e. weight W a Bias b a And weight W b Bias b b Or a weighted average) of (a) in the set.
Step S203, aggregating the batch of standardized layer parameters uploaded by each participant to obtain a second aggregated parameter.
In combination with the above example, batch normalization layer parameters for party a and party B are aggregated, specifically for party a's batch normalization layer parameter mean
Figure BDA0003383015520000065
Variance->
Figure BDA0003383015520000066
Minimum lot number m a First telescoping change parameter gamma a And a second telescoping variable parameter beta a Batch normalization layer parameter mean +.>
Figure BDA0003383015520000067
Variance->
Figure BDA0003383015520000068
Minimum lot number m b First telescoping change parameter gamma b And a second telescoping variable parameter beta b And polymerizing to obtain a second polymerization parameter.
Step S204, the first aggregation parameter and the second aggregation parameter are returned to each participant, so that each participant adjusts and optimizes the algorithm model according to the first aggregation parameter and the second aggregation parameter.
In combination with the above example, after aggregating the hidden layer parameters and the batch normalization layer parameters uploaded by the party a and the party B according to the above steps, the server 101 returns the first aggregation parameter and the second aggregation parameter to the party a and the party B, respectively. At this time, after receiving the first aggregation parameter and the second aggregation parameter returned by the server 101, the participant a and the participant B may update and adjust the hidden layer parameter in the network model by using the first aggregation parameter, update and adjust the batch of parameters of the standardized layer (BN layer) in the network model by using the second aggregation parameter, then use the network model after updating the parameters to continue training, and repeat the above parameter aggregation update step after training the next batch of training data until the algorithm model reaches the preset iteration number, so as to obtain the trained algorithm model.
According to the technical scheme provided by the embodiment of the disclosure, hidden layer parameters and batch standardization layer parameters uploaded by N participants are obtained, wherein the batch standardization layer parameters comprise a mean value, a variance, a minimum batch number, a first expansion change parameter and a second expansion change parameter, and N is a positive integer more than or equal to 2; aggregating hidden layer parameters uploaded by each participant to obtain a first aggregation parameter; polymerizing the batch of standardized layer parameters uploaded by each participant to obtain a second polymerization parameter; the first aggregation parameters and the second aggregation parameters are returned to each participant, so that each participant adjusts and optimizes the algorithm model according to the first aggregation parameters and the second aggregation parameters, the characteristics of different network layers in the network structure of the algorithm model of each participant can be comprehensively considered, the parameters of different network layers can be pertinently and respectively aggregated, the aggregation parameters are returned to each participant, the parameters of the algorithm model of each participant can be adjusted according to the returned aggregation parameters, and the convergence speed and the generalization capability of the algorithm model of each participant are improved.
In some embodiments, the network structure of the algorithm models of the N participants is the same, the network structure including an input layer, a batch normalization layer, a hidden layer, and an output layer.
The step S202 includes:
and aggregating hidden layer parameters of the same hidden layer of each participant to obtain a first aggregation parameter, wherein the first aggregation parameter comprises at least one hidden layer aggregation parameter.
As an example, let n=2, there are two participants, namely, a participant a and a participant B, wherein the network structures of the algorithm models of the participant a and the participant B are each a 4-layer neural network structure, and the structural schematic diagram of the neural network structure is shown in fig. 3. Referring to fig. 3, the neural network structure of the participant a includes an input layer a, a first BN layer a, a first hidden layer a, a second BN layer a, a second hidden layer a, and an output layer a; the neural network structure of the participant B comprises an input layer B, a first BN layer B, a first hidden layer B, a second BN layer B, a second hidden layer B and an output layer B. The first hidden layer a and the first hidden layer B are first hidden layers (both belong to the same hidden layer) of the participant a and the participant B, and the second hidden layer a and the second hidden layer B are second hidden layers (both belong to the same hidden layer) of the participant a and the participant B.
The hidden layer parameters of the same hidden layer of each participant are aggregated, specifically, the hidden layer parameters of a first hidden layer A of the participant A and a first hidden layer B of the participant B are aggregated to obtain hidden layer aggregation parameters 01; and aggregating hidden layer parameters of a second hidden layer A of the participant A and a second hidden layer B of the participant B to obtain hidden layer aggregation parameters 02. The first aggregation parameters here include hidden layer aggregation parameter 01 and hidden layer aggregation parameter 02.
As an example, assume that the hidden layer parameter of the first hidden layer a of the participant a is a weight W a1 Bias b a1 The hidden layer parameter of the second hidden layer A is weight W a2 Bias b a2 The method comprises the steps of carrying out a first treatment on the surface of the The hidden layer parameter of the first hidden layer B of the participant B is weight W b1 Bias b b1 The hidden layer parameter of the second hidden layer B is weight W b2 Bias b b2
Specifically, the aggregation process of hidden layer parameters of party a and party B is as follows:
first, calculate Party A and Party BWeight mean of first hidden layer
Figure BDA0003383015520000081
Figure BDA0003383015520000082
At the same time, the weight mean value of the second hidden layer of party A and party B is calculated +.>
Figure BDA0003383015520000083
Figure BDA0003383015520000084
Second, calculate the bias mean of the first hidden layer of Party A and Party B
Figure BDA0003383015520000085
Figure BDA0003383015520000086
At the same time, calculate the bias mean value of the second hidden layer of party A and party B +.>
Figure BDA0003383015520000087
Figure BDA0003383015520000088
From the above, the first hidden layer aggregation parameters of party A and party B are
Figure BDA0003383015520000089
And->
Figure BDA00033830155200000810
The second hidden layer polymerization parameter is +.>
Figure BDA00033830155200000811
And->
Figure BDA00033830155200000812
The first aggregation parameter includes a firstHidden layer aggregation parameters and second hidden layer aggregation parameters.
It will be appreciated that assuming N participants (N is a positive integer greater than or equal to 2) each having a network structure with K hidden layers (K is a positive integer greater than or equal to 1), then the formula can be followed
Figure BDA00033830155200000813
Calculating to obtain the weight average value of each hidden layer of all the participants; according to the formula- >
Figure BDA00033830155200000814
And calculating to obtain the bias mean value of each hidden layer of all the participants, and further obtaining the first aggregation parameters of the hidden layers of all the participants.
In some embodiments, the step S203 includes:
and polymerizing the batch of standardized layer parameters of the same batch of standardized layers of each participant to obtain a second polymerization parameter, wherein the second polymerization parameter comprises at least one batch of standardized layer polymerization parameters, and the batch of standardized layer polymerization parameters comprise a first batch of standardized layer polymerization parameters, a second batch of standardized layer polymerization parameters and a third batch of standardized layer polymerization parameters.
With reference to the above example, with reference to fig. 3, batch normalization layer parameters of the same batch normalization layer of each participant are aggregated, specifically, batch normalization layer parameters of a first batch normalization layer a of a participant a and a first batch normalization layer B of a participant B are aggregated, so as to obtain a batch normalization layer aggregation parameter 01; and polymerizing the batch standardization layer parameters of the second batch standardization layer A of the participant A and the second batch standardization layer B of the participant B to obtain batch standardization layer polymerization parameters 02. The second aggregation parameters here include a batch normalized layer aggregation parameter 01 and a batch normalized layer aggregation parameter 02.
In some embodiments, the step of aggregating the batch normalization layer parameters of the same batch normalization layer of each participant to obtain the second aggregation parameter specifically includes:
The first telescopic change parameters and the second telescopic change parameters of the same batch of standardized layers of each participant are aggregated to obtain first batch of standardized layer aggregation parameters;
polymerizing the average value of the same batch of standardized layers of each participant to obtain a second batch of standardized layer polymerization parameters;
and aggregating the variances of the same batch of standardized layers of each participant to obtain a third batch of standardized layer aggregation parameters.
As an example, the first normalization layer aggregation parameter may be calculated as follows:
and calculating the average value of the first telescopic change parameters of the same batch of standardized layers of each participant and the average value of the second telescopic change parameters of the same batch of standardized layers of each participant to obtain the aggregation parameters of the first batch of standardized layers.
As an example, in combination with the above example, it is assumed that the first scaling parameters of the first normalization layer a of the party a and the first normalization layer B of the party B are γ respectively a1 And gamma b1 The second expansion and contraction change parameters are respectively beta a1 And beta b1 The method comprises the steps of carrying out a first treatment on the surface of the The first expansion and contraction change parameters of the second standardization layer A of the participator A and the second standardization layer B of the participator B are gamma respectively a2 And gamma b2 The second expansion and contraction change parameters are respectively beta a2 And beta b2
Then, the formula can be calculated
Figure BDA0003383015520000101
Calculating the mean value of the first expansion change parameters of the first standardization layer A of the participant A and the first expansion change parameters of the first standardization layer B of the participant B>
Figure BDA0003383015520000102
According to the formula->
Figure BDA0003383015520000103
Calculating the mean value of the first expansion change parameters of the second standardization layer A of the participant A and the second standardization layer B of the participant B>
Figure BDA0003383015520000104
According to the formula->
Figure BDA0003383015520000105
Calculating the second expansion change parameter mean value +_of the first standardization layer A of the participant A and the first standardization layer B of the participant B>
Figure BDA0003383015520000106
According to the formula->
Figure BDA0003383015520000107
Calculating the second expansion change parameter mean value +_of the second standardization layer A of the participant A and the second standardization layer B of the participant B>
Figure BDA0003383015520000108
From the above, the first normalization layer aggregation parameters of the first normalization layers of Party A and Party B are
Figure BDA0003383015520000109
And->
Figure BDA00033830155200001010
The first normalization layer of the second normalization layer has an aggregation parameter of +.>
Figure BDA00033830155200001011
And->
Figure BDA00033830155200001012
It can be appreciated that, assuming that there are N participants (N is a positive integer greater than or equal to 2), each participant has a network structure with P layers of batch normalization layers (P is a positive integer greater than or equal to 1), the calculation formula of the average value of the first expansion variation parameters in the aggregation parameters of the first batch normalization layers can be based on
Figure BDA00033830155200001013
Calculating to obtain a first telescopic change parameter mean value of each batch of standardized layers of all the participants; normalization according to the first batch Calculation formula of second expansion change parameter mean value in layer aggregation parameters
Figure BDA00033830155200001014
And calculating to obtain the second telescopic change parameter mean value of each batch of standardized layers of all the participants, and further obtaining the first batch of standardized layer aggregation parameters of each batch of standardized layers of all the participants.
As an example, the second-batch normalization layer aggregation parameter may be calculated as follows:
calculating a first product of the mean value of the same batch of standardized layers of each participant and the minimum batch number of the same batch of standardized layers, and calculating the sum of the first products of the participants;
and calculating the sum of the minimum lot numbers of the N participants, and calculating to obtain a second batch of standardized layer aggregation parameters according to the sum of the first products and the sum of the minimum lot numbers.
As an example, when n=2, that is, there are two parties, respectively, party a and party B, in combination with the above example, the network structure adopted by the parties a and B is shown in fig. 3. Assume that the mean values of the first standardization layer A of the party A and the first standardization layer B of the party B are respectively
Figure BDA0003383015520000111
And->
Figure BDA0003383015520000112
The minimum lot numbers are respectively m a And m b The method comprises the steps of carrying out a first treatment on the surface of the The mean values of the second standardization layer A of the party A and the second standardization layer B of the party B are respectively +.>
Figure BDA0003383015520000113
And->
Figure BDA0003383015520000114
The minimum lot numbers are respectively m a And m b
Then, the formula can be calculated
Figure BDA0003383015520000115
A second set of normalization layer aggregation parameters 01 for the first set of normalization layers for party a and party B are calculated. According to the formula->
Figure BDA0003383015520000116
Second-batch normalization layer aggregation parameters 02 for the second-batch normalization layers of participant a and participant B are calculated.
It can be appreciated that if there are N participants (N is a positive integer greater than or equal to 2), each participant has a network structure with P layers of standardized layers (P is a positive integer greater than or equal to 1), the calculation formula of the aggregation parameters of the second standardized layer can be calculated
Figure BDA0003383015520000117
A second set of normalization layer aggregation parameters (aggregate value of the mean of all participants 'sets of normalization layers) for all participants' sets of normalization layers is calculated.
As an example, the third batch of normalized layer aggregation parameters may be calculated as follows:
calculating the square sum of the mean value of the same batch of standardization layers of each participant and the variance of the same batch of standardization layers, calculating the second product of the square sum and the minimum batch number of the participants, and counting the sum of the second products of the participants;
and calculating the sum of the minimum lot numbers of the N participants, and calculating a third batch of standardized layer aggregation parameters according to the sum of the second products, the sum of the minimum lot numbers and the second batch of standardized layer aggregation parameters.
As an example, when n=2, that is, there are two parties, respectively, party a and party B, in combination with the above example, the network structure adopted by the parties a and B is shown in fig. 3. Let the variance of the first normalization layer A of party A and the first normalization layer B of party B be respectively
Figure BDA0003383015520000118
And->
Figure BDA0003383015520000119
The variance of the second normalization layer A of party A and the second normalization layer B of party B is +.>
Figure BDA0003383015520000121
And->
Figure BDA0003383015520000122
Due to the desired-desired square of the variance = square of each BN layer for a single participant, i.e. Var = E (x 2 )-E 2 (x) Then the formula can be used
Figure BDA0003383015520000123
Calculating a third batch of standardized layer aggregation parameters 01 (i.e. aggregation variance) of the first batch of standardized layers A and B of the participant A and B; according to the formula->
Figure BDA0003383015520000124
A third set of normalized layer aggregation parameters 02 (i.e., aggregate variance) for the second set of normalized layers a of party a and the second set of normalized layers B of party B are calculated.
It can be understood that, assuming that there are N participants (N is a positive integer greater than or equal to 2), and each participant has a P-layer batch normalization layer (P is a positive integer greater than or equal to 1) on the network structure, the calculation formula of the third batch normalization layer aggregation parameter can be:
Figure BDA0003383015520000125
and calculating a third batch of standardized layer aggregation parameters of each batch of standardized layers of all the participants (namely, aggregate values of variances of each batch of standardized layers of all the participants).
According to the technical scheme provided by the embodiment of the disclosure, the principle that the server aggregates parameters of batch standardization layers of all participants is as follows:
Figure BDA0003383015520000126
Figure BDA0003383015520000127
Figure BDA0003383015520000128
Figure BDA0003383015520000131
wherein x is i Representing the output of the ith sample at the layer preceding the BN layer, m represents the batch training one min_batch number (i.e., the minimum batch number), E x Represents the mean value of min-batch, var x The variance of min _ batch is represented,
Figure BDA0003383015520000132
representing normalization process (i.e. normalization process for output of ith sample in the layer before BN layer), y i Represents the final output of BN layer, which is composed of +.>
Figure BDA0003383015520000133
Obtained by telescoping shift, wherein, gamma and beta are respectively +.>
Figure BDA0003383015520000134
E is the minimum value (eps) to prevent the denominator from being zero.
Specifically, the mean value of the batch data x is calculated firstly, then the variance of the batch data is calculated, then the batch data x is normalized, and finally the translation parameter and the scaling parameter are introduced to process the normalization result, so that y can be well calculated i Restoring to x before normalization, thereby ensuring that each time data is normalizedThe method can also keep the original learned characteristics, and can finish normalization operation, so as to achieve the effect of accelerating model convergence, and be beneficial to improving the generalization capability of the model.
In some embodiments, in the step S204, each party adjusts and optimizes its algorithm model according to the first aggregation parameter and the second aggregation parameter, which specifically includes:
each participant adjusts hidden layer parameters of hidden layers in the network structure according to the first aggregation parameters;
each participant adjusts the batch normalization layer parameters of the batch normalization layers in its network structure according to the second aggregation parameters.
As an example, in combination with the above example, when party a and party B respectively receive the first aggregation parameters returned by the server 101 (including the first hidden layer aggregation parameters
Figure BDA0003383015520000135
And->
Figure BDA0003383015520000136
Second hidden layer aggregation parameter->
Figure BDA0003383015520000137
And->
Figure BDA0003383015520000138
) And second polymerization parameters (including the first normalized layer polymerization parameters +.>
Figure BDA0003383015520000139
Second-batch normalized layer polymerization parameters
Figure BDA00033830155200001310
). Party a may update and adjust the original parameters of its first hidden layer using the first hidden layer aggregation parameters, adjust the original parameters of its second hidden layer using the second hidden layer aggregation parameters, adjust the original parameters of its first standardized layer using the first standardized layer aggregation parameters, and aggregate the parameters using the second standardized layerThe number update adjusts the original parameters of the second standardization layer, thereby completing the update adjustment of all parameters of the network structure of the algorithm model. And then, carrying out model training on the data of the next batch by using the algorithm model after updating the parameters, and repeating updating and adjusting the parameters of each layer of the network structure after each batch of data is trained until a preset model training frequency threshold value is reached, so that a trained algorithm model can be obtained.
Similarly, the update adjustment of the parameters of each layer of the network structure of the algorithm model of the participant B by the participant B can refer to the update step of the participant a, which is not described herein.
Any combination of the above optional solutions may be adopted to form an optional embodiment of the present application, which is not described herein in detail.
The following are device embodiments of the present disclosure that may be used to perform method embodiments of the present disclosure. For details not disclosed in the embodiments of the apparatus of the present disclosure, please refer to the embodiments of the method of the present disclosure.
Fig. 4 is a schematic diagram of a joint learning parameter aggregation apparatus according to an embodiment of the present disclosure. As shown in fig. 4, the joint learning parameter aggregation apparatus includes:
the parameter obtaining module 401 is configured to obtain hidden layer parameters and batch standardized layer parameters uploaded by N participants, where the batch standardized layer parameters include a mean value, a variance, a minimum batch number, a first telescopic change parameter and a second telescopic change parameter, and N is a positive integer greater than or equal to 2;
a first aggregation module 402, configured to aggregate the hidden layer parameters uploaded by each participant to obtain first aggregation parameters;
a second aggregation module 403, configured to aggregate the batch normalization layer parameters uploaded by each participant, to obtain second aggregation parameters;
A parameter return module 404 configured to return the first and second aggregation parameters to the respective participants such that each participant adjusts and optimizes its algorithm model based on the first and second aggregation parameters.
According to the technical scheme provided by the embodiment of the disclosure, hidden layer parameters and batch standardization layer parameters uploaded by N participants are acquired through a parameter acquisition module 401; the first aggregation module 402 aggregates the hidden layer parameters uploaded by each participant to obtain first aggregation parameters; the second aggregation module 403 aggregates the batch of standardized layer parameters uploaded by each participant to obtain second aggregation parameters; the parameter return module 404 returns the first aggregation parameter and the second aggregation parameter to each participant, so that each participant adjusts and optimizes its algorithm model according to the first aggregation parameter and the second aggregation parameter, characteristics of different network layers in the network structure of the algorithm model of each participant can be comprehensively considered, and parameters of different network layers can be pertinently and respectively aggregated, so that the aggregation parameters are returned to each participant, so that each participant can adjust parameters of its algorithm model according to the returned aggregation parameters, and convergence speed and generalization capability of its algorithm model are improved.
In some embodiments, the network structure of the algorithm models of the N participants is the same, the network structure including an input layer, a batch normalization layer, a hidden layer, and an output layer. The first aggregation module 402 includes:
and the hidden layer parameter aggregation unit is configured to aggregate hidden layer parameters of the same hidden layer of each participant to obtain a first aggregation parameter, wherein the first aggregation parameter comprises at least one hidden layer aggregation parameter.
In some embodiments, the second polymerization module 403 may be specifically configured to:
and polymerizing the batch of standardized layer parameters of the same batch of standardized layers of each participant to obtain a second polymerization parameter, wherein the second polymerization parameter comprises at least one batch of standardized layer polymerization parameters, and the batch of standardized layer polymerization parameters comprise a first batch of standardized layer polymerization parameters, a second batch of standardized layer polymerization parameters and a third batch of standardized layer polymerization parameters.
In some embodiments, the second aggregation module 403 includes:
the first aggregation unit is configured to aggregate the first telescopic change parameters and the second telescopic change parameters of the same batch of standardized layers of each participant to obtain first batch of standardized layer aggregation parameters;
the second polymerization unit is configured to polymerize the average value of the same batch of standardized layers of each participant to obtain a second batch of standardized layer polymerization parameters;
And the third polymerization unit is configured to polymerize the variance of the same batch of standardized layers of each participant to obtain a third batch of standardized layer polymerization parameters.
In some embodiments, the first aggregation unit may be specifically configured to:
and calculating the average value of the first telescopic change parameters of the same batch of standardized layers of each participant and the average value of the second telescopic change parameters of the same batch of standardized layers of each participant to obtain the aggregation parameters of the first batch of standardized layers.
In some embodiments, the second polymerization unit may be specifically configured to:
calculating a first product of the mean value of the same batch of standardized layers of each participant and the minimum batch number of the same batch of standardized layers, and calculating the sum of the first products of the participants;
and calculating the sum of the minimum lot numbers of the N participants, and calculating to obtain a second batch of standardized layer aggregation parameters according to the sum of the first products and the sum of the minimum lot numbers.
In some embodiments, the third polymerization unit may be specifically configured to:
calculating the square sum of the mean value of the same batch of standardization layers of each participant and the variance of the same batch of standardization layers, calculating the second product of the square sum and the minimum batch number of the participants, and counting the sum of the second products of the participants;
And calculating the sum of the minimum lot numbers of the N participants, and calculating a third batch of standardized layer aggregation parameters according to the sum of the second products, the sum of the minimum lot numbers and the second batch of standardized layer aggregation parameters.
In some embodiments, each participant may be configured to:
after receiving the first aggregation parameter and the second aggregation parameter returned by the server, adjusting hidden layer parameters of hidden layers in the network structure according to the first aggregation parameter; and adjusting batch standardization layer parameters of the batch standardization layer in the network structure according to the second polymerization parameters.
It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic of each process, and should not constitute any limitation on the implementation process of the embodiments of the disclosure.
Fig. 5 is a schematic structural diagram of a joint learning parameter aggregation system according to an embodiment of the present disclosure. As shown in fig. 5, the joint learning parameter aggregation system includes a server 101 including the joint learning parameter aggregation apparatus described above; and N participants communicatively connected to the server 101.
Specifically, the server 101 and each participant can communicate through a network, bluetooth and other modes, each participant participates in joint learning for the purpose of optimizing a certain algorithm model or want to construct a certain algorithm model, and training is performed by using local data of the basic model or a basic model issued by the server, after training of a batch of data is finished, hidden layer parameters and batch standardized layer parameters obtained by training are uploaded to the server 101, and after the server 101 receives the hidden layer parameters and batch standardized layer parameters uploaded by each participant, the hidden layer parameters uploaded by each participant are aggregated to obtain a first aggregation parameter; the batch standardization layer parameters uploaded by each participant are aggregated to obtain second aggregation parameters; and then, returning the first aggregation parameter and the second aggregation parameter to each participant, after each participant receives the first aggregation parameter and the second aggregation parameter returned by the server 101, correspondingly adjusting the original parameters of the corresponding network structure layer in the algorithm model according to the first aggregation parameter and the second aggregation parameter, and continuing training the data of the next batch by utilizing the updated algorithm model network structure until model convergence is reached.
Fig. 6 is a schematic structural diagram of an electronic device 600 provided in an embodiment of the disclosure. As shown in fig. 6, the electronic device 600 of this embodiment includes: a processor 601, a memory 602 and a computer program 603 stored in the memory 602 and executable on the processor 601. The steps of the various method embodiments described above are implemented by the processor 601 when executing the computer program 603. Alternatively, the processor 601, when executing the computer program 603, performs the functions of the modules/units of the apparatus embodiments described above.
Illustratively, the computer program 603 may be partitioned into one or more modules/units that are stored in the memory 602 and executed by the processor 601 to complete the present disclosure. One or more of the modules/units may be a series of computer program instruction segments capable of performing particular functions for describing the execution of the computer program 603 in the electronic device 600.
The electronic device 600 may be a desktop computer, a notebook computer, a palm computer, a cloud server, or the like. The electronic device 600 may include, but is not limited to, a processor 601 and a memory 602. It will be appreciated by those skilled in the art that fig. 6 is merely an example of an electronic device 600 and is not intended to limit the electronic device 600, and may include more or fewer components than shown, or may combine certain components, or different components, e.g., an electronic device may also include an input-output device, a network access device, a bus, etc.
The processor 601 may be a central processing unit (Central Processing Unit, CPU) or other general purpose processor, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 602 may be an internal storage unit of the electronic device 600, for example, a hard disk or a memory of the electronic device 600. The memory 602 may also be an external storage device of the electronic device 600, for example, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card) or the like, which are provided on the electronic device 600. Further, the memory 602 may also include both internal and external storage units of the electronic device 600. The memory 602 is used to store computer programs and other programs and data required by the electronic device. The memory 602 may also be used to temporarily store data that has been output or is to be output.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.
In the embodiments provided in the present disclosure, it should be understood that the disclosed apparatus/electronic device and method may be implemented in other manners. For example, the apparatus/electronic device embodiments described above are merely illustrative, e.g., the division of modules or elements is merely a logical functional division, and there may be additional divisions of actual implementations, multiple elements or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in each embodiment of the present disclosure may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated modules/units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present disclosure may implement all or part of the flow of the method of the above-described embodiments, or may be implemented by a computer program to instruct related hardware, and the computer program may be stored in a computer readable storage medium, where the computer program, when executed by a processor, may implement the steps of the method embodiments described above. The computer program may comprise computer program code, which may be in source code form, object code form, executable file or in some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the content of the computer readable medium can be appropriately increased or decreased according to the requirements of the jurisdiction's jurisdiction and the patent practice, for example, in some jurisdictions, the computer readable medium does not include electrical carrier signals and telecommunication signals according to the jurisdiction and the patent practice.
The above embodiments are merely for illustrating the technical solution of the present disclosure, and are not limiting thereof; although the present disclosure has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the disclosure, and are intended to be included in the scope of the present disclosure.

Claims (10)

1. A method for aggregating joint learning parameters, comprising:
acquiring hidden layer parameters and batch standardized layer parameters uploaded by N participants, wherein the batch standardized layer parameters comprise a mean value, a variance, a minimum batch number, a first telescopic change parameter and a second telescopic change parameter, and N is a positive integer more than or equal to 2;
aggregating hidden layer parameters uploaded by each participant to obtain a first aggregation parameter;
aggregating the batch of standardized layer parameters uploaded by each participant to obtain a second aggregation parameter;
and returning the first aggregation parameter and the second aggregation parameter to each participant so that each participant adjusts and optimizes an algorithm model of the participant according to the first aggregation parameter and the second aggregation parameter.
2. The joint learning parameter aggregation method according to claim 1, wherein the network structures of the algorithm models of the N participants are the same, and the network structures include an input layer, a batch normalization layer, a hidden layer and an output layer;
the aggregation of the hidden layer parameters uploaded by each participant is performed to obtain a first aggregation parameter, which comprises the following steps:
and aggregating hidden layer parameters of the same hidden layer of each participant to obtain a first aggregation parameter, wherein the first aggregation parameter comprises at least one hidden layer aggregation parameter.
3. The method for aggregating jointly learned parameters according to claim 1, wherein aggregating the batch normalization layer parameters uploaded by each of the participants to obtain a second aggregated parameter comprises:
and polymerizing the batch of standardized layer parameters of the same batch of standardized layers of each participant to obtain a second polymerization parameter, wherein the second polymerization parameter comprises at least one batch of standardized layer polymerization parameters.
4. The joint learning parameter aggregation method of claim 3, wherein the batch of normalized layer aggregation parameters comprises a first batch of normalized layer aggregation parameters, a second batch of normalized layer aggregation parameters, and a third batch of normalized layer aggregation parameters;
The step of polymerizing the batch of standardized layer parameters of the same batch of standardized layers of each participant to obtain a second polymerized parameter, including:
the first telescopic change parameters and the second telescopic change parameters of the same batch of standardized layers of each participant are aggregated to obtain first batch of standardized layer aggregation parameters;
polymerizing the average value of the same batch of standardized layers of each participant to obtain a second batch of standardized layer polymerization parameters;
and aggregating the variances of the same batch of standardized layers of each participant to obtain a third batch of standardized layer aggregation parameters.
5. The method for aggregating jointly learned parameters according to claim 4, wherein aggregating the first and second parameters of the same batch of normalized layers for each of the participants to obtain the first batch of normalized layer aggregate parameters includes:
and calculating the average value of the first telescopic change parameters of the same batch of standardized layers of each participant and the average value of the second telescopic change parameters of the same batch of standardized layers of each participant to obtain the aggregation parameters of the first batch of standardized layers.
6. The method for aggregating jointly learned parameters according to claim 4, wherein aggregating the average value of the same batch of normalized layers for each of the participants to obtain a second batch of normalized layer aggregate parameters comprises:
Calculating a first product of the mean value of the same batch of standardized layers of each participant and the minimum batch number of the same batch of standardized layers, and calculating the sum of the first products of the participants;
and calculating the sum of the minimum lot numbers of the N participants, and calculating to obtain a second batch of standardized layer aggregation parameters according to the sum of the first products and the sum of the minimum lot numbers.
7. The method of joint learning parameter aggregation according to claim 4, wherein aggregating the variances of the same batch of normalized layers of each of the participants to obtain a third batch of normalized layer aggregation parameters includes:
calculating the sum of squares of the mean value of the same batch of standardization layers of each participant and the variance of the mean value, calculating the second product of the sum of squares and the minimum batch number of the participants, and counting the sum of the second products of the participants;
and calculating the sum of the minimum lot numbers of the N participants, and calculating a third lot of standardized layer aggregation parameters according to the sum of the second products, the sum of the minimum lot numbers and the second lot of standardized layer aggregation parameters.
8. The method of joint learning parameter aggregation as claimed in claim 2, wherein each of the participants adjusts and optimizes its algorithm model according to the first aggregation parameter and the second aggregation parameter, including:
The participant adjusts hidden layer parameters of hidden layers in the network structure according to the first aggregation parameters;
and the participant adjusts batch standardization layer parameters of batch standardization layers in the network structure according to the second polymerization parameters.
9. A joint learning parameter aggregation apparatus, comprising:
the parameter acquisition module is configured to acquire hidden layer parameters and batch standardized layer parameters uploaded by N participants, wherein the batch standardized layer parameters comprise a mean value, a variance, a minimum batch number, a first telescopic change parameter and a second telescopic change parameter, and N is a positive integer more than or equal to 2;
the first aggregation module is configured to aggregate hidden layer parameters uploaded by each participant to obtain first aggregation parameters;
the second aggregation module is configured to aggregate the batch standardization layer parameters uploaded by each participant to obtain second aggregation parameters;
and a parameter returning module configured to return the first aggregation parameter and the second aggregation parameter to each of the participants, so that each of the participants adjusts and optimizes an algorithm model thereof according to the first aggregation parameter and the second aggregation parameter.
10. A joint learning parameter aggregation system, comprising:
a server comprising the joint learning parameter aggregation apparatus according to claim 9; the method comprises the steps of,
and N participants in communication connection with the server.
CN202111440144.5A 2021-11-29 2021-11-29 Method, device and system for aggregating joint learning parameters Pending CN116226779A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202111440144.5A CN116226779A (en) 2021-11-29 2021-11-29 Method, device and system for aggregating joint learning parameters
PCT/CN2022/119138 WO2023093229A1 (en) 2021-11-29 2022-09-15 Parameter aggregation method for federated learning, apparatus, and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111440144.5A CN116226779A (en) 2021-11-29 2021-11-29 Method, device and system for aggregating joint learning parameters

Publications (1)

Publication Number Publication Date
CN116226779A true CN116226779A (en) 2023-06-06

Family

ID=86538814

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111440144.5A Pending CN116226779A (en) 2021-11-29 2021-11-29 Method, device and system for aggregating joint learning parameters

Country Status (2)

Country Link
CN (1) CN116226779A (en)
WO (1) WO2023093229A1 (en)

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US12033067B2 (en) * 2018-10-31 2024-07-09 Google Llc Quantizing neural networks with batch normalization
CN110632572B (en) * 2019-09-30 2022-03-29 中国人民解放军战略支援部队信息工程大学 Radar radiation source individual identification method and device based on unintentional phase modulation characteristics
CN110766138A (en) * 2019-10-21 2020-02-07 中国科学院自动化研究所 Method and system for constructing self-adaptive neural network model based on brain development mechanism
CN110765704B (en) * 2019-11-28 2024-02-06 北京工业大学 Novel deep neural network automatic modeling method applied to microwave device
CN113092044B (en) * 2021-03-31 2022-03-18 东南大学 Rotary machine fault diagnosis method based on weighted level visible graph
CN113469050B (en) * 2021-07-01 2024-06-14 安徽大学 Flame detection method based on image fine classification

Also Published As

Publication number Publication date
WO2023093229A1 (en) 2023-06-01

Similar Documents

Publication Publication Date Title
CN111030861B (en) Edge calculation distributed model training method, terminal and network side equipment
CN114116705B (en) Method and device for determining contribution value of participants in joint learning
CN110795768A (en) Model learning method, device and system based on private data protection
CN114116707A (en) Method and device for determining contribution degree of participants in joint learning
CN113988310A (en) Deep learning model selection method and device, computer equipment and medium
CN113435534A (en) Data heterogeneous processing method and device based on similarity measurement, computer equipment and computer readable storage medium
CN112818588A (en) Optimal power flow calculation method and device for power system and storage medium
CN117313832A (en) Combined learning model training method, device and system based on bidirectional knowledge distillation
CN116226779A (en) Method, device and system for aggregating joint learning parameters
CN116800753A (en) Mobile model communication resource allocation method, device, terminal and storage medium
CN114553869B (en) Method and device for determining resource contribution degree based on joint learning and electronic equipment
CN113887746A (en) Method and device for reducing communication pressure based on joint learning
CN116069767A (en) Equipment data cleaning method and device, computer equipment and medium
CN116340959A (en) Breakpoint privacy protection-oriented method, device, equipment and medium
CN115564055A (en) Asynchronous joint learning training method and device, computer equipment and storage medium
CN116384461A (en) Model optimization training method and device based on joint learning
CN114116740B (en) Method and device for determining contribution degree of participants in joint learning
CN116402366A (en) Data contribution evaluation method and device based on joint learning
CN114154714A (en) Time series data prediction method, time series data prediction device, computer equipment and medium
CN115222057A (en) Federal learning gradient attack defense method, system, equipment and medium
CN114897185A (en) Joint learning training method and device based on category heterogeneous data
CN116384500A (en) Joint learning training method and device, electronic equipment and storage medium
CN113887745A (en) Data heterogeneous joint learning method and device
CN117521781B (en) Differential privacy federal dynamic aggregation method and system based on important gradient protection
CN116502513A (en) Regulation and control method, device and equipment for establishing data contributors based on joint learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination