CN113516151A

CN113516151A - Federal learning method

Info

Publication number: CN113516151A
Application number: CN202110369242.8A
Authority: CN
Inventors: 李斌; 刘宏福; 赵成林; 许方敏
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2021-04-06
Filing date: 2021-04-06
Publication date: 2021-10-19

Abstract

The disclosure provides a federated learning method, wherein the method is implemented by a central server and at least one edge node, and the method comprises the following steps: the central server reduces the dimension of the weight matrix of each layer of the neural network to obtain a dimension-reduced weight matrix parameter set, and sends the dimension-reduced weight matrix parameter set to the edge node; training the edge node based on the reduced-dimension weight matrix parameter set by using a local sample to obtain an edge characterization matrix corresponding to the edge node, and sending the edge characterization matrix to a central server; and the central server obtains an updated dimension-reduced weight matrix parameter set based on all the edge characterization matrixes and sends the updated dimension-reduced weight matrix parameter set to the edge nodes. The method and the device have the advantages that the data privacy of the edge nodes is guaranteed, the requirements on the computing capacity and the storage capacity of the edge nodes are reduced, and the data transmission speed between the central server and the edge nodes is increased.

Description

Federal learning method

Technical Field

The disclosure relates to the technical field of machine learning, in particular to a federated learning method.

Background

With the continuous development of machine learning, particularly deep learning and deep neural network research, the related applications are popularized, and breakthroughs and achievements are made in the fields of social life, economic development, scientific research and the like. In addition to the traditional model-based driving method or the prior knowledge-based method, the deep learning can automatically extract useful features in the process of describing complex real world. For next-generation information communication systems, such as the emerging 6G communication and smart manufacturing industries, deep learning will hold an increasingly important position in terms of technical innovation.

The Internet of things is one of applications of rapid development, deep learning integrates an internal coordination structure by solving an intermediate state which is difficult to capture in the Internet of things, makes an accurate and timely decision, and can efficiently improve the efficiency and the product quality.

However, when the machine learning algorithm is applied to the field of the internet of things in the related art, due to the limitation of the edge device, the model training is usually performed only at the central server, which obviously cannot protect the data privacy of the edge device in the internet of things and has higher transmission cost.

Disclosure of Invention

In view of this, the present disclosure is directed to a federated learning method.

In view of the above, the present disclosure provides a federated learning method, wherein the method is implemented by a central server and at least one edge node, the method comprising:

the central server reduces the dimension of the weight matrix of each layer of the neural network to obtain a reduced-dimension weight matrix parameter set, and sends the reduced-dimension weight matrix parameter set to the edge nodes;

the edge node is trained by using a local sample based on the dimensionality reduction weight matrix parameter set to obtain an edge characterization matrix corresponding to the edge node, and the edge characterization matrix is sent to the central server;

and the central server obtains an updated dimension-reduced weight matrix parameter set based on all the edge characterization matrixes and sends the updated dimension-reduced weight matrix parameter set to the edge nodes.

As can be seen from the above, the present disclosure provides a federated learning method, wherein the method is implemented by a central server and at least one edge node, the method comprising: the central server reduces the dimension of the weight matrix of each layer of the neural network to obtain a dimension-reduced weight matrix parameter set, and sends the dimension-reduced weight matrix parameter set to the edge node; training the edge node based on the reduced-dimension weight matrix parameter set by using a local sample to obtain an edge characterization matrix corresponding to the edge node, and sending the edge characterization matrix to a central server; and the central server obtains an updated dimension-reduced weight matrix parameter set based on all the edge characterization matrixes and sends the updated dimension-reduced weight matrix parameter set to the edge nodes. The method and the device have the advantages that the data privacy of the edge nodes is guaranteed, the requirements on the computing capacity and the storage capacity of the edge nodes are reduced, and the data transmission speed between the central server and the edge nodes is increased.

Drawings

In order to more clearly illustrate the technical solutions in the present disclosure or related technologies, the drawings needed to be used in the description of the embodiments or related technologies are briefly introduced below, and it is obvious that the drawings in the following description are only embodiments of the present disclosure, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

Fig. 1 is a schematic view of an application scenario of the federal learning method provided in the embodiment of the present disclosure;

fig. 2 is a schematic flow chart of a federal learning method provided in an embodiment of the present disclosure;

fig. 3 is a schematic flowchart of a method for reducing the dimension of the weight matrix according to an embodiment of the disclosure;

fig. 4 is a schematic flowchart of a method for generating an edge characterization matrix according to an embodiment of the present disclosure;

FIG. 5 is a schematic diagram of a weight matrix structure provided in the embodiments of the present disclosure;

fig. 6 is a schematic diagram of a parallel training method for an edge characterization matrix according to an embodiment of the present disclosure.

Detailed Description

For the purpose of promoting a better understanding of the objects, aspects and advantages of the present disclosure, reference is made to the following detailed description taken in conjunction with the accompanying drawings.

It is to be noted that technical terms or scientific terms used in the embodiments of the present disclosure should have a general meaning as understood by those having ordinary skill in the art to which the present disclosure belongs, unless otherwise defined. The use of "first," "second," and similar terms in the embodiments of the disclosure is not intended to indicate any order, quantity, or importance, but rather to distinguish one element from another. The word "comprising" or "comprises", and the like, means that the element or item listed before the word covers the element or item listed after the word and its equivalents, but does not exclude other elements or items. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", and the like are used merely to indicate relative positional relationships, and when the absolute position of the object being described is changed, the relative positional relationships may also be changed accordingly.

When the machine learning algorithm is applied to the field of the internet of things in the related art, due to the limitation of edge-end equipment, model training can be performed only at a central server, which obviously cannot protect the data privacy of the edge-end equipment in the internet of things and has higher transmission cost.

In particular, on one hand, the personal data in the edge device needs to be transmitted to the central server to train the machine learning model in the related art, which obviously cannot guarantee the privacy of the personal data of the user and the risk of leakage to third parties. On the other hand, when the trained machine learning model is transmitted from the central server to the edge device, the data magnitude of the machine learning model obtained by training of the central server is huge, the transmission delay is high, and the edge device is required to have high storage capacity.

The federal learning mechanism provides a part of ideas for solving the problems, and replaces the traditional method that a large amount of original data are transmitted to a central processor through a network, federal learning directly conducts network training on local data distributed in various places at an edge end, and then the trained network weight is sent to the central processor, so that privacy and safety of local data information can be well protected, meanwhile, the federal learning mechanism is helpful for reducing communication cost and improving communication reliability.

However, when the federal learning method is applied to the field of internet of things, there is a problem in how to perform local model training by local data at the edge. In the internet of things, the storage capacity and the computing capacity of the edge-end device are very weak, especially far from the central server, but the pre-training process of the network needs to train a complete network model containing a large number of weights, and the required memory and computing resources cannot be borne by the edge-end device with limited memory resources on a low-power chip.

In some related technologies, a training task is distributed to a plurality of edge end nodes, one node only trains one or a plurality of layers of network weights, the method forcedly splits the connection between the neural network layers, the model obtained by training is inaccurate, and meanwhile, the time and space complexity required by network training reasoning is not changed, so that the communication and application cost is greatly increased.

Therefore, it is a difficult problem how to ensure the data privacy of the edge node, reduce the demands for computation and storage of the edge node, and improve the data transmission speed of the central server and the edge node.

Referring to fig. 1, it is a schematic view of an application scenario of the federal learning method provided in an embodiment of the present disclosure. The application scenario includes a central server and a plurality of edge nodes. The central server and the edge nodes are connected through a wired or wireless communication network. Edge nodes include, but are not limited to, desktop computers, mobile phones, mobile computers, tablets, media players, smart wearable devices, Personal Digital Assistants (PDAs), or other electronic devices capable of performing the above-described functions. The central server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as cloud service, a cloud database, cloud computing, a cloud function, cloud storage, network service, cloud communication, middleware service, domain name service, security service, CDN, big data and artificial intelligence platform.

The central server is used for providing the total model for the edge nodes, the model comprises a weight matrix, the edge nodes use the local sample to train the weight matrix to obtain a trained model, the trained model is provided for the central server, and the central server integrates the models to update the total model and provide the updated total model for the edge nodes. The edge nodes store local samples, and the central server does not acquire sample bodies.

The federal learning method according to an exemplary embodiment of the present application is described below in conjunction with the application scenario of fig. 1. It should be noted that the above application scenarios are only presented to facilitate understanding of the spirit and principles of the present application, and the embodiments of the present application are not limited in this respect. Rather, embodiments of the present application may be applied to any scenario where applicable.

Referring to fig. 2, a schematic flow chart of a federal learning method provided in an embodiment of the present disclosure is shown; a federated learning method, wherein the method is implemented by a central server and at least one edge node, the method comprising:

s210, the central server reduces the dimension of the weight matrix of each layer of the neural network to obtain a dimension-reduced weight matrix parameter set, and sends the dimension-reduced weight matrix parameter set to the edge nodes.

The neural network comprises a plurality of layers, the method reduces the dimension of the weight matrix of each layer of neural network, reduces the dimension of the high-dimensional weight matrix into the low-dimensional weight matrix, and then sends the parameters of the low-dimensional weight matrices of all the layers of neural networks together as a dimension-reduced weight matrix parameter set to each edge node.

For each edge node, the set of reduced-dimension weight matrix parameters obtained from the central server is the same as the other edge nodes.

Referring to fig. 3, it is a schematic flow chart of a method for reducing the dimension of the weight matrix according to an embodiment of the present disclosure; in some embodiments, S210 specifically includes:

weight matrix for each layer of the neural network:

s310, the central server initializes the weight matrix to obtain an initialized weight matrix.

For example, the weight matrix of the l layer of the neural network is

Wherein M is_l,N_lWeight matrix W representing the layers of neural network l, respectively_l,gAn input dimension and an output dimension. Weight matrix W_l,gIs a high dimensional matrix.

Random initialization of weight matrix W by Xavier_l,gObtaining an initialization weight matrix

Wherein the weight matrix is initialized

Obeys the following uniform distribution:

wherein, i and j are all initialized weight matrix

Of (1).

S320, the central server carries out singular value decomposition on the initialized weight matrix to obtain an approximate weight matrix.

To the initialized weight matrix

Singular value decomposition is carried out to obtain an approximate weight matrix

Wherein, W_kIs an approximate weight matrix

The optimal solution of (a), having a rank of k,

in order to initialize the weight matrix, the weight matrix is initialized,

to approximate a weight matrix, U_kIs k columns of left singular vectors, S_kA matrix of singular values of k x k, V_kAre k columns of right singular vectors.

S330, the central server samples the approximate weight matrix to obtain a row sampling matrix and a column sampling matrix.

Determining the number of sampling lines s_l,r(s_l,r＜＜M_l) And the number of sampling columns s_l,c(s_l,c＜＜N_l) And for the approximate weight matrix

Adopting probability sampling to obtain a row sequence set and a column sequence set:

|S_l,r|＝s_l,r，|S_l,c|＝s_l,c；

further generating a row sampling matrix

Sum column sampling matrix

S340, the central server samples the approximate weight matrix by using the row sampling matrix and the column sampling matrix to obtain a global characterization matrix.

By obtaining a matrix S of row samples_l,rAnd column sampling matrix S_l,cTo approximate weight matrix

Sampling to obtain a global characterization matrix

And

global characterization matrix

And

is a low dimensional matrix.

In some embodiments, the central server integrates the global characterization matrices of all layers to obtain a reduced-dimension set of weight matrix parameters, including:

for the weight matrix of each layer of the neural network, the central server adds the row sampling matrix and the column sampling matrix into the reduced-dimension weight matrix parameter set as indexes of the global characterization matrix.

Wherein the reduced-dimension weight matrix parameter set

Wherein the content of the first and second substances,

set of weight matrix parameters for dimensionality reduction, S_l,rIs a line sampling matrix, S_l,cIn the form of a column-sampling matrix,

and

and L is the total number of layers of the neural network, and L is the L layer of the neural network.

The number of weights that the present disclosure needs to transmit is

The number of network weights that the fully-connected network in the related art needs to transmit

It can be seen that the number of the weights to be sent is far less than that of the weights to be sent in the full-connection network, so that the data transmission speed of the central server and the edge nodes is improved, the data transmission cost is reduced, and the reliability of data transmission can be improved. On the other hand, the storage pressure of the edge node is also reduced.

S220, training the edge nodes by using local samples based on the reduced-dimension weight matrix parameter set to obtain edge characterization matrixes corresponding to the edge nodes, and sending the edge characterization matrixes to a central server.

All the edge nodes are trained simultaneously and parallelly according to the local data sets, and the training of each edge node does not interfere with each other.

Referring to fig. 4, fig. 4 is a schematic flowchart of a method for generating an edge characterization matrix according to an embodiment of the present disclosure; in some embodiments, S220 specifically includes:

and S410, calculating by the edge node according to the reduced-dimension weight matrix parameter set to obtain a core representation matrix.

And calculating to obtain a core representation matrix according to the global representation matrix, the row sampling matrix and the column sampling matrix in the dimension-reduced weight matrix parameter set.

Take edge node b as an example:

wherein, the core characterization matrix is obtained by calculation, and the method comprises the following steps:

wherein the content of the first and second substances,

characterizing the matrix for the core, S_l,rIs a line sampling matrix, S_l,cIn the form of a column-sampling matrix,

and

is a global characterization matrix, b is an edge node, n is a local update period,

is a pseudo-inverse function of the matrix.

And S420, performing forward reasoning calculation on the edge node based on the local sample, the global characterization matrix and the core characterization matrix to obtain local output, performing error back propagation calculation to obtain the gradient of the edge characterization matrix, and obtaining the edge characterization matrix by using the gradient of the edge characterization matrix.

Wherein performing forward inference calculations to obtain a local output comprises: the edge node multiplies the local sample by the global characterization matrix and the core characterization matrix, and calculates to obtain a local output:

wherein the content of the first and second substances,

for local output, f is the activation function,

in the form of a local sample, the local sample,

and

in order to globally characterize the matrix(s),

the matrix is characterized for the core and,

is a bias vector.

Local sample

In turn with the core characterization matrix

Global characterization matrix

And

multiplication, plus offset vector

Then obtaining local output after the action of an activation function f

For computational complexity convenience, assume s_l,c＝s_l,r＝s_lBy controlling the order of the matrix multiplication, the complexity of the forward inference computation performed by the present disclosure is

The complexity of the forward reasoning computation in the related art neural network structure is

It can be seen that the complexity of the forward inference calculation performed by the present disclosure is significantly reduced compared to the related art, and thus, the operation pressure of the edge node is reduced.

Referring to fig. 5, which is a schematic diagram of a weight matrix structure provided in an embodiment of the present disclosure, it can be seen that, according to the present disclosure, a fully-connected layer weight matrix with a complex structure is reduced to a weight matrix with a relatively simple structure provided in the present disclosure, the reduced weight matrix provided in the present disclosure has a much lower complexity compared to a conventional fully-connected matrix, and the required storage and computation requirements are much lower, so that the hardware environment of an edge node can be adapted.

Wherein the performing an error back propagation calculation to obtain a gradient of the edge characterization matrix comprises:

let l +1 layer error matrix be xi_l+1 ^n-1The error matrix propagating backwards to the l-layer activation function is xi'_l+1 ^n-1Then the gradient of the edge characterization matrix is:

wherein the content of the first and second substances,

and

characterizing the gradient of the matrix for the edge, X_l,bIs a local sample, xi'_l+1 ^n-1For the error matrix to propagate back to the l-layer activation function,

and

the matrix is characterized for the edge of n-1 period, n is the local update period,

characterizing the matrix for the core, S_l,rIs a line sampling matrix, S_l,cIs a column sample matrix and T is the transpose of the matrix.

Without loss of generality i.e. s_l,c≠s_l,rBecause a complex pseudo-inverse derivation formula is involved, the derivation can be directly carried out by utilizing python to calculate the edge characterization matrix

Of the gradient of (c).

For computational convenience, assume s_l,c＝s_l,r＝s_lThe complexity of the update calculation is trained in this disclosure

The complexity of the training update calculation in the related art neural network structure is

It can be seen that the complexity of the training update calculation of the present disclosure is less than that of the related art, and thus, the operation pressure of the edge node is reduced.

The method for obtaining the edge characterization matrix by using the gradient of the edge characterization matrix comprises the following steps:

wherein the content of the first and second substances,

and

in order to characterize the matrix for the edges,

and

characterizing the matrix for the edge of n-1 period, n is the local update period, eta is the update step length of the gradient,

and

the gradient of the matrix is characterized for the edges.

The updating of the edge characterization matrix is carried out according to a random gradient descent method.

In some embodiments, all edge nodes may simultaneously calculate and train the characterization matrix through two sensor interactions, as shown in fig. 6, which may further accelerate the training speed of the neural network weight matrix at the edge nodes.

After all local networks complete the set local training period, the updated characterization matrix of the local network will be used

Is sent to a central server, where t_lFor a set local training period, B is the number of all edge nodes.

S230, the central server obtains an updated dimension-reduced weight matrix parameter set based on all the edge characterization matrixes, and sends the updated dimension-reduced weight matrix parameter set to the edge nodes.

The central server integrates all the edge characterization matrices, and obtains an updated global characterization matrix for each layer, including:

wherein D is the total training data set sample number, D_bThe number of training data set samples of the edge node b, and t is the current execution period of the central server.

After the central server finishes the global representation matrix updating of all the layers of neural networks, the updated dimension-reduced weight matrix parameter set is obtained by integration

And the updated dimension-reduced weight matrix parameter set is broadcast

Sending to each edge node:

in some embodiments, further comprising: and the edge node performs next training based on the updated dimension-reduced weight matrix parameter set, or determines a final model based on the next training.

For each edge node, obtaining an updated edge characterization matrix according to the updated dimension-reduced weight matrix parameter set, including:

wherein the content of the first and second substances,

a set of updated reduced-dimension weight matrix parameters;

the matrix is characterized for the updated edges.

The method comprises at least one round of iterative updating of the neural network weight matrix, and steps S220 and S230 are carried out in a cyclic reciprocating mode until a set training termination condition is reached.

It can be seen that, no matter the present disclosure is a local forward reasoning or a training update of the network weight, the computation complexity of the neural network under the framework of the present disclosure is much smaller than that of the neural network under the traditional structure. Meanwhile, the network layer structure provided by the method has the advantage of larger storage space, and the space complexity of the method is only

Far superior to that in the conventional network structure

The network structure characteristics provided by the disclosure also enable transmission between all edge terminals and the central server terminal

Full connection matrix for replacing transmission

Amount of data transferred of the present disclosure

Far lower volume of transmitted data than in fully connected networks

The burden of network transmission data is greatly reduced and the communication reliability is improved.

The method directly generates a new federated learning network structure in a random matrix sampling mode without network pre-training. The method disclosed by the invention has the advantages that under the condition of not reducing the generalization performance of the neural network, the method comprises the following steps: the communication traffic of the edge terminal and the central server terminal is reduced, the communication reliability is improved, and the requirement on the whole communication system network is reduced; from the edge node level: the privacy of local data is protected, the training speed of a local model and the model convergence time are accelerated, and the power consumption and the internal storage space required by local hardware are reduced. The method disclosed by the invention overcomes the application technology bottlenecks of overhigh network computing load, long training time, high requirements on local hardware and network throughput and safety of the whole system and the like of the traditional federal learning, so that the equipment with low time delay, low power consumption and low computing load can be applied under the federal learning framework, and the federal learning is further promoted to be better applied under a plurality of emerging edge computing scenes such as the internet of things, 6G and the like.

It should be noted that the method of the embodiments of the present disclosure may be executed by a single device, such as a computer or a server. The method of the embodiment can also be applied to a distributed scene and completed by the mutual cooperation of a plurality of devices. In such a distributed scenario, one of the devices may only perform one or more steps of the method of the embodiments of the present disclosure, and the devices may interact with each other to complete the method.

It should be noted that the above describes some embodiments of the disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments described above and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

It should be noted that the embodiments of the present disclosure can be further described in the following ways:

a method of federated learning, wherein the method is implemented by a central server and at least one edge node, the method comprising:

Optionally, the reducing the dimension of the weight matrix of each layer of the neural network by the central server to obtain a reduced-dimension weight matrix parameter set includes:

for the weight matrix of each layer of the neural network,

the central server initializes the weight matrix to obtain an initialized weight matrix,

the central server carries out singular value decomposition on the initialized weight matrix to obtain an approximate weight matrix,

the central server samples the approximate weight matrix to obtain a row sampling matrix and a column sampling matrix,

the central server samples the approximate weight matrix by using the row sampling matrix and the column sampling matrix to obtain a global characterization matrix;

and the central server integrates the global characterization matrixes of all the layers to obtain the reduced-dimension weight matrix parameter set.

Optionally, the integrating, by the central server, the global characterization matrices of all layers to obtain the reduced-dimension weight matrix parameter set includes:

and for the weight matrix of each layer of the neural network, the central server adds the row sampling matrix and the column sampling matrix into the reduced-dimension weight matrix parameter set as indexes of the global characterization matrix.

Optionally, the training, by the edge node, based on the dimensionality-reduced weight matrix parameter set, using a local sample to obtain an edge characterization matrix corresponding to the edge node includes:

the edge node calculates to obtain a core representation matrix according to the reduced-dimension weight matrix parameter set;

and the edge node performs forward reasoning calculation to obtain local output based on the local sample, the global characterization matrix and the core characterization matrix, performs error back propagation calculation to obtain the gradient of the edge characterization matrix, and obtains the edge characterization matrix by using the gradient of the edge characterization matrix.

Optionally, the calculating, by the edge node, a core characterization matrix according to the dimensionality-reduced weight matrix parameter set includes:

wherein the content of the first and second substances,

and

in order to globally characterize the matrix(s),

is a pseudo-inverse function of the matrix.

Optionally, wherein the performing forward inference computation to obtain a local output includes:

the edge node multiplies the local sample by a global characterization matrix and a core characterization matrix, and calculates to obtain the local output:

wherein the content of the first and second substances,

for local output, f is the activation function,

in the form of a local sample, the local sample,

and

in order to globally characterize the matrix(s),

the matrix is characterized for the core and,

is a bias vector.

Optionally, wherein the performing an error back propagation calculation to obtain a gradient of the edge characterization matrix includes:

wherein the content of the first and second substances,

and

and

Optionally, the obtaining the edge characterization matrix by using the gradient of the edge characterization matrix includes:

wherein the content of the first and second substances,

and

in order to characterize the matrix for the edges,

and

and

the gradient of the matrix is characterized for the edges.

Optionally, the obtaining, by the central server, an updated dimension-reduced weight matrix parameter set based on all the edge characterization matrices includes:

wherein the content of the first and second substances,

and

for the updated reduced-dimension set of weight matrix parameters,

and

an edge characterization matrix for edge node b, D being the total training data set sample number, D_bThe number of training data set samples of the edge node B, the total number of the edge nodes B, and the current execution period of the central server t.

Optionally, the method further includes: and the edge node performs next training based on the updated dimension-reduced weight matrix parameter set, or determines a final model based on the next training.

Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the idea of the present disclosure, also technical features in the above embodiments or in different embodiments may be combined, steps may be implemented in any order, and there are many other variations of the different aspects of the embodiments of the present disclosure as described above, which are not provided in detail for the sake of brevity.

In addition, well-known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown in the provided figures for simplicity of illustration and discussion, and so as not to obscure the embodiments of the disclosure. Furthermore, devices may be shown in block diagram form in order to avoid obscuring embodiments of the present disclosure, and this also takes into account the fact that specifics with respect to implementation of such block diagram devices are highly dependent upon the platform within which the embodiments of the present disclosure are to be implemented (i.e., specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the disclosure, it should be apparent to one skilled in the art that the embodiments of the disclosure can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative instead of restrictive.

While the present disclosure has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of these embodiments will be apparent to those of ordinary skill in the art in light of the foregoing description. For example, other memory architectures (e.g., dynamic ram (dram)) may use the discussed embodiments.

The disclosed embodiments are intended to embrace all such alternatives, modifications and variances which fall within the broad scope of the appended claims. Therefore, any omissions, modifications, equivalents, improvements, and the like that may be made within the spirit and principles of the embodiments of the disclosure are intended to be included within the scope of the disclosure.

Claims

1. A method of federated learning, wherein the method is implemented by a central server and at least one edge node, the method comprising:

2. The method of claim 1, wherein the central server dimensionality-reduces the weight matrix for each layer of the neural network to obtain a dimensionality-reduced set of weight matrix parameters, comprising:

for the weight matrix of each layer of the neural network,

3. The method of claim 2, wherein the central server integrates the global characterization matrices of all layers to obtain the reduced-dimension set of weight matrix parameters, comprising:

4. The method of claim 3, wherein the training of the edge node by using a local sample based on the reduced-dimension weight matrix parameter set to obtain an edge characterization matrix corresponding to the edge node comprises:

5. The method of claim 4, wherein the calculating, by the edge node, a core characterization matrix according to the reduced-dimension weight matrix parameter set includes:

wherein the content of the first and second substances,

and

in order to globally characterize the matrix(s),

is a pseudo-inverse function of the matrix.

6. The method of claim 4, wherein said performing forward inference calculations to derive local outputs comprises:

wherein the content of the first and second substances,

for local output, f is the activation function,

in the form of a local sample, the local sample,

and

in order to globally characterize the matrix(s),

the matrix is characterized for the core and,

is a bias vector.

7. The method of claim 4, wherein the performing an error back propagation calculation to derive a gradient of the edge characterization matrix comprises:

wherein the content of the first and second substances,

and

and

8. The method of claim 4, wherein the deriving the edge characterization matrix using the gradient of the edge characterization matrix comprises:

wherein the content of the first and second substances,

and

in order to characterize the matrix for the edges,

and

and

the gradient of the matrix is characterized for the edges.

9. The method of claim 1, wherein the central server derives an updated reduced-dimension set of weight matrix parameters based on all of the edge characterization matrices, comprising:

wherein the content of the first and second substances,

and

for the updated reduced-dimension set of weight matrix parameters,

and

10. The method of claim 1, further comprising: and the edge node performs next training based on the updated dimension-reduced weight matrix parameter set, or determines a final model based on the next training.