CN113516151A - Federal learning method - Google Patents

Federal learning method Download PDF

Info

Publication number
CN113516151A
CN113516151A CN202110369242.8A CN202110369242A CN113516151A CN 113516151 A CN113516151 A CN 113516151A CN 202110369242 A CN202110369242 A CN 202110369242A CN 113516151 A CN113516151 A CN 113516151A
Authority
CN
China
Prior art keywords
matrix
edge
weight matrix
characterization
central server
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110369242.8A
Other languages
Chinese (zh)
Inventor
李斌
刘宏福
赵成林
许方敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN202110369242.8A priority Critical patent/CN113516151A/en
Publication of CN113516151A publication Critical patent/CN113516151A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/12Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Complex Calculations (AREA)

Abstract

The disclosure provides a federated learning method, wherein the method is implemented by a central server and at least one edge node, and the method comprises the following steps: the central server reduces the dimension of the weight matrix of each layer of the neural network to obtain a dimension-reduced weight matrix parameter set, and sends the dimension-reduced weight matrix parameter set to the edge node; training the edge node based on the reduced-dimension weight matrix parameter set by using a local sample to obtain an edge characterization matrix corresponding to the edge node, and sending the edge characterization matrix to a central server; and the central server obtains an updated dimension-reduced weight matrix parameter set based on all the edge characterization matrixes and sends the updated dimension-reduced weight matrix parameter set to the edge nodes. The method and the device have the advantages that the data privacy of the edge nodes is guaranteed, the requirements on the computing capacity and the storage capacity of the edge nodes are reduced, and the data transmission speed between the central server and the edge nodes is increased.

Description

Federal learning method
Technical Field
The disclosure relates to the technical field of machine learning, in particular to a federated learning method.
Background
With the continuous development of machine learning, particularly deep learning and deep neural network research, the related applications are popularized, and breakthroughs and achievements are made in the fields of social life, economic development, scientific research and the like. In addition to the traditional model-based driving method or the prior knowledge-based method, the deep learning can automatically extract useful features in the process of describing complex real world. For next-generation information communication systems, such as the emerging 6G communication and smart manufacturing industries, deep learning will hold an increasingly important position in terms of technical innovation.
The Internet of things is one of applications of rapid development, deep learning integrates an internal coordination structure by solving an intermediate state which is difficult to capture in the Internet of things, makes an accurate and timely decision, and can efficiently improve the efficiency and the product quality.
However, when the machine learning algorithm is applied to the field of the internet of things in the related art, due to the limitation of the edge device, the model training is usually performed only at the central server, which obviously cannot protect the data privacy of the edge device in the internet of things and has higher transmission cost.
Disclosure of Invention
In view of this, the present disclosure is directed to a federated learning method.
In view of the above, the present disclosure provides a federated learning method, wherein the method is implemented by a central server and at least one edge node, the method comprising:
the central server reduces the dimension of the weight matrix of each layer of the neural network to obtain a reduced-dimension weight matrix parameter set, and sends the reduced-dimension weight matrix parameter set to the edge nodes;
the edge node is trained by using a local sample based on the dimensionality reduction weight matrix parameter set to obtain an edge characterization matrix corresponding to the edge node, and the edge characterization matrix is sent to the central server;
and the central server obtains an updated dimension-reduced weight matrix parameter set based on all the edge characterization matrixes and sends the updated dimension-reduced weight matrix parameter set to the edge nodes.
As can be seen from the above, the present disclosure provides a federated learning method, wherein the method is implemented by a central server and at least one edge node, the method comprising: the central server reduces the dimension of the weight matrix of each layer of the neural network to obtain a dimension-reduced weight matrix parameter set, and sends the dimension-reduced weight matrix parameter set to the edge node; training the edge node based on the reduced-dimension weight matrix parameter set by using a local sample to obtain an edge characterization matrix corresponding to the edge node, and sending the edge characterization matrix to a central server; and the central server obtains an updated dimension-reduced weight matrix parameter set based on all the edge characterization matrixes and sends the updated dimension-reduced weight matrix parameter set to the edge nodes. The method and the device have the advantages that the data privacy of the edge nodes is guaranteed, the requirements on the computing capacity and the storage capacity of the edge nodes are reduced, and the data transmission speed between the central server and the edge nodes is increased.
Drawings
In order to more clearly illustrate the technical solutions in the present disclosure or related technologies, the drawings needed to be used in the description of the embodiments or related technologies are briefly introduced below, and it is obvious that the drawings in the following description are only embodiments of the present disclosure, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a schematic view of an application scenario of the federal learning method provided in the embodiment of the present disclosure;
fig. 2 is a schematic flow chart of a federal learning method provided in an embodiment of the present disclosure;
fig. 3 is a schematic flowchart of a method for reducing the dimension of the weight matrix according to an embodiment of the disclosure;
fig. 4 is a schematic flowchart of a method for generating an edge characterization matrix according to an embodiment of the present disclosure;
FIG. 5 is a schematic diagram of a weight matrix structure provided in the embodiments of the present disclosure;
fig. 6 is a schematic diagram of a parallel training method for an edge characterization matrix according to an embodiment of the present disclosure.
Detailed Description
For the purpose of promoting a better understanding of the objects, aspects and advantages of the present disclosure, reference is made to the following detailed description taken in conjunction with the accompanying drawings.
It is to be noted that technical terms or scientific terms used in the embodiments of the present disclosure should have a general meaning as understood by those having ordinary skill in the art to which the present disclosure belongs, unless otherwise defined. The use of "first," "second," and similar terms in the embodiments of the disclosure is not intended to indicate any order, quantity, or importance, but rather to distinguish one element from another. The word "comprising" or "comprises", and the like, means that the element or item listed before the word covers the element or item listed after the word and its equivalents, but does not exclude other elements or items. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", and the like are used merely to indicate relative positional relationships, and when the absolute position of the object being described is changed, the relative positional relationships may also be changed accordingly.
When the machine learning algorithm is applied to the field of the internet of things in the related art, due to the limitation of edge-end equipment, model training can be performed only at a central server, which obviously cannot protect the data privacy of the edge-end equipment in the internet of things and has higher transmission cost.
In particular, on one hand, the personal data in the edge device needs to be transmitted to the central server to train the machine learning model in the related art, which obviously cannot guarantee the privacy of the personal data of the user and the risk of leakage to third parties. On the other hand, when the trained machine learning model is transmitted from the central server to the edge device, the data magnitude of the machine learning model obtained by training of the central server is huge, the transmission delay is high, and the edge device is required to have high storage capacity.
The federal learning mechanism provides a part of ideas for solving the problems, and replaces the traditional method that a large amount of original data are transmitted to a central processor through a network, federal learning directly conducts network training on local data distributed in various places at an edge end, and then the trained network weight is sent to the central processor, so that privacy and safety of local data information can be well protected, meanwhile, the federal learning mechanism is helpful for reducing communication cost and improving communication reliability.
However, when the federal learning method is applied to the field of internet of things, there is a problem in how to perform local model training by local data at the edge. In the internet of things, the storage capacity and the computing capacity of the edge-end device are very weak, especially far from the central server, but the pre-training process of the network needs to train a complete network model containing a large number of weights, and the required memory and computing resources cannot be borne by the edge-end device with limited memory resources on a low-power chip.
In some related technologies, a training task is distributed to a plurality of edge end nodes, one node only trains one or a plurality of layers of network weights, the method forcedly splits the connection between the neural network layers, the model obtained by training is inaccurate, and meanwhile, the time and space complexity required by network training reasoning is not changed, so that the communication and application cost is greatly increased.
Therefore, it is a difficult problem how to ensure the data privacy of the edge node, reduce the demands for computation and storage of the edge node, and improve the data transmission speed of the central server and the edge node.
Referring to fig. 1, it is a schematic view of an application scenario of the federal learning method provided in an embodiment of the present disclosure. The application scenario includes a central server and a plurality of edge nodes. The central server and the edge nodes are connected through a wired or wireless communication network. Edge nodes include, but are not limited to, desktop computers, mobile phones, mobile computers, tablets, media players, smart wearable devices, Personal Digital Assistants (PDAs), or other electronic devices capable of performing the above-described functions. The central server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as cloud service, a cloud database, cloud computing, a cloud function, cloud storage, network service, cloud communication, middleware service, domain name service, security service, CDN, big data and artificial intelligence platform.
The central server is used for providing the total model for the edge nodes, the model comprises a weight matrix, the edge nodes use the local sample to train the weight matrix to obtain a trained model, the trained model is provided for the central server, and the central server integrates the models to update the total model and provide the updated total model for the edge nodes. The edge nodes store local samples, and the central server does not acquire sample bodies.
The federal learning method according to an exemplary embodiment of the present application is described below in conjunction with the application scenario of fig. 1. It should be noted that the above application scenarios are only presented to facilitate understanding of the spirit and principles of the present application, and the embodiments of the present application are not limited in this respect. Rather, embodiments of the present application may be applied to any scenario where applicable.
Referring to fig. 2, a schematic flow chart of a federal learning method provided in an embodiment of the present disclosure is shown; a federated learning method, wherein the method is implemented by a central server and at least one edge node, the method comprising:
s210, the central server reduces the dimension of the weight matrix of each layer of the neural network to obtain a dimension-reduced weight matrix parameter set, and sends the dimension-reduced weight matrix parameter set to the edge nodes.
The neural network comprises a plurality of layers, the method reduces the dimension of the weight matrix of each layer of neural network, reduces the dimension of the high-dimensional weight matrix into the low-dimensional weight matrix, and then sends the parameters of the low-dimensional weight matrices of all the layers of neural networks together as a dimension-reduced weight matrix parameter set to each edge node.
For each edge node, the set of reduced-dimension weight matrix parameters obtained from the central server is the same as the other edge nodes.
Referring to fig. 3, it is a schematic flow chart of a method for reducing the dimension of the weight matrix according to an embodiment of the present disclosure; in some embodiments, S210 specifically includes:
weight matrix for each layer of the neural network:
s310, the central server initializes the weight matrix to obtain an initialized weight matrix.
For example, the weight matrix of the l layer of the neural network is
Figure BDA0003008649050000041
Wherein M isl,NlWeight matrix W representing the layers of neural network l, respectivelyl,gAn input dimension and an output dimension. Weight matrix Wl,gIs a high dimensional matrix.
Random initialization of weight matrix W by Xavierl,gObtaining an initialization weight matrix
Figure BDA0003008649050000051
Wherein the weight matrix is initialized
Figure BDA0003008649050000052
Obeys the following uniform distribution:
Figure BDA0003008649050000053
wherein, i and j are all initialized weight matrix
Figure BDA0003008649050000054
Of (1).
S320, the central server carries out singular value decomposition on the initialized weight matrix to obtain an approximate weight matrix.
To the initialized weight matrix
Figure BDA0003008649050000055
Singular value decomposition is carried out to obtain an approximate weight matrix
Figure BDA0003008649050000056
Figure BDA0003008649050000057
Wherein, WkIs an approximate weight matrix
Figure BDA0003008649050000058
The optimal solution of (a), having a rank of k,
Figure BDA0003008649050000059
in order to initialize the weight matrix, the weight matrix is initialized,
Figure BDA00030086490500000510
to approximate a weight matrix, UkIs k columns of left singular vectors, SkA matrix of singular values of k x k, VkAre k columns of right singular vectors.
S330, the central server samples the approximate weight matrix to obtain a row sampling matrix and a column sampling matrix.
Determining the number of sampling lines sl,r(sl,r<<Ml) And the number of sampling columns sl,c(sl,c<<Nl) And for the approximate weight matrix
Figure BDA00030086490500000511
Adopting probability sampling to obtain a row sequence set and a column sequence set:
Figure BDA00030086490500000512
|Sl,r|=sl,r,|Sl,c|=sl,c
further generating a row sampling matrix
Figure BDA00030086490500000513
Sum column sampling matrix
Figure BDA00030086490500000514
S340, the central server samples the approximate weight matrix by using the row sampling matrix and the column sampling matrix to obtain a global characterization matrix.
By obtaining a matrix S of row samplesl,rAnd column sampling matrix Sl,cTo approximate weight matrix
Figure BDA00030086490500000515
Sampling to obtain a global characterization matrix
Figure BDA00030086490500000516
And
Figure BDA00030086490500000517
Figure BDA00030086490500000518
Figure BDA00030086490500000519
global characterization matrix
Figure BDA00030086490500000520
And
Figure BDA00030086490500000521
is a low dimensional matrix.
In some embodiments, the central server integrates the global characterization matrices of all layers to obtain a reduced-dimension set of weight matrix parameters, including:
for the weight matrix of each layer of the neural network, the central server adds the row sampling matrix and the column sampling matrix into the reduced-dimension weight matrix parameter set as indexes of the global characterization matrix.
Wherein the reduced-dimension weight matrix parameter set
Figure BDA0003008649050000061
Figure BDA0003008649050000062
Wherein the content of the first and second substances,
Figure BDA0003008649050000063
set of weight matrix parameters for dimensionality reduction, Sl,rIs a line sampling matrix, Sl,cIn the form of a column-sampling matrix,
Figure BDA0003008649050000064
and
Figure BDA0003008649050000065
and L is the total number of layers of the neural network, and L is the L layer of the neural network.
The number of weights that the present disclosure needs to transmit is
Figure BDA0003008649050000066
The number of network weights that the fully-connected network in the related art needs to transmit
Figure BDA0003008649050000067
It can be seen that the number of the weights to be sent is far less than that of the weights to be sent in the full-connection network, so that the data transmission speed of the central server and the edge nodes is improved, the data transmission cost is reduced, and the reliability of data transmission can be improved. On the other hand, the storage pressure of the edge node is also reduced.
S220, training the edge nodes by using local samples based on the reduced-dimension weight matrix parameter set to obtain edge characterization matrixes corresponding to the edge nodes, and sending the edge characterization matrixes to a central server.
All the edge nodes are trained simultaneously and parallelly according to the local data sets, and the training of each edge node does not interfere with each other.
Referring to fig. 4, fig. 4 is a schematic flowchart of a method for generating an edge characterization matrix according to an embodiment of the present disclosure; in some embodiments, S220 specifically includes:
and S410, calculating by the edge node according to the reduced-dimension weight matrix parameter set to obtain a core representation matrix.
And calculating to obtain a core representation matrix according to the global representation matrix, the row sampling matrix and the column sampling matrix in the dimension-reduced weight matrix parameter set.
Take edge node b as an example:
wherein, the core characterization matrix is obtained by calculation, and the method comprises the following steps:
Figure BDA0003008649050000068
wherein the content of the first and second substances,
Figure BDA0003008649050000069
characterizing the matrix for the core, Sl,rIs a line sampling matrix, Sl,cIn the form of a column-sampling matrix,
Figure BDA00030086490500000610
and
Figure BDA0003008649050000071
is a global characterization matrix, b is an edge node, n is a local update period,
Figure BDA0003008649050000072
is a pseudo-inverse function of the matrix.
And S420, performing forward reasoning calculation on the edge node based on the local sample, the global characterization matrix and the core characterization matrix to obtain local output, performing error back propagation calculation to obtain the gradient of the edge characterization matrix, and obtaining the edge characterization matrix by using the gradient of the edge characterization matrix.
Wherein performing forward inference calculations to obtain a local output comprises: the edge node multiplies the local sample by the global characterization matrix and the core characterization matrix, and calculates to obtain a local output:
Figure BDA0003008649050000073
wherein the content of the first and second substances,
Figure BDA0003008649050000074
for local output, f is the activation function,
Figure BDA0003008649050000075
in the form of a local sample, the local sample,
Figure BDA0003008649050000076
and
Figure BDA0003008649050000077
in order to globally characterize the matrix(s),
Figure BDA0003008649050000078
the matrix is characterized for the core and,
Figure BDA0003008649050000079
is a bias vector.
Local sample
Figure BDA00030086490500000710
In turn with the core characterization matrix
Figure BDA00030086490500000711
Global characterization matrix
Figure BDA00030086490500000712
And
Figure BDA00030086490500000713
multiplication, plus offset vector
Figure BDA00030086490500000714
Then obtaining local output after the action of an activation function f
Figure BDA00030086490500000715
For computational complexity convenience, assume sl,c=sl,r=slBy controlling the order of the matrix multiplication, the complexity of the forward inference computation performed by the present disclosure is
Figure BDA00030086490500000716
The complexity of the forward reasoning computation in the related art neural network structure is
Figure BDA00030086490500000717
It can be seen that the complexity of the forward inference calculation performed by the present disclosure is significantly reduced compared to the related art, and thus, the operation pressure of the edge node is reduced.
Referring to fig. 5, which is a schematic diagram of a weight matrix structure provided in an embodiment of the present disclosure, it can be seen that, according to the present disclosure, a fully-connected layer weight matrix with a complex structure is reduced to a weight matrix with a relatively simple structure provided in the present disclosure, the reduced weight matrix provided in the present disclosure has a much lower complexity compared to a conventional fully-connected matrix, and the required storage and computation requirements are much lower, so that the hardware environment of an edge node can be adapted.
Wherein the performing an error back propagation calculation to obtain a gradient of the edge characterization matrix comprises:
let l +1 layer error matrix be xil+1 n-1The error matrix propagating backwards to the l-layer activation function is xi'l+1 n-1Then the gradient of the edge characterization matrix is:
Figure BDA00030086490500000718
Figure BDA0003008649050000081
wherein the content of the first and second substances,
Figure BDA0003008649050000082
and
Figure BDA0003008649050000083
characterizing the gradient of the matrix for the edge, Xl,bIs a local sample, xi'l+1 n-1For the error matrix to propagate back to the l-layer activation function,
Figure BDA0003008649050000084
and
Figure BDA0003008649050000085
the matrix is characterized for the edge of n-1 period, n is the local update period,
Figure BDA0003008649050000086
characterizing the matrix for the core, Sl,rIs a line sampling matrix, Sl,cIs a column sample matrix and T is the transpose of the matrix.
Without loss of generality i.e. sl,c≠sl,rBecause a complex pseudo-inverse derivation formula is involved, the derivation can be directly carried out by utilizing python to calculate the edge characterization matrix
Figure BDA0003008649050000087
Of the gradient of (c).
For computational convenience, assume sl,c=sl,r=slThe complexity of the update calculation is trained in this disclosure
Figure BDA0003008649050000088
The complexity of the training update calculation in the related art neural network structure is
Figure BDA0003008649050000089
It can be seen that the complexity of the training update calculation of the present disclosure is less than that of the related art, and thus, the operation pressure of the edge node is reduced.
The method for obtaining the edge characterization matrix by using the gradient of the edge characterization matrix comprises the following steps:
Figure BDA00030086490500000810
Figure BDA00030086490500000811
wherein the content of the first and second substances,
Figure BDA00030086490500000812
and
Figure BDA00030086490500000813
in order to characterize the matrix for the edges,
Figure BDA00030086490500000814
and
Figure BDA00030086490500000815
characterizing the matrix for the edge of n-1 period, n is the local update period, eta is the update step length of the gradient,
Figure BDA00030086490500000816
and
Figure BDA00030086490500000817
the gradient of the matrix is characterized for the edges.
The updating of the edge characterization matrix is carried out according to a random gradient descent method.
In some embodiments, all edge nodes may simultaneously calculate and train the characterization matrix through two sensor interactions, as shown in fig. 6, which may further accelerate the training speed of the neural network weight matrix at the edge nodes.
After all local networks complete the set local training period, the updated characterization matrix of the local network will be used
Figure BDA00030086490500000818
Is sent to a central server, where tlFor a set local training period, B is the number of all edge nodes.
S230, the central server obtains an updated dimension-reduced weight matrix parameter set based on all the edge characterization matrixes, and sends the updated dimension-reduced weight matrix parameter set to the edge nodes.
The central server integrates all the edge characterization matrices, and obtains an updated global characterization matrix for each layer, including:
Figure BDA0003008649050000091
Figure BDA0003008649050000092
wherein D is the total training data set sample number, DbThe number of training data set samples of the edge node b, and t is the current execution period of the central server.
After the central server finishes the global representation matrix updating of all the layers of neural networks, the updated dimension-reduced weight matrix parameter set is obtained by integration
Figure BDA0003008649050000093
And the updated dimension-reduced weight matrix parameter set is broadcast
Figure BDA0003008649050000094
Sending to each edge node:
Figure BDA0003008649050000095
in some embodiments, further comprising: and the edge node performs next training based on the updated dimension-reduced weight matrix parameter set, or determines a final model based on the next training.
For each edge node, obtaining an updated edge characterization matrix according to the updated dimension-reduced weight matrix parameter set, including:
Figure BDA0003008649050000096
Figure BDA0003008649050000097
wherein the content of the first and second substances,
Figure BDA0003008649050000098
a set of updated reduced-dimension weight matrix parameters;
Figure BDA0003008649050000099
the matrix is characterized for the updated edges.
The method comprises at least one round of iterative updating of the neural network weight matrix, and steps S220 and S230 are carried out in a cyclic reciprocating mode until a set training termination condition is reached.
As can be seen from the above, the present disclosure provides a federated learning method, wherein the method is implemented by a central server and at least one edge node, the method comprising: the central server reduces the dimension of the weight matrix of each layer of the neural network to obtain a dimension-reduced weight matrix parameter set, and sends the dimension-reduced weight matrix parameter set to the edge node; training the edge node based on the reduced-dimension weight matrix parameter set by using a local sample to obtain an edge characterization matrix corresponding to the edge node, and sending the edge characterization matrix to a central server; and the central server obtains an updated dimension-reduced weight matrix parameter set based on all the edge characterization matrixes and sends the updated dimension-reduced weight matrix parameter set to the edge nodes. The method and the device have the advantages that the data privacy of the edge nodes is guaranteed, the requirements on the computing capacity and the storage capacity of the edge nodes are reduced, and the data transmission speed between the central server and the edge nodes is increased.
It can be seen that, no matter the present disclosure is a local forward reasoning or a training update of the network weight, the computation complexity of the neural network under the framework of the present disclosure is much smaller than that of the neural network under the traditional structure. Meanwhile, the network layer structure provided by the method has the advantage of larger storage space, and the space complexity of the method is only
Figure BDA0003008649050000101
Far superior to that in the conventional network structure
Figure BDA0003008649050000102
The network structure characteristics provided by the disclosure also enable transmission between all edge terminals and the central server terminal
Figure BDA0003008649050000103
Full connection matrix for replacing transmission
Figure BDA0003008649050000104
Amount of data transferred of the present disclosure
Figure BDA0003008649050000105
Far lower volume of transmitted data than in fully connected networks
Figure BDA0003008649050000106
The burden of network transmission data is greatly reduced and the communication reliability is improved.
The method directly generates a new federated learning network structure in a random matrix sampling mode without network pre-training. The method disclosed by the invention has the advantages that under the condition of not reducing the generalization performance of the neural network, the method comprises the following steps: the communication traffic of the edge terminal and the central server terminal is reduced, the communication reliability is improved, and the requirement on the whole communication system network is reduced; from the edge node level: the privacy of local data is protected, the training speed of a local model and the model convergence time are accelerated, and the power consumption and the internal storage space required by local hardware are reduced. The method disclosed by the invention overcomes the application technology bottlenecks of overhigh network computing load, long training time, high requirements on local hardware and network throughput and safety of the whole system and the like of the traditional federal learning, so that the equipment with low time delay, low power consumption and low computing load can be applied under the federal learning framework, and the federal learning is further promoted to be better applied under a plurality of emerging edge computing scenes such as the internet of things, 6G and the like.
It should be noted that the method of the embodiments of the present disclosure may be executed by a single device, such as a computer or a server. The method of the embodiment can also be applied to a distributed scene and completed by the mutual cooperation of a plurality of devices. In such a distributed scenario, one of the devices may only perform one or more steps of the method of the embodiments of the present disclosure, and the devices may interact with each other to complete the method.
It should be noted that the above describes some embodiments of the disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments described above and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
It should be noted that the embodiments of the present disclosure can be further described in the following ways:
a method of federated learning, wherein the method is implemented by a central server and at least one edge node, the method comprising:
the central server reduces the dimension of the weight matrix of each layer of the neural network to obtain a reduced-dimension weight matrix parameter set, and sends the reduced-dimension weight matrix parameter set to the edge nodes;
the edge node is trained by using a local sample based on the dimensionality reduction weight matrix parameter set to obtain an edge characterization matrix corresponding to the edge node, and the edge characterization matrix is sent to the central server;
and the central server obtains an updated dimension-reduced weight matrix parameter set based on all the edge characterization matrixes and sends the updated dimension-reduced weight matrix parameter set to the edge nodes.
Optionally, the reducing the dimension of the weight matrix of each layer of the neural network by the central server to obtain a reduced-dimension weight matrix parameter set includes:
for the weight matrix of each layer of the neural network,
the central server initializes the weight matrix to obtain an initialized weight matrix,
the central server carries out singular value decomposition on the initialized weight matrix to obtain an approximate weight matrix,
the central server samples the approximate weight matrix to obtain a row sampling matrix and a column sampling matrix,
the central server samples the approximate weight matrix by using the row sampling matrix and the column sampling matrix to obtain a global characterization matrix;
and the central server integrates the global characterization matrixes of all the layers to obtain the reduced-dimension weight matrix parameter set.
Optionally, the integrating, by the central server, the global characterization matrices of all layers to obtain the reduced-dimension weight matrix parameter set includes:
and for the weight matrix of each layer of the neural network, the central server adds the row sampling matrix and the column sampling matrix into the reduced-dimension weight matrix parameter set as indexes of the global characterization matrix.
Optionally, the training, by the edge node, based on the dimensionality-reduced weight matrix parameter set, using a local sample to obtain an edge characterization matrix corresponding to the edge node includes:
the edge node calculates to obtain a core representation matrix according to the reduced-dimension weight matrix parameter set;
and the edge node performs forward reasoning calculation to obtain local output based on the local sample, the global characterization matrix and the core characterization matrix, performs error back propagation calculation to obtain the gradient of the edge characterization matrix, and obtains the edge characterization matrix by using the gradient of the edge characterization matrix.
Optionally, the calculating, by the edge node, a core characterization matrix according to the dimensionality-reduced weight matrix parameter set includes:
Figure BDA0003008649050000121
wherein the content of the first and second substances,
Figure BDA0003008649050000122
characterizing the matrix for the core, Sl,rIs a line sampling matrix, Sl,cIn the form of a column-sampling matrix,
Figure BDA0003008649050000123
and
Figure BDA0003008649050000124
in order to globally characterize the matrix(s),
Figure BDA0003008649050000125
is a pseudo-inverse function of the matrix.
Optionally, wherein the performing forward inference computation to obtain a local output includes:
the edge node multiplies the local sample by a global characterization matrix and a core characterization matrix, and calculates to obtain the local output:
Figure BDA0003008649050000126
wherein the content of the first and second substances,
Figure BDA0003008649050000127
for local output, f is the activation function,
Figure BDA0003008649050000128
in the form of a local sample, the local sample,
Figure BDA0003008649050000129
and
Figure BDA00030086490500001210
in order to globally characterize the matrix(s),
Figure BDA00030086490500001211
the matrix is characterized for the core and,
Figure BDA00030086490500001212
is a bias vector.
Optionally, wherein the performing an error back propagation calculation to obtain a gradient of the edge characterization matrix includes:
Figure BDA00030086490500001213
Figure BDA00030086490500001214
wherein the content of the first and second substances,
Figure BDA00030086490500001215
and
Figure BDA00030086490500001216
characterizing the gradient of the matrix for the edge, Xl,bIs a local sample, xi'l+1 n-1For the error matrix to propagate back to the l-layer activation function,
Figure BDA00030086490500001217
and
Figure BDA00030086490500001218
the matrix is characterized for the edge of n-1 period, n is the local update period,
Figure BDA00030086490500001219
characterizing the matrix for the core, Sl,rIs a line sampling matrix, Sl,cIs a column sample matrix and T is the transpose of the matrix.
Optionally, the obtaining the edge characterization matrix by using the gradient of the edge characterization matrix includes:
Figure BDA0003008649050000131
Figure BDA0003008649050000132
wherein the content of the first and second substances,
Figure BDA0003008649050000133
and
Figure BDA0003008649050000134
in order to characterize the matrix for the edges,
Figure BDA0003008649050000135
and
Figure BDA0003008649050000136
characterizing the matrix for the edge of n-1 period, n is the local update period, eta is the update step length of the gradient,
Figure BDA0003008649050000137
and
Figure BDA0003008649050000138
the gradient of the matrix is characterized for the edges.
Optionally, the obtaining, by the central server, an updated dimension-reduced weight matrix parameter set based on all the edge characterization matrices includes:
Figure BDA0003008649050000139
Figure BDA00030086490500001310
wherein the content of the first and second substances,
Figure BDA00030086490500001311
and
Figure BDA00030086490500001312
for the updated reduced-dimension set of weight matrix parameters,
Figure BDA00030086490500001313
and
Figure BDA00030086490500001314
an edge characterization matrix for edge node b, D being the total training data set sample number, DbThe number of training data set samples of the edge node B, the total number of the edge nodes B, and the current execution period of the central server t.
Optionally, the method further includes: and the edge node performs next training based on the updated dimension-reduced weight matrix parameter set, or determines a final model based on the next training.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the idea of the present disclosure, also technical features in the above embodiments or in different embodiments may be combined, steps may be implemented in any order, and there are many other variations of the different aspects of the embodiments of the present disclosure as described above, which are not provided in detail for the sake of brevity.
In addition, well-known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown in the provided figures for simplicity of illustration and discussion, and so as not to obscure the embodiments of the disclosure. Furthermore, devices may be shown in block diagram form in order to avoid obscuring embodiments of the present disclosure, and this also takes into account the fact that specifics with respect to implementation of such block diagram devices are highly dependent upon the platform within which the embodiments of the present disclosure are to be implemented (i.e., specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the disclosure, it should be apparent to one skilled in the art that the embodiments of the disclosure can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative instead of restrictive.
While the present disclosure has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of these embodiments will be apparent to those of ordinary skill in the art in light of the foregoing description. For example, other memory architectures (e.g., dynamic ram (dram)) may use the discussed embodiments.
The disclosed embodiments are intended to embrace all such alternatives, modifications and variances which fall within the broad scope of the appended claims. Therefore, any omissions, modifications, equivalents, improvements, and the like that may be made within the spirit and principles of the embodiments of the disclosure are intended to be included within the scope of the disclosure.

Claims (10)

1. A method of federated learning, wherein the method is implemented by a central server and at least one edge node, the method comprising:
the central server reduces the dimension of the weight matrix of each layer of the neural network to obtain a reduced-dimension weight matrix parameter set, and sends the reduced-dimension weight matrix parameter set to the edge nodes;
the edge node is trained by using a local sample based on the dimensionality reduction weight matrix parameter set to obtain an edge characterization matrix corresponding to the edge node, and the edge characterization matrix is sent to the central server;
and the central server obtains an updated dimension-reduced weight matrix parameter set based on all the edge characterization matrixes and sends the updated dimension-reduced weight matrix parameter set to the edge nodes.
2. The method of claim 1, wherein the central server dimensionality-reduces the weight matrix for each layer of the neural network to obtain a dimensionality-reduced set of weight matrix parameters, comprising:
for the weight matrix of each layer of the neural network,
the central server initializes the weight matrix to obtain an initialized weight matrix,
the central server carries out singular value decomposition on the initialized weight matrix to obtain an approximate weight matrix,
the central server samples the approximate weight matrix to obtain a row sampling matrix and a column sampling matrix,
the central server samples the approximate weight matrix by using the row sampling matrix and the column sampling matrix to obtain a global characterization matrix;
and the central server integrates the global characterization matrixes of all the layers to obtain the reduced-dimension weight matrix parameter set.
3. The method of claim 2, wherein the central server integrates the global characterization matrices of all layers to obtain the reduced-dimension set of weight matrix parameters, comprising:
and for the weight matrix of each layer of the neural network, the central server adds the row sampling matrix and the column sampling matrix into the reduced-dimension weight matrix parameter set as indexes of the global characterization matrix.
4. The method of claim 3, wherein the training of the edge node by using a local sample based on the reduced-dimension weight matrix parameter set to obtain an edge characterization matrix corresponding to the edge node comprises:
the edge node calculates to obtain a core representation matrix according to the reduced-dimension weight matrix parameter set;
and the edge node performs forward reasoning calculation to obtain local output based on the local sample, the global characterization matrix and the core characterization matrix, performs error back propagation calculation to obtain the gradient of the edge characterization matrix, and obtains the edge characterization matrix by using the gradient of the edge characterization matrix.
5. The method of claim 4, wherein the calculating, by the edge node, a core characterization matrix according to the reduced-dimension weight matrix parameter set includes:
Figure FDA0003008649040000021
wherein the content of the first and second substances,
Figure FDA0003008649040000022
characterizing the matrix for the core, Sl,rIs a line sampling matrix, Sl,cIn the form of a column-sampling matrix,
Figure FDA0003008649040000023
and
Figure FDA0003008649040000024
in order to globally characterize the matrix(s),
Figure FDA0003008649040000025
is a pseudo-inverse function of the matrix.
6. The method of claim 4, wherein said performing forward inference calculations to derive local outputs comprises:
the edge node multiplies the local sample by a global characterization matrix and a core characterization matrix, and calculates to obtain the local output:
Figure FDA0003008649040000026
wherein the content of the first and second substances,
Figure FDA0003008649040000027
for local output, f is the activation function,
Figure FDA0003008649040000028
in the form of a local sample, the local sample,
Figure FDA0003008649040000029
and
Figure FDA00030086490400000210
in order to globally characterize the matrix(s),
Figure FDA00030086490400000211
the matrix is characterized for the core and,
Figure FDA00030086490400000212
is a bias vector.
7. The method of claim 4, wherein the performing an error back propagation calculation to derive a gradient of the edge characterization matrix comprises:
Figure FDA00030086490400000213
Figure FDA00030086490400000214
wherein the content of the first and second substances,
Figure FDA00030086490400000215
and
Figure FDA00030086490400000216
characterizing the gradient of the matrix for the edge, Xl,bIs a local sample, xi'l+1 n-1For the error matrix to propagate back to the l-layer activation function,
Figure FDA00030086490400000217
and
Figure FDA00030086490400000218
the matrix is characterized for the edge of n-1 period, n is the local update period,
Figure FDA00030086490400000219
characterizing the matrix for the core, Sl,rIs a line sampling matrix, Sl,cIs a column sample matrix and T is the transpose of the matrix.
8. The method of claim 4, wherein the deriving the edge characterization matrix using the gradient of the edge characterization matrix comprises:
Figure FDA0003008649040000031
Figure FDA0003008649040000032
wherein the content of the first and second substances,
Figure FDA0003008649040000033
and
Figure FDA0003008649040000034
in order to characterize the matrix for the edges,
Figure FDA0003008649040000035
and
Figure FDA0003008649040000036
characterizing the matrix for the edge of n-1 period, n is the local update period, eta is the update step length of the gradient,
Figure FDA0003008649040000037
and
Figure FDA0003008649040000038
the gradient of the matrix is characterized for the edges.
9. The method of claim 1, wherein the central server derives an updated reduced-dimension set of weight matrix parameters based on all of the edge characterization matrices, comprising:
Figure FDA0003008649040000039
Figure FDA00030086490400000310
wherein the content of the first and second substances,
Figure FDA00030086490400000311
and
Figure FDA00030086490400000312
for the updated reduced-dimension set of weight matrix parameters,
Figure FDA00030086490400000313
and
Figure FDA00030086490400000314
an edge characterization matrix for edge node b, D being the total training data set sample number, DbThe number of training data set samples of the edge node B, the total number of the edge nodes B, and the current execution period of the central server t.
10. The method of claim 1, further comprising: and the edge node performs next training based on the updated dimension-reduced weight matrix parameter set, or determines a final model based on the next training.
CN202110369242.8A 2021-04-06 2021-04-06 Federal learning method Pending CN113516151A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110369242.8A CN113516151A (en) 2021-04-06 2021-04-06 Federal learning method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110369242.8A CN113516151A (en) 2021-04-06 2021-04-06 Federal learning method

Publications (1)

Publication Number Publication Date
CN113516151A true CN113516151A (en) 2021-10-19

Family

ID=78062259

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110369242.8A Pending CN113516151A (en) 2021-04-06 2021-04-06 Federal learning method

Country Status (1)

Country Link
CN (1) CN113516151A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110955778A (en) * 2019-12-13 2020-04-03 中国科学院深圳先进技术研究院 Junk short message identification method and system based on differential privacy joint learning
CN111079977A (en) * 2019-11-18 2020-04-28 中国矿业大学 Heterogeneous federated learning mine electromagnetic radiation trend tracking method based on SVD algorithm
CN111401513A (en) * 2020-02-11 2020-07-10 北京邮电大学 Lightweight deep learning method and device based on random matrix sampling
US20200233864A1 (en) * 2019-01-18 2020-07-23 Adobe Inc. Latent network summarization

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200233864A1 (en) * 2019-01-18 2020-07-23 Adobe Inc. Latent network summarization
CN111079977A (en) * 2019-11-18 2020-04-28 中国矿业大学 Heterogeneous federated learning mine electromagnetic radiation trend tracking method based on SVD algorithm
CN110955778A (en) * 2019-12-13 2020-04-03 中国科学院深圳先进技术研究院 Junk short message identification method and system based on differential privacy joint learning
CN111401513A (en) * 2020-02-11 2020-07-10 北京邮电大学 Lightweight deep learning method and device based on random matrix sampling

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
冶忠林;赵海兴;张科;朱宇;肖玉芝;: "基于邻节点和关系模型优化的网络表示学习", 计算机研究与发展, no. 12, 15 December 2019 (2019-12-15) *

Similar Documents

Publication Publication Date Title
CN112035743B (en) Data recommendation method and device, computer equipment and storage medium
WO2022257730A1 (en) Methods and apparatus for multiple parties to collaboratively update model while protecting privacy, and system
WO2022156561A1 (en) Method and device for natural language processing
WO2023087914A1 (en) Method and apparatus for selecting recommended content, and device, storage medium and program product
US11763204B2 (en) Method and apparatus for training item coding model
CN115238855A (en) Completion method of time sequence knowledge graph based on graph neural network and related equipment
CN117459575A (en) Service data pushing method, device, computer equipment and storage medium
CN114692745A (en) Data processing method and device, integrated chip, electronic equipment and storage medium
CN117894038A (en) Method and device for generating object gesture in image
Provalov et al. Synevarec: A framework for evaluating recommender systems on synthetic data classes
US20230196128A1 (en) Information processing method, apparatus, electronic device, storage medium and program product
CN116797850A (en) Class increment image classification method based on knowledge distillation and consistency regularization
Zhan et al. Field programmable gate array‐based all‐layer accelerator with quantization neural networks for sustainable cyber‐physical systems
CN113516151A (en) Federal learning method
CN114493674A (en) Advertisement click rate prediction model and method
CN111814368B (en) Tensor-based land utilization simulation method, system, equipment and storage medium
CN114298961A (en) Image processing method, device, equipment and storage medium
CN113282821A (en) Intelligent application prediction method, device and system based on high-dimensional session data fusion
CN114792388A (en) Image description character generation method and device and computer readable storage medium
CN113010772A (en) Data processing method, related equipment and computer readable storage medium
CN112036418A (en) Method and device for extracting user features
Kobayashi Bicomplex projection rule for complex-valued Hopfield neural networks
CN117392260B (en) Image generation method and device
Cheng et al. Travel Attractions Recommendation Based on Attentive Group Recommendation Algorithm
CN116910800A (en) Public opinion monitoring method, public opinion monitoring device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination