CN115983320A - Federal learning model parameter quantification method based on deep reinforcement learning - Google Patents

Federal learning model parameter quantification method based on deep reinforcement learning Download PDF

Info

Publication number
CN115983320A
CN115983320A CN202211657889.1A CN202211657889A CN115983320A CN 115983320 A CN115983320 A CN 115983320A CN 202211657889 A CN202211657889 A CN 202211657889A CN 115983320 A CN115983320 A CN 115983320A
Authority
CN
China
Prior art keywords
quantization
model
agent
reinforcement learning
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211657889.1A
Other languages
Chinese (zh)
Inventor
董宇涵
郑斯辉
陈翔
李志德
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Research Institute Tsinghua University
Shenzhen International Graduate School of Tsinghua University
Original Assignee
Shenzhen Research Institute Tsinghua University
Shenzhen International Graduate School of Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Research Institute Tsinghua University, Shenzhen International Graduate School of Tsinghua University filed Critical Shenzhen Research Institute Tsinghua University
Priority to CN202211657889.1A priority Critical patent/CN115983320A/en
Publication of CN115983320A publication Critical patent/CN115983320A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A federal learning model parameter quantification method based on deep reinforcement learning comprises the following steps: s1, obtaining current global model parameters; s2, counting M percentage points of the current global model parameters as environmental states observed by the intelligent deep reinforcement learning agent; s3, according to the action output by the agent, constructing L quantization step points according to a given rule, and using the L quantization step points as a mapping set of quantization operation; s4, quantization transmission: carrying out model quantization and transmission of X rounds, wherein quantization of each round adopts a quantization mapping set of the previous step, statistics is carried out on quantization errors and training errors of the X rounds, a mean value of the quantization errors and training errors is calculated, a reward value is obtained according to a reward function, and the reward value is input to an intelligent agent to serve as feedback; and S5, continuously recording the state, action and reward condition of each time by the intelligent agent, and updating the network model of the intelligent agent when the number of records reaches a given threshold value. The method has smaller quantization error and higher test accuracy.

Description

Federal learning model parameter quantification method based on deep reinforcement learning
Technical Field
The invention relates to the field of distributed artificial intelligence, in particular to a federal learning model parameter quantification method based on deep reinforcement learning.
Background
Machine Learning (ML) is one of the most representative artificial intelligence techniques at present, and can learn an optimization strategy in a massive data sample, and has shown a great potential in many applications, and particularly, has obtained extensive research and application in the fields of computer vision, natural language processing, and the like. However, with the increasing importance of privacy for individuals, institutions, and even countries, it is becoming increasingly difficult to collect large amounts of data and then place them on a central server for training. Federal Learning (FL) is produced as a distributed ML method, and it completes training through model exchange between a user and a server, and does not need to upload the original data of the user, and can avoid the problem of privacy disclosure to a certain extent. However, the machine learning network model is large in scale and requires hundreds of iterations to converge, which results in a large amount of communication resources consumed in the training process of federal learning, which is particularly prominent in wireless communication systems.
In order to improve the communication efficiency of wireless federal learning and reduce the communication overhead required by distributed model training, it is necessary to compress the federal learning model. At present, the main model compression method comprises low-rank approximation, sparsification, quantization and the like, wherein the quantization refers to the transmission of low-precision approximation values which can be represented by converting high-precision neural network model parameters into a few bits (namely, quantization bit width); the method has been shown to greatly reduce communication overhead without unduly affecting neural network model performance. For example, it is proposed to round the model parameter vector to a limited set of discrete values at random, and to efficiently lossless-code the model by using the characteristic that the occurrence probabilities of the discrete values are not equal, thereby effectively improving the communication efficiency of the FL.
In the prior art, the main concern is the quantization of model parameters when a user transmits a model to a server, i.e., uplink communication, and less concern is the quantization of model parameters when the server broadcasts the model to the user, i.e., downlink communication. Some consider the quantization compression problem of downlink communication, the proposed algorithm only transmits the difference value between the broadcast global model and the previous model, and by means of the characteristic that the difference value has a smaller dynamic range than the model itself, the scheme can obtain a lower error. However, in this scheme, the latest global model must be downloaded every turn regardless of whether the user is involved in training, which may add additional communication overhead from the user's perspective. In addition, the current solution only considers the uniform quantization scheme, because the non-uniform quantization can further reduce the quantization error, but the setting and optimization of the quantization step point are difficult. The performance of the existing wireless federal learning model parameter quantization method is seriously damaged when the bit width is low.
It is noted that the information disclosed in the above background section is only for understanding of the background of the present application and therefore may include information that does not constitute prior art that is known to a person of ordinary skill in the art.
Disclosure of Invention
The invention mainly aims to provide a Federal learning model parameter quantification method based on deep reinforcement learning, so as to reduce performance loss when a model adopts low quantification bit width for transmission.
In order to realize the purpose, the invention adopts the following technical scheme:
in a first aspect, a method for quantifying parameters of a federated learning model based on deep reinforcement learning comprises the following steps:
s1, obtaining current global model parameters;
s2, state processing: counting M percentile points of the current global model parameters as the environmental states observed by the deep reinforcement learning agent; the deep reinforcement learning agent comprises an action network and an evaluation network;
s3, action processing: according to the action output by the deep reinforcement learning agent, constructing L quantization step points according to a given rule as a mapping set of quantization operation;
s4, quantization transmission: carrying out model quantization and transmission of X rounds, wherein quantization of each round adopts a quantization mapping set of the previous step, statistics is carried out on quantization errors and training errors of the X rounds, a mean value of the quantization errors and training errors is calculated, a reward value is obtained according to a reward function, and the reward value is input to a reinforcement learning agent to serve as feedback;
s5, updating the model: the reinforcement learning agent continuously records the state, action and reward condition of each time, and updates the network model of the deep reinforcement learning agent when the recorded number reaches a given threshold value.
In a second aspect, a computer-readable storage medium stores a computer program, which when executed by a processor, implements the method for quantifying parameters of a federated learning model based on deep reinforcement learning.
The invention has the following beneficial effects:
the invention provides a method for quantizing federal learning model parameters based on deep reinforcement learning, which solves the problem that the performance of the existing method for quantizing the parameters of the wireless federal learning model is seriously damaged when the bit width is low. The invention reduces the performance loss of low quantization bit width, therefore, in order to reach the same performance, the quantization bit width lower than that of the common method can be adopted, and the invention has the indirect benefit of reducing communication overhead. The method adopts non-uniform quantization, autonomously optimizes the selection of quantization step points through a reinforcement learning intelligent agent, is suitable for an uplink communication link and a downlink communication link, and can obtain smaller quantization error and higher test accuracy compared with the traditional uniform quantization method. Through test comparison and verification, the accuracy of the model on the test set is higher when the method is adopted under the same training round.
Drawings
FIG. 1 is a flow chart of an embodiment of the present invention.
Fig. 2 is a process of quantifying federal learning model parameters based on deep reinforcement learning according to an embodiment of the present invention.
FIG. 3 is a graph comparing training and quantization errors of an embodiment of the present invention with a prior art method.
FIG. 4 is a graph comparing the test accuracy of the present invention embodiment with the prior art method.
Detailed Description
The embodiments of the present invention will be described in detail below. It should be emphasized that the following description is merely exemplary in nature and is not intended to limit the scope of the invention or its application.
Referring to fig. 1, an embodiment of the present invention provides a method for quantifying parameters of a federated learning model based on deep reinforcement learning, including the following steps:
s1, obtaining current global model parameters;
s2, state processing: counting M percentile points of the current global model parameters as the environmental states observed by the intelligent deep reinforcement learning agent; the deep reinforcement learning agent comprises two deep neural networks, namely an action network and an evaluation network;
s3, action processing: according to the action output by the deep reinforcement learning agent, constructing L quantization step points according to a given rule as a mapping set of quantization operation;
s4, quantization transmission: carrying out model quantization and transmission of X rounds, wherein quantization of each round adopts a quantization mapping set of the previous step, statistics is carried out on quantization errors and training errors of the X rounds, a mean value of the quantization errors and training errors is calculated, a reward value is obtained according to a reward function, and the reward value is input to a reinforcement learning agent to serve as feedback;
s5, updating the model: the reinforcement learning agent continuously records the state, action and reward condition of each time, and updates the network model of the deep reinforcement learning agent when the recorded number reaches a given threshold value.
The embodiment of the invention provides a federal learning model parameter quantification method based on deep reinforcement learning, which can make a reasonable quantification strategy through interaction with the environment, and effectively reduce the performance loss caused by model transmission by adopting low quantification bit width. The method adopts non-uniform quantization, autonomously optimizes selection of quantization step points through a reinforcement learning agent, is suitable for an uplink communication link and a downlink communication link, and can obtain smaller quantization error and higher test accuracy compared with the traditional uniform quantization method.
Specific embodiments of the present invention are further described below.
The method for quantifying the parameters of the federated learning model based on the deep reinforcement learning mainly comprises the following steps: constructing a deep reinforcement learning agent, wherein the agent comprises two deep neural networks, namely an action network and an evaluation network; state processing, in which the system counts M percentile points of the current global model parameters as environmental states observed by the reinforcement learning agent; the method comprises the following steps of performing action processing, wherein the system constructs L quantization step points according to given rules and the action output by the reinforcement learning agent as a mapping set of quantization operation; then, carrying out quantitative transmission, wherein in the step, the system carries out model quantization and transmission in X rounds, the quantization of each round adopts the quantization mapping set of the previous step, and carries out statistics on the quantization errors and the training errors of the X rounds, calculates the mean value of the quantization errors and obtains an incentive value according to an incentive function, and inputs the incentive value to the reinforcement learning agent as feedback; finally, the reinforcement learning agent keeps recording the state, action and reward condition of each time in the process, and when the number of the records reaches a given threshold value, the network model of the agent is updated. The specific flow is shown in fig. 2.
1. Building reinforcement learning agent
A Deep Reinforcement Learning (DRL) agent is a core unit responsible for decision making in the method of the present invention, and is composed of two neural networks, including: action networks π (a, s; θ) a ) Wherein theta α Representing the parameters of the neural network, s representing the state vectors observed by the agent at the present time, pi (a, s; theta) a ) Giving the probability that the agent decides the action a given in this state; evaluation network V (s; theta) v ) Wherein theta v Parameters representing the neural network, the function giving the current intelligenceThe value of the state s that can be considered.
2. State processing
Assuming that the neural network model to be transmitted is
Figure BDA0004012296020000041
Where d is the dimension of the network model, i.e. the number of parameters it contains. Suppose that w is obtained by sorting the parameters in w from small to large according to absolute values s Defining the xth percentile as
Figure BDA0004012296020000042
Wherein the content of the first and second substances,
Figure BDA0004012296020000043
is w s The (i) th element of (a), device for selecting or keeping>
Figure BDA0004012296020000044
Indicating rounding up.
Assuming that the dimension of the input state of the reinforcement learning agent is M ≧ 1, the following vector is constructed
s=[p 1 ,p 2 ,…,p M ] T ,
As observed at the current time, the m-th element of the vector is
Figure BDA0004012296020000045
/>
Here, a balance can be struck between the representation accuracy and the state space dimension by adjusting the value of M.
3. Action processing
Assume that the action dimension of the DRL agent output is L/2. After the state vector s is obtained in step 4.2, it is input into the action network π (as; θ) of the DRL agent a ) To obtain an output vector of the network
μ=[μ 12 ,…,μ L/2 ] T .
Given the variance of the samples for all actions as σ, then distribute N (μ) from normal i σ) to obtain action a i I =1,2, \ 8230;, L/2, and constitutes a motion vector
a=[a 1 ,a 2 ,…,a L/2 ] T
Based on the vector, the quantization step point and the complete set of mappings can be further computed. Assuming that the quantization step points are only in a given range [ -B, B [ ]],B>0, and searching and optimizing the interval [ -B,0 [ -B [ -C [ ]]Are divided into L/2 sub-intervals, and the ith interval is marked as I i =[l i ,u i ]Wherein the left end point l i Right end point u i Respectively calculated according to the following formula:
Figure BDA0004012296020000051
and defines a uniformly distributed reference vector c, the ith element of which is the center point of the ith sub-interval, i.e. the
Figure BDA0004012296020000052
Then, a quantization step point vector for the negative part is calculated
Figure BDA0004012296020000053
Where Sort (x) denotes ordering the elements in vector x from small to large. On this basis, a quantization step vector for the non-negative part can be calculated
Figure BDA0004012296020000054
It can be represented by a vector q n The reverse order is then obtained by taking the inverse number. Q is to be n ,q p Splicing the two vectors together to obtain a complete quantization step point with dimension L(Vector)
Figure BDA0004012296020000055
4. Quantized transmission
According to the quantization step point vector Q obtained in step 4.3, a mapping set Q = { Q } can be constructed 1 ,q 2 ,…,q L Wherein q is l Is the l-th element of the vector q. The process of broadcasting the global model by the server, receiving and training the model by the user equipment, uploading the model by the user equipment and aggregating the uploaded models by the server is called a round. Assuming that the current global model is w, the quantization transmission is performed in the following X rounds according to the following steps:
the method comprises the following steps that firstly, the current turn is assumed to be the t-th turn, wherein t =1,2, \8230, and X. for each parameter w in the global model of the current turn, random quantization is carried out according to the following rule:
Figure BDA0004012296020000061
wherein ξ (w, q) i ,q j ) Is a random variable, which is defined as
Figure BDA0004012296020000062
Suppose the quantized model parameters are denoted as Q (w) = [ Q (w) 1 ),Q(w 2 ),…,Q(w d )] T Then the quantization error e for that round can be calculated t =||Q(w)-w|| 2
In a second step, the quantized global model Q (w) is broadcast to users, who receive the model, and train on the local data set based on the model. Suppose the set of users is S, the training error of the kth user is
Figure BDA0004012296020000063
After each user finishes training, the updated model w k And uploading the training error to a server, and the server is according to the { w k } k∈S And aggregating to obtain a new global model w, and calculating an average training error according to the following formula:
Figure BDA0004012296020000064
after completing the calculation of X rounds, X quantization errors and average training error values can be collected, and the average value of the quantization errors and the average training error values is calculated according to the following formula:
Figure BDA0004012296020000065
further calculating the reward of the training process according to the calculated reward
Figure BDA0004012296020000066
Wherein the content of the first and second substances,
Figure BDA0004012296020000067
represents the average training error obtained by the last quantized transmission, and makes the decision on whether or not the quantized transmission is the first time>
Figure BDA0004012296020000068
This condition is true; if steps S1 to S5 are referred to as an iteration, this variable indicates the value which was acquired at this step in the last iteration>
Figure BDA0004012296020000069
α>1 is a constant factor for amplifying the influence of quantization error; beta is a 12 >0 is also a constant weight factor used to adjust the effect of quantization error and training error on the reward value; i (-) is an indicator function which takes on the value 1 when the conditions in parentheses are established, and takes on the value 0. Delta>0 is the penalty given when a gradient explosion occurs in the training process, i.e. there is an excess of INF in w, where INF representsA very large value (typically 1.796e 308).
5. Model updating
In step 4.2, the agent may obtain state s; in step 4.3, the agent outputs action a; in step 4.4, the agent receives the reward r and can calculate the next state s' based on the latest w in the method of step 4.2. Storing the quadruple (s, a, r, s') in the iteration process into a record cache region B of the agent, and if the number of records in the cache region is less than a given threshold value P, continuing to generate the next record according to the steps 4.2-4.3; otherwise, updating the intelligent agent model according to the following steps.
In the first step, assume that the ith record in the buffer is p i =(s i ,a i ,r i ,s' i ) Where i =1,2, ..., P. Let A P =0, and calculates the merit function of each successive state-action pair according to the following formula
A i-1 =r i-1 +γ·V(s i ';θ v )-V(s i ;θ v )+γλ·A i ,
Where both γ and λ are constants, used to adjust the impact of future revenues on current dominance.
And secondly, grouping the records in the cache region, wherein each group comprises C records. Make theta' a =θ a Then, for each set of data, the c-th data is calculated (assuming that its position in B is i) c ) Corresponding action loss function
Figure BDA0004012296020000071
Where C =1,2, ..., C, the probability ratio ρ is defined as:
Figure BDA0004012296020000072
the clipping function clip (. Cndot.) is defined as
Figure BDA0004012296020000073
The model is updated according to the following formula:
Figure BDA0004012296020000074
wherein eta is a Is the learning rate of the learning rate,
Figure BDA0004012296020000075
is the gradient of the loss function.
Similarly, let θ' c =θ c . Then, for each group of data, calculating the value loss function corresponding to the c-th data
Figure BDA0004012296020000076
The model is then updated according to the following formula:
Figure BDA0004012296020000081
wherein eta is c Is the learning rate.
Thirdly, after all the training is finished, updating the model, namely enabling the theta to be adjusted a =θ' ac =θ' c And all records in the buffer B are emptied. And returning to the step 4.2, and continuing to carry out interaction to obtain training data.
Performance analysis
In order to verify the benefits brought by the method, a simulation experiment is carried out based on a public data set CIFAR-10. The data is an object recognition image data set, and the training set contains 50000 samples which are randomly divided into 100 non-overlapping small data sets in the test example, so as to simulate local data sets of 100 users. In the FL training process, 10 users are randomly selected per round for local computation, the local batch size is set to 50, the iterative Epoch is set to 5, the learning rate is initialized to 0.15, and the decay rate is set to 0.99 per 10 rounds.
For the DRL agent, two identical multilayer perceptron models are respectively used as an action network and an evaluation network, the middle layer of each model contains 150 neurons, and in the DRL training stage, each user only uses 20% of a local data set to train so as to accelerate the speed. The state dimension M is 5, and 3-bit quantization is considered, so that the total number of quantization step points is L =8, the action dimension is 4, the action sampling variance σ =0.1, the optimization range R of the quantization step points is 0.15, and the execution round X =4 of each action. In the reward function, α =10, β 12 The values are 150000 and 0.3 respectively; the penalty factor delta is 5. In the model training parameters, the cutoff rate oa =0.2, the gain adjustment factors γ and λ take the values 0.99 and 0.95, respectively, and the learning rates η of the action network and the evaluation network are η ac All set to 0.0004, the buffering threshold set to P =2048, and the packet size C =16. The number of training sessions for the DRL was 40000.
After finishing DRL agent training, using the DRL agent training device to perform 1000 rounds of FL training, wherein all users use a complete training set at the moment, and perform accuracy testing on a testing set containing 10000 samples to verify the performance of the model, and the higher the accuracy is, the better the representation effect is. Here, it is compared with the conventional uniform quantization, and the quantization step vector of the uniform quantization is fixed to 0.0375 × [ -4, -3, -2, -1, 2,3,4] T
Fig. 3 shows an error comparison graph of quantization of neural network model parameters of downlink communication process on a CIFAR-10 data set by using the existing uniform quantization and the method proposed by the present invention. According to the reward function setting of the DRL agent, if the quantization step point is selected to reduce the training error and the quantization error, the DRL agent obtains reward, otherwise, the DRL agent is punished, and therefore the final scheme can obtain lower quantization error and training error compared with a general uniform quantization scheme.
Fig. 4 shows the comparison of the accuracy obtained on the test set when the downlink communication process is processed by uniform quantization and the method proposed by the present invention on the CIFAR-10 data set. The method provided by the invention can reduce quantization error and training error, which means that the error between the broadcast model and the unquantized model received by each user during training is smaller, and meanwhile, the convergence on the local data set is faster, so that the model convergence is faster when the method is adopted, and the accuracy of the model on the test set is higher under the same training round.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The background section of the present invention may contain background information related to the problem or environment of the present invention and does not necessarily describe prior art. Accordingly, the inclusion in the background section is not an admission of prior art by the applicant.
The foregoing is a further detailed description of the invention in connection with specific/preferred embodiments and it is not intended to limit the invention to the specific embodiments described. It will be apparent to those skilled in the art that numerous alterations and modifications can be made to the described embodiments without departing from the inventive concepts herein, and such alterations and modifications are to be considered as within the scope of the invention. In the description herein, references to the description of the term "one embodiment," "some embodiments," "preferred embodiments," "an example," "a specific example," or "some examples" or the like are intended to mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Various embodiments or examples and features of various embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction. Although embodiments of the present invention and its advantages have been described in detail, it should be understood that various changes, substitutions and alterations can be made herein without departing from the scope of the application.

Claims (7)

1. A federal learning model parameter quantification method based on deep reinforcement learning is characterized by comprising the following steps:
s1, obtaining current global model parameters;
s2, state processing: counting M percentile points of the current global model parameters as the environmental states observed by the intelligent deep reinforcement learning agent; the deep reinforcement learning agent comprises an action network and an evaluation network;
s3, action processing: according to the action output by the intelligent deep reinforcement learning agent, constructing L quantization step points according to a given rule, and using the L quantization step points as a mapping set of quantization operation;
s4, quantization transmission: performing model quantization and transmission of X rounds, wherein quantization of each round adopts the quantization mapping set of the previous step, and statistics is performed on quantization errors and training errors of the X rounds, and the average value of the quantization errors and the training errors is calculated to obtain a reward value according to a reward function and input to the reinforcement learning agent as feedback;
s5, updating the model: the reinforcement learning agent continuously records the state, action and reward condition of each time, and updates the network model of the deep reinforcement learning agent when the number of records reaches a given threshold value.
2. The method as claimed in claim 1, wherein the action network is represented by pi (a, s; theta) a ) Wherein theta α Representing the parameters of the neural network, s representing the state vectors observed by the agent at the present time, pi (a, s; theta) a ) Giving the probability that the agent decides the action a given in this state; the evaluation network is denoted V (s; theta) v ) Wherein theta v A parameter, V (s; theta), representing the neural network v ) The magnitude of the value of state s currently considered by the agent is given.
3. Federal learning model parameter quantity based on deep reinforcement learning according to claim 1 or 2The method is characterized in that in step S2, the neural network model to be transmitted is set as
Figure FDA0004012296010000011
D is the dimension of the network model, and w is obtained by sequencing parameters in w from small to large according to absolute values s Defining the xth percentile as:
Figure FDA0004012296010000012
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0004012296010000013
is w s The (i) th element of (a), device for selecting or keeping>
Figure FDA0004012296010000014
Represents rounding up;
setting the dimension of the input state of the deep reinforcement learning agent to be M more than or equal to 1, and constructing the following vectors:
s=[p 1 ,p 2 ,…,p M ] T ,
as observed at the current time, the m-th element of the vector is
Figure FDA0004012296010000021
By adjusting the value of M, a balance is achieved between the representation accuracy and the state space dimensions.
4. The method for quantifying parameters of a federated learning model based on deep reinforcement learning as defined in claim 2, wherein in step S3, after obtaining the state vector S in step S2, it is input into the action network pi (a | S; θ) a ) To obtain an output vector for the network:
μ=[μ 12 ,…,μ L/2 ] T
l/2 is the action dimension of the deep reinforcement learning agent output, the sampling variance of all actions is given as sigma, and then the normal distribution N (mu) is obtained i σ) to obtain action a i I =1,2, \ 8230;, L/2, and constitutes a motion vector:
a=[a 1 ,a 2 ,…,a L/2 ] T
based on the vector, a quantization step point and a complete set of mappings are calculated, the quantization step point being in a given range [ -B, B],B>0, and searching and optimizing the interval [ -B,0 [ -B]Are divided into L/2 sub-intervals, and the ith interval is marked as I i =[l i ,u i ]Wherein the left end point l i Right end point u i Respectively calculated according to the following formula:
Figure FDA0004012296010000022
defining a uniformly distributed reference vector c, wherein the ith element is the central point of the ith sub-interval, namely:
Figure FDA0004012296010000023
next, a quantization step vector for the negative part is calculated:
Figure FDA0004012296010000024
wherein Sort (x) denotes ordering the elements in vector x from small to large;
computing a quantization step vector for the non-negative part:
Figure FDA0004012296010000025
q is to be n ,q p Two vector stitchingTogether, a complete quantization step point vector with dimension L is obtained
Figure FDA0004012296010000026
5. The method for quantifying parameters of a deep reinforcement learning-based federal learning model as claimed in claim 1 or 2, wherein in step S4, a mapping set Q = { Q } is constructed according to the quantization order point vector Q obtained in step S3 1 ,q 2 ,…,q L In which q is l Is the l-th element of the vector q; the method comprises the following steps of (1) carrying out server broadcasting of a global model, receiving and training of a user equipment, uploading of the user equipment model and aggregation of the uploaded models by the server to form a round, and carrying out quantitative transmission on the current global model w in the following X rounds according to the following steps:
the method comprises the following steps of firstly, setting the current as the tth round, wherein t =1,2, \8230, and X, carrying out random quantization on each parameter w in the global model of the current round according to the following rules:
Figure FDA0004012296010000031
wherein ξ (w, q) i ,q j ) Is a random variable, which is defined as
Figure FDA0004012296010000032
The quantized model parameter is denoted as Q (w) = [ Q (w) 1 ),Q(w 2 ),…,Q(w d )] T Calculating the quantization error e of the round t =||Q(w)-w|| 2
Secondly, broadcasting the quantized global model Q (w) to users so that the users receiving the model train on the local data set based on the model; training errors of the kth user as S in the user setThe difference is l k The server receives the updated model w after each user finishes training k And training error, according to { w } k } k∈S And (4) aggregating to obtain a new global model w, and calculating an average training error according to the following formula:
Figure FDA0004012296010000033
after the calculation of X rounds is completed, X quantization errors and average training error values are collected, and the average value is calculated according to the following formula:
Figure FDA0004012296010000034
further calculating the reward of the training process according to the mean value
Figure FDA0004012296010000035
Wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0004012296010000036
representing the average training error obtained by the last quantization transmission, and if the quantization transmission is the first time, controlling the transmission to be the next time
Figure FDA0004012296010000037
This condition is true; alpha (alpha) ("alpha")>1 is a constant factor for amplifying the influence of quantization error; beta is a 12 >0 is a constant weight factor used to adjust the effect of quantization error and training error on the reward value; i (-) is an indication function, when the condition in the bracket is satisfied, the value is 1, otherwise the value is 0; delta>0 is the penalty given when the training process has a gradient explosion, i.e. there is a large value INF in w that exceeds the set value.
6. The method for quantifying parameters of a federated learning model based on deep reinforcement learning of claim 1 or 2, wherein in step S5, according to the state S acquired by the agent in step S2, the action a output by the agent in step S3; the reward r received by the intelligent agent in the step S4 and the next state S' calculated according to the method in the step S2 based on the latest w are stored into a record cache region of the intelligent agent, and if the number of records in the cache region is less than a given threshold value P, the next record is continuously generated according to the steps S2-S4; otherwise, updating the intelligent agent model according to the following steps:
in the first step, assume that the ith record in the buffer is p i =(s i ,a i ,r i ,s' i ) Wherein i =1,2, ..., P; let A be P =0, and calculates the merit function for each successive state-action pair according to the following equation:
A i-1 =r i-1 +γ·V(s i ';θ v )-V(s i ;θ v )+γλ·A i ,
wherein gamma and lambda are constants and are used for adjusting the influence of future income on the current advantage;
secondly, grouping the records in the cache area, wherein each group comprises C records; make theta' a =θ a Then, for each set of data, the action loss function corresponding to the c-th data is calculated:
Figure FDA0004012296010000041
wherein C =1,2, ..., C, the position of the C-th data in B is i c The probability ratio ρ is:
Figure FDA0004012296010000042
the clipping function clip (. Cndot.) is
Figure FDA0004012296010000043
The model is updated according to the following formula:
Figure FDA0004012296010000044
wherein eta a Is the learning rate of the learning rate,
Figure FDA0004012296010000045
is the gradient of the loss function;
make theta' c =θ c (ii) a Then, for each group of data, calculating the value loss function corresponding to the c-th data
Figure FDA0004012296010000046
The model is then updated as follows:
Figure FDA0004012296010000051
wherein eta is c Is the learning rate;
thirdly, after all the training is finished, updating the model, namely enabling theta to be adjusted a =θ' ac =θ' c Emptying all records in the cache region; returning to the step S2, and continuing to carry out interaction to obtain training data.
7. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the method for quantifying federal learning model parameters for deep reinforcement learning as claimed in any one of claims 1 to 6.
CN202211657889.1A 2022-12-22 2022-12-22 Federal learning model parameter quantification method based on deep reinforcement learning Pending CN115983320A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211657889.1A CN115983320A (en) 2022-12-22 2022-12-22 Federal learning model parameter quantification method based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211657889.1A CN115983320A (en) 2022-12-22 2022-12-22 Federal learning model parameter quantification method based on deep reinforcement learning

Publications (1)

Publication Number Publication Date
CN115983320A true CN115983320A (en) 2023-04-18

Family

ID=85966039

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211657889.1A Pending CN115983320A (en) 2022-12-22 2022-12-22 Federal learning model parameter quantification method based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN115983320A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117648123A (en) * 2024-01-30 2024-03-05 中国人民解放军国防科技大学 Micro-service rapid integration method, system, equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117648123A (en) * 2024-01-30 2024-03-05 中国人民解放军国防科技大学 Micro-service rapid integration method, system, equipment and storage medium
CN117648123B (en) * 2024-01-30 2024-06-11 中国人民解放军国防科技大学 Micro-service rapid integration method, system, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN109002889B (en) Adaptive iterative convolution neural network model compression method
CN110969251B (en) Neural network model quantification method and device based on label-free data
CN107729999A (en) Consider the deep neural network compression method of matrix correlation
CN111832627A (en) Image classification model training method, classification method and system for suppressing label noise
CN111985523A (en) Knowledge distillation training-based 2-exponential power deep neural network quantification method
CN113938488A (en) Load balancing method based on dynamic and static weighted polling
CN115906954A (en) Multivariate time sequence prediction method and device based on graph neural network
CN110309904B (en) Neural network compression method
CN115983320A (en) Federal learning model parameter quantification method based on deep reinforcement learning
CN112488313A (en) Convolutional neural network model compression method based on explicit weight
CN112990478A (en) Federal learning data processing system
CN114154646A (en) Efficiency optimization method for federal learning in mobile edge network
CN111966226B (en) Touch communication fault-tolerant method and system based on compensation type long-term and short-term memory network
CN110647990A (en) Cutting method of deep convolutional neural network model based on grey correlation analysis
CN113313265A (en) Reinforced learning method based on expert demonstration with noise
CN111832817A (en) Small world echo state network time sequence prediction method based on MCP penalty function
CN114819143A (en) Model compression method suitable for communication network field maintenance
CN111563203A (en) Intelligent household user-service interest degree personalized prediction device and method based on rapid non-negative implicit characteristic analysis
CN114972232A (en) No-reference image quality evaluation method based on incremental meta-learning
Li et al. Adaptive low-precision training for embeddings in click-through rate prediction
CN112651500B (en) Method for generating quantization model and terminal
CN114239949A (en) Website access amount prediction method and system based on two-stage attention mechanism
CN113743012B (en) Cloud-edge collaborative mode task unloading optimization method under multi-user scene
CN113033653B (en) Edge-cloud cooperative deep neural network model training method
CN113392958A (en) Parameter optimization and application method and system of fuzzy neural network FNN

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination