CN116055489A - Asynchronous federal optimization method for selecting vehicles based on DDPG algorithm - Google Patents
Asynchronous federal optimization method for selecting vehicles based on DDPG algorithm Download PDFInfo
- Publication number
- CN116055489A CN116055489A CN202310038329.6A CN202310038329A CN116055489A CN 116055489 A CN116055489 A CN 116055489A CN 202310038329 A CN202310038329 A CN 202310038329A CN 116055489 A CN116055489 A CN 116055489A
- Authority
- CN
- China
- Prior art keywords
- vehicle
- local
- training
- time slot
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 51
- 238000005457 optimization Methods 0.000 title claims abstract description 42
- 238000004422 calculation algorithm Methods 0.000 title claims abstract description 38
- 238000012549 training Methods 0.000 claims abstract description 105
- 230000005540 biological transmission Effects 0.000 claims abstract description 61
- 230000009471 action Effects 0.000 claims abstract description 29
- 230000002776 aggregation Effects 0.000 claims abstract description 16
- 238000004220 aggregation Methods 0.000 claims abstract description 16
- 238000012545 processing Methods 0.000 claims description 14
- 230000006870 function Effects 0.000 claims description 13
- 238000013527 convolutional neural network Methods 0.000 claims description 11
- 238000004364 calculation method Methods 0.000 claims description 9
- 230000007774 longterm Effects 0.000 claims description 8
- 230000007423 decrease Effects 0.000 claims description 6
- 230000033001 locomotion Effects 0.000 claims description 4
- 238000004891 communication Methods 0.000 claims description 3
- 238000009826 distribution Methods 0.000 claims description 3
- 238000005562 fading Methods 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 238000001228 spectrum Methods 0.000 claims description 3
- 238000011144 upstream manufacturing Methods 0.000 claims description 3
- 238000004088 simulation Methods 0.000 abstract description 3
- 238000010586 diagram Methods 0.000 description 12
- 238000012360 testing method Methods 0.000 description 8
- 238000004590 computer program Methods 0.000 description 7
- 238000003860 storage Methods 0.000 description 3
- 230000001934 delay Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000002787 reinforcement Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- RFHAOTPXVQNOHP-UHFFFAOYSA-N fluconazole Chemical compound C1=NC=NN1CC(C=1C(=CC(F)=CC=1)F)(O)CN1C=NC=N1 RFHAOTPXVQNOHP-UHFFFAOYSA-N 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008707 rearrangement Effects 0.000 description 1
- 230000000630 rising effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/12—Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W28/00—Network traffic management; Network resource management
- H04W28/16—Central resource management; Negotiation of resources or communication parameters, e.g. negotiating bandwidth or QoS [Quality of Service]
- H04W28/18—Negotiating wireless communication parameters
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W28/00—Network traffic management; Network resource management
- H04W28/16—Central resource management; Negotiation of resources or communication parameters, e.g. negotiating bandwidth or QoS [Quality of Service]
- H04W28/18—Negotiating wireless communication parameters
- H04W28/20—Negotiating bandwidth
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W28/00—Network traffic management; Network resource management
- H04W28/16—Central resource management; Negotiation of resources or communication parameters, e.g. negotiating bandwidth or QoS [Quality of Service]
- H04W28/18—Negotiating wireless communication parameters
- H04W28/22—Negotiating communication rate
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W4/00—Services specially adapted for wireless communication networks; Facilities therefor
- H04W4/30—Services specially adapted for particular environments, situations or purposes
- H04W4/40—Services specially adapted for particular environments, situations or purposes for vehicles, e.g. vehicle-to-pedestrians [V2P]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W4/00—Services specially adapted for wireless communication networks; Facilities therefor
- H04W4/30—Services specially adapted for particular environments, situations or purposes
- H04W4/40—Services specially adapted for particular environments, situations or purposes for vehicles, e.g. vehicle-to-pedestrians [V2P]
- H04W4/44—Services specially adapted for particular environments, situations or purposes for vehicles, e.g. vehicle-to-pedestrians [V2P] for communication between vehicles and infrastructures, e.g. vehicle-to-cloud [V2C] or vehicle-to-home [V2H]
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Quality & Reliability (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
The invention provides an asynchronous federal optimization method for selecting vehicles based on a DDPG algorithm, which comprises the steps of setting the system state, action and rewards of a time slot t according to the transmission rate of the vehicle, the size of available computing resources and the position of the vehicle; selecting vehicles participating in training according to the system action of the time slot t to obtain selected vehicles; the selected vehicle performs local training by using local data to obtain a corresponding local model; considering the hysteresis influence of training time delay on the local model trained by the vehicle, and carrying out weight optimization on the local model to obtain a weight-optimized local model; and the trained vehicle asynchronously uploads the local model after weight optimization to a roadside unit for asynchronous federal aggregation, and finally the roadside unit obtains a global model through repeated training of multiple rounds. The method is simple and convenient to calculate, the system model is reasonable, and simulation experiments prove that the method can obtain higher overall model precision in the vehicle environment.
Description
Technical Field
The invention relates to the technical field of vehicle-mounted networks, in particular to an asynchronous federal optimization method for selecting vehicles based on a DDPG (Deep deterministic policy gradient) algorithm.
Background
Along with the development of science and technology, the internet of vehicles technology is gradually rising, so that the travel of people is more convenient. While intelligent services on various vehicles are also emerging. The vehicle may then perform some calculation tasks as required during travel on the road. However, in the conventional cloud computing service, since the cloud is far from the vehicle, a large time delay is generated during uploading, and the cloud is not suitable for a vehicle scene moving at a high speed, so that vehicle-mounted edge computing is generated. The vehicle can upload the calculation task to a roadside Unit (RSU) which has certain calculation capability and is close to the vehicle for task processing, so that the time delay of task processing is greatly reduced. However, the task calculation of the vehicle requires the vehicle to upload local data to the roadside unit for processing by the roadside unit. This creates privacy security concerns. The vehicle user may be concerned about privacy disclosure and unwilling to upload local data. Federal learning techniques are thus created. In particular, federal learning may perform a certain number of global aggregations at roadside units. In the first round of training, the vehicle downloads the initialized global model at the roadside unit first, then uses the local data to carry out local training, uploads the local model instead of the local data after the training is finished, carries out aggregation processing of the local model after the roadside unit receives the local models of all vehicles, and then repeats the second round until the stipulated times are reached. This greatly protects the privacy of the vehicle user.
However, in the traditional federal learning, the roadside units need to wait for all vehicles to upload the local model before updating the global model, and if a certain vehicle training exists and the uploading time delay is too long, other vehicles can be caused to travel out of the coverage range of the roadside units, so that the vehicles cannot participate in the global training. An asynchronous federal training is then produced. Specifically, the vehicle uses the local data to upload the local model after local training. At the roadside unit, it performs a global model aggregation once every time it receives a local model from a vehicle upload. This enables faster updating of the global model at the roadside unit without waiting for the upload of other vehicles.
Since the vehicle itself has a certain mobility, time-varying channel conditions are created, resulting in time-varying transmission rates, which lead to different transmission delays for the vehicle. Meanwhile, different vehicles have different time-varying computing resources and have different amounts of local data, so that different local training delays are caused. In the asynchronous federal training process, because the vehicles are asynchronously uploaded with the local model, when one vehicle does not upload the local model to the roadside unit, the roadside unit updates the global model according to the other uploaded local model, and the local model of the vehicle has certain staleness. The staleness is related to the local training delay and the transmission delay. It is then crucial to consider the impact of the above factors on the accuracy of the global model at the roadside units.
Because bad nodes may exist in the vehicle, that is, the vehicle itself has little available computing resources, the local data volume is small, and the local model after the local training is interfered to some extent. The presence of bad nodes can greatly affect the accuracy of the global model at the roadside units. It is then necessary to select the vehicle nodes that participate in the global aggregation.
Therefore, the invention provides an asynchronous federal optimization method for selecting vehicles based on a DDPG algorithm under the condition of comprehensively considering the mobility of the vehicles, time-varying channel conditions, time-varying available computing resources of the vehicles themselves, different local data amounts of the vehicles and the existence of bad nodes of the vehicles.
Disclosure of Invention
Therefore, the embodiment of the invention provides an asynchronous federal optimization method for selecting vehicles based on a DDPG algorithm, which is used for solving the problem of low precision of a generated global model caused by mobility of the vehicles, time-varying channel conditions, time-varying available computing resources of the vehicles, different local data amounts of the vehicles and the existence of bad nodes of the vehicles in the prior art.
In order to solve the above problems, an embodiment of the present invention provides an asynchronous federal optimization method for selecting a vehicle based on a DDPG algorithm, the method comprising:
s1: setting the system state, action and rewards of the time slot t according to the transmission rate of the vehicle, the size of available computing resources and the position of the vehicle;
s2: selecting vehicles participating in training according to the system action of the time slot t to obtain selected vehicles;
s3: the selected vehicle performs local training by using local data to obtain a corresponding local model;
s4: considering the hysteresis influence of training time delay and transmission time delay on a local model trained by a vehicle, and performing weight optimization on the local model to obtain a weight-optimized local model;
s5: and the trained vehicle asynchronously uploads the local model after weight optimization to a roadside unit for asynchronous federal aggregation, and finally the roadside unit obtains a global model through repeated training of multiple rounds.
Preferably, in step S1, setting the system state, action and rewards of the time slot t according to the transmission rate of the vehicle itself, the available computing resource size and the vehicle position includes:
the system state of the set time slot t is as follows:
s(t)=(Tr(t),μ(t),d x (t),a(t-1))
where s (t) is the system state of the time slot t, tr (t) is the set of transmission rates of all vehicles in the time slot t, μ (t) is the set of available computing resources of all vehicles in the time slot t, d x (t) is a set of position coordinates of all vehicles along the x-axis at time slot t, and a (t-1) is a system action of time slot t-1;
the system action of the set time slot t is as follows:
a(t)=(λ 1 (t),λ 2 (t),…,λ K (t))
wherein a (t) is the system operation of time slot t, lambda i (t),i∈[1,K]Representation selectionProbability of selecting vehicle i, let λ 1 (0)=λ 2 (0)=…=λ K (0)=1;
The system rewards of the set time slot t are as follows:
where r (t) is the system prize for time slot t, ω 1 and ω2 Is a non-negative weight factor, a di (t) is the system action of time slot t, lambda i (t),i∈[1,K]Representing the probability of selecting vehicle i, loss (t) is the Loss value calculated in the asynchronous federal training,delay generated for local training of vehicle i, < >>And uploading the transmission delay of the local model for the vehicle i in the time slot t.
Preferably, in step S2, the vehicle involved in the training is selected according to the system action of the time slot t, and the obtaining of the selected vehicle includes the following steps:
s21: set a d (t)=(a d1 (t),a d2 (t),…,a dK (t));
S22: lambda is set to i (t) normalization processing, setting lambda i (t) corresponding a with a value of more than or equal to 0.5 di (t) is denoted as 1, otherwise 0, the resulting set a d (t) is composed of 0 and 1, 1 means selecting a vehicle, and 0 means not selecting a vehicle.
Preferably, the expected long-term discount rewards of the system based on the time slot t may be expressed as:
where gamma E (0, 1) is the discount factor, N is the total number of slots, mu is the policy of the system, J (mu) is the expected long-term discount rewards of the system.
Preferably, in step S3, the step of performing local training on the selected vehicle by using local data to obtain a corresponding local model includes the following steps:
s31: at time slot t, vehicle V k Downloading global model w from roadside units t-1 Wherein, at time slot 1, the global model at the roadside unit is initialized to w using a convolutional neural network 0 ;
S32: vehicle V k Local data is trained based on convolutional neural network, the local training consists of a round I, and the local training is performed on the m (m E [1, l)]) In the wheel local training, the vehicle V k First, the tag probability of each local data a, i.e., y a Input to local model w k,m And then obtaining the prediction probability of the convolutional neural network for the label of each dataComputing w using cross entropy loss function k,m The loss value of (2) is calculated as follows:
s33: the local model is updated using a random gradient descent algorithm, the formula is as follows:
s34: vehicle V k Performing m+1 local training by using the updated local model, stopping the local training when the local training round reaches l, and obtaining the updated local model w by the vehicle k 。
Preferably, the training delay is:
wherein ,time delay generated for local training of vehicle i, C 0 The number of CPU cycles, mu, required to train a datum i For the calculation resource of the vehicle i, the CPU cycle frequency is used for measuring, and each vehicle i (i is more than or equal to 1 and less than or equal to K) carries different data quantity D i 。
Preferably, the transmission delay is:
d i (t)=||P i (t)-P r ||
wherein ,for the transmission delay of a local model uploaded by a vehicle i in a time slot t, the W is the local model size obtained by local training of each vehicle, and tr i (t) is the transmission rate of t-slot vehicle i, B is the transmission bandwidth, p 0 For the transmitting power of each vehicle, is a fixed value, h i (t) channel gain for time slot t, α is path loss index, σ 2 For noise power, the position P of the vehicle i in the time slot t i (t) is set as (d) ix (t),d y 0), where d ix(t) and dy The positions d of the antennas of the roadside units along the x-axis and the y-axis of the vehicle i at the time slot t, respectively y Is a fixed value d ix (t)=d i0 +vt,d i0 The coordinates along the x-axis of the initial position of the vehicle i, v is the vehicle speed,t is a time slot, and the antenna height of the roadside unit is set to be H r The antenna position of the roadside unit is denoted as P r =(0,0,H r )。
Preferably, an autoregressive model is employed to construct h i(t) and hi The relationship between (t-1), namely:
wherein ,ρi E (t) is an error vector obeying a complex gaussian distribution and is equal to h, which is a normalized channel correlation coefficient between successive time slots i (t) correlation, based on the Jack fading spectrum, wherein J0 (. Cndot.) is zero-order Bessel function of the first class and +.>For Doppler frequency of vehicle i, +.>Λ is wavelength, θ is direction of movement, i.e., x 0 = (1, 0) and upstream communication direction, i.e. P r -P i An included angle between (t), thus +.>
Preferably, in step S4, the method for performing weight optimization on the local model is as follows:
performing weight optimization on the local model, wherein the weights comprise training weights and transmission weights, and the training weights are as follows:
wherein ,β1,k To train weights, m 1 E (0, 1) is a parameter which causes beta 1,k Along with the bookThe ground training time delay is increased and decreased,is a vehicle V k Is a local computation delay of (1);
the transmission weights are:
wherein ,β2,k (t) is a transmission weight, m 2 E (0, 1) is a parameter which causes beta 2,k (t) decreases as the transmission delay increases,is a vehicle V k Is a transmission delay of (1);
according to formula w kw =w k *β 1,k *β 2,k Obtaining a local model with optimized weight;
wherein ,wk As a local model, w kw Beta for weight optimized local model 1,k To train weights, beta 2,k And (t) is a transmission weight.
Preferably, in step S5, the trained vehicle asynchronously uploads the local model after weight optimization to the roadside unit for asynchronous federal aggregation, and the obtaining the global model by the roadside unit through repeated training for multiple rounds specifically includes:
preferably, when the vehicle V k After uploading the local model with optimized weight to the roadside unit, the roadside unit performs global aggregation once, and the formula is as follows:
w new =βw old +(1-β)w kw
wherein ,wold Is the current global model at the roadside unit, w new W is the updated global model kw Beta epsilon (0, 1) is the aggregation proportion for the local model after weight optimization;
when the roadside unit receives the first uploaded local model at the beginning of each time slot, w old =w t-1 When the road side unit receives the local models of all the selected vehicles and gets updated K 1 Post global model w t And the global model updating of the time slot is finished.
From the above technical scheme, the invention has the following advantages:
the embodiment of the invention provides an asynchronous federal optimization method for selecting vehicles based on a DDPG algorithm, which uses a deep reinforcement learning algorithm to select vehicles participating in training according to the transmission rate of the vehicles, the size of available computing resources and the positions of the vehicles, and removes possible bad nodes in the vehicles; the vehicle adopts asynchronous federal training, and the roadside units aggregate global models once each time when receiving a local model from the uploading of the vehicle, so that the global model at the roadside units can be updated more quickly without waiting for the uploading of other vehicles; when the vehicle carries out the local model, the hysteresis influence caused by training time delay and transmission time delay on the local model trained by the vehicle is considered, the weight optimization is carried out on the local model, and the accuracy of the global model at the roadside unit is improved.
Drawings
For a clearer description of embodiments of the invention or of solutions in the prior art, reference will be made to the accompanying drawings, which are intended to be used in the examples, for a clearer understanding of the characteristics and advantages of the invention, by way of illustration and not to be interpreted as limiting the invention in any way, and from which, without any inventive effort, a person skilled in the art can obtain other figures. Wherein:
FIG. 1 is a flow chart of an asynchronous federal optimization method for selecting a vehicle based on a DDPG algorithm, according to an embodiment;
FIG. 2 is a schematic view of a scene framework of the method of the present invention;
FIG. 3 shows two amounts in the bonus, test phase: schematic diagram of the sum of the loss value and the vehicle delay in asynchronous federal learning, namely the sum of the local delay and the transmission delay;
FIG. 4 is a schematic diagram of the accuracy comparison of the method of the present invention with asynchronous federal learning in the presence of bad nodes during the test phase;
FIG. 5 is a schematic diagram of the method of the present invention in comparison to loss of asynchronous federal learning in the presence of bad nodes during the test phase;
FIG. 6 is a schematic diagram of the accuracy of the method of the present invention in comparison to asynchronous Federal learning without local weighting, with nodes selected;
FIG. 7 is a schematic diagram of the method of the present invention in comparison to the loss of asynchronous federal learning without local weight treatment, with nodes selected;
FIG. 8 is a schematic diagram of the method of the present invention versus training time delay for federal learning as global passes increase;
FIG. 9 is a graph showing the accuracy of the method of the present invention versus the accuracy of a global model of conventional asynchronous federal learning at a selected node at different beta values.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more clear, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
As shown in fig. 1, an embodiment of the present invention provides an asynchronous federal optimization method for selecting a vehicle based on a DDPG algorithm, the method comprising:
s1: setting the system state, action and rewards of the time slot t according to the transmission rate of the vehicle, the size of available computing resources and the position of the vehicle;
s2: selecting vehicles participating in training according to the system action of the time slot t to obtain selected vehicles;
s3: the selected vehicle performs local training by using local data to obtain a corresponding local model;
s4: considering the hysteresis influence of training time delay and transmission time delay on a local model trained by a vehicle, and performing weight optimization on the local model to obtain a weight-optimized local model;
s5: and the trained vehicle asynchronously uploads the local model after weight optimization to a roadside unit for asynchronous federal aggregation, and finally the roadside unit obtains a global model through repeated training of multiple rounds.
The invention provides an asynchronous federal optimization method for selecting vehicles based on a DDPG algorithm, which is characterized in that time slots are set according to the transmission rate of the vehicles, the size of available computing resources and the positions of the vehicles, the vehicles participating in training are selected, and possible bad nodes in the vehicles are removed; the selected vehicle performs local training by using local data to obtain a corresponding local model, and when the vehicle performs the local model, the hysteresis influence of training time delay and transmission time delay on the trained local model of the vehicle is considered, the weight of the local model is optimized, and the accuracy of the global model at the roadside unit is improved; the trained vehicle asynchronously uploads the local model with optimized weight to the roadside unit for asynchronous federation, and finally the roadside unit obtains the global model through repeated training of multiple rounds. The method is simple and convenient to calculate, the system model is reasonable, and simulation experiments prove that the method can obtain higher overall model precision in the vehicle environment.
FIG. 2 is a schematic view of a scene framework of the method of the present invention, wherein a deep reinforcement learning algorithm is used to select vehicles participating in training according to the transmission rate of the vehicles themselves, the size of available computing resources, the position of the vehicles, etc., and then the selected vehicles train a local model by using an asynchronous federal technique and then upload to roadside units to finally obtain a relatively accurate global model.
Further, in step S1, it includes:
the system state, action and rewards of the time slot t are set according to the transmission rate of the vehicle, the size of available computing resources and the position of the vehicle, and specifically comprise:
since the mobility of a vehicle can be represented by its change in position, the training time and uploading time of the local model of the vehicle are related to the time-varying available computing resources of the vehicle itself and to the current channel conditions, and the system state s (t) of the time slot t is then defined as:
s(t)=(Tr(t),μ(t),d x (t),a(t-1))
where s (t) is the system state of the time slot t, tr (t) is the set of transmission rates of all vehicles in the time slot t, μ (t) is the set of available computing resources of all vehicles in the time slot t, d x (t) is a set of position coordinates of all vehicles along the x-axis in each time slot t, and a (t-1) is a system action of the time slot t-1.
Since the invention aims to select a better vehicle to perform asynchronous federal learning training according to the current state, the system action a (t) of the t time slot is defined as:
wherein a (t) is the system operation of time slot t, lambda i (t),i∈[1,K]Representing the probability of selecting vehicle i, let λ 1 (0)=λ 2 (0)=…=λ K (0)=1。
The invention aims to select vehicles with better performance to perform asynchronous federal training so as to obtain a more accurate global model at a roadside unit, and simultaneously consider the time delay and the precision of the global model, so that a system reward r (t) of a time slot t is defined as:
where r (t) is the system prize for time slot t, ω 1 and ω2 Is a non-negative weight factor, a di (t) is the system action of time slot t, lambda i (t),i∈[1,K]Representing the probability of selecting vehicle i, loss (t) is the Loss value calculated in the asynchronous federal training,delay generated for local training of vehicle i, < >>And uploading the transmission delay of the local model for the vehicle i in the time slot t.
The expected long-term discount rewards of the system may be expressed as:
where gamma E (0, 1) is the discount factor, N is the total number of slots, mu is the policy of the system, J (mu) is the expected long-term discount rewards of the system.
Further, in step S2, it includes:
to select a specific vehicle, set a d (t)=(a d1 (t),a d2 (t),…,a dK (t)), lambda is set i (t) normalization processing, setting lambda i (t) corresponding a with a value of more than or equal to 0.5 di (t) is denoted as 1, otherwise 0, the resulting set a d (t) is composed of 0 and 1, 1 means selecting a vehicle, and 0 means not selecting a vehicle.
Further, in step S3, it includes:
the selected vehicle is locally trained by using local data to obtain a corresponding local model, and the method comprises the following steps of:
s31: at time slot t, vehicle V k Downloading global model w from roadside units t-1 Wherein, at time slot 1, the global model at the roadside unit is initialized to w using a convolutional neural network 0 ;
S32: vehicle V k Training local data based on convolutional neural networkThe local training consists of a round of l, at m (mE [1, l)]) In the wheel local training, the vehicle Vk first sets the tag probability of each local data a, i.e., y a Input to local model w k,m And then obtaining the prediction probability of the convolutional neural network for the label of each dataComputing w using cross entropy loss function k,m The loss value of (2) is calculated as follows:
s33: the local model is updated using a random gradient descent algorithm, the formula is as follows:
s34: vehicle V k Performing m+1 local training by using the updated local model, stopping the local training when the local training round reaches l, and obtaining the updated local model w by the vehicle k 。
Further, in step S4, it includes:
when the vehicle performs local training, training delay and transmission delay are generated, and the training delay is as follows:
wherein ,time delay generated for local training of vehicle i, C 0 The number of CPU cycles, mu, required to train a datum i For the calculation resource of the vehicle i, the CPU cycle frequency is used for measuring, and each vehicle i (i is more than or equal to 1 and less than or equal to K) carries different data quantity D i ;
The transmission delay is as follows:
d i (t)=||P i (t)-P r ||
wherein ,for the transmission delay of a local model uploaded by a vehicle i in a time slot t, the W is the local model size obtained by local training of each vehicle, and tr i (t) is the transmission rate of t-slot vehicle i, B is the transmission bandwidth, p 0 For the transmitting power of each vehicle, is a fixed value, h i (t) channel gain for time slot t, α is path loss index, σ 2 For noise power, the position P of the vehicle i in the time slot t i (t) is set as (d) ix (t),d y 0), where d ix(t) and dy The positions d of the antennas of the roadside units along the x-axis and the y-axis of the vehicle i at the time slot t, respectively y Is a fixed value d ix (t)=d i0 +vt,d i0 For the coordinates of the initial position of the vehicle i along the x-axis, v is the vehicle speed, t is the time slot, and the antenna height of the roadside unit is set to H r The antenna position of the roadside unit is denoted as P r =(0,0,H r )。
Wherein, an autoregressive model is adopted to construct h i(t) and hi The relationship between (t-1), namely:
wherein ,ρi E (t) is an error vector obeying a complex gaussian distribution and is equal to h, which is a normalized channel correlation coefficient between successive time slots i (t) correlation, based on the Jack fading spectrum, wherein J0 (. Cndot.) is zero-order Bessel function of the first class and +.>For Doppler frequency of vehicle i, +.>Λ is wavelength, θ is direction of movement, i.e., x 0 = (1, 0) and upstream communication direction, i.e. P r -P i An included angle between (t), thus +.>
Unlike traditional asynchronous federal learning, the invention considers the hysteresis effect of training time delay and transmission time delay on the local model trained by the vehicle. Specifically, since there is a delay in both local training of the vehicle and uploading of the local model to the roadside unit, there may be a case where the roadside unit has received the local model uploaded from the other vehicle and updated the global model when one vehicle goes from the local training to the uploading to the roadside unit. In this case, the local model trained on the vehicle has a certain hysteresis. Thus the invention is directed to vehicle V k And (3) performing certain weight processing, namely setting training weights and transmission weights. The specific calculation method is as follows:
performing weight optimization on the local model, wherein the weights comprise training weights and transmission weights, and the training weights are as follows:
wherein ,β1,k To train weights, m 1 E (0, 1) is a parameter which causes beta 1,k As the local training delay increases and decreases,is a vehicle V k Is a local computation delay of (1);
the transmission weights are:
wherein ,β2,k (t) is a transmission weight, m 2 E (0, 1) is a parameter which causes beta 2,k (t) decreases as the transmission delay increases,is a vehicle V k Is a transmission delay of (1);
according to formula w kw =w k *β 1,k *β 2,k Obtaining a local model with optimized weight;
wherein ,wk As a local model, w kw Beta for weight optimized local model 1,k To train weights, beta 2,k And (t) is a transmission weight.
Further, in step S5, it includes:
the trained vehicle asynchronously uploads the local model after weight optimization to a roadside unit for asynchronous federal aggregation, and the roadside unit finally obtains a global model by repeated training of multiple rounds, wherein the global model comprises the following specific steps:
when the vehicle V k After uploading the local model with optimized weight to the roadside unit, the roadside unit performs global aggregation once, and the formula is as follows:
w new =β wold +(1-β)w kw
wherein ,wold Is the current global model at the roadside unit, w new W is the updated global model kw For the rightBeta epsilon (0, 1) is the aggregation ratio of the re-optimized local model;
when the roadside unit receives the first uploaded local model at the beginning of each time slot, w old =w t-1 When the road side unit receives the local models of all the selected vehicles and gets updated K 1 Post global model w t And the global model updating of the time slot is finished.
At the same time, the average Loss (t) of the vehicles involved in the training can be obtained, which can be expressed as:
wherein ,fk (w k ) Is a local model w k Is a loss value of (2).
To further illustrate the principles and advantages of the present invention, reference is made to the following detailed description of the invention.
The present invention aims to find an optimal strategy mu * To maximize the expected long-term discount rewards of the system.
The overall algorithm specifically adopted by the invention comprises two parts, namely an algorithm based on a training stage of a DAFL (Data-Free Learning) framework and an algorithm based on a testing stage of the DAFL framework.
Wherein the training phase algorithm steps based on the DAFL framework are shown in table 1.
TABLE 1
The invention provides an asynchronous federal optimization method for selecting vehicles based on a DDPG algorithm, wherein the DDPG algorithm is based on an actor-critic network architecture. The actor network is used for carrying out strategy improvement, and the critic network is used for enteringRow policy evaluation. Specifically, an actor network is used to approximate a policy μ, the approximated policy of which is denoted μ δ . The actor network is based on strategy mu δ And observe the state to output the action.
The invention promotes and evaluates the strategy through iteration so as to finally obtain the optimal strategy. In order to ensure the stability of the DDPG algorithm, a target network consisting of a target actor network and a target critic network is also adopted, and the architecture of the DDPG algorithm is respectively the same as that of the actor network and the critic network.
Setting delta as an actor network parameter, and zeta as a critic network parameter * For optimized actor network parameters, ζ * Delta for optimized critic network parameters 1 Is the target actor network parameter, ζ 1 Is the target critical network parameter. τ is the update parameter of the target network, Δ t Noise explored for slot t action. I is the small lot size. The algorithm of the training phase will be described in detail next.
First, δ and ζ are randomly initialized and δ in the target network is simultaneously initialized 1 and ξ1 Initialized to delta and zeta, respectively. At the same time, experience playback buffer R b Initialization is performed.
Next, the algorithm will execute E max Each round. In the first round, the locations of all vehicles, the channel states, and the available computing resource sizes of the vehicles themselves are reset. And set lambda 1 (0)=λ 2 (0)=…=λ K (0) =1, then in the first time slot, the system can obtain the initial state s (1) = (Tr (1), μ (1), d x (1) A (0)). Initializing global model w at roadside units simultaneously using CNN (Convolutional Neural Networks, convolutional neural network) 0 。
The algorithm will then execute continuously from slot 1 to a maximum number of slots N. In the first time slot, the actor network obtains the output mu according to the state δ (s|delta) where a random noise delta is added to the motion t The system then gets action a (1) =μ δ (s(1)|δ)+Δ t . Subsequently calculate a from the action d (1) The time slot selected vehicle is determined. Is selected to beThe vehicle performs asynchronous federal training, i.e., the vehicle trains a local model based on local data, and then asynchronously uploads to the roadside units for global model updating, after which Loss values Loss (1) are calculated. Meanwhile, the local training time delay and the transmission time delay of the vehicle are calculated, so that the system rewards under the time slot 1 can be obtained. The vehicle location is then updated, the channel conditions are recalculated, and the available computing resources of the vehicle itself, the transmission rate of the vehicle is updated so that the system can observe the next state s (2). Then storing the tuples (s (1), a (1), R (1), s (2)) into R b Is a kind of medium.
When R is b When the number of the tuples in the system is less than or equal to I, the system directly inputs the next state into the actor network and carries out the next iteration.
When R is b When the number of the tuples in the target network is larger than I, parameters delta, zeta and delta in an actor network, a critic network and a target network are calculated 1 and ξ1 Update is started to maximize J (μ) δ ). The parameter delta of the actor network is towards J (mu) δ ) The gradient direction of (i.e.)And updating. Will obey strategy μ at s (t) and a (t) δ Is set to +.>The expression is as follows:
which represents a long-term expected discount prize for the time slot t system.
Solving forCan be solved by +.>Gradient of->Instead of it. Critic network uses the parameter ζ pair->Approximately Q ξ (s(t),a(t))。/>
Next, the parameters δ, ζ, δ for the time slot t will be described 1 and ξ1 Is updated by the update method of (a). When R is b When the number of the tuples in the system is greater than I, the system is controlled by R b The I tuples are randomly decimated into a small batch. Design(s) x ,a x ,r x ,s′ x ),x∈[1,2,…,I]Is the x-th tuple in the small lot. The system then first takes s' x Inputting target actor network to obtain output actionThen the s 'is added again' x and a′x Inputting the target critic network to obtain an output action value functionThe target value may then be calculated as:
then according to s x and ax The critic network has an output Q ξ (s x ,a x ) The loss of tuple x can then be calculated as:
L x =[y x -Q ξ (s x ,a x )] 2
when all tuples are input to the critic network and the target network, a loss function is obtained:
critic network is passed through pairThe gradient descent method is used to minimize the loss function L (ζ) and thereby update the parameter ζ.
Similarly, the actor network is connected with the server by the serverMaximizing J (μ) using gradient ascent δ ) Thereby updating the parameter delta. Wherein->The formula is calculated by the action value function approximated by the critic network as follows:
At the end of the time slot t, updating the parameters of the target network, wherein the updating formula is as follows:
ξ 1 ←τξ+(1-τ)ξ 1
δ 1 ←τδ+(1-τ)δ 1
wherein τ is a constant and satisfies τ < 1.
Finally the system inputs s' into the actor network and starts the iterative computation of the next slot. When the time slot t reaches a maximum value N, the round ends. Then the system reinitializes the state value s (1) = (Tr (1), μ (1), d x (1) A (0)) and the next round of training is performed. When the number of rounds reaches the maximum E max After the training is finished, obtaining optimized parameters of the actor network, the critic network, the target actor network and the target critic network, namely delta * 、ξ * 、 and
The test phase simulates the training phase's critic network, target actor network, and target critic network. And using a parameter delta with an optimum value * Is a policy of optimization.
Wherein the test phase algorithm steps based on the DAFL framework are shown in table 2.
TABLE 2
1. For each round 1.ltoreq.epi.ltoreq.E' max Performing: |
2. resetting simulation parameters of the system model, initializing a global model at the |
3. Obtaining an initial state s (1) |
4. For each time slot 1 t N: |
5. generating actions a=μ according to the current policy δ (s|δ) |
6. Calculation of a d Determining a selected |
7. Weight-based AFL update training of selected |
8. Obtaining prize r and next state s 'from current system' |
Through the above experiments, the method of the invention has the following conclusion:
1. in the test phase, as the step increases, the loss value in the asynchronous federation, i.e. the local delay, gradually decreases, and the delay of the vehicle, i.e. the transmission delay, remains within a certain range, as shown in fig. 3.
2. In the test stage, under the condition that bad nodes exist, compared with asynchronous federal learning and federal learning, the method has higher precision and lower loss, as shown in fig. 4 and 5.
3. In the case of selecting a node of the nodes, the method of the present invention has higher accuracy and lower loss than asynchronous federal learning without local weight processing and federal learning without local weight processing, as shown in fig. 6 and 7.
4. With the increase of global rounds, the training time delay of the method is smaller compared with federal learning, as shown in fig. 8.
5. At different beta values, the accuracy of the method is higher than that of the conventional asynchronous federal learning global model under the selection node, as shown in fig. 9.
Note that the above is only a preferred embodiment of the present invention and the technical principle applied. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, while the invention has been described in connection with the above embodiments, the invention is not limited to the embodiments, but may be embodied in many other equivalent forms without departing from the spirit or scope of the invention, which is set forth in the following claims.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It is apparent that the above examples are given by way of illustration only and are not limiting of the embodiments. Other variations and modifications of the present invention will be apparent to those of ordinary skill in the art in light of the foregoing description. It is not necessary here nor is it exhaustive of all embodiments. And obvious variations or modifications thereof are contemplated as falling within the scope of the present invention.
Claims (10)
1. An asynchronous federal optimization method for selecting vehicles based on a DDPG algorithm, comprising:
s1: setting the system state, action and rewards of the time slot t according to the transmission rate of the vehicle, the size of available computing resources and the position of the vehicle;
s2: selecting vehicles participating in training according to the system action of the time slot t to obtain selected vehicles;
s3: the selected vehicle performs local training by using local data to obtain a corresponding local model;
s4: considering the hysteresis influence of training time delay and transmission time delay on a local model trained by a vehicle, and performing weight optimization on the local model to obtain a weight-optimized local model;
s5: and the trained vehicle asynchronously uploads the local model after weight optimization to a roadside unit for asynchronous federal aggregation, and finally the roadside unit obtains a global model through repeated training of multiple rounds.
2. The asynchronous federal optimization method for vehicle selection based on DDPG algorithm according to claim 1, wherein the setting of the system state, actions, and rewards of the time slot t according to the vehicle' S own transmission rate, the available computing resource size, and the vehicle location in step S1 comprises:
the system state of the set time slot t is as follows:
s(t)=(Tr(t),μ(t),d x (t),a(t-1))
where s (t) is the system state of the time slot t, tr (t) is the set of transmission rates of all vehicles in the time slot t, μ (t) is the set of available computing resources of all vehicles in the time slot t, d x (t) is a set of position coordinates of all vehicles along the x-axis at time slot t, and a (t-1) is a system action of time slot t-1;
the system action of the set time slot t is as follows:
a(t)=(λ 1 (t),λ 2 (t),…,λ K (t))
wherein a (t) is the system operation of time slot t, lambda i (t),i∈[1,K]Representing the probability of selecting vehicle i, let λ 1 (0)=λ 2 (0)=…=λ K (0)=1;
The system rewards of the set time slot t are as follows:
where r (t) is the system prize for time slot t, ω 1 and ω2 Is a non-negative weight factor, a di (t) is the system action of time slot t, lambda i (t),i∈[1,K]Representing the probability of selecting vehicle i, loss (t) is the Loss value calculated in the asynchronous federal training,delay generated for local training of vehicle i, < >>And uploading the transmission delay of the local model for the vehicle i in the time slot t.
3. The asynchronous federal optimization method for selecting vehicles based on the DDPG algorithm according to claim 2, wherein in step S2, the vehicles participating in the training are selected according to the system action of the slot t, and the step of obtaining the selected vehicles comprises the steps of:
s21: set a d (t)=(a d1 (t),a d2 (t),…,a dK (t));
S22: lambda is set to i (t) normalization processing, setting lambda i (t) corresponding a with a value of more than or equal to 0.5 di (t) is denoted as 1, otherwise 0, the resulting set a d (t) is composed of 0 and 1, 1 means selecting a vehicle, and 0 means not selecting a vehicle.
4. The asynchronous federal optimization method for vehicle selection based on DDPG algorithm according to claim 2, wherein the expected long-term discounted rewards of the system based on slot t can be expressed as:
where gamma E (0, 1) is the discount factor, N is the total number of slots, mu is the policy of the system, J (mu) is the expected long-term discount rewards of the system.
5. The asynchronous federal optimization method for vehicle selection based on DDPG algorithm according to claim 1, wherein the local training of the selected vehicle using local data to obtain the corresponding local model in step S3 comprises the steps of:
s31: at time slot t, vehicle V k Downloading global model w from roadside units t-1 Wherein, at time slot 1, the global model at the roadside unit is initialized to w using a convolutional neural network 0 ;
S32: vehicle V k Local data is trained based on convolutional neural network, the local training consists of a round I, and the local training is performed on the m (m E [1, l)]) In the wheel local training, the vehicle V k First, the tag probability of each local data a, i.e., y a Input to local model w k,m And then obtaining the prediction probability of the convolutional neural network for the label of each dataComputing w using cross entropy loss function k,m The loss value of (2) is calculated as follows:
s33: the local model is updated using a random gradient descent algorithm, the formula is as follows:
s34: vehicle V k Performing m+1 local training by using the updated local model, stopping the local training when the local training round reaches l, and obtaining the updated local model w by the vehicle k 。
6. The asynchronous federal optimization method for vehicle selection based on DDPG algorithm according to claim 1, wherein in step S4, the training delay is:
wherein ,time delay generated for local training of vehicle i, C 0 The number of CPU cycles, mu, required to train a datum i For the calculation resource of the vehicle i, the CPU cycle frequency is used for measuring, and each vehicle i (i is more than or equal to 1 and less than or equal to K) carries different data quantity D i 。
7. The asynchronous federal optimization method for vehicle selection based on DDPG algorithm according to claim 1, wherein in step S4, the transmission delay is:
d i (t)=‖P i (t)-P r ‖
wherein ,for the transmission delay of a local model uploaded by a vehicle i in a time slot t, the W is the local model size obtained by local training of each vehicle, and tr i (t) is the transmission rate of t-slot vehicle i, B is the transmission bandwidth, p 0 For the transmitting power of each vehicle, is a fixed value, h i (t) channel gain for time slot t, α is path loss index, σ 2 For noise power, the position P of the vehicle i in the time slot t i (t) is set as (d) ix (t),d y 0), where d ix(t) and dy The positions d of the antennas of the roadside units along the x-axis and the y-axis of the vehicle i at the time slot t, respectively y Is a fixed value d ix (t)=d i0 +vt,d i0 For the coordinates of the initial position of the vehicle i along the x-axis, v is the vehicle speed, t is the time slot, and the antenna height of the roadside unit is set to H r The antenna position of the roadside unit is denoted as P r =(0,0,H r )。
8. The asynchronous federal optimization method for vehicle selection based on DDPG algorithm according to claim 7, wherein h is constructed by using an autoregressive model i(t) and hi The relationship between (t-1), namely:
wherein ,ρi E (t) is an error vector obeying a complex gaussian distribution and is equal to h, which is a normalized channel correlation coefficient between successive time slots i (t) correlation, based on the Jack fading spectrum, wherein J0 (. Cndot.) is zero-order Bessel function of the first class and +.>For Doppler frequency of vehicle i, +.>Λ is wavelength, θ is direction of movement, i.e., x 0 = (1, 0) and upstream communication direction, i.e. P r -P i An included angle between (t), thus +.>
9. The asynchronous federal optimization method for selecting vehicles based on the DDPG algorithm according to claim 1, wherein in step S4, the method for weight optimization of the local model is as follows:
performing weight optimization on the local model, wherein the weights comprise training weights and transmission weights, and the training weights are as follows:
wherein ,β1,k To train weights, m 1 E (0, 1) is a parameter which causes beta 1,k As the local training delay increases and decreases,is a vehicle V k Is a local computation delay of (1);
the transmission weights are:
wherein ,β2,k (t) is a transmission weight, m 2 E (0, 1) is a parameter which causes beta 2,k (t) decreases as the transmission delay increases,is a vehicle V k Is a transmission delay of (1);
according to formula w kw =w k *β 1,k *β 2,k Obtaining a local model with optimized weight;
wherein ,wk As a local model, w kw Beta for weight optimized local model 1,k To train weights, beta 2,k And (t) is a transmission weight.
10. The asynchronous federation optimization method for selecting vehicles based on the DDPG algorithm according to claim 9, wherein in step S5, the trained vehicles asynchronously upload the local model after weight optimization to the roadside units for asynchronous federation, and the obtaining the global model by the roadside units through repeated training for multiple rounds specifically comprises:
when the vehicle V k After uploading the local model with optimized weight to the roadside unit, the roadside unit performs global aggregation once, and the formula is as follows:
w new =βw old +(1-β)w kw
wherein ,wold Is the current global model at the roadside unit, w new W is the updated global model kw Beta epsilon (0, 1) is the aggregation proportion for the local model after weight optimization;
when the roadside unit receives the first uploaded local model at the beginning of each time slot, w old =w t-1 When the road side unit receives the local models of all the selected vehicles and gets updated K 1 Post global model w t And the global model updating of the time slot is finished.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310038329.6A CN116055489A (en) | 2023-01-10 | 2023-01-10 | Asynchronous federal optimization method for selecting vehicles based on DDPG algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310038329.6A CN116055489A (en) | 2023-01-10 | 2023-01-10 | Asynchronous federal optimization method for selecting vehicles based on DDPG algorithm |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116055489A true CN116055489A (en) | 2023-05-02 |
Family
ID=86127261
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310038329.6A Pending CN116055489A (en) | 2023-01-10 | 2023-01-10 | Asynchronous federal optimization method for selecting vehicles based on DDPG algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116055489A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116542342A (en) * | 2023-05-16 | 2023-08-04 | 江南大学 | Asynchronous federal optimization method capable of defending Bayesian attack |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113382066A (en) * | 2021-06-08 | 2021-09-10 | 江南大学 | Vehicle user selection method and system based on federal edge platform |
CN114051222A (en) * | 2021-11-08 | 2022-02-15 | 北京工业大学 | Wireless resource allocation and communication optimization method based on federal learning in Internet of vehicles environment |
CN114625504A (en) * | 2022-03-09 | 2022-06-14 | 天津理工大学 | Internet of vehicles edge computing service migration method based on deep reinforcement learning |
CN115297170A (en) * | 2022-06-16 | 2022-11-04 | 江南大学 | Cooperative edge caching method based on asynchronous federation and deep reinforcement learning |
US20220363279A1 (en) * | 2021-04-21 | 2022-11-17 | Foundation Of Soongsil University-Industry Cooperation | Method for combating stop-and-go wave problem using deep reinforcement learning based autonomous vehicles, recording medium and device for performing the method |
CN115358412A (en) * | 2022-08-19 | 2022-11-18 | 江南大学 | Asynchronous federal optimization method based on edge auxiliary vehicle network |
-
2023
- 2023-01-10 CN CN202310038329.6A patent/CN116055489A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20220363279A1 (en) * | 2021-04-21 | 2022-11-17 | Foundation Of Soongsil University-Industry Cooperation | Method for combating stop-and-go wave problem using deep reinforcement learning based autonomous vehicles, recording medium and device for performing the method |
CN113382066A (en) * | 2021-06-08 | 2021-09-10 | 江南大学 | Vehicle user selection method and system based on federal edge platform |
CN114051222A (en) * | 2021-11-08 | 2022-02-15 | 北京工业大学 | Wireless resource allocation and communication optimization method based on federal learning in Internet of vehicles environment |
CN114625504A (en) * | 2022-03-09 | 2022-06-14 | 天津理工大学 | Internet of vehicles edge computing service migration method based on deep reinforcement learning |
CN115297170A (en) * | 2022-06-16 | 2022-11-04 | 江南大学 | Cooperative edge caching method based on asynchronous federation and deep reinforcement learning |
CN115358412A (en) * | 2022-08-19 | 2022-11-18 | 江南大学 | Asynchronous federal optimization method based on edge auxiliary vehicle network |
Non-Patent Citations (1)
Title |
---|
王丙琛;司怀伟;谭国真;: "基于深度强化学习的自动驾驶车控制算法研究", 郑州大学学报(工学版), no. 04 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116542342A (en) * | 2023-05-16 | 2023-08-04 | 江南大学 | Asynchronous federal optimization method capable of defending Bayesian attack |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110476172B (en) | Neural architecture search for convolutional neural networks | |
CN112668128B (en) | Method and device for selecting terminal equipment nodes in federal learning system | |
JP7301156B2 (en) | Quantum variational method, apparatus and storage medium for simulating quantum systems | |
JP6824382B2 (en) | Training machine learning models for multiple machine learning tasks | |
CN110276442B (en) | Searching method and device of neural network architecture | |
WO2021259090A1 (en) | Method and apparatus for federated learning, and chip | |
CN110832509A (en) | Black box optimization using neural networks | |
CN111406264A (en) | Neural architecture search | |
CN110992935A (en) | Computing system for training neural networks | |
WO2020259504A1 (en) | Efficient exploration method for reinforcement learning | |
CN113867843B (en) | Mobile edge computing task unloading method based on deep reinforcement learning | |
CN116055489A (en) | Asynchronous federal optimization method for selecting vehicles based on DDPG algorithm | |
KR20200040185A (en) | Learniung method and device for neural network at adaptive learning rate, and testing method and device using the same | |
CN109919313A (en) | A kind of method and distribution training system of gradient transmission | |
CN111416774A (en) | Network congestion control method and device, computer equipment and storage medium | |
CN116451593B (en) | Reinforced federal learning dynamic sampling method and equipment based on data quality evaluation | |
CN113762527A (en) | Data processing method, system, storage medium and electronic equipment | |
CN112104563A (en) | Congestion control method and device | |
KR102120150B1 (en) | Learning method and learning device for variational interference using neural network and test method and test device for variational interference using the same | |
CN111510473B (en) | Access request processing method and device, electronic equipment and computer readable medium | |
CN116166406B (en) | Personalized edge unloading scheduling method, model training method and system | |
CN116702389B (en) | Nested flow calculation method for mixed traffic flow | |
CN110450164A (en) | Robot control method, device, robot and storage medium | |
CN116542342A (en) | Asynchronous federal optimization method capable of defending Bayesian attack | |
CN112949850A (en) | Hyper-parameter determination method, device, deep reinforcement learning framework, medium and equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20240702 Address after: 518000 1104, Building A, Zhiyun Industrial Park, No. 13, Huaxing Road, Henglang Community, Longhua District, Shenzhen, Guangdong Province Applicant after: Shenzhen Hongyue Information Technology Co.,Ltd. Country or region after: China Address before: No. 258, Qingfeng Road, Liangxi District, Wuxi City, Jiangsu Province, 214000 Applicant before: Jiangnan University Country or region before: China |