CN116055489A - Asynchronous federal optimization method for selecting vehicles based on DDPG algorithm - Google Patents

Asynchronous federal optimization method for selecting vehicles based on DDPG algorithm Download PDF

Info

Publication number
CN116055489A
CN116055489A CN202310038329.6A CN202310038329A CN116055489A CN 116055489 A CN116055489 A CN 116055489A CN 202310038329 A CN202310038329 A CN 202310038329A CN 116055489 A CN116055489 A CN 116055489A
Authority
CN
China
Prior art keywords
vehicle
local
training
time slot
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310038329.6A
Other languages
Chinese (zh)
Inventor
吴琼
王思远
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Hongyue Information Technology Co ltd
Original Assignee
Jiangnan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangnan University filed Critical Jiangnan University
Priority to CN202310038329.6A priority Critical patent/CN116055489A/en
Publication of CN116055489A publication Critical patent/CN116055489A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/12Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/16Central resource management; Negotiation of resources or communication parameters, e.g. negotiating bandwidth or QoS [Quality of Service]
    • H04W28/18Negotiating wireless communication parameters
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/16Central resource management; Negotiation of resources or communication parameters, e.g. negotiating bandwidth or QoS [Quality of Service]
    • H04W28/18Negotiating wireless communication parameters
    • H04W28/20Negotiating bandwidth
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/16Central resource management; Negotiation of resources or communication parameters, e.g. negotiating bandwidth or QoS [Quality of Service]
    • H04W28/18Negotiating wireless communication parameters
    • H04W28/22Negotiating communication rate
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/30Services specially adapted for particular environments, situations or purposes
    • H04W4/40Services specially adapted for particular environments, situations or purposes for vehicles, e.g. vehicle-to-pedestrians [V2P]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/30Services specially adapted for particular environments, situations or purposes
    • H04W4/40Services specially adapted for particular environments, situations or purposes for vehicles, e.g. vehicle-to-pedestrians [V2P]
    • H04W4/44Services specially adapted for particular environments, situations or purposes for vehicles, e.g. vehicle-to-pedestrians [V2P] for communication between vehicles and infrastructures, e.g. vehicle-to-cloud [V2C] or vehicle-to-home [V2H]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Quality & Reliability (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention provides an asynchronous federal optimization method for selecting vehicles based on a DDPG algorithm, which comprises the steps of setting the system state, action and rewards of a time slot t according to the transmission rate of the vehicle, the size of available computing resources and the position of the vehicle; selecting vehicles participating in training according to the system action of the time slot t to obtain selected vehicles; the selected vehicle performs local training by using local data to obtain a corresponding local model; considering the hysteresis influence of training time delay on the local model trained by the vehicle, and carrying out weight optimization on the local model to obtain a weight-optimized local model; and the trained vehicle asynchronously uploads the local model after weight optimization to a roadside unit for asynchronous federal aggregation, and finally the roadside unit obtains a global model through repeated training of multiple rounds. The method is simple and convenient to calculate, the system model is reasonable, and simulation experiments prove that the method can obtain higher overall model precision in the vehicle environment.

Description

Asynchronous federal optimization method for selecting vehicles based on DDPG algorithm
Technical Field
The invention relates to the technical field of vehicle-mounted networks, in particular to an asynchronous federal optimization method for selecting vehicles based on a DDPG (Deep deterministic policy gradient) algorithm.
Background
Along with the development of science and technology, the internet of vehicles technology is gradually rising, so that the travel of people is more convenient. While intelligent services on various vehicles are also emerging. The vehicle may then perform some calculation tasks as required during travel on the road. However, in the conventional cloud computing service, since the cloud is far from the vehicle, a large time delay is generated during uploading, and the cloud is not suitable for a vehicle scene moving at a high speed, so that vehicle-mounted edge computing is generated. The vehicle can upload the calculation task to a roadside Unit (RSU) which has certain calculation capability and is close to the vehicle for task processing, so that the time delay of task processing is greatly reduced. However, the task calculation of the vehicle requires the vehicle to upload local data to the roadside unit for processing by the roadside unit. This creates privacy security concerns. The vehicle user may be concerned about privacy disclosure and unwilling to upload local data. Federal learning techniques are thus created. In particular, federal learning may perform a certain number of global aggregations at roadside units. In the first round of training, the vehicle downloads the initialized global model at the roadside unit first, then uses the local data to carry out local training, uploads the local model instead of the local data after the training is finished, carries out aggregation processing of the local model after the roadside unit receives the local models of all vehicles, and then repeats the second round until the stipulated times are reached. This greatly protects the privacy of the vehicle user.
However, in the traditional federal learning, the roadside units need to wait for all vehicles to upload the local model before updating the global model, and if a certain vehicle training exists and the uploading time delay is too long, other vehicles can be caused to travel out of the coverage range of the roadside units, so that the vehicles cannot participate in the global training. An asynchronous federal training is then produced. Specifically, the vehicle uses the local data to upload the local model after local training. At the roadside unit, it performs a global model aggregation once every time it receives a local model from a vehicle upload. This enables faster updating of the global model at the roadside unit without waiting for the upload of other vehicles.
Since the vehicle itself has a certain mobility, time-varying channel conditions are created, resulting in time-varying transmission rates, which lead to different transmission delays for the vehicle. Meanwhile, different vehicles have different time-varying computing resources and have different amounts of local data, so that different local training delays are caused. In the asynchronous federal training process, because the vehicles are asynchronously uploaded with the local model, when one vehicle does not upload the local model to the roadside unit, the roadside unit updates the global model according to the other uploaded local model, and the local model of the vehicle has certain staleness. The staleness is related to the local training delay and the transmission delay. It is then crucial to consider the impact of the above factors on the accuracy of the global model at the roadside units.
Because bad nodes may exist in the vehicle, that is, the vehicle itself has little available computing resources, the local data volume is small, and the local model after the local training is interfered to some extent. The presence of bad nodes can greatly affect the accuracy of the global model at the roadside units. It is then necessary to select the vehicle nodes that participate in the global aggregation.
Therefore, the invention provides an asynchronous federal optimization method for selecting vehicles based on a DDPG algorithm under the condition of comprehensively considering the mobility of the vehicles, time-varying channel conditions, time-varying available computing resources of the vehicles themselves, different local data amounts of the vehicles and the existence of bad nodes of the vehicles.
Disclosure of Invention
Therefore, the embodiment of the invention provides an asynchronous federal optimization method for selecting vehicles based on a DDPG algorithm, which is used for solving the problem of low precision of a generated global model caused by mobility of the vehicles, time-varying channel conditions, time-varying available computing resources of the vehicles, different local data amounts of the vehicles and the existence of bad nodes of the vehicles in the prior art.
In order to solve the above problems, an embodiment of the present invention provides an asynchronous federal optimization method for selecting a vehicle based on a DDPG algorithm, the method comprising:
s1: setting the system state, action and rewards of the time slot t according to the transmission rate of the vehicle, the size of available computing resources and the position of the vehicle;
s2: selecting vehicles participating in training according to the system action of the time slot t to obtain selected vehicles;
s3: the selected vehicle performs local training by using local data to obtain a corresponding local model;
s4: considering the hysteresis influence of training time delay and transmission time delay on a local model trained by a vehicle, and performing weight optimization on the local model to obtain a weight-optimized local model;
s5: and the trained vehicle asynchronously uploads the local model after weight optimization to a roadside unit for asynchronous federal aggregation, and finally the roadside unit obtains a global model through repeated training of multiple rounds.
Preferably, in step S1, setting the system state, action and rewards of the time slot t according to the transmission rate of the vehicle itself, the available computing resource size and the vehicle position includes:
the system state of the set time slot t is as follows:
s(t)=(Tr(t),μ(t),d x (t),a(t-1))
where s (t) is the system state of the time slot t, tr (t) is the set of transmission rates of all vehicles in the time slot t, μ (t) is the set of available computing resources of all vehicles in the time slot t, d x (t) is a set of position coordinates of all vehicles along the x-axis at time slot t, and a (t-1) is a system action of time slot t-1;
the system action of the set time slot t is as follows:
a(t)=(λ 1 (t),λ 2 (t),…,λ K (t))
wherein a (t) is the system operation of time slot t, lambda i (t),i∈[1,K]Representation selectionProbability of selecting vehicle i, let λ 1 (0)=λ 2 (0)=…=λ K (0)=1;
The system rewards of the set time slot t are as follows:
Figure BDA0004048467340000041
where r (t) is the system prize for time slot t, ω 1 and ω2 Is a non-negative weight factor, a di (t) is the system action of time slot t, lambda i (t),i∈[1,K]Representing the probability of selecting vehicle i, loss (t) is the Loss value calculated in the asynchronous federal training,
Figure BDA0004048467340000042
delay generated for local training of vehicle i, < >>
Figure BDA0004048467340000043
And uploading the transmission delay of the local model for the vehicle i in the time slot t.
Preferably, in step S2, the vehicle involved in the training is selected according to the system action of the time slot t, and the obtaining of the selected vehicle includes the following steps:
s21: set a d (t)=(a d1 (t),a d2 (t),…,a dK (t));
S22: lambda is set to i (t) normalization processing, setting lambda i (t) corresponding a with a value of more than or equal to 0.5 di (t) is denoted as 1, otherwise 0, the resulting set a d (t) is composed of 0 and 1, 1 means selecting a vehicle, and 0 means not selecting a vehicle.
Preferably, the expected long-term discount rewards of the system based on the time slot t may be expressed as:
Figure BDA0004048467340000044
where gamma E (0, 1) is the discount factor, N is the total number of slots, mu is the policy of the system, J (mu) is the expected long-term discount rewards of the system.
Preferably, in step S3, the step of performing local training on the selected vehicle by using local data to obtain a corresponding local model includes the following steps:
s31: at time slot t, vehicle V k Downloading global model w from roadside units t-1 Wherein, at time slot 1, the global model at the roadside unit is initialized to w using a convolutional neural network 0
S32: vehicle V k Local data is trained based on convolutional neural network, the local training consists of a round I, and the local training is performed on the m (m E [1, l)]) In the wheel local training, the vehicle V k First, the tag probability of each local data a, i.e., y a Input to local model w k,m And then obtaining the prediction probability of the convolutional neural network for the label of each data
Figure BDA0004048467340000051
Computing w using cross entropy loss function k,m The loss value of (2) is calculated as follows:
Figure BDA0004048467340000052
s33: the local model is updated using a random gradient descent algorithm, the formula is as follows:
Figure BDA0004048467340000053
wherein ,
Figure BDA0004048467340000054
is f k (w k,m ) Is the learning rate;
s34: vehicle V k Performing m+1 local training by using the updated local model, stopping the local training when the local training round reaches l, and obtaining the updated local model w by the vehicle k
Preferably, the training delay is:
Figure BDA0004048467340000055
wherein ,
Figure BDA0004048467340000056
time delay generated for local training of vehicle i, C 0 The number of CPU cycles, mu, required to train a datum i For the calculation resource of the vehicle i, the CPU cycle frequency is used for measuring, and each vehicle i (i is more than or equal to 1 and less than or equal to K) carries different data quantity D i
Preferably, the transmission delay is:
Figure BDA0004048467340000061
Figure BDA0004048467340000062
d i (t)=||P i (t)-P r ||
wherein ,
Figure BDA0004048467340000063
for the transmission delay of a local model uploaded by a vehicle i in a time slot t, the W is the local model size obtained by local training of each vehicle, and tr i (t) is the transmission rate of t-slot vehicle i, B is the transmission bandwidth, p 0 For the transmitting power of each vehicle, is a fixed value, h i (t) channel gain for time slot t, α is path loss index, σ 2 For noise power, the position P of the vehicle i in the time slot t i (t) is set as (d) ix (t),d y 0), where d ix(t) and dy The positions d of the antennas of the roadside units along the x-axis and the y-axis of the vehicle i at the time slot t, respectively y Is a fixed value d ix (t)=d i0 +vt,d i0 The coordinates along the x-axis of the initial position of the vehicle i, v is the vehicle speed,t is a time slot, and the antenna height of the roadside unit is set to be H r The antenna position of the roadside unit is denoted as P r =(0,0,H r )。
Preferably, an autoregressive model is employed to construct h i(t) and hi The relationship between (t-1), namely:
Figure BDA0004048467340000064
wherein ,ρi E (t) is an error vector obeying a complex gaussian distribution and is equal to h, which is a normalized channel correlation coefficient between successive time slots i (t) correlation, based on the Jack fading spectrum,
Figure BDA0004048467340000065
wherein J0 (. Cndot.) is zero-order Bessel function of the first class and +.>
Figure BDA0004048467340000066
For Doppler frequency of vehicle i, +.>
Figure BDA0004048467340000067
Λ is wavelength, θ is direction of movement, i.e., x 0 = (1, 0) and upstream communication direction, i.e. P r -P i An included angle between (t), thus +.>
Figure BDA0004048467340000068
Preferably, in step S4, the method for performing weight optimization on the local model is as follows:
performing weight optimization on the local model, wherein the weights comprise training weights and transmission weights, and the training weights are as follows:
Figure BDA0004048467340000071
wherein ,β1,k To train weights, m 1 E (0, 1) is a parameter which causes beta 1,k Along with the bookThe ground training time delay is increased and decreased,
Figure BDA0004048467340000072
is a vehicle V k Is a local computation delay of (1);
the transmission weights are:
Figure BDA0004048467340000073
wherein ,β2,k (t) is a transmission weight, m 2 E (0, 1) is a parameter which causes beta 2,k (t) decreases as the transmission delay increases,
Figure BDA0004048467340000074
is a vehicle V k Is a transmission delay of (1);
according to formula w kw =w k1,k2,k Obtaining a local model with optimized weight;
wherein ,wk As a local model, w kw Beta for weight optimized local model 1,k To train weights, beta 2,k And (t) is a transmission weight.
Preferably, in step S5, the trained vehicle asynchronously uploads the local model after weight optimization to the roadside unit for asynchronous federal aggregation, and the obtaining the global model by the roadside unit through repeated training for multiple rounds specifically includes:
preferably, when the vehicle V k After uploading the local model with optimized weight to the roadside unit, the roadside unit performs global aggregation once, and the formula is as follows:
w new =βw old +(1-β)w kw
wherein ,wold Is the current global model at the roadside unit, w new W is the updated global model kw Beta epsilon (0, 1) is the aggregation proportion for the local model after weight optimization;
when the roadside unit receives the first uploaded local model at the beginning of each time slot, w old =w t-1 When the road side unit receives the local models of all the selected vehicles and gets updated K 1 Post global model w t And the global model updating of the time slot is finished.
From the above technical scheme, the invention has the following advantages:
the embodiment of the invention provides an asynchronous federal optimization method for selecting vehicles based on a DDPG algorithm, which uses a deep reinforcement learning algorithm to select vehicles participating in training according to the transmission rate of the vehicles, the size of available computing resources and the positions of the vehicles, and removes possible bad nodes in the vehicles; the vehicle adopts asynchronous federal training, and the roadside units aggregate global models once each time when receiving a local model from the uploading of the vehicle, so that the global model at the roadside units can be updated more quickly without waiting for the uploading of other vehicles; when the vehicle carries out the local model, the hysteresis influence caused by training time delay and transmission time delay on the local model trained by the vehicle is considered, the weight optimization is carried out on the local model, and the accuracy of the global model at the roadside unit is improved.
Drawings
For a clearer description of embodiments of the invention or of solutions in the prior art, reference will be made to the accompanying drawings, which are intended to be used in the examples, for a clearer understanding of the characteristics and advantages of the invention, by way of illustration and not to be interpreted as limiting the invention in any way, and from which, without any inventive effort, a person skilled in the art can obtain other figures. Wherein:
FIG. 1 is a flow chart of an asynchronous federal optimization method for selecting a vehicle based on a DDPG algorithm, according to an embodiment;
FIG. 2 is a schematic view of a scene framework of the method of the present invention;
FIG. 3 shows two amounts in the bonus, test phase: schematic diagram of the sum of the loss value and the vehicle delay in asynchronous federal learning, namely the sum of the local delay and the transmission delay;
FIG. 4 is a schematic diagram of the accuracy comparison of the method of the present invention with asynchronous federal learning in the presence of bad nodes during the test phase;
FIG. 5 is a schematic diagram of the method of the present invention in comparison to loss of asynchronous federal learning in the presence of bad nodes during the test phase;
FIG. 6 is a schematic diagram of the accuracy of the method of the present invention in comparison to asynchronous Federal learning without local weighting, with nodes selected;
FIG. 7 is a schematic diagram of the method of the present invention in comparison to the loss of asynchronous federal learning without local weight treatment, with nodes selected;
FIG. 8 is a schematic diagram of the method of the present invention versus training time delay for federal learning as global passes increase;
FIG. 9 is a graph showing the accuracy of the method of the present invention versus the accuracy of a global model of conventional asynchronous federal learning at a selected node at different beta values.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more clear, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
As shown in fig. 1, an embodiment of the present invention provides an asynchronous federal optimization method for selecting a vehicle based on a DDPG algorithm, the method comprising:
s1: setting the system state, action and rewards of the time slot t according to the transmission rate of the vehicle, the size of available computing resources and the position of the vehicle;
s2: selecting vehicles participating in training according to the system action of the time slot t to obtain selected vehicles;
s3: the selected vehicle performs local training by using local data to obtain a corresponding local model;
s4: considering the hysteresis influence of training time delay and transmission time delay on a local model trained by a vehicle, and performing weight optimization on the local model to obtain a weight-optimized local model;
s5: and the trained vehicle asynchronously uploads the local model after weight optimization to a roadside unit for asynchronous federal aggregation, and finally the roadside unit obtains a global model through repeated training of multiple rounds.
The invention provides an asynchronous federal optimization method for selecting vehicles based on a DDPG algorithm, which is characterized in that time slots are set according to the transmission rate of the vehicles, the size of available computing resources and the positions of the vehicles, the vehicles participating in training are selected, and possible bad nodes in the vehicles are removed; the selected vehicle performs local training by using local data to obtain a corresponding local model, and when the vehicle performs the local model, the hysteresis influence of training time delay and transmission time delay on the trained local model of the vehicle is considered, the weight of the local model is optimized, and the accuracy of the global model at the roadside unit is improved; the trained vehicle asynchronously uploads the local model with optimized weight to the roadside unit for asynchronous federation, and finally the roadside unit obtains the global model through repeated training of multiple rounds. The method is simple and convenient to calculate, the system model is reasonable, and simulation experiments prove that the method can obtain higher overall model precision in the vehicle environment.
FIG. 2 is a schematic view of a scene framework of the method of the present invention, wherein a deep reinforcement learning algorithm is used to select vehicles participating in training according to the transmission rate of the vehicles themselves, the size of available computing resources, the position of the vehicles, etc., and then the selected vehicles train a local model by using an asynchronous federal technique and then upload to roadside units to finally obtain a relatively accurate global model.
Further, in step S1, it includes:
the system state, action and rewards of the time slot t are set according to the transmission rate of the vehicle, the size of available computing resources and the position of the vehicle, and specifically comprise:
since the mobility of a vehicle can be represented by its change in position, the training time and uploading time of the local model of the vehicle are related to the time-varying available computing resources of the vehicle itself and to the current channel conditions, and the system state s (t) of the time slot t is then defined as:
s(t)=(Tr(t),μ(t),d x (t),a(t-1))
where s (t) is the system state of the time slot t, tr (t) is the set of transmission rates of all vehicles in the time slot t, μ (t) is the set of available computing resources of all vehicles in the time slot t, d x (t) is a set of position coordinates of all vehicles along the x-axis in each time slot t, and a (t-1) is a system action of the time slot t-1.
Since the invention aims to select a better vehicle to perform asynchronous federal learning training according to the current state, the system action a (t) of the t time slot is defined as:
Figure BDA0004048467340000111
wherein a (t) is the system operation of time slot t, lambda i (t),i∈[1,K]Representing the probability of selecting vehicle i, let λ 1 (0)=λ 2 (0)=…=λ K (0)=1。
The invention aims to select vehicles with better performance to perform asynchronous federal training so as to obtain a more accurate global model at a roadside unit, and simultaneously consider the time delay and the precision of the global model, so that a system reward r (t) of a time slot t is defined as:
Figure BDA0004048467340000112
where r (t) is the system prize for time slot t, ω 1 and ω2 Is a non-negative weight factor, a di (t) is the system action of time slot t, lambda i (t),i∈[1,K]Representing the probability of selecting vehicle i, loss (t) is the Loss value calculated in the asynchronous federal training,
Figure BDA0004048467340000121
delay generated for local training of vehicle i, < >>
Figure BDA0004048467340000122
And uploading the transmission delay of the local model for the vehicle i in the time slot t.
The expected long-term discount rewards of the system may be expressed as:
Figure BDA0004048467340000123
where gamma E (0, 1) is the discount factor, N is the total number of slots, mu is the policy of the system, J (mu) is the expected long-term discount rewards of the system.
Further, in step S2, it includes:
to select a specific vehicle, set a d (t)=(a d1 (t),a d2 (t),…,a dK (t)), lambda is set i (t) normalization processing, setting lambda i (t) corresponding a with a value of more than or equal to 0.5 di (t) is denoted as 1, otherwise 0, the resulting set a d (t) is composed of 0 and 1, 1 means selecting a vehicle, and 0 means not selecting a vehicle.
Further, in step S3, it includes:
the selected vehicle is locally trained by using local data to obtain a corresponding local model, and the method comprises the following steps of:
s31: at time slot t, vehicle V k Downloading global model w from roadside units t-1 Wherein, at time slot 1, the global model at the roadside unit is initialized to w using a convolutional neural network 0
S32: vehicle V k Training local data based on convolutional neural networkThe local training consists of a round of l, at m (mE [1, l)]) In the wheel local training, the vehicle Vk first sets the tag probability of each local data a, i.e., y a Input to local model w k,m And then obtaining the prediction probability of the convolutional neural network for the label of each data
Figure BDA0004048467340000124
Computing w using cross entropy loss function k,m The loss value of (2) is calculated as follows:
Figure BDA0004048467340000131
s33: the local model is updated using a random gradient descent algorithm, the formula is as follows:
Figure BDA0004048467340000132
wherein ,
Figure BDA0004048467340000133
is f k (w k,m ) Is the learning rate;
s34: vehicle V k Performing m+1 local training by using the updated local model, stopping the local training when the local training round reaches l, and obtaining the updated local model w by the vehicle k
Further, in step S4, it includes:
when the vehicle performs local training, training delay and transmission delay are generated, and the training delay is as follows:
Figure BDA0004048467340000134
wherein ,
Figure BDA0004048467340000135
time delay generated for local training of vehicle i, C 0 The number of CPU cycles, mu, required to train a datum i For the calculation resource of the vehicle i, the CPU cycle frequency is used for measuring, and each vehicle i (i is more than or equal to 1 and less than or equal to K) carries different data quantity D i
The transmission delay is as follows:
Figure BDA0004048467340000136
Figure BDA0004048467340000137
d i (t)=||P i (t)-P r ||
wherein ,
Figure BDA0004048467340000138
for the transmission delay of a local model uploaded by a vehicle i in a time slot t, the W is the local model size obtained by local training of each vehicle, and tr i (t) is the transmission rate of t-slot vehicle i, B is the transmission bandwidth, p 0 For the transmitting power of each vehicle, is a fixed value, h i (t) channel gain for time slot t, α is path loss index, σ 2 For noise power, the position P of the vehicle i in the time slot t i (t) is set as (d) ix (t),d y 0), where d ix(t) and dy The positions d of the antennas of the roadside units along the x-axis and the y-axis of the vehicle i at the time slot t, respectively y Is a fixed value d ix (t)=d i0 +vt,d i0 For the coordinates of the initial position of the vehicle i along the x-axis, v is the vehicle speed, t is the time slot, and the antenna height of the roadside unit is set to H r The antenna position of the roadside unit is denoted as P r =(0,0,H r )。
Wherein, an autoregressive model is adopted to construct h i(t) and hi The relationship between (t-1), namely:
Figure BDA0004048467340000141
wherein ,ρi E (t) is an error vector obeying a complex gaussian distribution and is equal to h, which is a normalized channel correlation coefficient between successive time slots i (t) correlation, based on the Jack fading spectrum,
Figure BDA0004048467340000142
wherein J0 (. Cndot.) is zero-order Bessel function of the first class and +.>
Figure BDA0004048467340000143
For Doppler frequency of vehicle i, +.>
Figure BDA0004048467340000144
Λ is wavelength, θ is direction of movement, i.e., x 0 = (1, 0) and upstream communication direction, i.e. P r -P i An included angle between (t), thus +.>
Figure BDA0004048467340000145
Unlike traditional asynchronous federal learning, the invention considers the hysteresis effect of training time delay and transmission time delay on the local model trained by the vehicle. Specifically, since there is a delay in both local training of the vehicle and uploading of the local model to the roadside unit, there may be a case where the roadside unit has received the local model uploaded from the other vehicle and updated the global model when one vehicle goes from the local training to the uploading to the roadside unit. In this case, the local model trained on the vehicle has a certain hysteresis. Thus the invention is directed to vehicle V k And (3) performing certain weight processing, namely setting training weights and transmission weights. The specific calculation method is as follows:
performing weight optimization on the local model, wherein the weights comprise training weights and transmission weights, and the training weights are as follows:
Figure BDA0004048467340000151
wherein ,β1,k To train weights, m 1 E (0, 1) is a parameter which causes beta 1,k As the local training delay increases and decreases,
Figure BDA0004048467340000152
is a vehicle V k Is a local computation delay of (1);
the transmission weights are:
Figure BDA0004048467340000153
wherein ,β2,k (t) is a transmission weight, m 2 E (0, 1) is a parameter which causes beta 2,k (t) decreases as the transmission delay increases,
Figure BDA0004048467340000154
is a vehicle V k Is a transmission delay of (1);
according to formula w kw =w k1,k2,k Obtaining a local model with optimized weight;
wherein ,wk As a local model, w kw Beta for weight optimized local model 1,k To train weights, beta 2,k And (t) is a transmission weight.
Further, in step S5, it includes:
the trained vehicle asynchronously uploads the local model after weight optimization to a roadside unit for asynchronous federal aggregation, and the roadside unit finally obtains a global model by repeated training of multiple rounds, wherein the global model comprises the following specific steps:
when the vehicle V k After uploading the local model with optimized weight to the roadside unit, the roadside unit performs global aggregation once, and the formula is as follows:
w new =β wold +(1-β)w kw
wherein ,wold Is the current global model at the roadside unit, w new W is the updated global model kw For the rightBeta epsilon (0, 1) is the aggregation ratio of the re-optimized local model;
when the roadside unit receives the first uploaded local model at the beginning of each time slot, w old =w t-1 When the road side unit receives the local models of all the selected vehicles and gets updated K 1 Post global model w t And the global model updating of the time slot is finished.
At the same time, the average Loss (t) of the vehicles involved in the training can be obtained, which can be expressed as:
Figure BDA0004048467340000161
wherein ,fk (w k ) Is a local model w k Is a loss value of (2).
To further illustrate the principles and advantages of the present invention, reference is made to the following detailed description of the invention.
The present invention aims to find an optimal strategy mu * To maximize the expected long-term discount rewards of the system.
The overall algorithm specifically adopted by the invention comprises two parts, namely an algorithm based on a training stage of a DAFL (Data-Free Learning) framework and an algorithm based on a testing stage of the DAFL framework.
Wherein the training phase algorithm steps based on the DAFL framework are shown in table 1.
TABLE 1
Figure BDA0004048467340000162
Figure BDA0004048467340000171
The invention provides an asynchronous federal optimization method for selecting vehicles based on a DDPG algorithm, wherein the DDPG algorithm is based on an actor-critic network architecture. The actor network is used for carrying out strategy improvement, and the critic network is used for enteringRow policy evaluation. Specifically, an actor network is used to approximate a policy μ, the approximated policy of which is denoted μ δ . The actor network is based on strategy mu δ And observe the state to output the action.
The invention promotes and evaluates the strategy through iteration so as to finally obtain the optimal strategy. In order to ensure the stability of the DDPG algorithm, a target network consisting of a target actor network and a target critic network is also adopted, and the architecture of the DDPG algorithm is respectively the same as that of the actor network and the critic network.
Setting delta as an actor network parameter, and zeta as a critic network parameter * For optimized actor network parameters, ζ * Delta for optimized critic network parameters 1 Is the target actor network parameter, ζ 1 Is the target critical network parameter. τ is the update parameter of the target network, Δ t Noise explored for slot t action. I is the small lot size. The algorithm of the training phase will be described in detail next.
First, δ and ζ are randomly initialized and δ in the target network is simultaneously initialized 1 and ξ1 Initialized to delta and zeta, respectively. At the same time, experience playback buffer R b Initialization is performed.
Next, the algorithm will execute E max Each round. In the first round, the locations of all vehicles, the channel states, and the available computing resource sizes of the vehicles themselves are reset. And set lambda 1 (0)=λ 2 (0)=…=λ K (0) =1, then in the first time slot, the system can obtain the initial state s (1) = (Tr (1), μ (1), d x (1) A (0)). Initializing global model w at roadside units simultaneously using CNN (Convolutional Neural Networks, convolutional neural network) 0
The algorithm will then execute continuously from slot 1 to a maximum number of slots N. In the first time slot, the actor network obtains the output mu according to the state δ (s|delta) where a random noise delta is added to the motion t The system then gets action a (1) =μ δ (s(1)|δ)+Δ t . Subsequently calculate a from the action d (1) The time slot selected vehicle is determined. Is selected to beThe vehicle performs asynchronous federal training, i.e., the vehicle trains a local model based on local data, and then asynchronously uploads to the roadside units for global model updating, after which Loss values Loss (1) are calculated. Meanwhile, the local training time delay and the transmission time delay of the vehicle are calculated, so that the system rewards under the time slot 1 can be obtained. The vehicle location is then updated, the channel conditions are recalculated, and the available computing resources of the vehicle itself, the transmission rate of the vehicle is updated so that the system can observe the next state s (2). Then storing the tuples (s (1), a (1), R (1), s (2)) into R b Is a kind of medium.
When R is b When the number of the tuples in the system is less than or equal to I, the system directly inputs the next state into the actor network and carries out the next iteration.
When R is b When the number of the tuples in the target network is larger than I, parameters delta, zeta and delta in an actor network, a critic network and a target network are calculated 1 and ξ1 Update is started to maximize J (μ) δ ). The parameter delta of the actor network is towards J (mu) δ ) The gradient direction of (i.e.)
Figure BDA0004048467340000191
And updating. Will obey strategy μ at s (t) and a (t) δ Is set to +.>
Figure BDA0004048467340000192
The expression is as follows:
Figure BDA0004048467340000193
which represents a long-term expected discount prize for the time slot t system.
Solving for
Figure BDA0004048467340000194
Can be solved by +.>
Figure BDA0004048467340000195
Gradient of->
Figure BDA0004048467340000196
Instead of it. Critic network uses the parameter ζ pair->
Figure BDA0004048467340000197
Approximately Q ξ (s(t),a(t))。/>
Next, the parameters δ, ζ, δ for the time slot t will be described 1 and ξ1 Is updated by the update method of (a). When R is b When the number of the tuples in the system is greater than I, the system is controlled by R b The I tuples are randomly decimated into a small batch. Design(s) x ,a x ,r x ,s′ x ),x∈[1,2,…,I]Is the x-th tuple in the small lot. The system then first takes s' x Inputting target actor network to obtain output action
Figure BDA0004048467340000198
Then the s 'is added again' x and a′x Inputting the target critic network to obtain an output action value function
Figure BDA0004048467340000199
The target value may then be calculated as:
Figure BDA00040484673400001910
then according to s x and ax The critic network has an output Q ξ (s x ,a x ) The loss of tuple x can then be calculated as:
L x =[y x -Q ξ (s x ,a x )] 2
when all tuples are input to the critic network and the target network, a loss function is obtained:
Figure BDA00040484673400001911
critic network is passed through pair
Figure BDA0004048467340000201
The gradient descent method is used to minimize the loss function L (ζ) and thereby update the parameter ζ.
Similarly, the actor network is connected with the server by the server
Figure BDA0004048467340000202
Maximizing J (μ) using gradient ascent δ ) Thereby updating the parameter delta. Wherein->
Figure BDA0004048467340000203
The formula is calculated by the action value function approximated by the critic network as follows:
Figure BDA0004048467340000204
wherein Qξ The input of (2) is
Figure BDA0004048467340000205
At the end of the time slot t, updating the parameters of the target network, wherein the updating formula is as follows:
ξ 1 ←τξ+(1-τ)ξ 1
δ 1 ←τδ+(1-τ)δ 1
wherein τ is a constant and satisfies τ < 1.
Finally the system inputs s' into the actor network and starts the iterative computation of the next slot. When the time slot t reaches a maximum value N, the round ends. Then the system reinitializes the state value s (1) = (Tr (1), μ (1), d x (1) A (0)) and the next round of training is performed. When the number of rounds reaches the maximum E max After the training is finished, obtaining optimized parameters of the actor network, the critic network, the target actor network and the target critic network, namely delta * 、ξ *
Figure BDA0004048467340000206
and
Figure BDA0004048467340000207
The test phase simulates the training phase's critic network, target actor network, and target critic network. And using a parameter delta with an optimum value * Is a policy of optimization.
Wherein the test phase algorithm steps based on the DAFL framework are shown in table 2.
TABLE 2
1. For each round 1.ltoreq.epi.ltoreq.E' max Performing:
2. resetting simulation parameters of the system model, initializing a global model at the roadside unit
3. Obtaining an initial state s (1)
4. For each time slot 1 t N:
5. generating actions a=μ according to the current policy δ (s|δ)
6. Calculation of a d Determining a selected vehicle
7. Weight-based AFL update training of selected vehicles
8. Obtaining prize r and next state s 'from current system'
Through the above experiments, the method of the invention has the following conclusion:
1. in the test phase, as the step increases, the loss value in the asynchronous federation, i.e. the local delay, gradually decreases, and the delay of the vehicle, i.e. the transmission delay, remains within a certain range, as shown in fig. 3.
2. In the test stage, under the condition that bad nodes exist, compared with asynchronous federal learning and federal learning, the method has higher precision and lower loss, as shown in fig. 4 and 5.
3. In the case of selecting a node of the nodes, the method of the present invention has higher accuracy and lower loss than asynchronous federal learning without local weight processing and federal learning without local weight processing, as shown in fig. 6 and 7.
4. With the increase of global rounds, the training time delay of the method is smaller compared with federal learning, as shown in fig. 8.
5. At different beta values, the accuracy of the method is higher than that of the conventional asynchronous federal learning global model under the selection node, as shown in fig. 9.
Note that the above is only a preferred embodiment of the present invention and the technical principle applied. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, while the invention has been described in connection with the above embodiments, the invention is not limited to the embodiments, but may be embodied in many other equivalent forms without departing from the spirit or scope of the invention, which is set forth in the following claims.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It is apparent that the above examples are given by way of illustration only and are not limiting of the embodiments. Other variations and modifications of the present invention will be apparent to those of ordinary skill in the art in light of the foregoing description. It is not necessary here nor is it exhaustive of all embodiments. And obvious variations or modifications thereof are contemplated as falling within the scope of the present invention.

Claims (10)

1. An asynchronous federal optimization method for selecting vehicles based on a DDPG algorithm, comprising:
s1: setting the system state, action and rewards of the time slot t according to the transmission rate of the vehicle, the size of available computing resources and the position of the vehicle;
s2: selecting vehicles participating in training according to the system action of the time slot t to obtain selected vehicles;
s3: the selected vehicle performs local training by using local data to obtain a corresponding local model;
s4: considering the hysteresis influence of training time delay and transmission time delay on a local model trained by a vehicle, and performing weight optimization on the local model to obtain a weight-optimized local model;
s5: and the trained vehicle asynchronously uploads the local model after weight optimization to a roadside unit for asynchronous federal aggregation, and finally the roadside unit obtains a global model through repeated training of multiple rounds.
2. The asynchronous federal optimization method for vehicle selection based on DDPG algorithm according to claim 1, wherein the setting of the system state, actions, and rewards of the time slot t according to the vehicle' S own transmission rate, the available computing resource size, and the vehicle location in step S1 comprises:
the system state of the set time slot t is as follows:
s(t)=(Tr(t),μ(t),d x (t),a(t-1))
where s (t) is the system state of the time slot t, tr (t) is the set of transmission rates of all vehicles in the time slot t, μ (t) is the set of available computing resources of all vehicles in the time slot t, d x (t) is a set of position coordinates of all vehicles along the x-axis at time slot t, and a (t-1) is a system action of time slot t-1;
the system action of the set time slot t is as follows:
a(t)=(λ 1 (t),λ 2 (t),…,λ K (t))
wherein a (t) is the system operation of time slot t, lambda i (t),i∈[1,K]Representing the probability of selecting vehicle i, let λ 1 (0)=λ 2 (0)=…=λ K (0)=1;
The system rewards of the set time slot t are as follows:
Figure FDA0004048467330000021
where r (t) is the system prize for time slot t, ω 1 and ω2 Is a non-negative weight factor, a di (t) is the system action of time slot t, lambda i (t),i∈[1,K]Representing the probability of selecting vehicle i, loss (t) is the Loss value calculated in the asynchronous federal training,
Figure FDA0004048467330000023
delay generated for local training of vehicle i, < >>
Figure FDA0004048467330000022
And uploading the transmission delay of the local model for the vehicle i in the time slot t.
3. The asynchronous federal optimization method for selecting vehicles based on the DDPG algorithm according to claim 2, wherein in step S2, the vehicles participating in the training are selected according to the system action of the slot t, and the step of obtaining the selected vehicles comprises the steps of:
s21: set a d (t)=(a d1 (t),a d2 (t),…,a dK (t));
S22: lambda is set to i (t) normalization processing, setting lambda i (t) corresponding a with a value of more than or equal to 0.5 di (t) is denoted as 1, otherwise 0, the resulting set a d (t) is composed of 0 and 1, 1 means selecting a vehicle, and 0 means not selecting a vehicle.
4. The asynchronous federal optimization method for vehicle selection based on DDPG algorithm according to claim 2, wherein the expected long-term discounted rewards of the system based on slot t can be expressed as:
Figure FDA0004048467330000031
where gamma E (0, 1) is the discount factor, N is the total number of slots, mu is the policy of the system, J (mu) is the expected long-term discount rewards of the system.
5. The asynchronous federal optimization method for vehicle selection based on DDPG algorithm according to claim 1, wherein the local training of the selected vehicle using local data to obtain the corresponding local model in step S3 comprises the steps of:
s31: at time slot t, vehicle V k Downloading global model w from roadside units t-1 Wherein, at time slot 1, the global model at the roadside unit is initialized to w using a convolutional neural network 0
S32: vehicle V k Local data is trained based on convolutional neural network, the local training consists of a round I, and the local training is performed on the m (m E [1, l)]) In the wheel local training, the vehicle V k First, the tag probability of each local data a, i.e., y a Input to local model w k,m And then obtaining the prediction probability of the convolutional neural network for the label of each data
Figure FDA0004048467330000032
Computing w using cross entropy loss function k,m The loss value of (2) is calculated as follows:
Figure FDA0004048467330000033
s33: the local model is updated using a random gradient descent algorithm, the formula is as follows:
Figure FDA0004048467330000034
wherein ,
Figure FDA0004048467330000035
is f k (w k,m ) Is the learning rate;
s34: vehicle V k Performing m+1 local training by using the updated local model, stopping the local training when the local training round reaches l, and obtaining the updated local model w by the vehicle k
6. The asynchronous federal optimization method for vehicle selection based on DDPG algorithm according to claim 1, wherein in step S4, the training delay is:
Figure FDA0004048467330000041
wherein ,
Figure FDA0004048467330000045
time delay generated for local training of vehicle i, C 0 The number of CPU cycles, mu, required to train a datum i For the calculation resource of the vehicle i, the CPU cycle frequency is used for measuring, and each vehicle i (i is more than or equal to 1 and less than or equal to K) carries different data quantity D i
7. The asynchronous federal optimization method for vehicle selection based on DDPG algorithm according to claim 1, wherein in step S4, the transmission delay is:
Figure FDA0004048467330000042
Figure FDA0004048467330000043
d i (t)=‖P i (t)-P r
wherein ,
Figure FDA0004048467330000044
for the transmission delay of a local model uploaded by a vehicle i in a time slot t, the W is the local model size obtained by local training of each vehicle, and tr i (t) is the transmission rate of t-slot vehicle i, B is the transmission bandwidth, p 0 For the transmitting power of each vehicle, is a fixed value, h i (t) channel gain for time slot t, α is path loss index, σ 2 For noise power, the position P of the vehicle i in the time slot t i (t) is set as (d) ix (t),d y 0), where d ix(t) and dy The positions d of the antennas of the roadside units along the x-axis and the y-axis of the vehicle i at the time slot t, respectively y Is a fixed value d ix (t)=d i0 +vt,d i0 For the coordinates of the initial position of the vehicle i along the x-axis, v is the vehicle speed, t is the time slot, and the antenna height of the roadside unit is set to H r The antenna position of the roadside unit is denoted as P r =(0,0,H r )。
8. The asynchronous federal optimization method for vehicle selection based on DDPG algorithm according to claim 7, wherein h is constructed by using an autoregressive model i(t) and hi The relationship between (t-1), namely:
Figure FDA0004048467330000051
wherein ,ρi E (t) is an error vector obeying a complex gaussian distribution and is equal to h, which is a normalized channel correlation coefficient between successive time slots i (t) correlation, based on the Jack fading spectrum,
Figure FDA0004048467330000052
wherein J0 (. Cndot.) is zero-order Bessel function of the first class and +.>
Figure FDA0004048467330000053
For Doppler frequency of vehicle i, +.>
Figure FDA0004048467330000054
Λ is wavelength, θ is direction of movement, i.e., x 0 = (1, 0) and upstream communication direction, i.e. P r -P i An included angle between (t), thus +.>
Figure FDA0004048467330000055
9. The asynchronous federal optimization method for selecting vehicles based on the DDPG algorithm according to claim 1, wherein in step S4, the method for weight optimization of the local model is as follows:
performing weight optimization on the local model, wherein the weights comprise training weights and transmission weights, and the training weights are as follows:
Figure FDA0004048467330000056
wherein ,β1,k To train weights, m 1 E (0, 1) is a parameter which causes beta 1,k As the local training delay increases and decreases,
Figure FDA0004048467330000057
is a vehicle V k Is a local computation delay of (1);
the transmission weights are:
Figure FDA0004048467330000058
wherein ,β2,k (t) is a transmission weight, m 2 E (0, 1) is a parameter which causes beta 2,k (t) decreases as the transmission delay increases,
Figure FDA0004048467330000059
is a vehicle V k Is a transmission delay of (1);
according to formula w kw =w k1,k2,k Obtaining a local model with optimized weight;
wherein ,wk As a local model, w kw Beta for weight optimized local model 1,k To train weights, beta 2,k And (t) is a transmission weight.
10. The asynchronous federation optimization method for selecting vehicles based on the DDPG algorithm according to claim 9, wherein in step S5, the trained vehicles asynchronously upload the local model after weight optimization to the roadside units for asynchronous federation, and the obtaining the global model by the roadside units through repeated training for multiple rounds specifically comprises:
when the vehicle V k After uploading the local model with optimized weight to the roadside unit, the roadside unit performs global aggregation once, and the formula is as follows:
w new =βw old +(1-β)w kw
wherein ,wold Is the current global model at the roadside unit, w new W is the updated global model kw Beta epsilon (0, 1) is the aggregation proportion for the local model after weight optimization;
when the roadside unit receives the first uploaded local model at the beginning of each time slot, w old =w t-1 When the road side unit receives the local models of all the selected vehicles and gets updated K 1 Post global model w t And the global model updating of the time slot is finished.
CN202310038329.6A 2023-01-10 2023-01-10 Asynchronous federal optimization method for selecting vehicles based on DDPG algorithm Pending CN116055489A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310038329.6A CN116055489A (en) 2023-01-10 2023-01-10 Asynchronous federal optimization method for selecting vehicles based on DDPG algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310038329.6A CN116055489A (en) 2023-01-10 2023-01-10 Asynchronous federal optimization method for selecting vehicles based on DDPG algorithm

Publications (1)

Publication Number Publication Date
CN116055489A true CN116055489A (en) 2023-05-02

Family

ID=86127261

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310038329.6A Pending CN116055489A (en) 2023-01-10 2023-01-10 Asynchronous federal optimization method for selecting vehicles based on DDPG algorithm

Country Status (1)

Country Link
CN (1) CN116055489A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116542342A (en) * 2023-05-16 2023-08-04 江南大学 Asynchronous federal optimization method capable of defending Bayesian attack

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113382066A (en) * 2021-06-08 2021-09-10 江南大学 Vehicle user selection method and system based on federal edge platform
CN114051222A (en) * 2021-11-08 2022-02-15 北京工业大学 Wireless resource allocation and communication optimization method based on federal learning in Internet of vehicles environment
CN114625504A (en) * 2022-03-09 2022-06-14 天津理工大学 Internet of vehicles edge computing service migration method based on deep reinforcement learning
CN115297170A (en) * 2022-06-16 2022-11-04 江南大学 Cooperative edge caching method based on asynchronous federation and deep reinforcement learning
US20220363279A1 (en) * 2021-04-21 2022-11-17 Foundation Of Soongsil University-Industry Cooperation Method for combating stop-and-go wave problem using deep reinforcement learning based autonomous vehicles, recording medium and device for performing the method
CN115358412A (en) * 2022-08-19 2022-11-18 江南大学 Asynchronous federal optimization method based on edge auxiliary vehicle network

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220363279A1 (en) * 2021-04-21 2022-11-17 Foundation Of Soongsil University-Industry Cooperation Method for combating stop-and-go wave problem using deep reinforcement learning based autonomous vehicles, recording medium and device for performing the method
CN113382066A (en) * 2021-06-08 2021-09-10 江南大学 Vehicle user selection method and system based on federal edge platform
CN114051222A (en) * 2021-11-08 2022-02-15 北京工业大学 Wireless resource allocation and communication optimization method based on federal learning in Internet of vehicles environment
CN114625504A (en) * 2022-03-09 2022-06-14 天津理工大学 Internet of vehicles edge computing service migration method based on deep reinforcement learning
CN115297170A (en) * 2022-06-16 2022-11-04 江南大学 Cooperative edge caching method based on asynchronous federation and deep reinforcement learning
CN115358412A (en) * 2022-08-19 2022-11-18 江南大学 Asynchronous federal optimization method based on edge auxiliary vehicle network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王丙琛;司怀伟;谭国真;: "基于深度强化学习的自动驾驶车控制算法研究", 郑州大学学报(工学版), no. 04 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116542342A (en) * 2023-05-16 2023-08-04 江南大学 Asynchronous federal optimization method capable of defending Bayesian attack

Similar Documents

Publication Publication Date Title
CN110476172B (en) Neural architecture search for convolutional neural networks
CN112668128B (en) Method and device for selecting terminal equipment nodes in federal learning system
JP7301156B2 (en) Quantum variational method, apparatus and storage medium for simulating quantum systems
JP6824382B2 (en) Training machine learning models for multiple machine learning tasks
CN110276442B (en) Searching method and device of neural network architecture
WO2021259090A1 (en) Method and apparatus for federated learning, and chip
CN110832509A (en) Black box optimization using neural networks
CN111406264A (en) Neural architecture search
CN110992935A (en) Computing system for training neural networks
WO2020259504A1 (en) Efficient exploration method for reinforcement learning
CN113867843B (en) Mobile edge computing task unloading method based on deep reinforcement learning
CN116055489A (en) Asynchronous federal optimization method for selecting vehicles based on DDPG algorithm
KR20200040185A (en) Learniung method and device for neural network at adaptive learning rate, and testing method and device using the same
CN109919313A (en) A kind of method and distribution training system of gradient transmission
CN111416774A (en) Network congestion control method and device, computer equipment and storage medium
CN116451593B (en) Reinforced federal learning dynamic sampling method and equipment based on data quality evaluation
CN113762527A (en) Data processing method, system, storage medium and electronic equipment
CN112104563A (en) Congestion control method and device
KR102120150B1 (en) Learning method and learning device for variational interference using neural network and test method and test device for variational interference using the same
CN111510473B (en) Access request processing method and device, electronic equipment and computer readable medium
CN116166406B (en) Personalized edge unloading scheduling method, model training method and system
CN116702389B (en) Nested flow calculation method for mixed traffic flow
CN110450164A (en) Robot control method, device, robot and storage medium
CN116542342A (en) Asynchronous federal optimization method capable of defending Bayesian attack
CN112949850A (en) Hyper-parameter determination method, device, deep reinforcement learning framework, medium and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20240702

Address after: 518000 1104, Building A, Zhiyun Industrial Park, No. 13, Huaxing Road, Henglang Community, Longhua District, Shenzhen, Guangdong Province

Applicant after: Shenzhen Hongyue Information Technology Co.,Ltd.

Country or region after: China

Address before: No. 258, Qingfeng Road, Liangxi District, Wuxi City, Jiangsu Province, 214000

Applicant before: Jiangnan University

Country or region before: China