CN116055489A

CN116055489A - Asynchronous federal optimization method for selecting vehicles based on DDPG algorithm

Info

Publication number: CN116055489A
Application number: CN202310038329.6A
Authority: CN
Inventors: 吴琼; 王思远
Original assignee: Jiangnan University
Current assignee: Shenzhen Hongyue Information Technology Co ltd
Priority date: 2023-01-10
Filing date: 2023-01-10
Publication date: 2023-05-02

Abstract

The invention provides an asynchronous federal optimization method for selecting vehicles based on a DDPG algorithm, which comprises the steps of setting the system state, action and rewards of a time slot t according to the transmission rate of the vehicle, the size of available computing resources and the position of the vehicle; selecting vehicles participating in training according to the system action of the time slot t to obtain selected vehicles; the selected vehicle performs local training by using local data to obtain a corresponding local model; considering the hysteresis influence of training time delay on the local model trained by the vehicle, and carrying out weight optimization on the local model to obtain a weight-optimized local model; and the trained vehicle asynchronously uploads the local model after weight optimization to a roadside unit for asynchronous federal aggregation, and finally the roadside unit obtains a global model through repeated training of multiple rounds. The method is simple and convenient to calculate, the system model is reasonable, and simulation experiments prove that the method can obtain higher overall model precision in the vehicle environment.

Description

Asynchronous federal optimization method for selecting vehicles based on DDPG algorithm

Technical Field

The invention relates to the technical field of vehicle-mounted networks, in particular to an asynchronous federal optimization method for selecting vehicles based on a DDPG (Deep deterministic policy gradient) algorithm.

Background

Along with the development of science and technology, the internet of vehicles technology is gradually rising, so that the travel of people is more convenient. While intelligent services on various vehicles are also emerging. The vehicle may then perform some calculation tasks as required during travel on the road. However, in the conventional cloud computing service, since the cloud is far from the vehicle, a large time delay is generated during uploading, and the cloud is not suitable for a vehicle scene moving at a high speed, so that vehicle-mounted edge computing is generated. The vehicle can upload the calculation task to a roadside Unit (RSU) which has certain calculation capability and is close to the vehicle for task processing, so that the time delay of task processing is greatly reduced. However, the task calculation of the vehicle requires the vehicle to upload local data to the roadside unit for processing by the roadside unit. This creates privacy security concerns. The vehicle user may be concerned about privacy disclosure and unwilling to upload local data. Federal learning techniques are thus created. In particular, federal learning may perform a certain number of global aggregations at roadside units. In the first round of training, the vehicle downloads the initialized global model at the roadside unit first, then uses the local data to carry out local training, uploads the local model instead of the local data after the training is finished, carries out aggregation processing of the local model after the roadside unit receives the local models of all vehicles, and then repeats the second round until the stipulated times are reached. This greatly protects the privacy of the vehicle user.

However, in the traditional federal learning, the roadside units need to wait for all vehicles to upload the local model before updating the global model, and if a certain vehicle training exists and the uploading time delay is too long, other vehicles can be caused to travel out of the coverage range of the roadside units, so that the vehicles cannot participate in the global training. An asynchronous federal training is then produced. Specifically, the vehicle uses the local data to upload the local model after local training. At the roadside unit, it performs a global model aggregation once every time it receives a local model from a vehicle upload. This enables faster updating of the global model at the roadside unit without waiting for the upload of other vehicles.

Since the vehicle itself has a certain mobility, time-varying channel conditions are created, resulting in time-varying transmission rates, which lead to different transmission delays for the vehicle. Meanwhile, different vehicles have different time-varying computing resources and have different amounts of local data, so that different local training delays are caused. In the asynchronous federal training process, because the vehicles are asynchronously uploaded with the local model, when one vehicle does not upload the local model to the roadside unit, the roadside unit updates the global model according to the other uploaded local model, and the local model of the vehicle has certain staleness. The staleness is related to the local training delay and the transmission delay. It is then crucial to consider the impact of the above factors on the accuracy of the global model at the roadside units.

Because bad nodes may exist in the vehicle, that is, the vehicle itself has little available computing resources, the local data volume is small, and the local model after the local training is interfered to some extent. The presence of bad nodes can greatly affect the accuracy of the global model at the roadside units. It is then necessary to select the vehicle nodes that participate in the global aggregation.

Therefore, the invention provides an asynchronous federal optimization method for selecting vehicles based on a DDPG algorithm under the condition of comprehensively considering the mobility of the vehicles, time-varying channel conditions, time-varying available computing resources of the vehicles themselves, different local data amounts of the vehicles and the existence of bad nodes of the vehicles.

Disclosure of Invention

Therefore, the embodiment of the invention provides an asynchronous federal optimization method for selecting vehicles based on a DDPG algorithm, which is used for solving the problem of low precision of a generated global model caused by mobility of the vehicles, time-varying channel conditions, time-varying available computing resources of the vehicles, different local data amounts of the vehicles and the existence of bad nodes of the vehicles in the prior art.

In order to solve the above problems, an embodiment of the present invention provides an asynchronous federal optimization method for selecting a vehicle based on a DDPG algorithm, the method comprising:

s1: setting the system state, action and rewards of the time slot t according to the transmission rate of the vehicle, the size of available computing resources and the position of the vehicle;

s2: selecting vehicles participating in training according to the system action of the time slot t to obtain selected vehicles;

s3: the selected vehicle performs local training by using local data to obtain a corresponding local model;

s4: considering the hysteresis influence of training time delay and transmission time delay on a local model trained by a vehicle, and performing weight optimization on the local model to obtain a weight-optimized local model;

s5: and the trained vehicle asynchronously uploads the local model after weight optimization to a roadside unit for asynchronous federal aggregation, and finally the roadside unit obtains a global model through repeated training of multiple rounds.

Preferably, in step S1, setting the system state, action and rewards of the time slot t according to the transmission rate of the vehicle itself, the available computing resource size and the vehicle position includes:

the system state of the set time slot t is as follows:

s(t)＝(Tr(t),μ(t),d _x (t),a(t-1))

where s (t) is the system state of the time slot t, tr (t) is the set of transmission rates of all vehicles in the time slot t, μ (t) is the set of available computing resources of all vehicles in the time slot t, d _x (t) is a set of position coordinates of all vehicles along the x-axis at time slot t, and a (t-1) is a system action of time slot t-1;

the system action of the set time slot t is as follows:

a(t)＝(λ ₁ (t),λ ₂ (t),…，λ _K (t))

wherein a (t) is the system operation of time slot t, lambda _i (t),i∈[1,K]Representation selectionProbability of selecting vehicle i, let λ ₁ (0)＝λ ₂ (0)＝…＝λ _K (0)＝1；

The system rewards of the set time slot t are as follows:

where r (t) is the system prize for time slot t, ω ₁ and ω₂ Is a non-negative weight factor, a _di (t) is the system action of time slot t, lambda _i (t),i∈[1,K]Representing the probability of selecting vehicle i, loss (t) is the Loss value calculated in the asynchronous federal training,

delay generated for local training of vehicle i, < >>

And uploading the transmission delay of the local model for the vehicle i in the time slot t.

Preferably, in step S2, the vehicle involved in the training is selected according to the system action of the time slot t, and the obtaining of the selected vehicle includes the following steps:

s21: set a _d (t)＝(a _d1 (t),a _d2 (t),…，a _dK (t))；

S22: lambda is set to _i (t) normalization processing, setting lambda _i (t) corresponding a with a value of more than or equal to 0.5 _di (t) is denoted as 1, otherwise 0, the resulting set a _d (t) is composed of 0 and 1, 1 means selecting a vehicle, and 0 means not selecting a vehicle.

Preferably, the expected long-term discount rewards of the system based on the time slot t may be expressed as:

where gamma E (0, 1) is the discount factor, N is the total number of slots, mu is the policy of the system, J (mu) is the expected long-term discount rewards of the system.

Preferably, in step S3, the step of performing local training on the selected vehicle by using local data to obtain a corresponding local model includes the following steps:

s31: at time slot t, vehicle V _k Downloading global model w from roadside units _t-1 Wherein, at time slot 1, the global model at the roadside unit is initialized to w using a convolutional neural network ₀ ；

S32: vehicle V _k Local data is trained based on convolutional neural network, the local training consists of a round I, and the local training is performed on the m (m E [1, l)]) In the wheel local training, the vehicle V _k First, the tag probability of each local data a, i.e., y _a Input to local model w _k,m And then obtaining the prediction probability of the convolutional neural network for the label of each data

Computing w using cross entropy loss function _k,m The loss value of (2) is calculated as follows:

s33: the local model is updated using a random gradient descent algorithm, the formula is as follows:

wherein ,

is f _k (w _k,m ) Is the learning rate;

s34: vehicle V _k Performing m+1 local training by using the updated local model, stopping the local training when the local training round reaches l, and obtaining the updated local model w by the vehicle _k 。

Preferably, the training delay is:

wherein ,

time delay generated for local training of vehicle i, C ₀ The number of CPU cycles, mu, required to train a datum _i For the calculation resource of the vehicle i, the CPU cycle frequency is used for measuring, and each vehicle i (i is more than or equal to 1 and less than or equal to K) carries different data quantity D _i 。

Preferably, the transmission delay is:

d _i (t)＝||P _i (t)-P _r ||

wherein ,

for the transmission delay of a local model uploaded by a vehicle i in a time slot t, the W is the local model size obtained by local training of each vehicle, and tr _i (t) is the transmission rate of t-slot vehicle i, B is the transmission bandwidth, p ₀ For the transmitting power of each vehicle, is a fixed value, h _i (t) channel gain for time slot t, α is path loss index, σ ² For noise power, the position P of the vehicle i in the time slot t _i (t) is set as (d) _ix (t),d _y 0), where d _ix(t) and d_y The positions d of the antennas of the roadside units along the x-axis and the y-axis of the vehicle i at the time slot t, respectively _y Is a fixed value d _ix (t)＝d _i0 +vt，d _i0 The coordinates along the x-axis of the initial position of the vehicle i, v is the vehicle speed,t is a time slot, and the antenna height of the roadside unit is set to be H _r The antenna position of the roadside unit is denoted as P _r ＝(0,0,H _r )。

Preferably, an autoregressive model is employed to construct h _i(t) and h_i The relationship between (t-1), namely:

wherein ,ρ_i E (t) is an error vector obeying a complex gaussian distribution and is equal to h, which is a normalized channel correlation coefficient between successive time slots _i (t) correlation, based on the Jack fading spectrum,

wherein J₀ (. Cndot.) is zero-order Bessel function of the first class and +.>

For Doppler frequency of vehicle i, +.>

Λ is wavelength, θ is direction of movement, i.e., x ₀ = (1, 0) and upstream communication direction, i.e. P _r -P _i An included angle between (t), thus +.>

Preferably, in step S4, the method for performing weight optimization on the local model is as follows:

performing weight optimization on the local model, wherein the weights comprise training weights and transmission weights, and the training weights are as follows:

wherein ,β_1,k To train weights, m ₁ E (0, 1) is a parameter which causes beta _1,k Along with the bookThe ground training time delay is increased and decreased,

is a vehicle V _k Is a local computation delay of (1);

the transmission weights are:

wherein ,β_2,k (t) is a transmission weight, m ₂ E (0, 1) is a parameter which causes beta _2,k (t) decreases as the transmission delay increases,

is a vehicle V _k Is a transmission delay of (1);

according to formula w _kw ＝w _k *β _1,k *β _2,k Obtaining a local model with optimized weight;

wherein ,w_k As a local model, w _kw Beta for weight optimized local model _1,k To train weights, beta _2,k And (t) is a transmission weight.

Preferably, in step S5, the trained vehicle asynchronously uploads the local model after weight optimization to the roadside unit for asynchronous federal aggregation, and the obtaining the global model by the roadside unit through repeated training for multiple rounds specifically includes:

preferably, when the vehicle V _k After uploading the local model with optimized weight to the roadside unit, the roadside unit performs global aggregation once, and the formula is as follows:

w _new ＝βw _old +(1-β)w _kw

wherein ,w_old Is the current global model at the roadside unit, w _new W is the updated global model _kw Beta epsilon (0, 1) is the aggregation proportion for the local model after weight optimization;

when the roadside unit receives the first uploaded local model at the beginning of each time slot, w _old ＝w _t-1 When the road side unit receives the local models of all the selected vehicles and gets updated K ₁ Post global model w _t And the global model updating of the time slot is finished.

From the above technical scheme, the invention has the following advantages:

the embodiment of the invention provides an asynchronous federal optimization method for selecting vehicles based on a DDPG algorithm, which uses a deep reinforcement learning algorithm to select vehicles participating in training according to the transmission rate of the vehicles, the size of available computing resources and the positions of the vehicles, and removes possible bad nodes in the vehicles; the vehicle adopts asynchronous federal training, and the roadside units aggregate global models once each time when receiving a local model from the uploading of the vehicle, so that the global model at the roadside units can be updated more quickly without waiting for the uploading of other vehicles; when the vehicle carries out the local model, the hysteresis influence caused by training time delay and transmission time delay on the local model trained by the vehicle is considered, the weight optimization is carried out on the local model, and the accuracy of the global model at the roadside unit is improved.

Drawings

For a clearer description of embodiments of the invention or of solutions in the prior art, reference will be made to the accompanying drawings, which are intended to be used in the examples, for a clearer understanding of the characteristics and advantages of the invention, by way of illustration and not to be interpreted as limiting the invention in any way, and from which, without any inventive effort, a person skilled in the art can obtain other figures. Wherein:

FIG. 1 is a flow chart of an asynchronous federal optimization method for selecting a vehicle based on a DDPG algorithm, according to an embodiment;

FIG. 2 is a schematic view of a scene framework of the method of the present invention;

FIG. 3 shows two amounts in the bonus, test phase: schematic diagram of the sum of the loss value and the vehicle delay in asynchronous federal learning, namely the sum of the local delay and the transmission delay;

FIG. 4 is a schematic diagram of the accuracy comparison of the method of the present invention with asynchronous federal learning in the presence of bad nodes during the test phase;

FIG. 5 is a schematic diagram of the method of the present invention in comparison to loss of asynchronous federal learning in the presence of bad nodes during the test phase;

FIG. 6 is a schematic diagram of the accuracy of the method of the present invention in comparison to asynchronous Federal learning without local weighting, with nodes selected;

FIG. 7 is a schematic diagram of the method of the present invention in comparison to the loss of asynchronous federal learning without local weight treatment, with nodes selected;

FIG. 8 is a schematic diagram of the method of the present invention versus training time delay for federal learning as global passes increase;

FIG. 9 is a graph showing the accuracy of the method of the present invention versus the accuracy of a global model of conventional asynchronous federal learning at a selected node at different beta values.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more clear, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

As shown in fig. 1, an embodiment of the present invention provides an asynchronous federal optimization method for selecting a vehicle based on a DDPG algorithm, the method comprising:

The invention provides an asynchronous federal optimization method for selecting vehicles based on a DDPG algorithm, which is characterized in that time slots are set according to the transmission rate of the vehicles, the size of available computing resources and the positions of the vehicles, the vehicles participating in training are selected, and possible bad nodes in the vehicles are removed; the selected vehicle performs local training by using local data to obtain a corresponding local model, and when the vehicle performs the local model, the hysteresis influence of training time delay and transmission time delay on the trained local model of the vehicle is considered, the weight of the local model is optimized, and the accuracy of the global model at the roadside unit is improved; the trained vehicle asynchronously uploads the local model with optimized weight to the roadside unit for asynchronous federation, and finally the roadside unit obtains the global model through repeated training of multiple rounds. The method is simple and convenient to calculate, the system model is reasonable, and simulation experiments prove that the method can obtain higher overall model precision in the vehicle environment.

FIG. 2 is a schematic view of a scene framework of the method of the present invention, wherein a deep reinforcement learning algorithm is used to select vehicles participating in training according to the transmission rate of the vehicles themselves, the size of available computing resources, the position of the vehicles, etc., and then the selected vehicles train a local model by using an asynchronous federal technique and then upload to roadside units to finally obtain a relatively accurate global model.

Further, in step S1, it includes:

the system state, action and rewards of the time slot t are set according to the transmission rate of the vehicle, the size of available computing resources and the position of the vehicle, and specifically comprise:

since the mobility of a vehicle can be represented by its change in position, the training time and uploading time of the local model of the vehicle are related to the time-varying available computing resources of the vehicle itself and to the current channel conditions, and the system state s (t) of the time slot t is then defined as:

s(t)＝(Tr(t),μ(t),d _x (t),a(t-1))

where s (t) is the system state of the time slot t, tr (t) is the set of transmission rates of all vehicles in the time slot t, μ (t) is the set of available computing resources of all vehicles in the time slot t, d _x (t) is a set of position coordinates of all vehicles along the x-axis in each time slot t, and a (t-1) is a system action of the time slot t-1.

Since the invention aims to select a better vehicle to perform asynchronous federal learning training according to the current state, the system action a (t) of the t time slot is defined as:

wherein a (t) is the system operation of time slot t, lambda _i (t),i∈[1,K]Representing the probability of selecting vehicle i, let λ ₁ (0)＝λ ₂ (0)＝…＝λ _K (0)＝1。

The invention aims to select vehicles with better performance to perform asynchronous federal training so as to obtain a more accurate global model at a roadside unit, and simultaneously consider the time delay and the precision of the global model, so that a system reward r (t) of a time slot t is defined as:

delay generated for local training of vehicle i, < >>

The expected long-term discount rewards of the system may be expressed as:

Further, in step S2, it includes:

to select a specific vehicle, set a _d (t)＝(a _d1 (t),a _d2 (t),…，a _dK (t)), lambda is set _i (t) normalization processing, setting lambda _i (t) corresponding a with a value of more than or equal to 0.5 _di (t) is denoted as 1, otherwise 0, the resulting set a _d (t) is composed of 0 and 1, 1 means selecting a vehicle, and 0 means not selecting a vehicle.

Further, in step S3, it includes:

the selected vehicle is locally trained by using local data to obtain a corresponding local model, and the method comprises the following steps of:

S32: vehicle V _k Training local data based on convolutional neural networkThe local training consists of a round of l, at m (mE [1, l)]) In the wheel local training, the vehicle Vk first sets the tag probability of each local data a, i.e., y _a Input to local model w _k,m And then obtaining the prediction probability of the convolutional neural network for the label of each data

wherein ,

is f _k (w _k,m ) Is the learning rate;

Further, in step S4, it includes:

when the vehicle performs local training, training delay and transmission delay are generated, and the training delay is as follows:

wherein ,

time delay generated for local training of vehicle i, C ₀ The number of CPU cycles, mu, required to train a datum _i For the calculation resource of the vehicle i, the CPU cycle frequency is used for measuring, and each vehicle i (i is more than or equal to 1 and less than or equal to K) carries different data quantity D _i ；

The transmission delay is as follows:

d _i (t)＝||P _i (t)-P _r ||

wherein ,

for the transmission delay of a local model uploaded by a vehicle i in a time slot t, the W is the local model size obtained by local training of each vehicle, and tr _i (t) is the transmission rate of t-slot vehicle i, B is the transmission bandwidth, p ₀ For the transmitting power of each vehicle, is a fixed value, h _i (t) channel gain for time slot t, α is path loss index, σ ² For noise power, the position P of the vehicle i in the time slot t _i (t) is set as (d) _ix (t),d _y 0), where d _ix(t) and d_y The positions d of the antennas of the roadside units along the x-axis and the y-axis of the vehicle i at the time slot t, respectively _y Is a fixed value d _ix (t)＝d _i0 +vt，d _i0 For the coordinates of the initial position of the vehicle i along the x-axis, v is the vehicle speed, t is the time slot, and the antenna height of the roadside unit is set to H _r The antenna position of the roadside unit is denoted as P _r ＝(0,0,H _r )。

Wherein, an autoregressive model is adopted to construct h _i(t) and h_i The relationship between (t-1), namely:

For Doppler frequency of vehicle i, +.>

Unlike traditional asynchronous federal learning, the invention considers the hysteresis effect of training time delay and transmission time delay on the local model trained by the vehicle. Specifically, since there is a delay in both local training of the vehicle and uploading of the local model to the roadside unit, there may be a case where the roadside unit has received the local model uploaded from the other vehicle and updated the global model when one vehicle goes from the local training to the uploading to the roadside unit. In this case, the local model trained on the vehicle has a certain hysteresis. Thus the invention is directed to vehicle V _k And (3) performing certain weight processing, namely setting training weights and transmission weights. The specific calculation method is as follows:

wherein ,β_1,k To train weights, m ₁ E (0, 1) is a parameter which causes beta _1,k As the local training delay increases and decreases,

is a vehicle V _k Is a local computation delay of (1);

the transmission weights are:

is a vehicle V _k Is a transmission delay of (1);

Further, in step S5, it includes:

the trained vehicle asynchronously uploads the local model after weight optimization to a roadside unit for asynchronous federal aggregation, and the roadside unit finally obtains a global model by repeated training of multiple rounds, wherein the global model comprises the following specific steps:

when the vehicle V _k After uploading the local model with optimized weight to the roadside unit, the roadside unit performs global aggregation once, and the formula is as follows:

w _new ＝β _wold +(1-β)w _kw

wherein ,w_old Is the current global model at the roadside unit, w _new W is the updated global model _kw For the rightBeta epsilon (0, 1) is the aggregation ratio of the re-optimized local model;

At the same time, the average Loss (t) of the vehicles involved in the training can be obtained, which can be expressed as:

wherein ,f_k (w _k ) Is a local model w _k Is a loss value of (2).

To further illustrate the principles and advantages of the present invention, reference is made to the following detailed description of the invention.

The present invention aims to find an optimal strategy mu ^* To maximize the expected long-term discount rewards of the system.

The overall algorithm specifically adopted by the invention comprises two parts, namely an algorithm based on a training stage of a DAFL (Data-Free Learning) framework and an algorithm based on a testing stage of the DAFL framework.

Wherein the training phase algorithm steps based on the DAFL framework are shown in table 1.

TABLE 1

The invention provides an asynchronous federal optimization method for selecting vehicles based on a DDPG algorithm, wherein the DDPG algorithm is based on an actor-critic network architecture. The actor network is used for carrying out strategy improvement, and the critic network is used for enteringRow policy evaluation. Specifically, an actor network is used to approximate a policy μ, the approximated policy of which is denoted μ _δ . The actor network is based on strategy mu _δ And observe the state to output the action.

The invention promotes and evaluates the strategy through iteration so as to finally obtain the optimal strategy. In order to ensure the stability of the DDPG algorithm, a target network consisting of a target actor network and a target critic network is also adopted, and the architecture of the DDPG algorithm is respectively the same as that of the actor network and the critic network.

Setting delta as an actor network parameter, and zeta as a critic network parameter ^* For optimized actor network parameters, ζ ^* Delta for optimized critic network parameters ₁ Is the target actor network parameter, ζ ₁ Is the target critical network parameter. τ is the update parameter of the target network, Δ _t Noise explored for slot t action. I is the small lot size. The algorithm of the training phase will be described in detail next.

First, δ and ζ are randomly initialized and δ in the target network is simultaneously initialized ₁ and ξ₁ Initialized to delta and zeta, respectively. At the same time, experience playback buffer R _b Initialization is performed.

Next, the algorithm will execute E _max Each round. In the first round, the locations of all vehicles, the channel states, and the available computing resource sizes of the vehicles themselves are reset. And set lambda ₁ (0)＝λ ₂ (0)＝…＝λ _K (0) =1, then in the first time slot, the system can obtain the initial state s (1) = (Tr (1), μ (1), d _x (1) A (0)). Initializing global model w at roadside units simultaneously using CNN (Convolutional Neural Networks, convolutional neural network) ₀ 。

The algorithm will then execute continuously from slot 1 to a maximum number of slots N. In the first time slot, the actor network obtains the output mu according to the state _δ (s|delta) where a random noise delta is added to the motion _t The system then gets action a (1) =μ _δ (s(1)|δ)+Δ _t . Subsequently calculate a from the action _d (1) The time slot selected vehicle is determined. Is selected to beThe vehicle performs asynchronous federal training, i.e., the vehicle trains a local model based on local data, and then asynchronously uploads to the roadside units for global model updating, after which Loss values Loss (1) are calculated. Meanwhile, the local training time delay and the transmission time delay of the vehicle are calculated, so that the system rewards under the time slot 1 can be obtained. The vehicle location is then updated, the channel conditions are recalculated, and the available computing resources of the vehicle itself, the transmission rate of the vehicle is updated so that the system can observe the next state s (2). Then storing the tuples (s (1), a (1), R (1), s (2)) into R _b Is a kind of medium.

When R is _b When the number of the tuples in the system is less than or equal to I, the system directly inputs the next state into the actor network and carries out the next iteration.

When R is _b When the number of the tuples in the target network is larger than I, parameters delta, zeta and delta in an actor network, a critic network and a target network are calculated ₁ and ξ₁ Update is started to maximize J (μ) _δ ). The parameter delta of the actor network is towards J (mu) _δ ) The gradient direction of (i.e.)

And updating. Will obey strategy μ at s (t) and a (t) _δ Is set to +.>

The expression is as follows:

which represents a long-term expected discount prize for the time slot t system.

Solving for

Can be solved by +.>

Gradient of->

Instead of it. Critic network uses the parameter ζ pair->

Approximately Q _ξ (s(t),a(t))。/>

Next, the parameters δ, ζ, δ for the time slot t will be described ₁ and ξ₁ Is updated by the update method of (a). When R is _b When the number of the tuples in the system is greater than I, the system is controlled by R _b The I tuples are randomly decimated into a small batch. Design(s) _x ,a _x ,r _x ,s′ _x ),x∈[1,2,…,I]Is the x-th tuple in the small lot. The system then first takes s' _x Inputting target actor network to obtain output action

Then the s 'is added again' _x and a′_x Inputting the target critic network to obtain an output action value function

The target value may then be calculated as:

then according to s _x and a_x The critic network has an output Q _ξ (s _x ,a _x ) The loss of tuple x can then be calculated as:

L _x ＝[y _x -Q _ξ (s _x ,a _x )] ²

when all tuples are input to the critic network and the target network, a loss function is obtained:

critic network is passed through pair

The gradient descent method is used to minimize the loss function L (ζ) and thereby update the parameter ζ.

Similarly, the actor network is connected with the server by the server

Maximizing J (μ) using gradient ascent _δ ) Thereby updating the parameter delta. Wherein->

The formula is calculated by the action value function approximated by the critic network as follows:

wherein Q_ξ The input of (2) is

At the end of the time slot t, updating the parameters of the target network, wherein the updating formula is as follows:

ξ ₁ ←τξ+(1-τ)ξ ₁

δ ₁ ←τδ+(1-τ)δ ₁

wherein τ is a constant and satisfies τ < 1.

Finally the system inputs s' into the actor network and starts the iterative computation of the next slot. When the time slot t reaches a maximum value N, the round ends. Then the system reinitializes the state value s (1) = (Tr (1), μ (1), d _x (1) A (0)) and the next round of training is performed. When the number of rounds reaches the maximum E _max After the training is finished, obtaining optimized parameters of the actor network, the critic network, the target actor network and the target critic network, namely delta ^* 、ξ ^* 、

and

The test phase simulates the training phase's critic network, target actor network, and target critic network. And using a parameter delta with an optimum value ^* Is a policy of optimization.

Wherein the test phase algorithm steps based on the DAFL framework are shown in table 2.

TABLE 2

1. For each round 1.ltoreq.epi.ltoreq.E' _max Performing:
	2. resetting simulation parameters of the system model, initializing a global model at the roadside unit
3. Obtaining an initial state s (1)
	4. For each time slot 1 t N:
5. generating actions a=μ according to the current policy _δ (s\|δ)
	6. Calculation of a _d Determining a selected vehicle
7. Weight-based AFL update training of selected vehicles
	8. Obtaining prize r and next state s 'from current system'

Through the above experiments, the method of the invention has the following conclusion:

1. in the test phase, as the step increases, the loss value in the asynchronous federation, i.e. the local delay, gradually decreases, and the delay of the vehicle, i.e. the transmission delay, remains within a certain range, as shown in fig. 3.

2. In the test stage, under the condition that bad nodes exist, compared with asynchronous federal learning and federal learning, the method has higher precision and lower loss, as shown in fig. 4 and 5.

3. In the case of selecting a node of the nodes, the method of the present invention has higher accuracy and lower loss than asynchronous federal learning without local weight processing and federal learning without local weight processing, as shown in fig. 6 and 7.

4. With the increase of global rounds, the training time delay of the method is smaller compared with federal learning, as shown in fig. 8.

5. At different beta values, the accuracy of the method is higher than that of the conventional asynchronous federal learning global model under the selection node, as shown in fig. 9.

Note that the above is only a preferred embodiment of the present invention and the technical principle applied. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, while the invention has been described in connection with the above embodiments, the invention is not limited to the embodiments, but may be embodied in many other equivalent forms without departing from the spirit or scope of the invention, which is set forth in the following claims.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It is apparent that the above examples are given by way of illustration only and are not limiting of the embodiments. Other variations and modifications of the present invention will be apparent to those of ordinary skill in the art in light of the foregoing description. It is not necessary here nor is it exhaustive of all embodiments. And obvious variations or modifications thereof are contemplated as falling within the scope of the present invention.

Claims

1. An asynchronous federal optimization method for selecting vehicles based on a DDPG algorithm, comprising:

2. The asynchronous federal optimization method for vehicle selection based on DDPG algorithm according to claim 1, wherein the setting of the system state, actions, and rewards of the time slot t according to the vehicle' S own transmission rate, the available computing resource size, and the vehicle location in step S1 comprises:

the system state of the set time slot t is as follows:

s(t)＝(Tr(t),μ(t),d _x (t),a(t-1))

the system action of the set time slot t is as follows:

a(t)＝(λ ₁ (t),λ ₂ (t),…，λ _K (t))

wherein a (t) is the system operation of time slot t, lambda _i (t),i∈[1,K]Representing the probability of selecting vehicle i, let λ ₁ (0)＝λ ₂ (0)＝…＝λ _K (0)＝1；

The system rewards of the set time slot t are as follows:

delay generated for local training of vehicle i, < >>

3. The asynchronous federal optimization method for selecting vehicles based on the DDPG algorithm according to claim 2, wherein in step S2, the vehicles participating in the training are selected according to the system action of the slot t, and the step of obtaining the selected vehicles comprises the steps of:

s21: set a _d (t)＝(a _d1 (t),a _d2 (t),…，a _dK (t))；

4. The asynchronous federal optimization method for vehicle selection based on DDPG algorithm according to claim 2, wherein the expected long-term discounted rewards of the system based on slot t can be expressed as:

5. The asynchronous federal optimization method for vehicle selection based on DDPG algorithm according to claim 1, wherein the local training of the selected vehicle using local data to obtain the corresponding local model in step S3 comprises the steps of:

wherein ,

is f _k (w _k,m ) Is the learning rate;

6. The asynchronous federal optimization method for vehicle selection based on DDPG algorithm according to claim 1, wherein in step S4, the training delay is:

wherein ,

7. The asynchronous federal optimization method for vehicle selection based on DDPG algorithm according to claim 1, wherein in step S4, the transmission delay is:

d _i (t)＝‖P _i (t)-P _r ‖

wherein ,

8. The asynchronous federal optimization method for vehicle selection based on DDPG algorithm according to claim 7, wherein h is constructed by using an autoregressive model _i(t) and h_i The relationship between (t-1), namely:

For Doppler frequency of vehicle i, +.>

9. The asynchronous federal optimization method for selecting vehicles based on the DDPG algorithm according to claim 1, wherein in step S4, the method for weight optimization of the local model is as follows:

is a vehicle V _k Is a local computation delay of (1);

the transmission weights are:

is a vehicle V _k Is a transmission delay of (1);

10. The asynchronous federation optimization method for selecting vehicles based on the DDPG algorithm according to claim 9, wherein in step S5, the trained vehicles asynchronously upload the local model after weight optimization to the roadside units for asynchronous federation, and the obtaining the global model by the roadside units through repeated training for multiple rounds specifically comprises:

w _new ＝βw _old +(1-β)w _kw