CN116582840A

CN116582840A - Level distribution method and device for Internet of vehicles communication, storage medium and electronic equipment

Info

Publication number: CN116582840A
Application number: CN202310856689.7A
Authority: CN
Inventors: 吴琼; 师帅; 张翠; 李正权
Original assignee: Jiangnan University
Current assignee: Jiangnan University
Priority date: 2023-07-13
Filing date: 2023-07-13
Publication date: 2023-08-11

Abstract

The application discloses a level distribution method, a device, a storage medium and electronic equipment for communication of the Internet of vehicles, which are used for optimizing the amount of overhead of the communication in the Internet of vehicles. The method comprises the following steps: acquiring a state change of the vehicle; confirming state data, action data, and a reward function of the vehicle based on the state change; decoupling the federal learning network and the predictive network based on a dual-depth Q network according to the state data, the action data, the reward function to define a second loss function according to a time step to determine a level allocation value; the level assignment value is updated based on the dual Q learning network.

Description

Level distribution method and device for Internet of vehicles communication, storage medium and electronic equipment

Technical Field

The present application relates to the field of communications technologies, and in particular, to a method and an apparatus for level allocation in internet of vehicles communications, a storage medium, and an electronic device.

Background

With the development of the internet of vehicles and artificial intelligence technology, more and more AI applications are deployed in a vehicle-mounted operating system, and massive data generated by the vehicle-mounted applications also bring serious challenges to the calculation and storage of the internet of vehicles. The vehicle-mounted edge calculation can provide high-bandwidth, low-delay and high-reliability service for users, and support for intelligent service, but at the same time, the risk of exposing private data of the users exists. As a promising privacy preserving paradigm, federal Learning (FL) uses only the parameters of a decentralized trained local model to synthesize a global model, avoiding leakage of sensitive data. However, since federal learning involves frequent model exchanges, a significant amount of communication and computing resources are consumed, and introducing federal learning into the internet of vehicles introduces additional overhead. In a vehicle network, how to reasonably allocate quantization levels according to high mobility of a vehicle and dynamic change of a channel, and jointly optimizing learning delay and quantization errors are the current technical problems.

The conventional method adopts a centralized reinforcement learning framework to dynamically adjust the quantization level so as to accelerate the distributed learning speed, federal learning is basically transmitted in parallel, the method can cause extremely high signaling consumption, and meanwhile, a personal decision maker is easily influenced by geographic position and surrounding obstacles, so that the observation capability is partially limited, and the base station is difficult to collect accurate instantaneous channel state information due to the high mobility of vehicles.

Therefore, a low complexity and efficient solution is needed to solve the above technical problems.

Disclosure of Invention

The application provides a level distribution device, a storage medium and electronic equipment for Internet of vehicles communication, which are used for solving the problem that a low-complexity and effective solution is lacking in resource distribution in the Internet of vehicles, so that the cost of data transmission in the Internet of vehicles is optimized.

In order to achieve the above purpose, the application adopts the following technical scheme:

in a first aspect, the present application provides a level allocation method for internet of vehicles communication, applied to a base station, where the base station includes an edge server, the base station and a plurality of vehicles within a coverage area of the base station form a federal learning network, and the federal learning network adopts a gradient quantization technology, and the method includes:

the federal learning model is used for communicating between the vehicles in the federal learning network; federal learning includes image recognition learning of the vehicle, a process of the federal learning including a plurality of iterations;

a stochastic quantization model for local gradient quantization of the level;

the communication model is used for carrying out wireless transmission of data between the edge server and the vehicle;

the calculation model is used for calculating the federal learning time and the minimum communication round number;

acquiring state changes of the vehicle, wherein the state changes are data changes of environments where the vehicle is located at adjacent moments after the vehicle executes a preset strategy;

confirming state data, action data, and a reward function of the vehicle based on the state change;

decoupling the federal learning network and the predictive network based on a dual-depth Q network according to the state data, the action data, the reward function to define a second loss function according to a time step to determine a level allocation value;

the level assignment value is updated based on the dual Q learning network.

In one possible implementation, the method further includes:

an empirical loss of the vehicle during the federal learning process is assessed based on a first loss function.

In one possible implementation, the federal learning process implements the plurality of iterations according to a distributed approach using a random gradient descent algorithm, the vehicle having a learning dataset, the federal learning process comprising:

calculating a local random gradient from the learning dataset based on the random quantization model;

and aggregating the local random gradients to generate a global federal learning model.

In one possible implementation manner, the communication model adopts a non-orthogonal multiple access technology to realize data transmission between the vehicle and the edge server, provides orthogonal subcarriers of each vehicle associated with the edge server, and specifically further comprises:

the transmission rate of the vehicle, the channel gain from the edge server, and the distance are calculated based on shannon's theorem.

In one possible implementation, the calculating the minimum number of communication rounds of federal learning using the calculation model includes:

the convergence value of the minimum communication round is calculated based on a hypothesis deduction of the first loss function.

In one possible implementation, the method further includes:

and testing the level distribution method based on a simulation experiment.

In a second aspect, a level allocation device for internet of vehicles communication is also provided, including:

the acquisition module is used for acquiring the state change of the vehicle;

a first calculation module for confirming state data, action data, and a reward function of the vehicle based on the state change;

a second calculation module, configured to decouple the federal learning network and the prediction network based on a dual-depth Q network according to the state data, the action data, and the reward function, so as to define a second loss function according to a time step to determine a level distribution value;

the level assignment value is updated based on the dual Q learning network.

In a third aspect, a storage medium is provided, where the storage medium includes at least one processor and a level allocation method for controlling, when running with the program, a device where the storage medium is located to perform an internet of vehicles communication as in the first aspect.

In a fourth aspect, an electronic device includes at least one processor, at least one memory coupled to the processor, and a bus; the processor and the memory complete communication with each other through the bus; the processor is configured to invoke the program instructions in the memory to perform a level allocation method for internet of vehicles communication as in the first aspect.

The application provides a level distribution method, a device, a storage medium and electronic equipment for Internet of vehicles communication, which are applied to a scene that Internet of vehicles and a base station perform data transmission. When channel resources are required to be reasonably allocated and the transmission cost of a vehicle is optimized, a federal learning model, a random quantization model, a communication model and a calculation model can be established; acquiring the state change of the vehicle; confirming state data, action data and rewarding function of the vehicle based on the state change; the level assignment value is determined based on the dual deep Q network and the dual Q learning network. Thereby solving the problem of the limitation of centralized quantization level allocation. Overhead costs in communication can be optimized.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only embodiments of the present application, and other drawings can be obtained according to the provided drawings without inventive effort to a person skilled in the art.

Fig. 1 is a schematic flow chart of a level allocation method for internet of vehicles communication according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a gradient quantized federal learning model of the Internet of vehicles according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a dual depth Q network framework according to an embodiment of the present application;

FIG. 4 is a training reward diagram for dual deep Q network learning according to an embodiment of the present application;

FIG. 5 is a diagram of federal learning loss based on dual depth Q network framework gradient quantization according to an embodiment of the present application;

FIG. 6 is a schematic diagram of federal learning test accuracy based on dual depth Q network frame gradient quantization according to an embodiment of the present application;

FIG. 7 is a graph showing average total learning time versus baseline method for federal learning based on dual depth Q network framework gradient quantification in accordance with an embodiment of the present application;

FIG. 8 is a graph showing average quantization error versus baseline method for Federal learning based on dual depth Q network framework gradient quantization according to an embodiment of the present application;

FIG. 9 is a schematic diagram of a comparison of long-term discount rewards based on a dual depth Q network frame gradient quantification and baseline approach provided by an embodiment of the application;

FIG. 10 is a schematic diagram showing a comparison of learning time under different vehicle numbers based on a dual depth Q network frame gradient quantification and baseline method according to an embodiment of the present application;

FIG. 11 is a schematic diagram showing a comparison of quantization errors under different vehicle numbers based on a dual depth Q network frame gradient quantization and baseline method according to an embodiment of the present application;

FIG. 12 is a schematic diagram showing a comparison of long-term discount rewards under different vehicle numbers based on a dual depth Q network frame gradient quantification and baseline method provided by an embodiment of the present application;

fig. 13 is a schematic structural diagram of a level allocation device for internet of vehicles communication according to an embodiment of the present application;

fig. 14 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application. In the description of the present application, unless otherwise indicated, "at least one", "a plurality" means two or more. The terms "first," "second," and the like do not limit the number and order of execution, and the terms "first," "second," and the like do not necessarily differ.

The level allocation in the conventional internet of vehicles usually adopts a centralized reinforcement learning framework to dynamically adjust the quantization level so as to accelerate the distributed learning speed, and the method can cause extremely high signaling consumption and cannot meet the requirement of the vehicle due to high mobility, so that the base station has difficulty in collecting the characteristic of accurate instantaneous channel state information.

In order to solve the problems, the embodiment of the application provides a level distribution method, a device, a storage medium and electronic equipment for Internet of vehicles communication, gradient quantization is introduced on the basis of conventional level distribution, and the cost of communication transmission in Internet of vehicles is optimized based on a double-depth Q network frame for level distribution.

Fig. 1 shows a flow diagram of a method for level allocation for internet of vehicles communication according to an embodiment of the present application.

It should be noted that, the base station trains a learning model together with the vehicles within its coverage area, and many edge servers are distributed on roadsides, such as roadside units and cellular base stations, considering automatic driving in urban situations. As shown in fig. 2, a group of vehicles on the road are covered by an edge server, withNVehicle, by collection ofAnd (3) representing. Every vehicle->Representation->Each vehicle is equipped with an MEC processor that provides local and distributed processing capabilities. And communication with the wireless connection edge server is achieved through an orthogonal frequency division multiple access technology.

Federal learning model in one possible implementation, federal learning processes occur between edge servers and certain vehicles within their coverage, specifically, processes of federal learning employing gradient quantization are:

(1) Vehicle selection: before a round of iteration starts, the edge server firstly selects vehicles participating in learning and downloads the current global model;

(2) Downloading a local model: when the vehicles participating in the learning are selected, the edge server sends the initial global model to the selected vehicles so as to perform the distributed learning on the vehicles;

(3) Training in local learning: the participating vehicles train the model according to the local data set of the participating vehicles, and update the global model by adopting federal random gradient descent calculation;

(4) Gradient quantization: compressing the local gradient by adopting random gradient quantization;

(5) Uploading a global model: the vehicle sends the quantized local gradient to an edge server;

(6) Global aggregation: the edge server aggregates the local gradients and updates the global model after receiving the local gradients of the vehicle.

It should be noted that the above procedure is iterated a plurality of times until the convergence condition is satisfied.

Random quantization model a random quantization model is defined, which is used for local gradient quantization. For arbitrary vectorsRandom quantizer->The elements are defined as:

wherein the quantizer->The output of (2) is composed of three parts, namely vector norm +>The symbol of each element->，/>Representation->Is>Individual elements, and quantized value of each element +.>，/>Is an independent random variable defined as: />Wherein q represents the number of quantization levels, and +.>Is an integer such that +.>. Make the following stepsSRepresenting the number of bits transmitted after random quantization, as in the formula +.>Shown, for the purpose of quantifying gradient vector->Any element of (+)>Index representing an element), it is necessary to +.>Element direction symbolAnd normalized quantized value->Encoded as bits. In particular the number of the elements to be processed,for each->One bit is required for encoding. Let->With support->Is at least +.>Bit can be given to each +.>Encoding is performed. Since each vector contains d elements, a total of +.>Bits encode both parts. In contrast, for a large model of size d, a single scalar vector norm +.>The overhead of (2) is typically negligible. To facilitate subsequent analysis, for a large d, the approximation is made byS：/>Random quantizer->Is unbiased, i.e. for any given vector g +.>And assume +.>The variance of the quantizer is bounded, i.e +.>. For ease of analysis, for a vehicle->With upper bound->Representing quantization error->It will be appreciated that a higher quantization level results in a larger number of transmitted bits but a smaller quantization error.

The communication model is applied between the edge server and the vehicle, and adopts orthogonal frequency division multiple access for wireless transmission. The channel bandwidth of the server is B and can be divided into W orthogonal subcarriers, each vehicle associated with the server is allocated one orthogonal subcarrier, and the influence of the vehicle mobility is converted into distance（/>And edge servers).

Calculating model, and vehicle participating in one-round federal learningTime delay of->Consists of the following four parts: local model training calculation time +.>The method comprises the steps of carrying out a first treatment on the surface of the Local parameter upload delay->The method comprises the steps of carrying out a first treatment on the surface of the Global aggregate delay->The method comprises the steps of carrying out a first treatment on the surface of the Global parameter download delay->. Since the aggregation speed of the edge server is very fast, and the downlink is transmittedThe transmission bandwidth is large enough, assuming thatAnd->Negligible, the calculation time and upload time expressions for each round of federal learning are defined as follows: calculating time: let c be a certain vehicle->Number of processing cycles for executing a batch of samples, +.>Is->Is set to the CPU frequency of the memory device. Thus, the computation time for running a round of SGD is: />。

Uploading time:the time delay of uploading the local parameters in one round is as follows: />。

Wherein, the liquid crystal display device comprises a liquid crystal display device,Sis thatTransmission bit number defined in->Is->The federal learning delay in a round is therefore defined as: />. Definitions->To achieve->The smallest number of rounds of optimal difference, i.e. +.>Wherein->Representing the optimal model parameters when +.>And when the optimal difference is met, the training process is stopped. Finally +.>Requirements of total training time of the wheel: />。

Specifically, as shown in fig. 1, the method includes:

s110, acquiring the state change of the vehicle, wherein the state change is the data change of the environment where the vehicle is located at the adjacent moment after the vehicle executes the strategy.

It should be noted that the quantization level allocation problem may be modeled as a markov decision process. Specifically, at decision time node t, each vehicle observes the current local stateAccording to policy->Taking action. Then the environmental status is defined by->Become->The vehicle gets rewards->. Next, the state of the vehicle in the period t is defined +.>Action->And rewarding->. It will be appreciated that each agent, i.e. vehicle, may determine the allocation of quantization levels by observing its own local state. Since the delay is affected by vehicle mobility and uncertain channel conditions, selecting the local state should reflect the convergence round, uncertain channel conditions and vehicle mobility. In each time step, the agent under deep reinforcement learning observes the environment and gathers the following components to construct the system state. Let S be the system state space, state at time slot t +.>The definition is as follows:。

wherein, the liquid crystal display device comprises a liquid crystal display device,is at->Time steptSINR at time, reflecting time steptUncertainty in time channel conditions; />Is a time steptTime->Distance from BS, reflecting +.>Mobility of (a); />For time step t->Is used for the quantization level of (a).

And S120, confirming the state data, the action data and the rewarding function of the vehicle based on the state change.

In order to take local observation stateQuantization levels are allocated assuming that the quantization level space is divided into +.>Thus, in time stepstWhen (I)>Is defined as the quantization level, i.e.: />。

In the present method, the first step is performed,the performance of federal learning is intended to be improved in terms of federal learning total delay and quantization error. Thus, define->At a time steptThe reward function of (a) may be defined as: />，/>The expected long-term discount return calculation of (1) may be defined as: />Wherein (1)>For the discount factor, T is the time step index upper limit. The goal is to obtainOptimal strategy->The expected long-term discount return may be maximized.

In the iterative process of the federal learning model, the method adopts the method with decreasing functionIs->Greedy action policies guide agent selection actions. The strategy allows agents to have->Probability selection exploration of (2) logical agents have +.>Is used for probability selection of the (c). Mathematically, it can be defined as: />。

And S130, decoupling the federal learning network and the prediction network based on a dual-depth Q network according to the state data, the action data and the reward function so as to define a second loss function according to a time step to determine a level distribution value.

Wherein, as shown in fig. 3, it shows the framework of the whole dual-depth Q network, further, shows the interaction between the agent and the environment under the coverage of the dual-depth Q network. The dual deep Q network framework includes a target network and a dual Q learning method as an extension of the deep Q network. The dual depth Q network framework approximates the Q function with a depth neural network, and pairs the state-actionMapping to Q value->. It will be appreciated that when state-action space +.>And when the depth is larger, the double-depth Q network framework selects the optimal action. The Q function can be weighted by the weight parameter +.>The approximation is: />。

Decoupling the federal learning network and the predictive network based on the dual deep Q network. Wherein, the dual depth Q network pairs the target networkAnd predictive network->Decoupling is performed to reduce the correlation between the predicted value and the target value. Obtaining a predictive value from a predictive network>Obtaining a target value from a target network>：/>Wherein the target network parameters->Updated every C time steps. Thus, predictive value +.>And target value->The correlation of (c) can be reduced. In supervised learning, a target value +.>And the predicted value can be updated by minimizing the loss function.

Definition according to time stepThe second loss function determines a level assignment value. Under one possibility, the target valueDependent on weight parameters->And each update fluctuates, the update process of the Q value (i.e., the predicted value) is unstable. Thus, the target network->The target value +.>. Then the time difference TD, the error can be defined as +.>Wherein the second loss function may be defined as time difference TD error: />。

And S140, updating the level distribution value based on the double Q learning network.

It should be noted that, the framework of the dual-depth Q network adopts the dual-Q learning network method in addition to the target network, forThe maximum calculation of the target value may overestimate the next state action value, and overestimate errors may result in the estimated value being far from the true optimum value. However, since the dual Q learning network technique decouples the action selection from the value function evaluation, a large amount of overestimated errors can be reduced, thus obtaining a true optimal value. Thus, target value +.>The definition can be rewritten as: />Wherein the prediction network->For action selection, target network->For value assessment.

It should be noted that, in order to more clearly show how to determine the optimal action policy using the dual-depth Q network framework, the following steps show details of the learning process based on the dual-depth Q network framework:

step 1: initializing a replay experience buffer B;

step 2: random initialization prediction of network parameters；

Step 3: by passing throughInitializing target network parameters;

step 4: for each epoode, reset simulation parameters of the VEC system model and receive initial observed states；

Step 5: for each time step interval, an action is generated；

Step 6: observe the environment and calculate the next stateAnd rewarding->；

Step 7: experience tupleStore to replay buffer B;

step 8: if the number of the tuples in the B is greater than or equal to the mini-batch numberIThen randomly extract from BIIndividual tuples according to the formulaSetting target value +.>And according to formula->Minimizing a loss function gradient descent update prediction network;

step 9: for every C time steps, byUpdating the target network parameters and returning to the step 4.

Wherein in step 1-3, the predicted network parameters are first initializedAnd target network parameters->Wherein the predicted network parameters are randomly initialized and the target network parameters are initialized to the predicted network parameters. A replay buffer B is constructed with sufficient space to buffer the transitions for each time step.

Wherein, regarding step 4, for each epoode iterative algorithm, the first epoode is to beIs reset to the position where it enters the coverage area of the BS, i.e. +.>Set to->. Then randomly initializing the quantization level +.>And channel gain->. According to->Calculating initial SINR>The state of time step 0 can be observed, i.e. +.>。

Wherein, with respect to step 5, for each step intervalAnd performing iterative execution. Based on exploration probability->One action is randomly selected from the action space A to search. Otherwise, in the learning process, according to the formula +.>A greedy action is selected.

Wherein, regarding step 6 and step 7, according to the actionsObserving the environment, calculating the next state +.>And rewarding->Get experience tuple +>And then stored in the playback buffer.

Wherein, regarding step 8, step 9, if the number of samples in the empirical buffer is greater than the small lot size, randomly selecting one sample from the empirical buffer and setting a target value according to the formulaThe predictive network is learned and network parameters are updated step by a back propagation gradient descent. The target network is updated every C time steps.

S210, evaluating experience loss of the vehicle in local learning in the federal learning process based on the first loss function.

It should be noted that vehicles participating in learning may cooperate to learn a shared model under the coordination of the edge servers, the shared model being composed of parameter vectorsAnd (d) represents the size of the model. The federal learning process aims at minimizing the first experience loss function: />In->For vehicle->Local loss function at the site for evaluation +.>Is a local learning effect of (a). Let go of vehicle>Training data set with uniform size D +.>I.e.Local loss function->Can be expressed as: />Wherein->A loss function representing a data sample of a learning task is obtained by adding a model w to the training data sample +.>Quantization of training loss on the model.

In one possible implementation, the federal learning process includes:

s211, calculating local random gradients according to the learning data set based on the random quantization model.

Still another iteration or communication round r, where all vehicles first download the current model from the server. Then, each vehicle uses the slave data set +_in a unified manner>A small sample of random selection of (a) to calculate the local random gradient +.>. Representation->The small sample set used in round r is +.>The size is +.>. It can be derived that:。

s212, generating a global federal learning model by aggregating local random gradients.

Each vehicle sends a quantized version of its local gradient to an edge server, i.eAfter the edge server receives the data, the local gradient is aggregated and the whole is updatedThe office model is as follows: />Wherein (1)>Representing the learning rate of the r-th round, the updated global model transmits a broadcast back to all vehicles participating in the learning to initialize the next round of learning.

In one possible implementation, the method further includes:

s310, calculating the convergence value of the minimum communication round based on the hypothesis deduction of the first loss function.

Deriving a minimum convergence round by analyzing convergence of quantized federal learning. For this purpose, a local loss functionThe following assumptions were made:

let 1%Smoothness): the local loss functions are all->And (3) smoothing. For all participating vehicles->，。

Let us assumeStrong convexity): the local loss function is->And (3) being strong convex. For all participating vehicles->，。

Suppose 3 (variance bounded): variance of random gradient and local loss has upper bound. For all communication roundsrAnd all participating vehicles->The method comprises the following steps: />。

Suppose 4 (unbiased): the mean of the random gradient and local losses is unbiased. For all communication roundsrAnd all participating vehiclesThe method comprises the following steps: />。

Further, the minimum convergence round may be derived, first given lemma 1:

the primer 1 is used for the preparation of the medicine, 。

wherein the method comprises the steps of，/>，/>，/>Wherein, the method comprises the steps of, wherein,is the most importantSmall global loss (I)>Is a vehicle->Is used for the minimum local loss of (1). />Is the initial point of the training process. By letting the upper bound meet the convergence constraint, one can derive: />。

It will be appreciated that based on the minimum number of convergence rounds should be an integer, we can derive the following proposition：

Proposition 1: given any one ofAnd a fixed quantization level->Realize->The minimum convergence round of the optimal difference can be calculated as follows: />。

Further, the optimization objective is to ensure that federal learning assigns quantization levels in the case of convergence to jointly optimize training time and quantization error, which can translate into the following constraints:，/>；wherein (1)>And->Is a non-negative weighting factor.

In one possible implementation, the method further includes:

s410, testing a resource allocation strategy based on a simulation experiment.

The step carries out simulation experiments on the level distribution method and compares the level distribution method with other advanced gradient quantification baselines. The above-described baselines include adaptive quantization methods (labeled herein as "adaptive-gradient quantization") and fixed quantization methods. For the fixed bit quantization method, quantization bits of 2, 6, 10 bits are used, respectively, and are referred to as "fixed (2-bit)", "fixed (6-bit)", and "fixed (10-bit)".

The aim is to validate the simulation results to demonstrate the feasibility and effectiveness of the proposed level allocation scheme described above in optimizing learning time and quantization error compared to conventional several reference schemes. The results were as follows:

specifically, as shown in fig. 4, the average rewards per epoode training are shown as the number of training iterations increases to study the convergence behavior of the proposed DDQN-based gradient quantization method (DQN-gradient quantization). As can be seen from fig. 4, as the number of epoode increases, the training rewards gradually stabilize, demonstrating the good convergence performance of the proposed algorithm.

As shown in fig. 5, the training loss for federal learning for five different quantization schemes is shown for a number of participating vehicles of 8. From the results, it can be seen that "DQN-gradient quantification" is superior to all baseline protocols in federal learning convergence rate. Specifically, the number of training rounds required to converge the "DQN-gradient quantization" to the same loss value is minimal compared to the baseline scheme. This is because, in the early stage of training, the channel condition is poor due to the long distance between the vehicle and the base station, and the transmission of the vehicle itself requires a large delay. As the training loss decreases and the channel conditions improve, the vehicle gradually increases the quantization level and converges to the target loss value with a smaller number of communication rounds. The "adaptive-gradient quantization" method can reduce the traffic at the beginning of training. However, the heuristic method predefined in "adaptive-gradient quantization" does not make a good decision to guarantee the accuracy of the gradient in the later training phase. In addition, it can be seen that the training curve "fixed (2-bit)" falls rapidly in early loss function, but as training progresses, gradient accuracy needs to be improved, and a smaller quantization level will generate a larger quantization error, so that training loss cannot be reduced to a lower value, and even cannot be converged. On the other hand, "fixed (6-bit)" and "fixed (10-bit)" use more bits to quantify the gradient during training, and their training curves steadily decrease over the training time. However, a large amount of transmission overhead is wasted in the training process, reducing the convergence speed. Thus, "fixed (6-bit)" and "fixed (10-bit)" require a greater number of convergence rounds.

As shown in fig. 6, a comparison of the test accuracy of federal learning under different quantization schemes is given. As can be seen from the figure, the accuracy of "DQN-gradient quantification" is higher than for all baseline protocols with the same number of training rounds. Experimental results demonstrate the effectiveness of gradient quantification using DDQN. Since federal learning under a "fixed (2-bit)" regimen is difficult to converge to the target loss value, subsequent experiments only conducted comparative analysis of the other three baseline regimens.

As shown in fig. 7, the average total training time of each epoode during federal learning for vehicles under different gradient quantization schemes is given for a number of participating vehicles of 8. As can be seen from the figure, the proposed scheme "DQN-gradient quantization" has the lowest average total training time compared to all baseline schemes, the proposed scheme has a 13.8% decrease in average total training time compared to the "fixed (6-bit)" scheme, the proposed scheme has a 11.4% decrease in average total training time compared to the "adaptive-gradient quantization" scheme, and the proposed scheme has a 5.5% decrease in average total training time compared to the "fixed (10-bit)" scheme.

As shown in fig. 8, the average quantization error of the vehicle during federal learning under different gradient quantization schemes is given when the number of participating vehicles is 8. It can be seen that the average quantization error of the proposed scheme is lower than the "adaptive-gradient quantization" and the "fixed (6-bit)", wherein the average quantization error of the proposed scheme is reduced by 68.9% compared to the "adaptive-gradient quantization" scheme and by 57.2% compared to the "fixed (6-bit)" scheme. It can be seen that the average quantization error of the proposed scheme is higher than "fixed (10-bits)". However, fig. 5 shows that "fixed (10-bit)" does not converge to lower losses, and therefore, the proposed scheme can minimize the average total training time of federal learning while guaranteeing the federal learning convergence effect compared to all baseline schemes.

The long-term discount rewards under different gradient quantization schemes are compared as shown in fig. 9. It can be seen that the long-term fold rewards of "DQN-gradient quantization" are always higher than other schemes. This is because "DQN-gradient quantization" can adaptively adjust quantization level allocation according to environmental changes, achieving long-term discount-rewards maximization.

As shown in fig. 10, the total training time for the four scenarios for different numbers of participating vehicles is given. It can be seen that the total training time for federal learning for the four schemes is gradually reduced as the number of participating vehicles increases, wherein the total training time for federal learning for the proposed scheme "DQN-gradient quantization" is minimal, because the more participating vehicles, the fewer the number of rounds required for federal learning convergence, and the proposed scheme dynamically adjusts the allocation of quantization levels according to the change in the environment, and the further the number of communication rounds required for federal learning convergence is reduced.

As shown in fig. 11, quantization errors for four schemes for different numbers of participating vehicles are given. It can be seen that the quantization error is not affected by the number of participating vehicles, the magnitude of the quantization error depends only on the magnitude of the assigned quantization level, the larger the quantization level, the smaller the quantization error and vice versa. The proposed scheme has a low quantization error.

As shown in fig. 12, a long-term discount rewards are presented for four scenarios for different numbers of participating vehicles. It can be seen that as the number of participating vehicles increases, the long-term discount rewards for the four schemes gradually increase, with the proposed scheme "DQN-gradient quantified" having the greatest long-term discount rewards, reflecting the effectiveness of the proposed scheme in minimizing federal learning training time.

According to a second aspect of the present application, a level distribution device for internet of vehicles communication is also presented. Fig. 13 shows a schematic block diagram of a level distribution device for internet of vehicles communication according to an embodiment of the present application. As shown in fig. 13, the apparatus may include: an acquisition module 210 for acquiring a state change of the vehicle;

a first calculation module 220 for validating the state data, the motion data, the reward function of the vehicle based on the state change;

a second calculation module 230, configured to decouple the federal learning network and the prediction network based on a dual-depth Q network according to the status data, the action data, and the reward function, so as to define a second loss function according to a time step to determine a level distribution value; the level assignment value is updated based on the dual Q learning network.

In a third aspect, a storage medium is also provided, on which program instructions are stored, which program instructions, when executed, are configured to perform a level allocation method for internet of vehicles communication as described above. The storage medium may include, for example, a storage component of a tablet computer, a hard disk of a computer, read-only memory (ROM), erasable programmable read-only memory (EPROM), portable compact disc read-only memory (CD-ROM), USB memory, or any combination of the foregoing storage media. The computer-readable storage medium may be any combination of one or more computer-readable storage media.

According to a fourth aspect of the present application, an electronic device is also provided, and fig. 14 shows a schematic block diagram of an electronic device provided by an embodiment of the present application. As shown in fig. 14, the device includes at least one processor 310, and at least one memory 320, bus 330 connected to the processor 310; wherein, the processor 310 and the memory 320 complete the communication with each other through the bus 330; the processor 310 is configured to invoke the program instructions in the memory 320 to perform a level allocation method for internet of vehicles communication as described above.

The device herein may be a server, PC, PAD, cell phone, etc.

Those skilled in the art will understand the specific details and beneficial effects of the internet of vehicles resource allocation device, the electronic device and the storage medium from reading the above description about the level allocation of internet of vehicles communication, and will not be repeated herein for brevity.

In several embodiments provided by the present application, it should be understood that the disclosed apparatus and/or device may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of elements is merely a logical functional division, and there may be additional divisions of actual implementation, e.g., multiple elements or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

1. The utility model provides a level distribution method of car networking communication which characterized in that is applied to the basic station, the basic station includes the edge server, the basic station with a plurality of vehicles in the basic station coverage constitute federal study network, federal study network adopts gradient quantization technique, the method includes:

a stochastic quantization model for local gradient quantization of the level;

the level assignment value is updated based on the dual Q learning network.

2. The method according to claim 1, wherein the method further comprises:

3. The method of claim 1, wherein the federal learning process implements the plurality of iterations according to a distributed approach using a random gradient descent algorithm, the vehicle having a learning dataset, the federal learning process comprising:

4. The method of claim 1, wherein the communication model employs a non-orthogonal multiple access technique to effect data transmission between the vehicle and the edge server, providing each vehicle orthogonal subcarrier associated with an edge server, and further comprising:

5. The method of claim 1, wherein the computing model for computing a minimum number of communication rounds of federal learning comprises:

6. The method according to claim 1, wherein the method further comprises:

and testing the level distribution method based on a simulation experiment.

7. A level distribution device for internet of vehicles communication, the device comprising:

the acquisition module is used for acquiring the state change of the vehicle;

a second calculation module, configured to decouple the federal learning network and the prediction network based on a dual-depth Q network according to the state data, the action data, and the reward function, so as to define a second loss function according to a time step to determine a level distribution value; the level assignment value is updated based on the dual Q learning network.

8. A storage medium comprising a stored program, wherein the program, when run, controls a device in which the storage medium is located to perform a level allocation method of internet of vehicles communication as claimed in any one of claims 1 to 6.

9. An electronic device comprising at least one processor, and at least one memory, bus, coupled to the processor; the processor and the memory complete communication with each other through the bus; the processor is configured to invoke program instructions in the memory to perform a method of level allocation for internet of vehicles communication as claimed in any one of claims 1-6.