CN115550944A

CN115550944A - Dynamic service placement method based on edge calculation and deep reinforcement learning in Internet of vehicles

Info

Publication number: CN115550944A
Application number: CN202210992657.5A
Authority: CN
Inventors: 李秀华; 李辉; 孙川; 徐峥辉; 郝金隆; 蔡春茂; 范琪琳; 杨正益; 文俊浩
Original assignee: Chongqing University
Current assignee: Chongqing University
Priority date: 2022-08-18
Filing date: 2022-08-18
Publication date: 2022-12-30
Anticipated expiration: 2042-08-18
Also published as: CN115550944B

Abstract

The invention discloses a dynamic service placement method based on edge calculation and deep reinforcement learning in the Internet of vehicles, which comprises the following steps: 1) Establishing a network and service request model, and acquiring network and service request related information; 2) Establishing a network and service request calculation model; 3) Constructing a state space, an action space, a strategy function and a reward function; 4) Constructing an actor network and a criticizing family network, and training the actor network and the criticizing family network; 5) The actor network generates a service placement strategy and inputs the strategy into a critic network; 6) And the criticizing family network evaluates the strategy quality of the service placement strategy, updates actor network parameters if the evaluation fails, returns to the step 5), and outputs the service placement strategy if the evaluation passes. The present invention minimizes the maximum edge resource usage and service delay while taking into account the mobility of the vehicle, changing demands, and dynamics of different types of service requests.

Description

Dynamic service placement method based on edge calculation and deep reinforcement learning in Internet of vehicles

Technical Field

The invention relates to the field of Internet of vehicles, in particular to a dynamic service placement method based on edge calculation and deep reinforcement learning in the Internet of vehicles.

Background

The internet of vehicles is an interactive network formed by information such as vehicle position, speed and route. The rapid development of communication technology has brought many new possibilities to the current field of car networking. In addition, the emergence of the fifth generation mobile communication technology enables the internet of vehicles to become more intelligent, and the service coverage range is further expanded. However, as delay-sensitive applications such as intelligent voice assistance and automatic driving become the most popular applications in the field of car networking, the traditional cloud computing paradigm is gradually unable to meet the needs of users. The European telecommunication standards institute introduces mobile edge computing into the field of car networking, expands storage resources and computing resources of cloud computing, enables the cloud computing to be closer to users, and meets requirements of the users on high reliability, low delay, safety and the like of intelligent application.

In the internet of vehicles, vehicles communicate with infrastructure to obtain services such as media downloads, collaboration messages, decentralized environment notification messages, and so on, to coordinate among applications such as remote driving, parking space discovery, navigation, and so on. In the edge computing paradigm, multiple services can be deployed on an edge server, leveraging computing and storage resources. Service placement is one of the research hotspots in the field of car networking. In particular, service placement is the mapping of services to edge servers in the internet of vehicles to meet the demand of requested services while efficiently using edge resources. From a user perspective, it is important to minimize delays in vehicle awareness services. From the service provider's perspective, maximizing edge resource usage is desirable while maintaining as much resource load balancing between servers as possible.

Disclosure of Invention

The invention aims to provide a dynamic service placement method based on edge calculation and deep reinforcement learning in the Internet of vehicles, which comprises the following steps:

1) Establishing a network and service request model, and acquiring network and service request related information;

the network and service request related information comprises edge server information, vehicle information and service information;

the edge server information comprises an edge server set E, an edge server E and residual resource capacity C of the edge server E _e ；

The vehicle information includes a set of vehicles V.

The service information comprises a service set S and a vehicle number lambda of the request service S _s One service instance at a time (e.g., media file download in an internet of vehicles environment, collaboration awareness messages, environment notification services, etc.) or the number of vehicles epsilon that can provide parallel connections, the specified time t and vehicle location loc in a service request message, the amount of resources R consumed by an edge server deployment service s _s Time delay requirement threshold D _s 。

2) Establishing a network and service request calculation model;

the network and service request calculation model comprises a total service delay calculation model and an edge resource utilization rate calculation model;

the total service delay calculation model is as follows:

in the formula (I), the compound is shown in the specification,

the total service delay;

propagation delay and queuing delay; dist (v, s) is the Euclidean distance between the vehicle v and the edge server deployed by the service s; c is the propagation speed of the signal through the communication medium;

number of vehicles lambda when requesting service s _s When the number is less than or equal to epsilon, the queuing delay

Number of vehicles lambda when requesting service s _s When is greater than epsilon, queuing delay

Satisfies the following formula:

wherein the number is different by λ' _s ＝λ _s -ε；

Propagation delay

As follows:

where dist (v, s) is the Euclidean distance between the vehicle v and the edge server deployed by the service s; and c is the propagation speed of the signal through the communication medium.

The edge resource utilization calculation model is as follows:

edge resource usage rate

Is the ratio between the resources consumed by the service instance and the available resources of the edge server, as follows:

in the formula, parameter

C _e Is the remaining resource capacity of the edge server e;

is edge resource usage; r _s The amount of resources consumed to deploy service s for the edge server.

3) Constructing a state space, an action space, a strategy function and a reward function;

the state space is characterized by a set of state spaces ω, namely:

ω＝{[v ₁ ,loc ₁ ,s],[v ₂ ,loc ₂ ,s],...,[v _n ,loc _n ,s]} _t (6)

wherein S belongs to S; v. of ₁ ,v ₂ ,...,v _n A set of vehicles; loc ₁ ,loc ₂ ,...,loc _n At time t, a set of vehicle positions for service s is requested.

The action space is used for describing actions taken when the service is placed on the edge server;

wherein the action a taken at a given time t is as follows:

in the formula, pi is a strategy function required by generating action on an observation set of omega in a time unit t;

the representation service s is deployed in an edge server e;

meaning that service s is not deployed at edge server e.

The strategy function pi is a function executed by an actor network and is used for mapping a state space to an action space, namely pi, omega → a;

the objective of the policy function pi is to minimize the maximum edge resource usage and service latency and to control the relative importance of resource usage and service latency by using the parameter beta. The policy function pi is expressed as

Wherein, beta is a weight coefficient;

the constraints of the policy function pi include mapping constraints

Time delay constraint

Resource constraints

The reward function is as follows:

in the formula (I), the compound is shown in the specification,

is an instant prize. Gamma is the reward factor.

Service delay at time t;

4) Constructing an actor network and a criticizing family network, and training the actor network and the criticizing family network;

loss function in the criticizing family network training process

As follows:

in the formula, theta is a criticizing family network parameter;

a target value for evaluating the quality of the strategy; q _i (ω, a; θ) placing the policy quality of the policy for the service;

the number of available resource units in the edge server;

5) The actor network generates a service placement strategy and inputs the strategy into the critic network;

6) And the criticizing family network evaluates the strategy quality of the service placement strategy, updates actor network parameters if the evaluation fails, returns to the step 5), and outputs the service placement strategy if the evaluation passes.

The method for evaluating the policy quality of the service placement policy by the criticizing family network comprises the following steps: judging criticizing family network loss function

And whether convergence is achieved, if the convergence is achieved, the evaluation is passed, and if the convergence is not achieved, the evaluation is not passed.

It is worth noting that the present invention proposes a three-tier car networking architecture based on edge computing, and considers the dynamic service placement problem, with the optimization goal of minimizing maximum edge resource usage (from the service provider's perspective) and service delay (from the user's perspective).

In addition, the invention provides a service placement framework based on deep reinforcement learning, which consists of a strategy function (actor network) and a value function (critic network). The actor network makes a service placement strategy, while the critic network evaluates the performance of decisions made by the actor network based on delays observed by the vehicle.

The technical effect of the invention is undoubted. The invention provides a dynamic service placement method based on edge calculation and deep reinforcement learning in an internet of vehicles, which provides a dynamic service placement framework based on deep reinforcement learning in the internet of vehicles, and aims to minimize the maximum edge resource use and service delay while considering the mobility of vehicles, the changing requirements and the dynamics of different types of service requests.

Drawings

FIG. 1 is a three-tier vehicle networking rack based on edge calculations;

FIG. 2 is an agent structure;

FIG. 3 is a flow chart of the present invention.

Detailed Description

The present invention is further illustrated by the following examples, but it should not be construed that the scope of the above-described subject matter is limited to the following examples. Various substitutions and modifications can be made without departing from the technical idea of the invention and the scope of the invention according to the common technical knowledge and the conventional means in the field.

Example 1:

referring to fig. 1 to 3, a dynamic service placement method based on edge computing and deep reinforcement learning in a vehicle networking includes the following steps:

The vehicle information includes a set of vehicles V.

The service information comprises a service set S and a vehicle number lambda of the request service S _s Number of vehicles epsilon that can handle one service instance at a time (e.g., media file download, collaboration awareness messaging and context notification services in an internet of vehicles environment, etc.) or can provide parallel connections, specified time t and vehicle location loc in a service request message, amount of resources R consumed by edge server deployment service s _s Time delay requirement threshold D _s 。

2) Establishing a network and service request calculation model;

the total service delay calculation model is as follows:

in the formula (I), the compound is shown in the specification,

the total service delay;

Number of vehicles lambda when requesting service s _s When more than epsilon, queuing delay

Satisfies the following formula:

wherein, number quantity difference λ' _s ＝λ _s -ε；

Propagation delay

As follows:

The edge resource utilization calculation model is as follows:

edge resource usage rate

in the formula, parameter

C _e Is the remaining resource capacity of the edge server e;

is edge resource usage; r _s The amount of resources consumed by deploying service s for the edge server.

the state space is characterized by a set of state spaces ω, namely:

ω＝{[v ₁ ,loc ₁ ,s],[v ₂ ,loc ₂ ,s],...,[v _n ,loc _n ,s]} _t (6)

wherein S belongs to S; v. of ₁ ,v ₂ ,...,v _n A set of vehicles; loc ₁ ,loc ₂ ,...,loc _n At t, a set of vehicle positions serving s is requested.

wherein the action a taken at a given time t is as follows:

the representation service s is deployed in an edge server e;

meaning that service s is not deployed at edge server e.

Wherein, beta is a weight coefficient;

the principle of the policy function pi is: and (4) iterating the service set and the edge server set through subscripts s and e, searching the maximum edge resource use and service delay, and minimizing the maximum edge resource use and service delay to obtain a corresponding strategy function pi.

The constraints of the policy function pi include mapping constraints

Time delay constraint

Resource constraints

The reward function is as follows:

in the formula (I), the compound is shown in the specification,

is an instant prize. Gamma is the reward factor.

Service delay at time t;

4) Constructing an actor network and a critic network, and training the actor network and the critic network;

loss function in the criticizing family network training process

As follows:

in the formula, theta is a criticizing family network parameter;

a target value for evaluating the quality of the policy; q _i (ω, a; θ) placing the policy quality of the policy for the service;

the number of available resource units in the edge server;

6) And the criticizing family network evaluates the strategy quality of the service placement strategy, updates actor network parameters if the evaluation fails, and returns to the step 5), and outputs the service placement strategy if the evaluation passes.

Example 2:

a dynamic service placement method based on edge calculation and deep reinforcement learning in the Internet of vehicles comprises the following steps:

1) And establishing a network and service request model, and acquiring edge server information, vehicle information and service information.

The server information, the vehicle information and the service information comprise an edge server set E, an edge server E and residual resource capacity C of the edge server E _e Vehicle set V and service set S, number of vehicles λ requesting service S _s One service instance at a time (e.g., media file download, collaboration awareness messaging and context notification services in an Internet of vehicles environment, etc.) or the number of vehicles ε, t, and loc of vehicle locations in a service request message, and the amount of resources R consumed by an edge server to deploy a service s _s Delay requirement threshold D _s 。

2) And establishing a calculation model.

2.1 ) total service delay modeling. The entire edge Internet of vehicles system is modeled as an M/D/1 queue. Wherein, when the service s is requested from the edge server e, the total service time delay of the vehicle

Refers to the total time from when the vehicle sends a service request to when the edge server receives a corresponding response. Total service delay

By propagation delay

And queuing delay

Consists of the following components:

if λ _s Less than or equal to epsilon, delay in queuing

Is 0. If λ _s > epsilon, a queue is created and the average queuing delay for service s on the edge server will be as follows:

wherein, λ' _s ＝λ _s ε, the average propagation delay is calculated as the ratio of the distance to the propagation velocity over the medium, as follows:

where dist (v, s) is the euclidean distance between the vehicle v and the edge server deployed by the service s, and c is the propagation speed of the signal through the communication medium. Thus, the total service latency is as follows:

2.2 Edge resource usage modeling. Edge resource usage rate

wherein, the first and the second end of the pipe are connected with each other,

3) And designing a state space. At a given time t, the state space set describes the network environment. The agent observes the environment to form a set of state spaces ω from the service request model, as follows:

ω＝{[v ₁ ,loc ₁ ,s],[v ₂ ,loc ₂ ,s],...,[v _n ,loc _n ,s]} _t 。 (6)

wherein S ∈ S, v ₁ ,v ₂ ,...,v _n As a set of vehicles, loc ₁ ,loc ₂ ,...,loc _n At time t, a set of vehicle positions for service s is requested.

4) Designed motion space. The action space describes the actions taken by the policy module when placing a service on an edge server, the actions taken at a given time t are as follows:

where π is the policy function required to generate an action on the observed set of ω in time unit tBinary variable

A matrix is given indicating the location of the service s on the edge server e,

the representation service s is deployed at the edge server e. On the contrary, the method can be used for carrying out the following steps,

meaning that service s is not deployed at edge server e.

5) And designing a strategy function. The policy function pi is a function performed by the actor network to map the state space to the action space pi:ω → a. The objective of the policy function pi is to minimize the maximum edge resource usage and service latency and to control the relative importance of resource usage and service latency by using the parameter beta. The policy function pi is expressed as

The policy function is also subject to the mapping constraints,

the time delay is constrained in a manner that,

and the constraints of the resources are also included,

6) A reward function is designed. At each time unit t, the system receives an immediate reward from the environment in response to an action taken by the agent's actor network

As follows:

7) And constructing a critics network, and evaluating the quality Q (omega, a) of the decision made by the actor network. Inputting the state, the action and the reward to train the criticizing network, and updating the parameter theta of the criticizing network to minimize the loss function

As follows:

wherein, y _t Is the target value. A replay memory M is further used for storing the experience of training the critic's network. The critic network acquires experience after a random period of time in the usage replay and optimizes network parameters for better performance.

8) After the training convergence of the actor network and the critic network in the steps, the actor network can find the optimal placement strategy of the service while considering the mobility and the dynamics of vehicles in different types of service requests. The criticizing family network can evaluate the strategy quality of the actor network through a value function.

Example 3:

1) And establishing the network and service request model and acquiring the related information of the network and service request.

2) Establishing a network and service request calculation model;

Example 4:

a dynamic service placement method based on edge calculation and deep reinforcement learning in the Internet of vehicles is disclosed in the embodiment 3, wherein the network and service request related information comprises edge server information, vehicle information and service information;

The vehicle information includes a set of vehicles V.

The service information comprises a service set S and the number lambda of vehicles requesting the service S _s The number of vehicles epsilon that can handle one service instance at a time or can provide parallel connections, the specified time t and vehicle location loc in the service request message, the amount of resources R consumed by the edge server to deploy the service s _s Time delay requirement threshold D _s (ii) a The service instances include media file downloads, collaboration aware messaging, and environment notification services in an internet of vehicles environment.

Example 5:

a dynamic service placement method based on edge calculation and deep reinforcement learning in the Internet of vehicles is disclosed in the embodiment 3, wherein the network and service request calculation model comprises a total service delay calculation model and an edge resource utilization calculation model;

the total service delay calculation model is as follows:

in the formula (I), the compound is shown in the specification,

the total service delay;

Satisfies the following formula:

wherein the number is different by λ' _s ＝λ _s -ε；

Propagation delay

As follows:

where dist (v, s) is the Euclidean distance between vehicle v and the edge server deployed by service s; and c is the propagation speed of the signal through the communication medium.

The edge resource utilization calculation model is as follows:

edge resource usage rate

in the formula, parameter

C _e Is the remaining resource capacity of the edge server e;

is edge resource usage; r is _s The amount of resources consumed to deploy service s for the edge server.

Example 6:

a dynamic service placement method based on edge calculation and deep reinforcement learning in the Internet of vehicles is disclosed in the embodiment 3, wherein the state space is characterized by a state space set omega, namely:

ω＝{[v ₁ ,loc ₁ ,s],[v ₂ ,loc ₂ ,s],...,[v _n ,loc _n ,s]} _t (6)

Example 7:

a dynamic service placement method based on edge calculation and deep reinforcement learning in the Internet of vehicles is disclosed in the embodiment 3, wherein the action space is used for describing actions taken when a service is placed on an edge server;

wherein the action a taken at a given time t is as follows:

where π is the policy function required to generate an action on the observed set of ω at time unit t;

the representation service s is deployed in an edge server e;

meaning that service s is not deployed at edge server e.

Example 8:

a dynamic service placement method based on edge calculation and deep reinforcement learning in the Internet of vehicles is disclosed in the embodiment 3, wherein the strategy function pi is a function executed by an actor network and is used for mapping a state space to an action space, i.e. pi: omega → a;

the objective of the policy function pi is to minimize the maximum edge resource usage and service latency and to control the relative importance of resource usage and service latency by using the parameter β;

the policy function pi is expressed as follows:

in the formula, β is a weight coefficient.

The constraints of the policy function pi include mapping constraints

Time delay constraint

Resource constraints

Example 9:

a dynamic service placement method based on edge calculation and deep reinforcement learning in the Internet of vehicles is disclosed in the embodiment 3, wherein the reward function is as follows:

in the formula (I), the compound is shown in the specification,

is an instant prize. Gamma is the reward factor.

Service time delay at the moment t;

example 10:

a dynamic service placement method based on edge calculation and deep reinforcement learning in the Internet of vehicles is disclosed in an embodiment 3, wherein a loss function in the criticizing family network training process

As follows:

in the formula, theta is a criticizing family network parameter;

the number of available resource units in the edge server;

example 11:

a dynamic service placement method based on edge computing and deep reinforcement learning in a vehicle networking system is disclosed in an embodiment 3, wherein the method for evaluating the policy quality of a service placement policy by a criticizing network comprises the following steps: judging criticizing family network loss function

Claims

1. A dynamic service placement method based on edge calculation and deep reinforcement learning in Internet of vehicles is characterized by comprising the following steps:

2) Establishing a network and service request calculation model;

5) The actor network generates a service placement strategy and inputs the strategy into a critic network;

2. The dynamic service placement method based on edge computing and deep reinforcement learning in the internet of vehicles according to claim 1, wherein the network and service request related information comprises edge server information, vehicle information and service information;

The vehicle information includes a set of vehicles V.

The service information comprises a service set S and a vehicle number lambda of the request service S _s The number of vehicles epsilon that can handle one service instance at a time or can provide parallel connections, the specified time t and vehicle location loc in the service request message, the amount of resources R consumed by the edge server deployment service s _s Delay requirement threshold D _s (ii) a The service instances include media file downloads, collaboration aware messaging, and environment notification services in an internet of vehicles environment.

3. The dynamic service placement method based on edge computing and deep reinforcement learning in the Internet of vehicles according to claim 1, wherein the network and service request computing model comprises a total service delay computing model and an edge resource utilization computing model;

the total service delay calculation model is as follows:

in the formula (I), the compound is shown in the specification,

the total service delay;

Satisfies the following formula:

wherein the number is different by λ' _s ＝λ _s -ε；

Propagation delay

As follows:

The edge resource utilization calculation model is as follows:

edge resource usage rate

in the formula, parameters

C _e Is the remaining resource capacity of the edge server e;

4. The dynamic service placement method based on edge computation and deep reinforcement learning in the internet of vehicles according to claim 1, wherein the state space is characterized by a state space set ω, namely:

ω＝{[v ₁ ,loc ₁ ,s],[v ₂ ,loc ₂ ,s],...,[v _n ,loc _n ,s]} _t (6)

wherein S belongs to S; v. of ₁ ,v ₂ ,...,v _n A set of vehicles; loc C ₁ ,loc ₂ ,...,loc _n At time t, a set of vehicle positions for service s is requested.

5. The dynamic service placement method based on edge computing and deep reinforcement learning in the internet of vehicles according to claim 1, wherein the action space is used for describing actions taken when placing services on an edge server;

wherein the action a taken at a given time t is as follows:

the representation service s is deployed in an edge server e;

meaning that service s is not deployed at edge server e.

6. The dynamic service placement method based on edge computation and deep reinforcement learning in the internet of vehicles according to claim 1, wherein the policy function pi is a function executed by an actor network for mapping a state space to an action space, i.e. pi: ω → a;

the policy function pi is expressed as follows:

in the formula, β is a weight coefficient.

The constraints of the policy function pi include mapping constraints

Time delay constraint

Resource constraints

7. The dynamic service placement method based on edge computing and deep reinforcement learning in the internet of vehicles according to claim 1, wherein the reward function is as follows:

in the formula (I), the compound is shown in the specification,

is an instant prize. Gamma is the reward factor.

Is the service delay at time t.

8. The dynamic service placement method based on edge computing and deep reinforcement learning in the internet of vehicles according to claim 1, wherein the loss function in the critics network training process

As follows:

in the formula, theta is a criticizing family network parameter;

to useTo evaluate a target value of the quality of the strategy; q _i (ω, a; θ) placing the policy quality of the policy for the service;

is the number of available resource units in the edge server.

9. The dynamic service placement method based on edge computing and deep reinforcement learning in the internet of vehicles according to claim 8, wherein the method for evaluating the policy quality of the service placement policy by the criticizing family network comprises the following steps: judging criticizing family network loss function