CN115550944A - Dynamic service placement method based on edge calculation and deep reinforcement learning in Internet of vehicles - Google Patents

Dynamic service placement method based on edge calculation and deep reinforcement learning in Internet of vehicles Download PDF

Info

Publication number
CN115550944A
CN115550944A CN202210992657.5A CN202210992657A CN115550944A CN 115550944 A CN115550944 A CN 115550944A CN 202210992657 A CN202210992657 A CN 202210992657A CN 115550944 A CN115550944 A CN 115550944A
Authority
CN
China
Prior art keywords
service
network
edge
vehicles
edge server
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210992657.5A
Other languages
Chinese (zh)
Other versions
CN115550944B (en
Inventor
李秀华
李辉
孙川
徐峥辉
郝金隆
蔡春茂
范琪琳
杨正益
文俊浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University
Original Assignee
Chongqing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University filed Critical Chongqing University
Priority to CN202210992657.5A priority Critical patent/CN115550944B/en
Publication of CN115550944A publication Critical patent/CN115550944A/en
Application granted granted Critical
Publication of CN115550944B publication Critical patent/CN115550944B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W16/00Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
    • H04W16/18Network planning tools
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5072Grid computing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16YINFORMATION AND COMMUNICATION TECHNOLOGY SPECIALLY ADAPTED FOR THE INTERNET OF THINGS [IoT]
    • G16Y20/00Information sensed or collected by the things
    • G16Y20/10Information sensed or collected by the things relating to the environment, e.g. temperature; relating to location
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16YINFORMATION AND COMMUNICATION TECHNOLOGY SPECIALLY ADAPTED FOR THE INTERNET OF THINGS [IoT]
    • G16Y20/00Information sensed or collected by the things
    • G16Y20/30Information sensed or collected by the things relating to resources, e.g. consumed power
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16YINFORMATION AND COMMUNICATION TECHNOLOGY SPECIALLY ADAPTED FOR THE INTERNET OF THINGS [IoT]
    • G16Y30/00IoT infrastructure
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W16/00Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
    • H04W16/22Traffic simulation tools or models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/502Proximity

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • Environmental & Geological Engineering (AREA)
  • Toxicology (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a dynamic service placement method based on edge calculation and deep reinforcement learning in the Internet of vehicles, which comprises the following steps: 1) Establishing a network and service request model, and acquiring network and service request related information; 2) Establishing a network and service request calculation model; 3) Constructing a state space, an action space, a strategy function and a reward function; 4) Constructing an actor network and a criticizing family network, and training the actor network and the criticizing family network; 5) The actor network generates a service placement strategy and inputs the strategy into a critic network; 6) And the criticizing family network evaluates the strategy quality of the service placement strategy, updates actor network parameters if the evaluation fails, returns to the step 5), and outputs the service placement strategy if the evaluation passes. The present invention minimizes the maximum edge resource usage and service delay while taking into account the mobility of the vehicle, changing demands, and dynamics of different types of service requests.

Description

Dynamic service placement method based on edge calculation and deep reinforcement learning in Internet of vehicles
Technical Field
The invention relates to the field of Internet of vehicles, in particular to a dynamic service placement method based on edge calculation and deep reinforcement learning in the Internet of vehicles.
Background
The internet of vehicles is an interactive network formed by information such as vehicle position, speed and route. The rapid development of communication technology has brought many new possibilities to the current field of car networking. In addition, the emergence of the fifth generation mobile communication technology enables the internet of vehicles to become more intelligent, and the service coverage range is further expanded. However, as delay-sensitive applications such as intelligent voice assistance and automatic driving become the most popular applications in the field of car networking, the traditional cloud computing paradigm is gradually unable to meet the needs of users. The European telecommunication standards institute introduces mobile edge computing into the field of car networking, expands storage resources and computing resources of cloud computing, enables the cloud computing to be closer to users, and meets requirements of the users on high reliability, low delay, safety and the like of intelligent application.
In the internet of vehicles, vehicles communicate with infrastructure to obtain services such as media downloads, collaboration messages, decentralized environment notification messages, and so on, to coordinate among applications such as remote driving, parking space discovery, navigation, and so on. In the edge computing paradigm, multiple services can be deployed on an edge server, leveraging computing and storage resources. Service placement is one of the research hotspots in the field of car networking. In particular, service placement is the mapping of services to edge servers in the internet of vehicles to meet the demand of requested services while efficiently using edge resources. From a user perspective, it is important to minimize delays in vehicle awareness services. From the service provider's perspective, maximizing edge resource usage is desirable while maintaining as much resource load balancing between servers as possible.
Disclosure of Invention
The invention aims to provide a dynamic service placement method based on edge calculation and deep reinforcement learning in the Internet of vehicles, which comprises the following steps:
1) Establishing a network and service request model, and acquiring network and service request related information;
the network and service request related information comprises edge server information, vehicle information and service information;
the edge server information comprises an edge server set E, an edge server E and residual resource capacity C of the edge server E e
The vehicle information includes a set of vehicles V.
The service information comprises a service set S and a vehicle number lambda of the request service S s One service instance at a time (e.g., media file download in an internet of vehicles environment, collaboration awareness messages, environment notification services, etc.) or the number of vehicles epsilon that can provide parallel connections, the specified time t and vehicle location loc in a service request message, the amount of resources R consumed by an edge server deployment service s s Time delay requirement threshold D s
2) Establishing a network and service request calculation model;
the network and service request calculation model comprises a total service delay calculation model and an edge resource utilization rate calculation model;
the total service delay calculation model is as follows:
Figure BDA0003804378200000021
in the formula (I), the compound is shown in the specification,
Figure BDA0003804378200000022
the total service delay;
Figure BDA0003804378200000023
propagation delay and queuing delay; dist (v, s) is the Euclidean distance between the vehicle v and the edge server deployed by the service s; c is the propagation speed of the signal through the communication medium;
number of vehicles lambda when requesting service s s When the number is less than or equal to epsilon, the queuing delay
Figure BDA0003804378200000024
Number of vehicles lambda when requesting service s s When is greater than epsilon, queuing delay
Figure BDA0003804378200000025
Satisfies the following formula:
Figure BDA0003804378200000026
wherein the number is different by λ' s =λ s -ε;
Propagation delay
Figure BDA0003804378200000027
As follows:
Figure BDA0003804378200000028
where dist (v, s) is the Euclidean distance between the vehicle v and the edge server deployed by the service s; and c is the propagation speed of the signal through the communication medium.
The edge resource utilization calculation model is as follows:
edge resource usage rate
Figure BDA0003804378200000029
Is the ratio between the resources consumed by the service instance and the available resources of the edge server, as follows:
Figure BDA00038043782000000210
in the formula, parameter
Figure BDA00038043782000000211
C e Is the remaining resource capacity of the edge server e;
Figure BDA00038043782000000212
is edge resource usage; r s The amount of resources consumed to deploy service s for the edge server.
3) Constructing a state space, an action space, a strategy function and a reward function;
the state space is characterized by a set of state spaces ω, namely:
ω={[v 1 ,loc 1 ,s],[v 2 ,loc 2 ,s],...,[v n ,loc n ,s]} t (6)
wherein S belongs to S; v. of 1 ,v 2 ,...,v n A set of vehicles; loc 1 ,loc 2 ,...,loc n At time t, a set of vehicle positions for service s is requested.
The action space is used for describing actions taken when the service is placed on the edge server;
wherein the action a taken at a given time t is as follows:
Figure BDA0003804378200000031
in the formula, pi is a strategy function required by generating action on an observation set of omega in a time unit t;
Figure BDA0003804378200000032
the representation service s is deployed in an edge server e;
Figure BDA0003804378200000033
meaning that service s is not deployed at edge server e.
The strategy function pi is a function executed by an actor network and is used for mapping a state space to an action space, namely pi, omega → a;
the objective of the policy function pi is to minimize the maximum edge resource usage and service latency and to control the relative importance of resource usage and service latency by using the parameter beta. The policy function pi is expressed as
Figure BDA0003804378200000034
Wherein, beta is a weight coefficient;
the constraints of the policy function pi include mapping constraints
Figure BDA0003804378200000035
Time delay constraint
Figure BDA0003804378200000036
Resource constraints
Figure BDA0003804378200000037
The reward function is as follows:
Figure BDA0003804378200000038
in the formula (I), the compound is shown in the specification,
Figure BDA0003804378200000039
is an instant prize. Gamma is the reward factor.
Figure BDA00038043782000000310
Service delay at time t;
4) Constructing an actor network and a criticizing family network, and training the actor network and the criticizing family network;
loss function in the criticizing family network training process
Figure BDA00038043782000000311
As follows:
Figure BDA00038043782000000312
in the formula, theta is a criticizing family network parameter;
Figure BDA00038043782000000313
a target value for evaluating the quality of the strategy; q i (ω, a; θ) placing the policy quality of the policy for the service;
Figure BDA00038043782000000314
the number of available resource units in the edge server;
5) The actor network generates a service placement strategy and inputs the strategy into the critic network;
6) And the criticizing family network evaluates the strategy quality of the service placement strategy, updates actor network parameters if the evaluation fails, returns to the step 5), and outputs the service placement strategy if the evaluation passes.
The method for evaluating the policy quality of the service placement policy by the criticizing family network comprises the following steps: judging criticizing family network loss function
Figure BDA0003804378200000041
And whether convergence is achieved, if the convergence is achieved, the evaluation is passed, and if the convergence is not achieved, the evaluation is not passed.
It is worth noting that the present invention proposes a three-tier car networking architecture based on edge computing, and considers the dynamic service placement problem, with the optimization goal of minimizing maximum edge resource usage (from the service provider's perspective) and service delay (from the user's perspective).
In addition, the invention provides a service placement framework based on deep reinforcement learning, which consists of a strategy function (actor network) and a value function (critic network). The actor network makes a service placement strategy, while the critic network evaluates the performance of decisions made by the actor network based on delays observed by the vehicle.
The technical effect of the invention is undoubted. The invention provides a dynamic service placement method based on edge calculation and deep reinforcement learning in an internet of vehicles, which provides a dynamic service placement framework based on deep reinforcement learning in the internet of vehicles, and aims to minimize the maximum edge resource use and service delay while considering the mobility of vehicles, the changing requirements and the dynamics of different types of service requests.
Drawings
FIG. 1 is a three-tier vehicle networking rack based on edge calculations;
FIG. 2 is an agent structure;
FIG. 3 is a flow chart of the present invention.
Detailed Description
The present invention is further illustrated by the following examples, but it should not be construed that the scope of the above-described subject matter is limited to the following examples. Various substitutions and modifications can be made without departing from the technical idea of the invention and the scope of the invention according to the common technical knowledge and the conventional means in the field.
Example 1:
referring to fig. 1 to 3, a dynamic service placement method based on edge computing and deep reinforcement learning in a vehicle networking includes the following steps:
1) Establishing a network and service request model, and acquiring network and service request related information;
the network and service request related information comprises edge server information, vehicle information and service information;
the edge server information comprises an edge server set E, an edge server E and residual resource capacity C of the edge server E e
The vehicle information includes a set of vehicles V.
The service information comprises a service set S and a vehicle number lambda of the request service S s Number of vehicles epsilon that can handle one service instance at a time (e.g., media file download, collaboration awareness messaging and context notification services in an internet of vehicles environment, etc.) or can provide parallel connections, specified time t and vehicle location loc in a service request message, amount of resources R consumed by edge server deployment service s s Time delay requirement threshold D s
2) Establishing a network and service request calculation model;
the network and service request calculation model comprises a total service delay calculation model and an edge resource utilization rate calculation model;
the total service delay calculation model is as follows:
Figure BDA0003804378200000051
in the formula (I), the compound is shown in the specification,
Figure BDA0003804378200000052
the total service delay;
Figure BDA0003804378200000053
propagation delay and queuing delay; dist (v, s) is the Euclidean distance between the vehicle v and the edge server deployed by the service s; c is the propagation speed of the signal through the communication medium;
number of vehicles lambda when requesting service s s When the number is less than or equal to epsilon, the queuing delay
Figure BDA0003804378200000054
Number of vehicles lambda when requesting service s s When more than epsilon, queuing delay
Figure BDA0003804378200000055
Satisfies the following formula:
Figure BDA0003804378200000056
wherein, number quantity difference λ' s =λ s -ε;
Propagation delay
Figure BDA0003804378200000057
As follows:
Figure BDA0003804378200000058
where dist (v, s) is the Euclidean distance between the vehicle v and the edge server deployed by the service s; and c is the propagation speed of the signal through the communication medium.
The edge resource utilization calculation model is as follows:
edge resource usage rate
Figure BDA0003804378200000059
Is the ratio between the resources consumed by the service instance and the available resources of the edge server, as follows:
Figure BDA00038043782000000510
in the formula, parameter
Figure BDA00038043782000000511
C e Is the remaining resource capacity of the edge server e;
Figure BDA00038043782000000512
is edge resource usage; r s The amount of resources consumed by deploying service s for the edge server.
3) Constructing a state space, an action space, a strategy function and a reward function;
the state space is characterized by a set of state spaces ω, namely:
ω={[v 1 ,loc 1 ,s],[v 2 ,loc 2 ,s],...,[v n ,loc n ,s]} t (6)
wherein S belongs to S; v. of 1 ,v 2 ,...,v n A set of vehicles; loc 1 ,loc 2 ,...,loc n At t, a set of vehicle positions serving s is requested.
The action space is used for describing actions taken when the service is placed on the edge server;
wherein the action a taken at a given time t is as follows:
Figure BDA0003804378200000061
in the formula, pi is a strategy function required by generating action on an observation set of omega in a time unit t;
Figure BDA0003804378200000062
the representation service s is deployed in an edge server e;
Figure BDA0003804378200000063
meaning that service s is not deployed at edge server e.
The strategy function pi is a function executed by an actor network and is used for mapping a state space to an action space, namely pi, omega → a;
the objective of the policy function pi is to minimize the maximum edge resource usage and service latency and to control the relative importance of resource usage and service latency by using the parameter beta. The policy function pi is expressed as
Figure BDA0003804378200000064
Wherein, beta is a weight coefficient;
the principle of the policy function pi is: and (4) iterating the service set and the edge server set through subscripts s and e, searching the maximum edge resource use and service delay, and minimizing the maximum edge resource use and service delay to obtain a corresponding strategy function pi.
The constraints of the policy function pi include mapping constraints
Figure BDA0003804378200000065
Time delay constraint
Figure BDA0003804378200000066
Resource constraints
Figure BDA0003804378200000067
The reward function is as follows:
Figure BDA0003804378200000068
in the formula (I), the compound is shown in the specification,
Figure BDA0003804378200000069
is an instant prize. Gamma is the reward factor.
Figure BDA00038043782000000610
Service delay at time t;
4) Constructing an actor network and a critic network, and training the actor network and the critic network;
loss function in the criticizing family network training process
Figure BDA00038043782000000611
As follows:
Figure BDA0003804378200000071
in the formula, theta is a criticizing family network parameter;
Figure BDA0003804378200000072
a target value for evaluating the quality of the policy; q i (ω, a; θ) placing the policy quality of the policy for the service;
Figure BDA0003804378200000073
the number of available resource units in the edge server;
5) The actor network generates a service placement strategy and inputs the strategy into the critic network;
6) And the criticizing family network evaluates the strategy quality of the service placement strategy, updates actor network parameters if the evaluation fails, and returns to the step 5), and outputs the service placement strategy if the evaluation passes.
The method for evaluating the policy quality of the service placement policy by the criticizing family network comprises the following steps: judging criticizing family network loss function
Figure BDA0003804378200000074
And whether convergence is achieved, if the convergence is achieved, the evaluation is passed, and if the convergence is not achieved, the evaluation is not passed.
Example 2:
a dynamic service placement method based on edge calculation and deep reinforcement learning in the Internet of vehicles comprises the following steps:
1) And establishing a network and service request model, and acquiring edge server information, vehicle information and service information.
The server information, the vehicle information and the service information comprise an edge server set E, an edge server E and residual resource capacity C of the edge server E e Vehicle set V and service set S, number of vehicles λ requesting service S s One service instance at a time (e.g., media file download, collaboration awareness messaging and context notification services in an Internet of vehicles environment, etc.) or the number of vehicles ε, t, and loc of vehicle locations in a service request message, and the amount of resources R consumed by an edge server to deploy a service s s Delay requirement threshold D s
2) And establishing a calculation model.
2.1 ) total service delay modeling. The entire edge Internet of vehicles system is modeled as an M/D/1 queue. Wherein, when the service s is requested from the edge server e, the total service time delay of the vehicle
Figure BDA00038043782000000710
Refers to the total time from when the vehicle sends a service request to when the edge server receives a corresponding response. Total service delay
Figure BDA0003804378200000075
By propagation delay
Figure BDA0003804378200000076
And queuing delay
Figure BDA0003804378200000077
Consists of the following components:
Figure BDA0003804378200000078
if λ s Less than or equal to epsilon, delay in queuing
Figure BDA0003804378200000079
Is 0. If λ s > epsilon, a queue is created and the average queuing delay for service s on the edge server will be as follows:
Figure BDA0003804378200000081
wherein, λ' s =λ s ε, the average propagation delay is calculated as the ratio of the distance to the propagation velocity over the medium, as follows:
Figure BDA0003804378200000082
where dist (v, s) is the euclidean distance between the vehicle v and the edge server deployed by the service s, and c is the propagation speed of the signal through the communication medium. Thus, the total service latency is as follows:
Figure BDA0003804378200000083
2.2 Edge resource usage modeling. Edge resource usage rate
Figure BDA0003804378200000084
Is the ratio between the resources consumed by the service instance and the available resources of the edge server, as follows:
Figure BDA0003804378200000085
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003804378200000086
3) And designing a state space. At a given time t, the state space set describes the network environment. The agent observes the environment to form a set of state spaces ω from the service request model, as follows:
ω={[v 1 ,loc 1 ,s],[v 2 ,loc 2 ,s],...,[v n ,loc n ,s]} t 。 (6)
wherein S ∈ S, v 1 ,v 2 ,...,v n As a set of vehicles, loc 1 ,loc 2 ,...,loc n At time t, a set of vehicle positions for service s is requested.
4) Designed motion space. The action space describes the actions taken by the policy module when placing a service on an edge server, the actions taken at a given time t are as follows:
Figure BDA0003804378200000087
where π is the policy function required to generate an action on the observed set of ω in time unit tBinary variable
Figure BDA0003804378200000088
A matrix is given indicating the location of the service s on the edge server e,
Figure BDA0003804378200000089
the representation service s is deployed at the edge server e. On the contrary, the method can be used for carrying out the following steps,
Figure BDA00038043782000000810
meaning that service s is not deployed at edge server e.
5) And designing a strategy function. The policy function pi is a function performed by the actor network to map the state space to the action space pi:ω → a. The objective of the policy function pi is to minimize the maximum edge resource usage and service latency and to control the relative importance of resource usage and service latency by using the parameter beta. The policy function pi is expressed as
Figure BDA0003804378200000091
The policy function is also subject to the mapping constraints,
Figure BDA0003804378200000092
the time delay is constrained in a manner that,
Figure BDA0003804378200000093
and the constraints of the resources are also included,
Figure BDA0003804378200000094
6) A reward function is designed. At each time unit t, the system receives an immediate reward from the environment in response to an action taken by the agent's actor network
Figure BDA0003804378200000098
As follows:
Figure BDA0003804378200000095
7) And constructing a critics network, and evaluating the quality Q (omega, a) of the decision made by the actor network. Inputting the state, the action and the reward to train the criticizing network, and updating the parameter theta of the criticizing network to minimize the loss function
Figure BDA0003804378200000096
As follows:
Figure BDA0003804378200000097
wherein, y t Is the target value. A replay memory M is further used for storing the experience of training the critic's network. The critic network acquires experience after a random period of time in the usage replay and optimizes network parameters for better performance.
8) After the training convergence of the actor network and the critic network in the steps, the actor network can find the optimal placement strategy of the service while considering the mobility and the dynamics of vehicles in different types of service requests. The criticizing family network can evaluate the strategy quality of the actor network through a value function.
Example 3:
a dynamic service placement method based on edge calculation and deep reinforcement learning in the Internet of vehicles comprises the following steps:
1) And establishing the network and service request model and acquiring the related information of the network and service request.
2) Establishing a network and service request calculation model;
3) Constructing a state space, an action space, a strategy function and a reward function;
4) Constructing an actor network and a criticizing family network, and training the actor network and the criticizing family network;
5) The actor network generates a service placement strategy and inputs the strategy into the critic network;
6) And the criticizing family network evaluates the strategy quality of the service placement strategy, updates actor network parameters if the evaluation fails, returns to the step 5), and outputs the service placement strategy if the evaluation passes.
Example 4:
a dynamic service placement method based on edge calculation and deep reinforcement learning in the Internet of vehicles is disclosed in the embodiment 3, wherein the network and service request related information comprises edge server information, vehicle information and service information;
the edge server information comprises an edge server set E, an edge server E and residual resource capacity C of the edge server E e
The vehicle information includes a set of vehicles V.
The service information comprises a service set S and the number lambda of vehicles requesting the service S s The number of vehicles epsilon that can handle one service instance at a time or can provide parallel connections, the specified time t and vehicle location loc in the service request message, the amount of resources R consumed by the edge server to deploy the service s s Time delay requirement threshold D s (ii) a The service instances include media file downloads, collaboration aware messaging, and environment notification services in an internet of vehicles environment.
Example 5:
a dynamic service placement method based on edge calculation and deep reinforcement learning in the Internet of vehicles is disclosed in the embodiment 3, wherein the network and service request calculation model comprises a total service delay calculation model and an edge resource utilization calculation model;
the total service delay calculation model is as follows:
Figure BDA0003804378200000101
in the formula (I), the compound is shown in the specification,
Figure BDA0003804378200000102
the total service delay;
Figure BDA0003804378200000103
propagation delay and queuing delay; dist (v, s) is the Euclidean distance between the vehicle v and the edge server deployed by the service s; c is the propagation speed of the signal through the communication medium;
number of vehicles lambda when requesting service s s When the number is less than or equal to epsilon, the queuing delay
Figure BDA0003804378200000104
Number of vehicles lambda when requesting service s s When is greater than epsilon, queuing delay
Figure BDA0003804378200000105
Satisfies the following formula:
Figure BDA0003804378200000106
wherein the number is different by λ' s =λ s -ε;
Propagation delay
Figure BDA0003804378200000107
As follows:
Figure BDA0003804378200000108
where dist (v, s) is the Euclidean distance between vehicle v and the edge server deployed by service s; and c is the propagation speed of the signal through the communication medium.
The edge resource utilization calculation model is as follows:
edge resource usage rate
Figure BDA0003804378200000111
Is the ratio between the resources consumed by the service instance and the available resources of the edge server, as follows:
Figure BDA0003804378200000112
in the formula, parameter
Figure BDA0003804378200000113
C e Is the remaining resource capacity of the edge server e;
Figure BDA0003804378200000114
is edge resource usage; r is s The amount of resources consumed to deploy service s for the edge server.
Example 6:
a dynamic service placement method based on edge calculation and deep reinforcement learning in the Internet of vehicles is disclosed in the embodiment 3, wherein the state space is characterized by a state space set omega, namely:
ω={[v 1 ,loc 1 ,s],[v 2 ,loc 2 ,s],...,[v n ,loc n ,s]} t (6)
wherein S belongs to S; v. of 1 ,v 2 ,...,v n A set of vehicles; loc 1 ,loc 2 ,...,loc n At t, a set of vehicle positions serving s is requested.
Example 7:
a dynamic service placement method based on edge calculation and deep reinforcement learning in the Internet of vehicles is disclosed in the embodiment 3, wherein the action space is used for describing actions taken when a service is placed on an edge server;
wherein the action a taken at a given time t is as follows:
Figure BDA0003804378200000115
where π is the policy function required to generate an action on the observed set of ω at time unit t;
Figure BDA0003804378200000116
the representation service s is deployed in an edge server e;
Figure BDA0003804378200000117
meaning that service s is not deployed at edge server e.
Example 8:
a dynamic service placement method based on edge calculation and deep reinforcement learning in the Internet of vehicles is disclosed in the embodiment 3, wherein the strategy function pi is a function executed by an actor network and is used for mapping a state space to an action space, i.e. pi: omega → a;
the objective of the policy function pi is to minimize the maximum edge resource usage and service latency and to control the relative importance of resource usage and service latency by using the parameter β;
the policy function pi is expressed as follows:
Figure BDA0003804378200000118
in the formula, β is a weight coefficient.
The constraints of the policy function pi include mapping constraints
Figure BDA0003804378200000119
Time delay constraint
Figure BDA0003804378200000121
Resource constraints
Figure BDA0003804378200000122
Example 9:
a dynamic service placement method based on edge calculation and deep reinforcement learning in the Internet of vehicles is disclosed in the embodiment 3, wherein the reward function is as follows:
Figure BDA0003804378200000123
in the formula (I), the compound is shown in the specification,
Figure BDA0003804378200000124
is an instant prize. Gamma is the reward factor.
Figure BDA0003804378200000125
Service time delay at the moment t;
example 10:
a dynamic service placement method based on edge calculation and deep reinforcement learning in the Internet of vehicles is disclosed in an embodiment 3, wherein a loss function in the criticizing family network training process
Figure BDA0003804378200000126
As follows:
Figure BDA0003804378200000127
in the formula, theta is a criticizing family network parameter;
Figure BDA0003804378200000128
a target value for evaluating the quality of the strategy; q i (ω, a; θ) placing the policy quality of the policy for the service;
Figure BDA0003804378200000129
the number of available resource units in the edge server;
example 11:
a dynamic service placement method based on edge computing and deep reinforcement learning in a vehicle networking system is disclosed in an embodiment 3, wherein the method for evaluating the policy quality of a service placement policy by a criticizing network comprises the following steps: judging criticizing family network loss function
Figure BDA00038043782000001210
And whether convergence is achieved, if the convergence is achieved, the evaluation is passed, and if the convergence is not achieved, the evaluation is not passed.

Claims (9)

1. A dynamic service placement method based on edge calculation and deep reinforcement learning in Internet of vehicles is characterized by comprising the following steps:
1) And establishing the network and service request model and acquiring the related information of the network and service request.
2) Establishing a network and service request calculation model;
3) Constructing a state space, an action space, a strategy function and a reward function;
4) Constructing an actor network and a criticizing family network, and training the actor network and the criticizing family network;
5) The actor network generates a service placement strategy and inputs the strategy into a critic network;
6) And the criticizing family network evaluates the strategy quality of the service placement strategy, updates actor network parameters if the evaluation fails, returns to the step 5), and outputs the service placement strategy if the evaluation passes.
2. The dynamic service placement method based on edge computing and deep reinforcement learning in the internet of vehicles according to claim 1, wherein the network and service request related information comprises edge server information, vehicle information and service information;
the edge server information comprises an edge server set E, an edge server E and residual resource capacity C of the edge server E e
The vehicle information includes a set of vehicles V.
The service information comprises a service set S and a vehicle number lambda of the request service S s The number of vehicles epsilon that can handle one service instance at a time or can provide parallel connections, the specified time t and vehicle location loc in the service request message, the amount of resources R consumed by the edge server deployment service s s Delay requirement threshold D s (ii) a The service instances include media file downloads, collaboration aware messaging, and environment notification services in an internet of vehicles environment.
3. The dynamic service placement method based on edge computing and deep reinforcement learning in the Internet of vehicles according to claim 1, wherein the network and service request computing model comprises a total service delay computing model and an edge resource utilization computing model;
the total service delay calculation model is as follows:
Figure FDA0003804378190000011
in the formula (I), the compound is shown in the specification,
Figure FDA0003804378190000012
the total service delay;
Figure FDA0003804378190000013
propagation delay and queuing delay; dist (v, s) is the Euclidean distance between the vehicle v and the edge server deployed by the service s; c is the propagation speed of the signal through the communication medium;
number of vehicles lambda when requesting service s s When the number is less than or equal to epsilon, the queuing delay
Figure FDA0003804378190000014
Number of vehicles lambda when requesting service s s When is greater than epsilon, queuing delay
Figure FDA0003804378190000021
Satisfies the following formula:
Figure FDA0003804378190000022
wherein the number is different by λ' s =λ s -ε;
Propagation delay
Figure FDA0003804378190000023
As follows:
Figure FDA0003804378190000024
where dist (v, s) is the Euclidean distance between vehicle v and the edge server deployed by service s; and c is the propagation speed of the signal through the communication medium.
The edge resource utilization calculation model is as follows:
edge resource usage rate
Figure FDA0003804378190000025
Is the ratio between the resources consumed by the service instance and the available resources of the edge server, as follows:
Figure FDA0003804378190000026
in the formula, parameters
Figure FDA0003804378190000027
C e Is the remaining resource capacity of the edge server e;
Figure FDA0003804378190000028
is edge resource usage; r s The amount of resources consumed to deploy service s for the edge server.
4. The dynamic service placement method based on edge computation and deep reinforcement learning in the internet of vehicles according to claim 1, wherein the state space is characterized by a state space set ω, namely:
ω={[v 1 ,loc 1 ,s],[v 2 ,loc 2 ,s],...,[v n ,loc n ,s]} t (6)
wherein S belongs to S; v. of 1 ,v 2 ,...,v n A set of vehicles; loc C 1 ,loc 2 ,...,loc n At time t, a set of vehicle positions for service s is requested.
5. The dynamic service placement method based on edge computing and deep reinforcement learning in the internet of vehicles according to claim 1, wherein the action space is used for describing actions taken when placing services on an edge server;
wherein the action a taken at a given time t is as follows:
Figure FDA0003804378190000029
where π is the policy function required to generate an action on the observed set of ω at time unit t;
Figure FDA00038043781900000210
the representation service s is deployed in an edge server e;
Figure FDA00038043781900000211
meaning that service s is not deployed at edge server e.
6. The dynamic service placement method based on edge computation and deep reinforcement learning in the internet of vehicles according to claim 1, wherein the policy function pi is a function executed by an actor network for mapping a state space to an action space, i.e. pi: ω → a;
the objective of the policy function pi is to minimize the maximum edge resource usage and service latency and to control the relative importance of resource usage and service latency by using the parameter β;
the policy function pi is expressed as follows:
Figure FDA0003804378190000031
in the formula, β is a weight coefficient.
The constraints of the policy function pi include mapping constraints
Figure FDA0003804378190000032
Time delay constraint
Figure FDA0003804378190000033
Resource constraints
Figure FDA0003804378190000034
7. The dynamic service placement method based on edge computing and deep reinforcement learning in the internet of vehicles according to claim 1, wherein the reward function is as follows:
Figure FDA0003804378190000035
in the formula (I), the compound is shown in the specification,
Figure FDA0003804378190000036
is an instant prize. Gamma is the reward factor.
Figure FDA0003804378190000037
Is the service delay at time t.
8. The dynamic service placement method based on edge computing and deep reinforcement learning in the internet of vehicles according to claim 1, wherein the loss function in the critics network training process
Figure FDA0003804378190000038
As follows:
Figure FDA0003804378190000039
in the formula, theta is a criticizing family network parameter;
Figure FDA00038043781900000310
to useTo evaluate a target value of the quality of the strategy; q i (ω, a; θ) placing the policy quality of the policy for the service;
Figure FDA00038043781900000311
is the number of available resource units in the edge server.
9. The dynamic service placement method based on edge computing and deep reinforcement learning in the internet of vehicles according to claim 8, wherein the method for evaluating the policy quality of the service placement policy by the criticizing family network comprises the following steps: judging criticizing family network loss function
Figure FDA00038043781900000312
And whether convergence is achieved, if the convergence is achieved, the evaluation is passed, and if the convergence is not achieved, the evaluation is not passed.
CN202210992657.5A 2022-08-18 2022-08-18 Dynamic service placement method based on edge calculation and deep reinforcement learning in Internet of vehicles Active CN115550944B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210992657.5A CN115550944B (en) 2022-08-18 2022-08-18 Dynamic service placement method based on edge calculation and deep reinforcement learning in Internet of vehicles

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210992657.5A CN115550944B (en) 2022-08-18 2022-08-18 Dynamic service placement method based on edge calculation and deep reinforcement learning in Internet of vehicles

Publications (2)

Publication Number Publication Date
CN115550944A true CN115550944A (en) 2022-12-30
CN115550944B CN115550944B (en) 2024-02-27

Family

ID=84725291

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210992657.5A Active CN115550944B (en) 2022-08-18 2022-08-18 Dynamic service placement method based on edge calculation and deep reinforcement learning in Internet of vehicles

Country Status (1)

Country Link
CN (1) CN115550944B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110213796A (en) * 2019-05-28 2019-09-06 大连理工大学 A kind of intelligent resource allocation methods in car networking
US20210112441A1 (en) * 2020-12-23 2021-04-15 Dario Sabella Transportation operator collaboration system
CN113382383A (en) * 2021-06-11 2021-09-10 浙江工业大学 Method for unloading calculation tasks of public transport vehicle based on strategy gradient
WO2021237996A1 (en) * 2020-05-26 2021-12-02 多伦科技股份有限公司 Fuzzy c-means-based adaptive energy consumption optimization vehicle clustering method
CN114528042A (en) * 2022-01-30 2022-05-24 南京信息工程大学 Energy-saving automatic interconnected vehicle service unloading method based on deep reinforcement learning
CN114625504A (en) * 2022-03-09 2022-06-14 天津理工大学 Internet of vehicles edge computing service migration method based on deep reinforcement learning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110213796A (en) * 2019-05-28 2019-09-06 大连理工大学 A kind of intelligent resource allocation methods in car networking
WO2021237996A1 (en) * 2020-05-26 2021-12-02 多伦科技股份有限公司 Fuzzy c-means-based adaptive energy consumption optimization vehicle clustering method
US20210112441A1 (en) * 2020-12-23 2021-04-15 Dario Sabella Transportation operator collaboration system
CN113382383A (en) * 2021-06-11 2021-09-10 浙江工业大学 Method for unloading calculation tasks of public transport vehicle based on strategy gradient
CN114528042A (en) * 2022-01-30 2022-05-24 南京信息工程大学 Energy-saving automatic interconnected vehicle service unloading method based on deep reinforcement learning
CN114625504A (en) * 2022-03-09 2022-06-14 天津理工大学 Internet of vehicles edge computing service migration method based on deep reinforcement learning

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
DASONG ZHUANG: "Offloading Strategy for Vehicles in the Architecture of Vehicle-MEC-Cloud", 《2022 IEEE/CIC INTERNATIONAL CONFERENCE ON COMMUNICATIONS IN CHINA (ICCC WORKSHOPS)》, 11 August 2022 (2022-08-11) *
XIUHUA LI: "Task Offloading for End-Edge-Cloud Orchestrated Computing in Mobile Networks", 《2020 IEEE WIRELESS COMMUNICATIONS AND NETWORKING CONFERENCE (WCNC)》, 25 May 2020 (2020-05-25) *
张海波;荆昆仑;刘开健;贺晓帆;: "车联网中一种基于软件定义网络与移动边缘计算的卸载策略", 电子与信息学报, no. 03, 15 March 2020 (2020-03-15) *
彭军;王成龙;蒋富;顾欣;牟??;刘伟荣;: "一种车载服务的快速深度Q学习网络边云迁移策略", 电子与信息学报, no. 01, 15 January 2020 (2020-01-15) *

Also Published As

Publication number Publication date
CN115550944B (en) 2024-02-27

Similar Documents

Publication Publication Date Title
CN109756378B (en) Intelligent computing unloading method under vehicle-mounted network
CN109391681B (en) MEC-based V2X mobility prediction and content caching offloading scheme
CN112995950B (en) Resource joint allocation method based on deep reinforcement learning in Internet of vehicles
Kazmi et al. Infotainment enabled smart cars: A joint communication, caching, and computation approach
Chen et al. Efficiency and fairness oriented dynamic task offloading in internet of vehicles
CN110312231A (en) Content caching decision and resource allocation joint optimization method based on mobile edge calculations in a kind of car networking
CN114143346B (en) Joint optimization method and system for task unloading and service caching of Internet of vehicles
CN112395090B (en) Intelligent hybrid optimization method for service placement in mobile edge calculation
CN113507503B (en) Internet of vehicles resource allocation method with load balancing function
CN111339554A (en) User data privacy protection method based on mobile edge calculation
CN115209426B (en) Dynamic deployment method for digital twin servers in edge car networking
CN114374741B (en) Dynamic grouping internet of vehicles caching method based on reinforcement learning under MEC environment
CN115297171B (en) Edge computing and unloading method and system for hierarchical decision of cellular Internet of vehicles
Wu et al. A profit-aware coalition game for cooperative content caching at the network edge
CN110489218A (en) Vehicle-mounted mist computing system task discharging method based on semi-Markovian decision process
CN114641041A (en) Edge-intelligent-oriented Internet of vehicles slicing method and device
CN109495565A (en) High concurrent service request processing method and equipment based on distributed ubiquitous computation
CN114979145A (en) Content distribution method integrating sensing, communication and caching in Internet of vehicles
CN113709249B (en) Safe balanced unloading method and system for driving assisting service
CN115550944A (en) Dynamic service placement method based on edge calculation and deep reinforcement learning in Internet of vehicles
CN117221951A (en) Task unloading method based on deep reinforcement learning in vehicle-mounted edge environment
CN116916272A (en) Resource allocation and task unloading method and system based on automatic driving automobile network
CN116489668A (en) Edge computing task unloading method based on high-altitude communication platform assistance
CN114928826A (en) Two-stage optimization method, controller and decision method for software-defined vehicle-mounted task unloading and resource allocation
CN114189522B (en) Priority-based blockchain consensus method and system in Internet of vehicles

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant