CN115550944B

CN115550944B - Dynamic service placement method based on edge calculation and deep reinforcement learning in Internet of vehicles

Info

Publication number: CN115550944B
Application number: CN202210992657.5A
Authority: CN
Inventors: 李秀华; 李辉; 孙川; 徐峥辉; 郝金隆; 蔡春茂; 范琪琳; 杨正益; 文俊浩
Original assignee: Chongqing University
Current assignee: Chongqing University
Priority date: 2022-08-18
Filing date: 2022-08-18
Publication date: 2024-02-27
Anticipated expiration: 2042-08-18
Also published as: CN115550944A

Abstract

The invention discloses a dynamic service placement method based on edge calculation and deep reinforcement learning in the Internet of vehicles, which comprises the following steps: 1) Establishing a network and service request model, and acquiring information related to the network and service request; 2) Establishing a network and service request calculation model; 3) Constructing a state space, an action space, a strategy function and a reward function; 4) Constructing an actor network and a criticism network, and training the actor network and the criticism network; 5) Generating a service placement strategy by the actor network and inputting the strategy into the criticizing home network; 6) And (3) the criticizing home network evaluates the strategy quality of the service placement strategy, if the evaluation is not passed, updating actor network parameters, returning to the step (5), and if the evaluation is passed, outputting the service placement strategy. The present invention minimizes maximum edge resource usage and service delays while taking into account vehicle mobility, changing demands, and dynamics of different types of service requests.

Description

A dynamic service placement method based on edge computing and deep reinforcement learning in Internet of Vehicles

技术领域Technical Field

本发明涉及车联网领域，具体是一种车联网中基于边缘计算和深度强化学习的动态服务放置方法。The present invention relates to the field of Internet of Vehicles, and specifically to a dynamic service placement method based on edge computing and deep reinforcement learning in the Internet of Vehicles.

背景技术Background Art

车联网是由车辆位置、速度和路线等信息构成的交互网络。通信技术的迅速发展为目前的车联网领域带来了许多新的可能性。其中，第五代移动通信技术的出现让车联网变得更加智能化，服务覆盖范围也进一步的扩大。但是，随着车联网领域中智能语音助手和自动驾驶等延迟敏感性应用成为了目前最流行的应用，传统云计算范式逐渐无法满足用户的需求。欧洲电信标准协会将移动边缘计算引入了车联网领域中，扩展了云计算的存储资源和计算资源，使其更接近用户，满足了用户对于智能应用的高可靠性、低延迟性、安全性等要求。The Internet of Vehicles is an interactive network consisting of information such as vehicle location, speed, and route. The rapid development of communication technology has brought many new possibilities to the current field of Internet of Vehicles. Among them, the emergence of the fifth generation of mobile communication technology has made the Internet of Vehicles more intelligent and the service coverage has been further expanded. However, as delay-sensitive applications such as intelligent voice assistants and autonomous driving have become the most popular applications in the field of Internet of Vehicles, the traditional cloud computing paradigm has gradually failed to meet user needs. The European Telecommunications Standards Institute introduced mobile edge computing into the field of Internet of Vehicles, expanding the storage and computing resources of cloud computing, bringing them closer to users, and meeting users' requirements for high reliability, low latency, and security of intelligent applications.

在车联网中，车辆与基础设施进行通信，以获得媒体下载、合作消息、去中心化环境通知消息等服务，从而在远程驾驶、停车位发现、导航等应用中实现协调。在边缘计算范式中可以在边缘服务器上部署多个服务，充分利用计算资源和存储资源。服务放置是车联网领域的研究热点之一。具体地说，服务放置是将服务映射至车联网中的边缘服务器，以满足所请求服务的需求，同时高效地使用边缘资源。从用户的角度来看，将车辆感知服务的延迟降至最低是十分重要的。从服务提供商的角度来看，要满足最大化边缘资源使用率，同时尽可能保持服务器之间的资源负载平衡。In the Internet of Vehicles, vehicles communicate with the infrastructure to obtain services such as media downloads, cooperative messages, and decentralized environmental notification messages, thereby achieving coordination in applications such as remote driving, parking space discovery, and navigation. In the edge computing paradigm, multiple services can be deployed on edge servers to fully utilize computing resources and storage resources. Service placement is one of the research hotspots in the field of Internet of Vehicles. Specifically, service placement is the mapping of services to edge servers in the Internet of Vehicles to meet the needs of the requested services while efficiently using edge resources. From the user's perspective, it is very important to minimize the latency of vehicle-aware services. From the service provider's perspective, it is necessary to maximize the utilization of edge resources while keeping the resource load balance between servers as much as possible.

发明内容Summary of the invention

本发明的目的是提供一种车联网中基于边缘计算和深度强化学习的动态服务放置方法，包括以下步骤：The purpose of the present invention is to provide a dynamic service placement method based on edge computing and deep reinforcement learning in a vehicle network, comprising the following steps:

1)建立网络与服务请求模型，获取网络与服务请求相关信息；1) Establish a network and service request model to obtain network and service request related information;

所述网络与服务请求相关信息包括边缘服务器信息、车辆信息、服务信息；The network and service request related information includes edge server information, vehicle information, and service information;

所述边缘服务器信息包括边缘服务器集合E，边缘服务器e，边缘服务器e的剩余资源容量C_e；The edge server information includes an edge server set E, an edge server e, and a remaining resource capacity C _e of the edge server e;

所述车辆信息包括车辆集合V。The vehicle information includes a vehicle set V.

所述服务信息包括服务集合S、请求服务s的车辆数量λ_s、一次可以处理一个服务实例(如车联网环境中的媒体文件下载、合作意识消息和环境通知服务等)或可以提供并行连接的车辆数量ε、服务请求消息中指定时间t和车辆位置loc、边缘服务器部署服务s所消耗的资源量R_s、时延需求阈值D_s。The service information includes the service set S, the number of vehicles λ _s requesting service s, the number of vehicles ε that can process one service instance at a time (such as media file downloading, cooperative awareness messages and environmental notification services in the Internet of Vehicles environment) or provide parallel connections, the time t and vehicle location loc specified in the service request message, the amount of resources R _s consumed by the edge server deploying service s, and the delay requirement threshold D _s .

2)建立网络与服务请求计算模型；2) Establish a network and service request calculation model;

所述网络与服务请求计算模型包括总服务时延计算模型、边缘资源使用率计算模型；The network and service request calculation model includes a total service delay calculation model and an edge resource utilization calculation model;

总服务时延计算模型如下所示：The total service delay calculation model is as follows:

式中，为总服务时延；为传播时延和排队时延；dist(v,s)为车辆v与服务s部署的边缘服务器之间的欧氏距离；c为信号通过通信介质的传播速度；In the formula, is the total service delay; are the propagation delay and queuing delay; dist(v,s) is the Euclidean distance between vehicle v and the edge server deployed by service s; c is the propagation speed of the signal through the communication medium;

当请求服务s的车辆数量λ_s≤ε时，排队时延当请求服务s的车辆数量λ_s＞ε时，排队时延满足下式：When the number of vehicles requesting service s is _λs ≤ε, the queuing delay is When the number of vehicles requesting service s λ _s ＞ ε, the queuing delay Satisfy the following formula:

式中，数量差λ′_s＝λ_s-ε；In the formula, the quantity difference λ′ _s =λ _s -ε;

传播时延如下所示：Propagation Delay As shown below:

式中，dist(v,s)为车辆v与服务s部署的边缘服务器之间的欧氏距离；c为信号通过通信介质的传播速度。Where dist(v,s) is the Euclidean distance between vehicle v and the edge server deployed by service s; c is the propagation speed of the signal through the communication medium.

边缘资源使用率计算模型如下所示：The edge resource utilization calculation model is as follows:

边缘资源使用率是服务实例消耗的资源与边缘服务器的可用资源之间的比率，如下所示：Edge resource utilization is the ratio between the resources consumed by the service instance and the available resources of the edge server, as follows:

式中，参数C_e为边缘服务器e的剩余资源容量；为边缘资源使用率；R_s为边缘服务器部署服务s所消耗的资源量。In the formula, the parameters C _e is the remaining resource capacity of edge server e; is the edge resource utilization rate; _Rs is the amount of resources consumed by the edge server to deploy service s.

3)构建状态空间、动作空间、策略函数和奖励函数；3) Construct state space, action space, strategy function and reward function;

所述状态空间通过状态空间集ω表征，即：The state space is characterized by the state space set ω, namely:

ω＝{[v₁,loc₁,s],[v₂,loc₂,s],...,[v_n,loc_n,s]}_t (6)ω={[v ₁ ,loc ₁ ,s],[v ₂ ,loc ₂ ,s],...,[v _n ,loc _n ,s]} _t (6)

式中，s∈S；v₁,v₂,...,v_n为一组车辆集合；loc₁,loc₂,...,loc_n为在t时，请求服务s的车辆位置集合。Where s∈S; v ₁ ,v ₂ ,...,v _n is a set of vehicles; loc ₁ ,loc ₂ ,...,loc _n is the set of vehicle locations requesting service s at time t.

所述动作空间用于描述在边缘服务器上放置服务时所采取的动作；The action space is used to describe the actions taken when placing a service on an edge server;

其中，在给定的时刻t所采取的动作a如下所示：The action a taken at a given time t is as follows:

式中，π是在时间单位t对ω的观察集生成动作所需的策略函数；表示服务s部署于边缘服务器e；表示服务s没有部署于边缘服务器e。Where π is the policy function required to generate actions for the observation set ω in time unit t; Indicates that service s is deployed on edge server e; Indicates that service s is not deployed on edge server e.

所述策略函数π是演员网络执行的函数，用于将状态空间映射到动作空间，即π:ω→a；The policy function π is a function executed by the actor network to map the state space to the action space, i.e., π:ω→a;

策略函数π的目标是最小化最大边缘资源使用和服务时延，并通过使用参数β来控制资源使用与服务时延的相对重要性。策略函数π表示为The goal of the policy function π is to minimize the maximum edge resource usage and service delay, and to control the relative importance of resource usage and service delay by using the parameter β. The policy function π is expressed as

式中，β为权重系数；In the formula, β is the weight coefficient;

所述策略函数π的约束包括映射约束时延约束资源约束 The constraints of the policy function π include the mapping constraints Delay Constraint Resource Constraints

所述奖励函数如下所示：The reward function is as follows:

式中，为即时奖励。γ为奖励系数。为t时刻的服务时延；In the formula, is the immediate reward. γ is the reward coefficient. is the service delay at time t;

4)构建演员网络和批评家网络，并对演员网络和批评家网络进行训练；4) Constructing the actor network and the critic network, and training the actor network and the critic network;

所述批评家网络训练过程中的损失函数如下所示：The loss function during the training of the critic network As shown below:

式中，θ为批评家网络参数；为用来评估策略质量的目标值；Q_i(ω,a；θ)为服务放置策略的策略质量；为边缘服务器中可用资源单元的数量；Where θ is the critic network parameter; is the target value used to evaluate the quality of the strategy; _Qi (ω, a; θ) is the strategy quality of the service placement strategy; is the number of available resource units in the edge server;

5)演员网络生成服务放置策略，并输入到批评家网络中；5) The actor network generates service placement strategies and inputs them into the critic network;

6)所述批评家网络评估服务放置策略的策略质量，若评估不通过，则更新演员网络参数，并返回步骤5)，若评估通过，则输出服务放置策略。6) The critic network evaluates the policy quality of the service placement strategy. If the evaluation fails, the actor network parameters are updated and the process returns to step 5. If the evaluation passes, the service placement strategy is output.

所述批评家网络评估服务放置策略的策略质量的方法包括：判断批评家网络损失函数是否收敛，若收敛，则评估通过，否则，评估不通过。The method for evaluating the strategy quality of the service placement strategy by the critic network includes: determining the critic network loss function Whether it converges. If so, the evaluation passes. Otherwise, the evaluation fails.

值得说明的是，本发明提出了一种基于边缘计算的三层式车联网架构，并考虑了动态服务放置问题，优化目标是最小化最大边缘资源使用(从服务提供商的角度)和服务延迟(从用户的角度)。It is worth noting that the present invention proposes a three-layer Internet of Vehicles architecture based on edge computing and takes into account the problem of dynamic service placement. The optimization goal is to minimize the maximum edge resource usage (from the perspective of the service provider) and service delay (from the perspective of the user).

此外，本发明提出了一种基于深度强化学习的服务放置框架，由策略函数(演员网络)和价值函数(评论家网络)组成。演员网络做出服务放置策略，而评论家网络根据车辆观察到的延迟来评估演员网络做出的决策性能。In addition, this paper proposes a service placement framework based on deep reinforcement learning, which consists of a policy function (actor network) and a value function (critic network). The actor network makes a service placement strategy, while the critic network evaluates the performance of the decision made by the actor network based on the delay observed by the vehicle.

本发明的技术效果是毋庸置疑的。本发明提供一种车联网中基于边缘计算和深度强化学习的动态服务放置方法，该方法提出了一个车联网中基于深度强化学习的动态服务放置框架，其目标是在考虑车辆的移动性、变化的需求和对不同类型服务请求的动态性的同时，最小化最大的边缘资源使用和服务延迟。The technical effect of the present invention is unquestionable. The present invention provides a dynamic service placement method based on edge computing and deep reinforcement learning in the Internet of Vehicles. The method proposes a dynamic service placement framework based on deep reinforcement learning in the Internet of Vehicles, which aims to minimize the maximum edge resource usage and service delay while considering the mobility of vehicles, changing needs and the dynamics of requests for different types of services.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为基于边缘计算的三层式车联网架；Figure 1 shows a three-layered Internet of Vehicles architecture based on edge computing;

图2为智能体结构；Figure 2 shows the structure of the intelligent agent;

图3为本发明流程图。FIG3 is a flow chart of the present invention.

具体实施方式DETAILED DESCRIPTION

下面结合实施例对本发明作进一步说明，但不应该理解为本发明上述主题范围仅限于下述实施例。在不脱离本发明上述技术思想的情况下，根据本领域普通技术知识和惯用手段，做出各种替换和变更，均应包括在本发明的保护范围内。The present invention is further described below in conjunction with the embodiments, but it should not be understood that the above subject matter of the present invention is limited to the following embodiments. Without departing from the above technical ideas of the present invention, various substitutions and changes are made according to the common technical knowledge and customary means in the art, which should all be included in the protection scope of the present invention.

实施例1：Embodiment 1:

参见图1至图3，一种车联网中基于边缘计算和深度强化学习的动态服务放置方法，包括以下步骤：Referring to FIG. 1 to FIG. 3 , a dynamic service placement method based on edge computing and deep reinforcement learning in an Internet of Vehicles includes the following steps:

传播时延如下所示：Propagation Delay As shown below:

式中，β为权重系数；In the formula, β is the weight coefficient;

策略函数π的原理为：通过下标s,e对服务集合与边缘服务器集合迭代，寻找最大的边缘资源使用和服务时延，再使其最小化以获得对应的策略函数π。The principle of the policy function π is: iterate the service set and the edge server set through the subscripts s and e to find the maximum edge resource usage and service delay, and then minimize them to obtain the corresponding policy function π.

所述奖励函数如下所示：The reward function is as follows:

实施例2：Embodiment 2:

一种车联网中基于边缘计算和深度强化学习的动态服务放置方法，包括以下步骤：A dynamic service placement method based on edge computing and deep reinforcement learning in an Internet of Vehicles includes the following steps:

1)建立网络与服务请求模型，获取边缘服务器信息、车辆信息和服务信息。1) Establish a network and service request model to obtain edge server information, vehicle information, and service information.

所述的服务器信息、车辆信息和服务信息包括边缘服务器集合E，边缘服务器e，边缘服务器e的剩余资源容量C_e，车辆集合V和服务集合S，请求服务s的车辆数量λ_s，一次可以处理一个服务实例(如车联网环境中的媒体文件下载、合作意识消息和环境通知服务等)或可以提供并行连接的车辆数量ε、，服务请求消息中指定时间t和车辆位置loc，边缘服务器部署服务s所消耗的资源量R_s，时延需求阈值D_s。The server information, vehicle information and service information include the edge server set E, edge server e, the remaining resource capacity C _e of edge server e, the vehicle set V and the service set S, the number of vehicles λ _s requesting service s, the number of vehicles ε that can process one service instance at a time (such as media file downloading, cooperative awareness messages and environmental notification services in the Internet of Vehicles environment) or can provide parallel connections, the time t and vehicle location loc specified in the service request message, the amount of resources R _s consumed by the edge server to deploy service s, and the delay requirement threshold D _s .

2)建立计算模型。2) Establish a computational model.

2.1)总服务时延建模。将整个边缘车联网系统建模为M/D/1队列。其中，从边缘服务器e请求服务s时，车辆的总服务时延是指从车辆发送服务请求到边缘服务器接收到相应响应的总时间。总服务时延由传播时延和排队时延组成：2.1) Total service delay modeling. The entire edge vehicle network system is modeled as an M/D/1 queue. When requesting service s from edge server e, the total service delay of the vehicle is It refers to the total time from when the vehicle sends a service request to when the edge server receives the corresponding response. Total service delay Propagation delay and queuing delay composition:

如果λ_s≤ε，排队时延为0。如果λ_s＞ε，则创建一个队列，并且边缘服务器上服务s的平均排队时延将如下所示：If λ _s ≤ ε, the queuing delay is 0. If λ _s ＞ε, a queue is created and the average queuing delay of service s on the edge server will be as follows:

其中，λ′_s＝λ_s-ε，平均传播时延计算为距离与介质上传播速度之比，如下所示：Where λ′ _s = λ _s - ε, and the average propagation delay is calculated as the ratio of distance to the propagation speed in the medium, as follows:

其中，dist(v,s)为车辆v与服务s部署的边缘服务器之间的欧氏距离，c为信号通过通信介质的传播速度。因此，总服务时延如下所示：Where dist(v,s) is the Euclidean distance between vehicle v and the edge server where service s is deployed, and c is the propagation speed of the signal through the communication medium. Therefore, the total service delay is as follows:

2.2)边缘资源使用率建模。边缘资源使用率是服务实例消耗的资源与边缘服务器的可用资源之间的比率，如下所示：2.2) Edge resource utilization modeling. Edge resource utilization is the ratio between the resources consumed by the service instance and the available resources of the edge server, as follows:

其中， in,

3)设计状态空间。在给定的时刻t，状态空间集描述网络环境。智能体观察环境以构成来自服务请求模型的状态空间集ω，如下所示：3) Design state space. At a given time t, the state space set describes the network environment. The agent observes the environment to form a state space set ω from the service request model, as shown below:

ω＝{[v₁,loc₁,s],[v₂,loc₂,s],...,[v_n,loc_n,s]}_t。 (6)ω={[v ₁ ,loc ₁ ,s],[v ₂ ,loc ₂ ,s],...,[v _n ,loc _n ,s]} _t . (6)

其中，s∈S，v₁,v₂,...,v_n为一组车辆集合，loc₁,loc₂,...,loc_n为在t时，请求服务s的车辆位置集合。Where s∈S, v ₁ ,v ₂ ,...,v _n is a set of vehicles, and loc ₁ ,loc ₂ ,...,loc _n is the set of vehicle locations requesting service s at time t.

4)设计的动作空间。动作空间描述了策略模块在边缘服务器上放置服务时所采取的动作，在给定的时刻t所采取的动作如下所示：4) Designed action space. The action space describes the actions taken by the policy module when placing services on edge servers. The actions taken at a given time t are as follows:

其中π是在时间单位t对ω的观察集生成动作所需的策略函数，二进制变量给出了指示服务s在边缘服务器e上的位置的矩阵，表示服务s部署于边缘服务器e。反之，表示服务s没有部署于边缘服务器e。where π is the policy function required to generate actions for a set of observations ω at time unit t, and the binary variable Given a matrix indicating the location of service s on edge server e, Indicates that service s is deployed on edge server e. Otherwise, Indicates that service s is not deployed on edge server e.

5)设计策略函数。策略函数π是演员网络执行的一个函数，用于将状态空间映射到动作空间π:ω→a。策略函数π的目标是最小化最大边缘资源使用和服务时延，并通过使用参数β来控制资源使用与服务时延的相对重要性。策略函数π表示为5) Design the policy function. The policy function π is a function executed by the actor network to map the state space to the action space π:ω→a. The goal of the policy function π is to minimize the maximum edge resource usage and service delay, and to control the relative importance of resource usage and service delay by using the parameter β. The policy function π is expressed as

策略函数同时受到映射约束，时延约束，以及资源约束， The policy function is also subject to mapping constraints, Delay constraints, and resource constraints,

6)设计奖励函数。在每个时间单位t，作为对智能体的演员网络所采取的行动的响应，系统从环境中收到即时奖励如下所示：6) Design a reward function. At each time unit t, in response to the actions taken by the agent's actor network, the system receives an immediate reward from the environment As shown below:

7)构建批评家网络，负责评估演员网络所做出的的决策质量Q(ω,a)。输入上述状态、动作和奖励进行训练批评家网络，批评家网络更新其参数θ以最小化损失函数如下所示：7) Build a critic network, which is responsible for evaluating the decision quality Q(ω,a) made by the actor network. Input the above states, actions and rewards to train the critic network, which updates its parameters θ to minimize the loss function As shown below:

其中，y_t为目标值。进一步使用重播内存M，用于存储训练批评家网络的体验。批评家网络使用重播内存在一段随机时间后获取体验，并优化网络参数以获得更好的性能。Where y _t is the target value. The replay memory M is further used to store the experience of training the critic network. The critic network uses the replay memory to obtain experience after a random period of time and optimize the network parameters to obtain better performance.

8)经过上述步骤对演员网络和批评家网络的训练收敛后，演员网络能够考虑车辆在不同类型服务请求中的移动性和动态性的同时，找到服务的最佳放置策略。批评家网络可以通过值函数对演员网络的策略质量进行评估。8) After the training of the actor network and the critic network converges through the above steps, the actor network is able to find the best placement strategy for services while considering the mobility and dynamics of vehicles in different types of service requests. The critic network can evaluate the quality of the actor network's strategy through the value function.

实施例3：Embodiment 3:

1)建立所述网络与服务请求模型，获取网络与服务请求相关信息。1) Establish the network and service request model and obtain network and service request related information.

实施例4：Embodiment 4:

一种车联网中基于边缘计算和深度强化学习的动态服务放置方法，主要内容见实施例3，其中，所述网络与服务请求相关信息包括边缘服务器信息、车辆信息、服务信息；A dynamic service placement method based on edge computing and deep reinforcement learning in an Internet of Vehicles, the main content of which is shown in Example 3, wherein the network and service request related information include edge server information, vehicle information, and service information;

所述服务信息包括服务集合S、请求服务s的车辆数量λ_s、一次可以处理一个服务实例或可以提供并行连接的车辆数量ε、服务请求消息中指定时间t和车辆位置loc、边缘服务器部署服务s所消耗的资源量R_s、时延需求阈值D_s；所述服务实例包括车联网环境中的媒体文件下载、合作意识消息和环境通知服务。The service information includes the service set S, the number of vehicles λ _s requesting service s, the number of vehicles ε that can process one service instance at a time or provide parallel connections, the time t and vehicle location loc specified in the service request message, the amount of resources R _s consumed by the edge server deploying service s, and the delay requirement threshold D _s ; the service instances include media file downloading, cooperative awareness messaging, and environmental notification services in the Internet of Vehicles environment.

实施例5：Embodiment 5:

一种车联网中基于边缘计算和深度强化学习的动态服务放置方法，主要内容见实施例3，其中，所述网络与服务请求计算模型包括总服务时延计算模型、边缘资源使用率计算模型；A dynamic service placement method based on edge computing and deep reinforcement learning in an Internet of Vehicles, the main content of which is shown in Example 3, wherein the network and service request calculation model includes a total service delay calculation model and an edge resource utilization rate calculation model;

传播时延如下所示：Propagation Delay As shown below:

实施例6：Embodiment 6:

一种车联网中基于边缘计算和深度强化学习的动态服务放置方法，主要内容见实施例3，其中，所述状态空间通过状态空间集ω表征，即：A dynamic service placement method based on edge computing and deep reinforcement learning in an Internet of Vehicles, the main content of which is shown in Example 3, wherein the state space is represented by a state space set ω, that is:

实施例7：Embodiment 7:

一种车联网中基于边缘计算和深度强化学习的动态服务放置方法，主要内容见实施例3，其中，所述动作空间用于描述在边缘服务器上放置服务时所采取的动作；A dynamic service placement method based on edge computing and deep reinforcement learning in an Internet of Vehicles, the main content of which is shown in Example 3, wherein the action space is used to describe the actions taken when placing a service on an edge server;

实施例8：Embodiment 8:

一种车联网中基于边缘计算和深度强化学习的动态服务放置方法，主要内容见实施例3，其中，所述策略函数π是演员网络执行的函数，用于将状态空间映射到动作空间，即π:ω→a；A dynamic service placement method based on edge computing and deep reinforcement learning in an Internet of Vehicles, the main content of which is shown in Example 3, wherein the policy function π is a function executed by an actor network, which is used to map the state space to the action space, that is, π:ω→a;

策略函数π的目标是最小化最大边缘资源使用和服务时延，并通过使用参数β来控制资源使用与服务时延的相对重要性；The goal of the policy function π is to minimize the maximum edge resource usage and service delay, and to control the relative importance of resource usage and service delay by using the parameter β;

策略函数π表示如下：The policy function π is expressed as follows:

式中，β为权重系数。In the formula, β is the weight coefficient.

实施例9：Embodiment 9:

一种车联网中基于边缘计算和深度强化学习的动态服务放置方法，主要内容见实施例3，其中，所述奖励函数如下所示：A dynamic service placement method based on edge computing and deep reinforcement learning in a vehicle network, the main content of which is shown in Example 3, wherein the reward function is as follows:

实施例10：Embodiment 10:

一种车联网中基于边缘计算和深度强化学习的动态服务放置方法，主要内容见实施例3，其中，所述批评家网络训练过程中的损失函数如下所示：A dynamic service placement method based on edge computing and deep reinforcement learning in a vehicle network, the main content of which is shown in Example 3, wherein the loss function in the critic network training process is As shown below:

实施例11：Embodiment 11:

一种车联网中基于边缘计算和深度强化学习的动态服务放置方法，主要内容见实施例3，其中，所述批评家网络评估服务放置策略的策略质量的方法包括：判断批评家网络损失函数是否收敛，若收敛，则评估通过，否则，评估不通过。A dynamic service placement method based on edge computing and deep reinforcement learning in a vehicle network, the main content of which is shown in Example 3, wherein the method for evaluating the policy quality of the service placement strategy by the critic network includes: determining the critic network loss function Whether it converges. If so, the evaluation passes. Otherwise, the evaluation fails.

Claims

1. A dynamic service placement method based on edge computing and deep reinforcement learning in an Internet of Vehicles, characterized in that it includes the following steps:

1) Establish a network and service request model to obtain network and service request related information;

2) Establish a network and service request calculation model;

3) Construct state space, action space, strategy function and reward function;

4) Constructing the actor network and the critic network, and training the actor network and the critic network;

5) The actor network generates service placement strategies and inputs them into the critic network;

6) The critic network evaluates the policy quality of the service placement strategy. If the evaluation fails, the actor network parameters are updated and the process returns to step 5. If the evaluation passes, the service placement strategy is output.

The network and service request calculation model includes a total service delay calculation model and an edge resource utilization calculation model;

The total service delay calculation model is as follows:

In the formula, is the total service delay; are the propagation delay and queuing delay; dist(v,s) is the Euclidean distance between vehicle v and the edge server deployed by service s; c is the propagation speed of the signal through the communication medium;

When the number of vehicles requesting service s is _λs ≤ε, the queuing delay is When the number of vehicles requesting service s λ _s ＞ ε, the queuing delay Satisfy the following formula:

In the formula, the quantity difference λ′ _s =λ _s -ε;

Propagation Delay As shown below:

Where dist(v,s) is the Euclidean distance between vehicle v and the edge server deployed by service s; c is the propagation speed of the signal through the communication medium;

The edge resource utilization calculation model is as follows:

Edge resource utilization is the ratio between the resources consumed by the service instance and the available resources of the edge server, as follows:

In the formula, the parameters C _e is the remaining resource capacity of edge server e; is the edge resource utilization rate; _Rs is the amount of resources consumed by deploying service s on the edge server;

The state space is characterized by the state space set ω, namely:

ω={[v ₁ ,loc ₁ ,s],[v ₂ ,loc ₂ ,s],…,[v _n ,loc _n ,s]} _t (6)

Where s∈S; v ₁ ,v ₂ ,...,v _n is a set of vehicles; loc ₁ ,loc ₂ ,...,loc _n is the set of vehicle locations requesting service s at time t;

The action space is used to describe the actions taken when placing a service on an edge server;

The action a taken at a given time t is as follows:

Where π is the policy function required to generate actions for the observation set ω in time unit t; Indicates that service s is deployed on edge server e; Indicates that service s is not deployed on edge server e;

The policy function π is a function executed by the actor network to map the state space to the action space, i.e., π:ω→a;

The goal of the policy function π is to minimize the maximum edge resource usage and service delay, and to control the relative importance of resource usage and service delay by using the parameter β;

The policy function π is expressed as follows:

In the formula, β is the weight coefficient;

The constraints of the policy function π include the mapping constraints Delay Constraint Resource Constraints

2. A dynamic service placement method based on edge computing and deep reinforcement learning in a connected vehicle network according to claim 1, characterized in that the network and service request related information includes edge server information, vehicle information, and service information;

The edge server information includes an edge server set E, an edge server e, and a remaining resource capacity C _e of the edge server e;

The vehicle information includes a vehicle set V;

The service information includes the service set S, the number of vehicles λ _s requesting service s, the number of vehicles ε that can process one service instance at a time or provide parallel connections, the time t and vehicle location loc specified in the service request message, the amount of resources R _s consumed by the edge server deploying service s, and the delay requirement threshold D _s ; the service instances include media file downloading, cooperative awareness messaging, and environmental notification services in the Internet of Vehicles environment.

3. According to a method for dynamic service placement based on edge computing and deep reinforcement learning in a connected vehicle network according to claim 1, it is characterized in that the reward function is as follows:

In the formula, is the immediate reward; γ is the reward coefficient; is the service delay at time t.

4. According to the method of dynamic service placement based on edge computing and deep reinforcement learning in the Internet of Vehicles in claim 1, it is characterized in that the loss function in the critic network training process is As shown below:

Where θ is the critic network parameter; is the target value used to evaluate the quality of the strategy; _Qi (ω, a; θ) is the strategy quality of the service placement strategy; is the number of available resource units in the edge server.

5. According to the method of dynamic service placement based on edge computing and deep reinforcement learning in the Internet of Vehicles in claim 4, it is characterized in that the method of evaluating the policy quality of the service placement strategy by the critic network includes: judging the critic network loss function Whether it converges. If so, the evaluation passes. Otherwise, the evaluation fails.