CN115550944B - Dynamic service placement method based on edge calculation and deep reinforcement learning in Internet of vehicles - Google Patents

Dynamic service placement method based on edge calculation and deep reinforcement learning in Internet of vehicles Download PDF

Info

Publication number
CN115550944B
CN115550944B CN202210992657.5A CN202210992657A CN115550944B CN 115550944 B CN115550944 B CN 115550944B CN 202210992657 A CN202210992657 A CN 202210992657A CN 115550944 B CN115550944 B CN 115550944B
Authority
CN
China
Prior art keywords
service
network
edge
vehicles
edge server
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210992657.5A
Other languages
Chinese (zh)
Other versions
CN115550944A (en
Inventor
李秀华
李辉
孙川
徐峥辉
郝金隆
蔡春茂
范琪琳
杨正益
文俊浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University
Original Assignee
Chongqing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University filed Critical Chongqing University
Priority to CN202210992657.5A priority Critical patent/CN115550944B/en
Publication of CN115550944A publication Critical patent/CN115550944A/en
Application granted granted Critical
Publication of CN115550944B publication Critical patent/CN115550944B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W16/00Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
    • H04W16/18Network planning tools
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5072Grid computing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16YINFORMATION AND COMMUNICATION TECHNOLOGY SPECIALLY ADAPTED FOR THE INTERNET OF THINGS [IoT]
    • G16Y20/00Information sensed or collected by the things
    • G16Y20/10Information sensed or collected by the things relating to the environment, e.g. temperature; relating to location
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16YINFORMATION AND COMMUNICATION TECHNOLOGY SPECIALLY ADAPTED FOR THE INTERNET OF THINGS [IoT]
    • G16Y20/00Information sensed or collected by the things
    • G16Y20/30Information sensed or collected by the things relating to resources, e.g. consumed power
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16YINFORMATION AND COMMUNICATION TECHNOLOGY SPECIALLY ADAPTED FOR THE INTERNET OF THINGS [IoT]
    • G16Y30/00IoT infrastructure
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W16/00Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
    • H04W16/22Traffic simulation tools or models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/502Proximity

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • Environmental & Geological Engineering (AREA)
  • Toxicology (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a dynamic service placement method based on edge calculation and deep reinforcement learning in the Internet of vehicles, which comprises the following steps: 1) Establishing a network and service request model, and acquiring information related to the network and service request; 2) Establishing a network and service request calculation model; 3) Constructing a state space, an action space, a strategy function and a reward function; 4) Constructing an actor network and a criticism network, and training the actor network and the criticism network; 5) Generating a service placement strategy by the actor network and inputting the strategy into the criticizing home network; 6) And (3) the criticizing home network evaluates the strategy quality of the service placement strategy, if the evaluation is not passed, updating actor network parameters, returning to the step (5), and if the evaluation is passed, outputting the service placement strategy. The present invention minimizes maximum edge resource usage and service delays while taking into account vehicle mobility, changing demands, and dynamics of different types of service requests.

Description

一种车联网中基于边缘计算和深度强化学习的动态服务放置 方法A dynamic service placement method based on edge computing and deep reinforcement learning in Internet of Vehicles

技术领域Technical Field

本发明涉及车联网领域,具体是一种车联网中基于边缘计算和深度强化学习的动态服务放置方法。The present invention relates to the field of Internet of Vehicles, and specifically to a dynamic service placement method based on edge computing and deep reinforcement learning in the Internet of Vehicles.

背景技术Background Art

车联网是由车辆位置、速度和路线等信息构成的交互网络。通信技术的迅速发展为目前的车联网领域带来了许多新的可能性。其中,第五代移动通信技术的出现让车联网变得更加智能化,服务覆盖范围也进一步的扩大。但是,随着车联网领域中智能语音助手和自动驾驶等延迟敏感性应用成为了目前最流行的应用,传统云计算范式逐渐无法满足用户的需求。欧洲电信标准协会将移动边缘计算引入了车联网领域中,扩展了云计算的存储资源和计算资源,使其更接近用户,满足了用户对于智能应用的高可靠性、低延迟性、安全性等要求。The Internet of Vehicles is an interactive network consisting of information such as vehicle location, speed, and route. The rapid development of communication technology has brought many new possibilities to the current field of Internet of Vehicles. Among them, the emergence of the fifth generation of mobile communication technology has made the Internet of Vehicles more intelligent and the service coverage has been further expanded. However, as delay-sensitive applications such as intelligent voice assistants and autonomous driving have become the most popular applications in the field of Internet of Vehicles, the traditional cloud computing paradigm has gradually failed to meet user needs. The European Telecommunications Standards Institute introduced mobile edge computing into the field of Internet of Vehicles, expanding the storage and computing resources of cloud computing, bringing them closer to users, and meeting users' requirements for high reliability, low latency, and security of intelligent applications.

在车联网中,车辆与基础设施进行通信,以获得媒体下载、合作消息、去中心化环境通知消息等服务,从而在远程驾驶、停车位发现、导航等应用中实现协调。在边缘计算范式中可以在边缘服务器上部署多个服务,充分利用计算资源和存储资源。服务放置是车联网领域的研究热点之一。具体地说,服务放置是将服务映射至车联网中的边缘服务器,以满足所请求服务的需求,同时高效地使用边缘资源。从用户的角度来看,将车辆感知服务的延迟降至最低是十分重要的。从服务提供商的角度来看,要满足最大化边缘资源使用率,同时尽可能保持服务器之间的资源负载平衡。In the Internet of Vehicles, vehicles communicate with the infrastructure to obtain services such as media downloads, cooperative messages, and decentralized environmental notification messages, thereby achieving coordination in applications such as remote driving, parking space discovery, and navigation. In the edge computing paradigm, multiple services can be deployed on edge servers to fully utilize computing resources and storage resources. Service placement is one of the research hotspots in the field of Internet of Vehicles. Specifically, service placement is the mapping of services to edge servers in the Internet of Vehicles to meet the needs of the requested services while efficiently using edge resources. From the user's perspective, it is very important to minimize the latency of vehicle-aware services. From the service provider's perspective, it is necessary to maximize the utilization of edge resources while keeping the resource load balance between servers as much as possible.

发明内容Summary of the invention

本发明的目的是提供一种车联网中基于边缘计算和深度强化学习的动态服务放置方法,包括以下步骤:The purpose of the present invention is to provide a dynamic service placement method based on edge computing and deep reinforcement learning in a vehicle network, comprising the following steps:

1)建立网络与服务请求模型,获取网络与服务请求相关信息;1) Establish a network and service request model to obtain network and service request related information;

所述网络与服务请求相关信息包括边缘服务器信息、车辆信息、服务信息;The network and service request related information includes edge server information, vehicle information, and service information;

所述边缘服务器信息包括边缘服务器集合E,边缘服务器e,边缘服务器e的剩余资源容量CeThe edge server information includes an edge server set E, an edge server e, and a remaining resource capacity C e of the edge server e;

所述车辆信息包括车辆集合V。The vehicle information includes a vehicle set V.

所述服务信息包括服务集合S、请求服务s的车辆数量λs、一次可以处理一个服务实例(如车联网环境中的媒体文件下载、合作意识消息和环境通知服务等)或可以提供并行连接的车辆数量ε、服务请求消息中指定时间t和车辆位置loc、边缘服务器部署服务s所消耗的资源量Rs、时延需求阈值DsThe service information includes the service set S, the number of vehicles λ s requesting service s, the number of vehicles ε that can process one service instance at a time (such as media file downloading, cooperative awareness messages and environmental notification services in the Internet of Vehicles environment) or provide parallel connections, the time t and vehicle location loc specified in the service request message, the amount of resources R s consumed by the edge server deploying service s, and the delay requirement threshold D s .

2)建立网络与服务请求计算模型;2) Establish a network and service request calculation model;

所述网络与服务请求计算模型包括总服务时延计算模型、边缘资源使用率计算模型;The network and service request calculation model includes a total service delay calculation model and an edge resource utilization calculation model;

总服务时延计算模型如下所示:The total service delay calculation model is as follows:

式中,为总服务时延;为传播时延和排队时延;dist(v,s)为车辆v与服务s部署的边缘服务器之间的欧氏距离;c为信号通过通信介质的传播速度;In the formula, is the total service delay; are the propagation delay and queuing delay; dist(v,s) is the Euclidean distance between vehicle v and the edge server deployed by service s; c is the propagation speed of the signal through the communication medium;

当请求服务s的车辆数量λs≤ε时,排队时延当请求服务s的车辆数量λs>ε时,排队时延满足下式:When the number of vehicles requesting service s is λs ≤ε, the queuing delay is When the number of vehicles requesting service s λ s > ε, the queuing delay Satisfy the following formula:

式中,数量差λ′s=λs-ε;In the formula, the quantity difference λ′ ss -ε;

传播时延如下所示:Propagation Delay As shown below:

式中,dist(v,s)为车辆v与服务s部署的边缘服务器之间的欧氏距离;c为信号通过通信介质的传播速度。Where dist(v,s) is the Euclidean distance between vehicle v and the edge server deployed by service s; c is the propagation speed of the signal through the communication medium.

边缘资源使用率计算模型如下所示:The edge resource utilization calculation model is as follows:

边缘资源使用率是服务实例消耗的资源与边缘服务器的可用资源之间的比率,如下所示:Edge resource utilization is the ratio between the resources consumed by the service instance and the available resources of the edge server, as follows:

式中,参数Ce为边缘服务器e的剩余资源容量;为边缘资源使用率;Rs为边缘服务器部署服务s所消耗的资源量。In the formula, the parameters C e is the remaining resource capacity of edge server e; is the edge resource utilization rate; Rs is the amount of resources consumed by the edge server to deploy service s.

3)构建状态空间、动作空间、策略函数和奖励函数;3) Construct state space, action space, strategy function and reward function;

所述状态空间通过状态空间集ω表征,即:The state space is characterized by the state space set ω, namely:

ω={[v1,loc1,s],[v2,loc2,s],...,[vn,locn,s]}t (6)ω={[v 1 ,loc 1 ,s],[v 2 ,loc 2 ,s],...,[v n ,loc n ,s]} t (6)

式中,s∈S;v1,v2,...,vn为一组车辆集合;loc1,loc2,...,locn为在t时,请求服务s的车辆位置集合。Where s∈S; v 1 ,v 2 ,...,v n is a set of vehicles; loc 1 ,loc 2 ,...,loc n is the set of vehicle locations requesting service s at time t.

所述动作空间用于描述在边缘服务器上放置服务时所采取的动作;The action space is used to describe the actions taken when placing a service on an edge server;

其中,在给定的时刻t所采取的动作a如下所示:The action a taken at a given time t is as follows:

式中,π是在时间单位t对ω的观察集生成动作所需的策略函数;表示服务s部署于边缘服务器e;表示服务s没有部署于边缘服务器e。Where π is the policy function required to generate actions for the observation set ω in time unit t; Indicates that service s is deployed on edge server e; Indicates that service s is not deployed on edge server e.

所述策略函数π是演员网络执行的函数,用于将状态空间映射到动作空间,即π:ω→a;The policy function π is a function executed by the actor network to map the state space to the action space, i.e., π:ω→a;

策略函数π的目标是最小化最大边缘资源使用和服务时延,并通过使用参数β来控制资源使用与服务时延的相对重要性。策略函数π表示为The goal of the policy function π is to minimize the maximum edge resource usage and service delay, and to control the relative importance of resource usage and service delay by using the parameter β. The policy function π is expressed as

式中,β为权重系数;In the formula, β is the weight coefficient;

所述策略函数π的约束包括映射约束时延约束资源约束 The constraints of the policy function π include the mapping constraints Delay Constraint Resource Constraints

所述奖励函数如下所示:The reward function is as follows:

式中,为即时奖励。γ为奖励系数。为t时刻的服务时延;In the formula, is the immediate reward. γ is the reward coefficient. is the service delay at time t;

4)构建演员网络和批评家网络,并对演员网络和批评家网络进行训练;4) Constructing the actor network and the critic network, and training the actor network and the critic network;

所述批评家网络训练过程中的损失函数如下所示:The loss function during the training of the critic network As shown below:

式中,θ为批评家网络参数;为用来评估策略质量的目标值;Qi(ω,a;θ)为服务放置策略的策略质量;为边缘服务器中可用资源单元的数量;Where θ is the critic network parameter; is the target value used to evaluate the quality of the strategy; Qi (ω, a; θ) is the strategy quality of the service placement strategy; is the number of available resource units in the edge server;

5)演员网络生成服务放置策略,并输入到批评家网络中;5) The actor network generates service placement strategies and inputs them into the critic network;

6)所述批评家网络评估服务放置策略的策略质量,若评估不通过,则更新演员网络参数,并返回步骤5),若评估通过,则输出服务放置策略。6) The critic network evaluates the policy quality of the service placement strategy. If the evaluation fails, the actor network parameters are updated and the process returns to step 5. If the evaluation passes, the service placement strategy is output.

所述批评家网络评估服务放置策略的策略质量的方法包括:判断批评家网络损失函数是否收敛,若收敛,则评估通过,否则,评估不通过。The method for evaluating the strategy quality of the service placement strategy by the critic network includes: determining the critic network loss function Whether it converges. If so, the evaluation passes. Otherwise, the evaluation fails.

值得说明的是,本发明提出了一种基于边缘计算的三层式车联网架构,并考虑了动态服务放置问题,优化目标是最小化最大边缘资源使用(从服务提供商的角度)和服务延迟(从用户的角度)。It is worth noting that the present invention proposes a three-layer Internet of Vehicles architecture based on edge computing and takes into account the problem of dynamic service placement. The optimization goal is to minimize the maximum edge resource usage (from the perspective of the service provider) and service delay (from the perspective of the user).

此外,本发明提出了一种基于深度强化学习的服务放置框架,由策略函数(演员网络)和价值函数(评论家网络)组成。演员网络做出服务放置策略,而评论家网络根据车辆观察到的延迟来评估演员网络做出的决策性能。In addition, this paper proposes a service placement framework based on deep reinforcement learning, which consists of a policy function (actor network) and a value function (critic network). The actor network makes a service placement strategy, while the critic network evaluates the performance of the decision made by the actor network based on the delay observed by the vehicle.

本发明的技术效果是毋庸置疑的。本发明提供一种车联网中基于边缘计算和深度强化学习的动态服务放置方法,该方法提出了一个车联网中基于深度强化学习的动态服务放置框架,其目标是在考虑车辆的移动性、变化的需求和对不同类型服务请求的动态性的同时,最小化最大的边缘资源使用和服务延迟。The technical effect of the present invention is unquestionable. The present invention provides a dynamic service placement method based on edge computing and deep reinforcement learning in the Internet of Vehicles. The method proposes a dynamic service placement framework based on deep reinforcement learning in the Internet of Vehicles, which aims to minimize the maximum edge resource usage and service delay while considering the mobility of vehicles, changing needs and the dynamics of requests for different types of services.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为基于边缘计算的三层式车联网架;Figure 1 shows a three-layered Internet of Vehicles architecture based on edge computing;

图2为智能体结构;Figure 2 shows the structure of the intelligent agent;

图3为本发明流程图。FIG3 is a flow chart of the present invention.

具体实施方式DETAILED DESCRIPTION

下面结合实施例对本发明作进一步说明,但不应该理解为本发明上述主题范围仅限于下述实施例。在不脱离本发明上述技术思想的情况下,根据本领域普通技术知识和惯用手段,做出各种替换和变更,均应包括在本发明的保护范围内。The present invention is further described below in conjunction with the embodiments, but it should not be understood that the above subject matter of the present invention is limited to the following embodiments. Without departing from the above technical ideas of the present invention, various substitutions and changes are made according to the common technical knowledge and customary means in the art, which should all be included in the protection scope of the present invention.

实施例1:Embodiment 1:

参见图1至图3,一种车联网中基于边缘计算和深度强化学习的动态服务放置方法,包括以下步骤:Referring to FIG. 1 to FIG. 3 , a dynamic service placement method based on edge computing and deep reinforcement learning in an Internet of Vehicles includes the following steps:

1)建立网络与服务请求模型,获取网络与服务请求相关信息;1) Establish a network and service request model to obtain network and service request related information;

所述网络与服务请求相关信息包括边缘服务器信息、车辆信息、服务信息;The network and service request related information includes edge server information, vehicle information, and service information;

所述边缘服务器信息包括边缘服务器集合E,边缘服务器e,边缘服务器e的剩余资源容量CeThe edge server information includes an edge server set E, an edge server e, and a remaining resource capacity C e of the edge server e;

所述车辆信息包括车辆集合V。The vehicle information includes a vehicle set V.

所述服务信息包括服务集合S、请求服务s的车辆数量λs、一次可以处理一个服务实例(如车联网环境中的媒体文件下载、合作意识消息和环境通知服务等)或可以提供并行连接的车辆数量ε、服务请求消息中指定时间t和车辆位置loc、边缘服务器部署服务s所消耗的资源量Rs、时延需求阈值DsThe service information includes the service set S, the number of vehicles λ s requesting service s, the number of vehicles ε that can process one service instance at a time (such as media file downloading, cooperative awareness messages and environmental notification services in the Internet of Vehicles environment) or provide parallel connections, the time t and vehicle location loc specified in the service request message, the amount of resources R s consumed by the edge server deploying service s, and the delay requirement threshold D s .

2)建立网络与服务请求计算模型;2) Establish a network and service request calculation model;

所述网络与服务请求计算模型包括总服务时延计算模型、边缘资源使用率计算模型;The network and service request calculation model includes a total service delay calculation model and an edge resource utilization calculation model;

总服务时延计算模型如下所示:The total service delay calculation model is as follows:

式中,为总服务时延;为传播时延和排队时延;dist(v,s)为车辆v与服务s部署的边缘服务器之间的欧氏距离;c为信号通过通信介质的传播速度;In the formula, is the total service delay; are the propagation delay and queuing delay; dist(v,s) is the Euclidean distance between vehicle v and the edge server deployed by service s; c is the propagation speed of the signal through the communication medium;

当请求服务s的车辆数量λs≤ε时,排队时延当请求服务s的车辆数量λs>ε时,排队时延满足下式:When the number of vehicles requesting service s is λs ≤ε, the queuing delay is When the number of vehicles requesting service s λ s > ε, the queuing delay Satisfy the following formula:

式中,数量差λ′s=λs-ε;In the formula, the quantity difference λ′ ss -ε;

传播时延如下所示:Propagation Delay As shown below:

式中,dist(v,s)为车辆v与服务s部署的边缘服务器之间的欧氏距离;c为信号通过通信介质的传播速度。Where dist(v,s) is the Euclidean distance between vehicle v and the edge server deployed by service s; c is the propagation speed of the signal through the communication medium.

边缘资源使用率计算模型如下所示:The edge resource utilization calculation model is as follows:

边缘资源使用率是服务实例消耗的资源与边缘服务器的可用资源之间的比率,如下所示:Edge resource utilization is the ratio between the resources consumed by the service instance and the available resources of the edge server, as follows:

式中,参数Ce为边缘服务器e的剩余资源容量;为边缘资源使用率;Rs为边缘服务器部署服务s所消耗的资源量。In the formula, the parameters C e is the remaining resource capacity of edge server e; is the edge resource utilization rate; Rs is the amount of resources consumed by the edge server to deploy service s.

3)构建状态空间、动作空间、策略函数和奖励函数;3) Construct state space, action space, strategy function and reward function;

所述状态空间通过状态空间集ω表征,即:The state space is characterized by the state space set ω, namely:

ω={[v1,loc1,s],[v2,loc2,s],...,[vn,locn,s]}t (6)ω={[v 1 ,loc 1 ,s],[v 2 ,loc 2 ,s],...,[v n ,loc n ,s]} t (6)

式中,s∈S;v1,v2,...,vn为一组车辆集合;loc1,loc2,...,locn为在t时,请求服务s的车辆位置集合。Where s∈S; v 1 ,v 2 ,...,v n is a set of vehicles; loc 1 ,loc 2 ,...,loc n is the set of vehicle locations requesting service s at time t.

所述动作空间用于描述在边缘服务器上放置服务时所采取的动作;The action space is used to describe the actions taken when placing a service on an edge server;

其中,在给定的时刻t所采取的动作a如下所示:The action a taken at a given time t is as follows:

式中,π是在时间单位t对ω的观察集生成动作所需的策略函数;表示服务s部署于边缘服务器e;表示服务s没有部署于边缘服务器e。Where π is the policy function required to generate actions for the observation set ω in time unit t; Indicates that service s is deployed on edge server e; Indicates that service s is not deployed on edge server e.

所述策略函数π是演员网络执行的函数,用于将状态空间映射到动作空间,即π:ω→a;The policy function π is a function executed by the actor network to map the state space to the action space, i.e., π:ω→a;

策略函数π的目标是最小化最大边缘资源使用和服务时延,并通过使用参数β来控制资源使用与服务时延的相对重要性。策略函数π表示为The goal of the policy function π is to minimize the maximum edge resource usage and service delay, and to control the relative importance of resource usage and service delay by using the parameter β. The policy function π is expressed as

式中,β为权重系数;In the formula, β is the weight coefficient;

策略函数π的原理为:通过下标s,e对服务集合与边缘服务器集合迭代,寻找最大的边缘资源使用和服务时延,再使其最小化以获得对应的策略函数π。The principle of the policy function π is: iterate the service set and the edge server set through the subscripts s and e to find the maximum edge resource usage and service delay, and then minimize them to obtain the corresponding policy function π.

所述策略函数π的约束包括映射约束时延约束资源约束 The constraints of the policy function π include the mapping constraints Delay Constraint Resource Constraints

所述奖励函数如下所示:The reward function is as follows:

式中,为即时奖励。γ为奖励系数。为t时刻的服务时延;In the formula, is the immediate reward. γ is the reward coefficient. is the service delay at time t;

4)构建演员网络和批评家网络,并对演员网络和批评家网络进行训练;4) Constructing the actor network and the critic network, and training the actor network and the critic network;

所述批评家网络训练过程中的损失函数如下所示:The loss function during the training of the critic network As shown below:

式中,θ为批评家网络参数;为用来评估策略质量的目标值;Qi(ω,a;θ)为服务放置策略的策略质量;为边缘服务器中可用资源单元的数量;Where θ is the critic network parameter; is the target value used to evaluate the quality of the strategy; Qi (ω, a; θ) is the strategy quality of the service placement strategy; is the number of available resource units in the edge server;

5)演员网络生成服务放置策略,并输入到批评家网络中;5) The actor network generates service placement strategies and inputs them into the critic network;

6)所述批评家网络评估服务放置策略的策略质量,若评估不通过,则更新演员网络参数,并返回步骤5),若评估通过,则输出服务放置策略。6) The critic network evaluates the policy quality of the service placement strategy. If the evaluation fails, the actor network parameters are updated and the process returns to step 5. If the evaluation passes, the service placement strategy is output.

所述批评家网络评估服务放置策略的策略质量的方法包括:判断批评家网络损失函数是否收敛,若收敛,则评估通过,否则,评估不通过。The method for evaluating the strategy quality of the service placement strategy by the critic network includes: determining the critic network loss function Whether it converges. If so, the evaluation passes. Otherwise, the evaluation fails.

实施例2:Embodiment 2:

一种车联网中基于边缘计算和深度强化学习的动态服务放置方法,包括以下步骤:A dynamic service placement method based on edge computing and deep reinforcement learning in an Internet of Vehicles includes the following steps:

1)建立网络与服务请求模型,获取边缘服务器信息、车辆信息和服务信息。1) Establish a network and service request model to obtain edge server information, vehicle information, and service information.

所述的服务器信息、车辆信息和服务信息包括边缘服务器集合E,边缘服务器e,边缘服务器e的剩余资源容量Ce,车辆集合V和服务集合S,请求服务s的车辆数量λs,一次可以处理一个服务实例(如车联网环境中的媒体文件下载、合作意识消息和环境通知服务等)或可以提供并行连接的车辆数量ε、,服务请求消息中指定时间t和车辆位置loc,边缘服务器部署服务s所消耗的资源量Rs,时延需求阈值DsThe server information, vehicle information and service information include the edge server set E, edge server e, the remaining resource capacity C e of edge server e, the vehicle set V and the service set S, the number of vehicles λ s requesting service s, the number of vehicles ε that can process one service instance at a time (such as media file downloading, cooperative awareness messages and environmental notification services in the Internet of Vehicles environment) or can provide parallel connections, the time t and vehicle location loc specified in the service request message, the amount of resources R s consumed by the edge server to deploy service s, and the delay requirement threshold D s .

2)建立计算模型。2) Establish a computational model.

2.1)总服务时延建模。将整个边缘车联网系统建模为M/D/1队列。其中,从边缘服务器e请求服务s时,车辆的总服务时延是指从车辆发送服务请求到边缘服务器接收到相应响应的总时间。总服务时延由传播时延和排队时延组成:2.1) Total service delay modeling. The entire edge vehicle network system is modeled as an M/D/1 queue. When requesting service s from edge server e, the total service delay of the vehicle is It refers to the total time from when the vehicle sends a service request to when the edge server receives the corresponding response. Total service delay Propagation delay and queuing delay composition:

如果λs≤ε,排队时延为0。如果λs>ε,则创建一个队列,并且边缘服务器上服务s的平均排队时延将如下所示:If λ s ≤ ε, the queuing delay is 0. If λ s >ε, a queue is created and the average queuing delay of service s on the edge server will be as follows:

其中,λ′s=λs-ε,平均传播时延计算为距离与介质上传播速度之比,如下所示:Where λ′ s = λ s - ε, and the average propagation delay is calculated as the ratio of distance to the propagation speed in the medium, as follows:

其中,dist(v,s)为车辆v与服务s部署的边缘服务器之间的欧氏距离,c为信号通过通信介质的传播速度。因此,总服务时延如下所示:Where dist(v,s) is the Euclidean distance between vehicle v and the edge server where service s is deployed, and c is the propagation speed of the signal through the communication medium. Therefore, the total service delay is as follows:

2.2)边缘资源使用率建模。边缘资源使用率是服务实例消耗的资源与边缘服务器的可用资源之间的比率,如下所示:2.2) Edge resource utilization modeling. Edge resource utilization is the ratio between the resources consumed by the service instance and the available resources of the edge server, as follows:

其中, in,

3)设计状态空间。在给定的时刻t,状态空间集描述网络环境。智能体观察环境以构成来自服务请求模型的状态空间集ω,如下所示:3) Design state space. At a given time t, the state space set describes the network environment. The agent observes the environment to form a state space set ω from the service request model, as shown below:

ω={[v1,loc1,s],[v2,loc2,s],...,[vn,locn,s]}t。 (6)ω={[v 1 ,loc 1 ,s],[v 2 ,loc 2 ,s],...,[v n ,loc n ,s]} t . (6)

其中,s∈S,v1,v2,...,vn为一组车辆集合,loc1,loc2,...,locn为在t时,请求服务s的车辆位置集合。Where s∈S, v 1 ,v 2 ,...,v n is a set of vehicles, and loc 1 ,loc 2 ,...,loc n is the set of vehicle locations requesting service s at time t.

4)设计的动作空间。动作空间描述了策略模块在边缘服务器上放置服务时所采取的动作,在给定的时刻t所采取的动作如下所示:4) Designed action space. The action space describes the actions taken by the policy module when placing services on edge servers. The actions taken at a given time t are as follows:

其中π是在时间单位t对ω的观察集生成动作所需的策略函数,二进制变量给出了指示服务s在边缘服务器e上的位置的矩阵,表示服务s部署于边缘服务器e。反之,表示服务s没有部署于边缘服务器e。where π is the policy function required to generate actions for a set of observations ω at time unit t, and the binary variable Given a matrix indicating the location of service s on edge server e, Indicates that service s is deployed on edge server e. Otherwise, Indicates that service s is not deployed on edge server e.

5)设计策略函数。策略函数π是演员网络执行的一个函数,用于将状态空间映射到动作空间π:ω→a。策略函数π的目标是最小化最大边缘资源使用和服务时延,并通过使用参数β来控制资源使用与服务时延的相对重要性。策略函数π表示为5) Design the policy function. The policy function π is a function executed by the actor network to map the state space to the action space π:ω→a. The goal of the policy function π is to minimize the maximum edge resource usage and service delay, and to control the relative importance of resource usage and service delay by using the parameter β. The policy function π is expressed as

策略函数同时受到映射约束,时延约束,以及资源约束, The policy function is also subject to mapping constraints, Delay constraints, and resource constraints,

6)设计奖励函数。在每个时间单位t,作为对智能体的演员网络所采取的行动的响应,系统从环境中收到即时奖励如下所示:6) Design a reward function. At each time unit t, in response to the actions taken by the agent's actor network, the system receives an immediate reward from the environment As shown below:

7)构建批评家网络,负责评估演员网络所做出的的决策质量Q(ω,a)。输入上述状态、动作和奖励进行训练批评家网络,批评家网络更新其参数θ以最小化损失函数如下所示:7) Build a critic network, which is responsible for evaluating the decision quality Q(ω,a) made by the actor network. Input the above states, actions and rewards to train the critic network, which updates its parameters θ to minimize the loss function As shown below:

其中,yt为目标值。进一步使用重播内存M,用于存储训练批评家网络的体验。批评家网络使用重播内存在一段随机时间后获取体验,并优化网络参数以获得更好的性能。Where y t is the target value. The replay memory M is further used to store the experience of training the critic network. The critic network uses the replay memory to obtain experience after a random period of time and optimize the network parameters to obtain better performance.

8)经过上述步骤对演员网络和批评家网络的训练收敛后,演员网络能够考虑车辆在不同类型服务请求中的移动性和动态性的同时,找到服务的最佳放置策略。批评家网络可以通过值函数对演员网络的策略质量进行评估。8) After the training of the actor network and the critic network converges through the above steps, the actor network is able to find the best placement strategy for services while considering the mobility and dynamics of vehicles in different types of service requests. The critic network can evaluate the quality of the actor network's strategy through the value function.

实施例3:Embodiment 3:

一种车联网中基于边缘计算和深度强化学习的动态服务放置方法,包括以下步骤:A dynamic service placement method based on edge computing and deep reinforcement learning in an Internet of Vehicles includes the following steps:

1)建立所述网络与服务请求模型,获取网络与服务请求相关信息。1) Establish the network and service request model and obtain network and service request related information.

2)建立网络与服务请求计算模型;2) Establish a network and service request calculation model;

3)构建状态空间、动作空间、策略函数和奖励函数;3) Construct state space, action space, strategy function and reward function;

4)构建演员网络和批评家网络,并对演员网络和批评家网络进行训练;4) Constructing the actor network and the critic network, and training the actor network and the critic network;

5)演员网络生成服务放置策略,并输入到批评家网络中;5) The actor network generates service placement strategies and inputs them into the critic network;

6)所述批评家网络评估服务放置策略的策略质量,若评估不通过,则更新演员网络参数,并返回步骤5),若评估通过,则输出服务放置策略。6) The critic network evaluates the policy quality of the service placement strategy. If the evaluation fails, the actor network parameters are updated and the process returns to step 5. If the evaluation passes, the service placement strategy is output.

实施例4:Embodiment 4:

一种车联网中基于边缘计算和深度强化学习的动态服务放置方法,主要内容见实施例3,其中,所述网络与服务请求相关信息包括边缘服务器信息、车辆信息、服务信息;A dynamic service placement method based on edge computing and deep reinforcement learning in an Internet of Vehicles, the main content of which is shown in Example 3, wherein the network and service request related information include edge server information, vehicle information, and service information;

所述边缘服务器信息包括边缘服务器集合E,边缘服务器e,边缘服务器e的剩余资源容量CeThe edge server information includes an edge server set E, an edge server e, and a remaining resource capacity C e of the edge server e;

所述车辆信息包括车辆集合V。The vehicle information includes a vehicle set V.

所述服务信息包括服务集合S、请求服务s的车辆数量λs、一次可以处理一个服务实例或可以提供并行连接的车辆数量ε、服务请求消息中指定时间t和车辆位置loc、边缘服务器部署服务s所消耗的资源量Rs、时延需求阈值Ds;所述服务实例包括车联网环境中的媒体文件下载、合作意识消息和环境通知服务。The service information includes the service set S, the number of vehicles λ s requesting service s, the number of vehicles ε that can process one service instance at a time or provide parallel connections, the time t and vehicle location loc specified in the service request message, the amount of resources R s consumed by the edge server deploying service s, and the delay requirement threshold D s ; the service instances include media file downloading, cooperative awareness messaging, and environmental notification services in the Internet of Vehicles environment.

实施例5:Embodiment 5:

一种车联网中基于边缘计算和深度强化学习的动态服务放置方法,主要内容见实施例3,其中,所述网络与服务请求计算模型包括总服务时延计算模型、边缘资源使用率计算模型;A dynamic service placement method based on edge computing and deep reinforcement learning in an Internet of Vehicles, the main content of which is shown in Example 3, wherein the network and service request calculation model includes a total service delay calculation model and an edge resource utilization rate calculation model;

总服务时延计算模型如下所示:The total service delay calculation model is as follows:

式中,为总服务时延;为传播时延和排队时延;dist(v,s)为车辆v与服务s部署的边缘服务器之间的欧氏距离;c为信号通过通信介质的传播速度;In the formula, is the total service delay; are the propagation delay and queuing delay; dist(v,s) is the Euclidean distance between vehicle v and the edge server deployed by service s; c is the propagation speed of the signal through the communication medium;

当请求服务s的车辆数量λs≤ε时,排队时延当请求服务s的车辆数量λs>ε时,排队时延满足下式:When the number of vehicles requesting service s is λs ≤ε, the queuing delay is When the number of vehicles requesting service s λ s > ε, the queuing delay Satisfy the following formula:

式中,数量差λ′s=λs-ε;In the formula, the quantity difference λ′ ss -ε;

传播时延如下所示:Propagation Delay As shown below:

式中,dist(v,s)为车辆v与服务s部署的边缘服务器之间的欧氏距离;c为信号通过通信介质的传播速度。Where dist(v,s) is the Euclidean distance between vehicle v and the edge server deployed by service s; c is the propagation speed of the signal through the communication medium.

边缘资源使用率计算模型如下所示:The edge resource utilization calculation model is as follows:

边缘资源使用率是服务实例消耗的资源与边缘服务器的可用资源之间的比率,如下所示:Edge resource utilization is the ratio between the resources consumed by the service instance and the available resources of the edge server, as follows:

式中,参数Ce为边缘服务器e的剩余资源容量;为边缘资源使用率;Rs为边缘服务器部署服务s所消耗的资源量。In the formula, the parameters C e is the remaining resource capacity of edge server e; is the edge resource utilization rate; Rs is the amount of resources consumed by the edge server to deploy service s.

实施例6:Embodiment 6:

一种车联网中基于边缘计算和深度强化学习的动态服务放置方法,主要内容见实施例3,其中,所述状态空间通过状态空间集ω表征,即:A dynamic service placement method based on edge computing and deep reinforcement learning in an Internet of Vehicles, the main content of which is shown in Example 3, wherein the state space is represented by a state space set ω, that is:

ω={[v1,loc1,s],[v2,loc2,s],...,[vn,locn,s]}t (6)ω={[v 1 ,loc 1 ,s],[v 2 ,loc 2 ,s],...,[v n ,loc n ,s]} t (6)

式中,s∈S;v1,v2,...,vn为一组车辆集合;loc1,loc2,...,locn为在t时,请求服务s的车辆位置集合。Where s∈S; v 1 ,v 2 ,...,v n is a set of vehicles; loc 1 ,loc 2 ,...,loc n is the set of vehicle locations requesting service s at time t.

实施例7:Embodiment 7:

一种车联网中基于边缘计算和深度强化学习的动态服务放置方法,主要内容见实施例3,其中,所述动作空间用于描述在边缘服务器上放置服务时所采取的动作;A dynamic service placement method based on edge computing and deep reinforcement learning in an Internet of Vehicles, the main content of which is shown in Example 3, wherein the action space is used to describe the actions taken when placing a service on an edge server;

其中,在给定的时刻t所采取的动作a如下所示:The action a taken at a given time t is as follows:

式中,π是在时间单位t对ω的观察集生成动作所需的策略函数;表示服务s部署于边缘服务器e;表示服务s没有部署于边缘服务器e。Where π is the policy function required to generate actions for the observation set ω in time unit t; Indicates that service s is deployed on edge server e; Indicates that service s is not deployed on edge server e.

实施例8:Embodiment 8:

一种车联网中基于边缘计算和深度强化学习的动态服务放置方法,主要内容见实施例3,其中,所述策略函数π是演员网络执行的函数,用于将状态空间映射到动作空间,即π:ω→a;A dynamic service placement method based on edge computing and deep reinforcement learning in an Internet of Vehicles, the main content of which is shown in Example 3, wherein the policy function π is a function executed by an actor network, which is used to map the state space to the action space, that is, π:ω→a;

策略函数π的目标是最小化最大边缘资源使用和服务时延,并通过使用参数β来控制资源使用与服务时延的相对重要性;The goal of the policy function π is to minimize the maximum edge resource usage and service delay, and to control the relative importance of resource usage and service delay by using the parameter β;

策略函数π表示如下:The policy function π is expressed as follows:

式中,β为权重系数。In the formula, β is the weight coefficient.

所述策略函数π的约束包括映射约束时延约束资源约束 The constraints of the policy function π include the mapping constraints Delay Constraint Resource Constraints

实施例9:Embodiment 9:

一种车联网中基于边缘计算和深度强化学习的动态服务放置方法,主要内容见实施例3,其中,所述奖励函数如下所示:A dynamic service placement method based on edge computing and deep reinforcement learning in a vehicle network, the main content of which is shown in Example 3, wherein the reward function is as follows:

式中,为即时奖励。γ为奖励系数。为t时刻的服务时延;In the formula, is the immediate reward. γ is the reward coefficient. is the service delay at time t;

实施例10:Embodiment 10:

一种车联网中基于边缘计算和深度强化学习的动态服务放置方法,主要内容见实施例3,其中,所述批评家网络训练过程中的损失函数如下所示:A dynamic service placement method based on edge computing and deep reinforcement learning in a vehicle network, the main content of which is shown in Example 3, wherein the loss function in the critic network training process is As shown below:

式中,θ为批评家网络参数;为用来评估策略质量的目标值;Qi(ω,a;θ)为服务放置策略的策略质量;为边缘服务器中可用资源单元的数量;Where θ is the critic network parameter; is the target value used to evaluate the quality of the strategy; Qi (ω, a; θ) is the strategy quality of the service placement strategy; is the number of available resource units in the edge server;

实施例11:Embodiment 11:

一种车联网中基于边缘计算和深度强化学习的动态服务放置方法,主要内容见实施例3,其中,所述批评家网络评估服务放置策略的策略质量的方法包括:判断批评家网络损失函数是否收敛,若收敛,则评估通过,否则,评估不通过。A dynamic service placement method based on edge computing and deep reinforcement learning in a vehicle network, the main content of which is shown in Example 3, wherein the method for evaluating the policy quality of the service placement strategy by the critic network includes: determining the critic network loss function Whether it converges. If so, the evaluation passes. Otherwise, the evaluation fails.

Claims (5)

1.一种车联网中基于边缘计算和深度强化学习的动态服务放置方法,其特征在于,包括以下步骤:1. A dynamic service placement method based on edge computing and deep reinforcement learning in an Internet of Vehicles, characterized in that it includes the following steps: 1)建立网络与服务请求模型,获取网络与服务请求相关信息;1) Establish a network and service request model to obtain network and service request related information; 2)建立网络与服务请求计算模型;2) Establish a network and service request calculation model; 3)构建状态空间、动作空间、策略函数和奖励函数;3) Construct state space, action space, strategy function and reward function; 4)构建演员网络和批评家网络,并对演员网络和批评家网络进行训练;4) Constructing the actor network and the critic network, and training the actor network and the critic network; 5)演员网络生成服务放置策略,并输入到批评家网络中;5) The actor network generates service placement strategies and inputs them into the critic network; 6)所述批评家网络评估服务放置策略的策略质量,若评估不通过,则更新演员网络参数,并返回步骤5),若评估通过,则输出服务放置策略;6) The critic network evaluates the policy quality of the service placement strategy. If the evaluation fails, the actor network parameters are updated and the process returns to step 5. If the evaluation passes, the service placement strategy is output. 所述网络与服务请求计算模型包括总服务时延计算模型、边缘资源使用率计算模型;The network and service request calculation model includes a total service delay calculation model and an edge resource utilization calculation model; 总服务时延计算模型如下所示:The total service delay calculation model is as follows: 式中,为总服务时延;为传播时延和排队时延;dist(v,s)为车辆v与服务s部署的边缘服务器之间的欧氏距离;c为信号通过通信介质的传播速度;In the formula, is the total service delay; are the propagation delay and queuing delay; dist(v,s) is the Euclidean distance between vehicle v and the edge server deployed by service s; c is the propagation speed of the signal through the communication medium; 当请求服务s的车辆数量λs≤ε时,排队时延当请求服务s的车辆数量λs>ε时,排队时延满足下式:When the number of vehicles requesting service s is λs ≤ε, the queuing delay is When the number of vehicles requesting service s λ s > ε, the queuing delay Satisfy the following formula: 式中,数量差λ′s=λs-ε;In the formula, the quantity difference λ′ ss -ε; 传播时延如下所示:Propagation Delay As shown below: 式中,dist(v,s)为车辆v与服务s部署的边缘服务器之间的欧氏距离;c为信号通过通信介质的传播速度;Where dist(v,s) is the Euclidean distance between vehicle v and the edge server deployed by service s; c is the propagation speed of the signal through the communication medium; 边缘资源使用率计算模型如下所示:The edge resource utilization calculation model is as follows: 边缘资源使用率是服务实例消耗的资源与边缘服务器的可用资源之间的比率,如下所示:Edge resource utilization is the ratio between the resources consumed by the service instance and the available resources of the edge server, as follows: 式中,参数Ce为边缘服务器e的剩余资源容量;为边缘资源使用率;Rs为边缘服务器部署服务s所消耗的资源量;In the formula, the parameters C e is the remaining resource capacity of edge server e; is the edge resource utilization rate; Rs is the amount of resources consumed by deploying service s on the edge server; 所述状态空间通过状态空间集ω表征,即:The state space is characterized by the state space set ω, namely: ω={[v1,loc1,s],[v2,loc2,s],…,[vn,locn,s]}t (6)ω={[v 1 ,loc 1 ,s],[v 2 ,loc 2 ,s],…,[v n ,loc n ,s]} t (6) 式中,s∈S;v1,v2,...,vn为一组车辆集合;loc1,loc2,...,locn为在t时,请求服务s的车辆位置集合;Where s∈S; v 1 ,v 2 ,...,v n is a set of vehicles; loc 1 ,loc 2 ,...,loc n is the set of vehicle locations requesting service s at time t; 所述动作空间用于描述在边缘服务器上放置服务时所采取的动作;The action space is used to describe the actions taken when placing a service on an edge server; 其中,在给定的时刻t所采取的动作a如下所示:The action a taken at a given time t is as follows: 式中,π是在时间单位t对ω的观察集生成动作所需的策略函数;表示服务s部署于边缘服务器e;表示服务s没有部署于边缘服务器e;Where π is the policy function required to generate actions for the observation set ω in time unit t; Indicates that service s is deployed on edge server e; Indicates that service s is not deployed on edge server e; 所述策略函数π是演员网络执行的函数,用于将状态空间映射到动作空间,即π:ω→a;The policy function π is a function executed by the actor network to map the state space to the action space, i.e., π:ω→a; 策略函数π的目标是最小化最大边缘资源使用和服务时延,并通过使用参数β来控制资源使用与服务时延的相对重要性;The goal of the policy function π is to minimize the maximum edge resource usage and service delay, and to control the relative importance of resource usage and service delay by using the parameter β; 策略函数π表示如下:The policy function π is expressed as follows: 式中,β为权重系数;In the formula, β is the weight coefficient; 所述策略函数π的约束包括映射约束时延约束资源约束 The constraints of the policy function π include the mapping constraints Delay Constraint Resource Constraints 2.根据权利要求1所述的一种车联网中基于边缘计算和深度强化学习的动态服务放置方法,其特征在于,所述网络与服务请求相关信息包括边缘服务器信息、车辆信息、服务信息;2. A dynamic service placement method based on edge computing and deep reinforcement learning in a connected vehicle network according to claim 1, characterized in that the network and service request related information includes edge server information, vehicle information, and service information; 所述边缘服务器信息包括边缘服务器集合E,边缘服务器e,边缘服务器e的剩余资源容量CeThe edge server information includes an edge server set E, an edge server e, and a remaining resource capacity C e of the edge server e; 所述车辆信息包括车辆集合V;The vehicle information includes a vehicle set V; 所述服务信息包括服务集合S、请求服务s的车辆数量λs、一次可以处理一个服务实例或可以提供并行连接的车辆数量ε、服务请求消息中指定时间t和车辆位置loc、边缘服务器部署服务s所消耗的资源量Rs、时延需求阈值Ds;所述服务实例包括车联网环境中的媒体文件下载、合作意识消息和环境通知服务。The service information includes the service set S, the number of vehicles λ s requesting service s, the number of vehicles ε that can process one service instance at a time or provide parallel connections, the time t and vehicle location loc specified in the service request message, the amount of resources R s consumed by the edge server deploying service s, and the delay requirement threshold D s ; the service instances include media file downloading, cooperative awareness messaging, and environmental notification services in the Internet of Vehicles environment. 3.根据权利要求1所述的一种车联网中基于边缘计算和深度强化学习的动态服务放置方法,其特征在于,所述奖励函数如下所示:3. According to a method for dynamic service placement based on edge computing and deep reinforcement learning in a connected vehicle network according to claim 1, it is characterized in that the reward function is as follows: 式中,为即时奖励;γ为奖励系数;为t时刻的服务时延。In the formula, is the immediate reward; γ is the reward coefficient; is the service delay at time t. 4.根据权利要求1所述的一种车联网中基于边缘计算和深度强化学习的动态服务放置方法,其特征在于,所述批评家网络训练过程中的损失函数如下所示:4. According to the method of dynamic service placement based on edge computing and deep reinforcement learning in the Internet of Vehicles in claim 1, it is characterized in that the loss function in the critic network training process is As shown below: 式中,θ为批评家网络参数;为用来评估策略质量的目标值;Qi(ω,a;θ)为服务放置策略的策略质量;为边缘服务器中可用资源单元的数量。Where θ is the critic network parameter; is the target value used to evaluate the quality of the strategy; Qi (ω, a; θ) is the strategy quality of the service placement strategy; is the number of available resource units in the edge server. 5.根据权利要求4所述的一种车联网中基于边缘计算和深度强化学习的动态服务放置方法,其特征在于,所述批评家网络评估服务放置策略的策略质量的方法包括:判断批评家网络损失函数是否收敛,若收敛,则评估通过,否则,评估不通过。5. According to the method of dynamic service placement based on edge computing and deep reinforcement learning in the Internet of Vehicles in claim 4, it is characterized in that the method of evaluating the policy quality of the service placement strategy by the critic network includes: judging the critic network loss function Whether it converges. If so, the evaluation passes. Otherwise, the evaluation fails.
CN202210992657.5A 2022-08-18 2022-08-18 Dynamic service placement method based on edge calculation and deep reinforcement learning in Internet of vehicles Active CN115550944B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210992657.5A CN115550944B (en) 2022-08-18 2022-08-18 Dynamic service placement method based on edge calculation and deep reinforcement learning in Internet of vehicles

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210992657.5A CN115550944B (en) 2022-08-18 2022-08-18 Dynamic service placement method based on edge calculation and deep reinforcement learning in Internet of vehicles

Publications (2)

Publication Number Publication Date
CN115550944A CN115550944A (en) 2022-12-30
CN115550944B true CN115550944B (en) 2024-02-27

Family

ID=84725291

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210992657.5A Active CN115550944B (en) 2022-08-18 2022-08-18 Dynamic service placement method based on edge calculation and deep reinforcement learning in Internet of vehicles

Country Status (1)

Country Link
CN (1) CN115550944B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118502967B (en) * 2024-07-17 2024-11-12 北京师范大学珠海校区 A delay-aware container scheduling method, system and terminal for cluster upgrade

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110213796A (en) * 2019-05-28 2019-09-06 大连理工大学 A kind of intelligent resource allocation methods in car networking
CN113382383A (en) * 2021-06-11 2021-09-10 浙江工业大学 Method for unloading calculation tasks of public transport vehicle based on strategy gradient
WO2021237996A1 (en) * 2020-05-26 2021-12-02 多伦科技股份有限公司 Fuzzy c-means-based adaptive energy consumption optimization vehicle clustering method
CN114528042A (en) * 2022-01-30 2022-05-24 南京信息工程大学 Energy-saving automatic interconnected vehicle service unloading method based on deep reinforcement learning
CN114625504A (en) * 2022-03-09 2022-06-14 天津理工大学 Internet of vehicles edge computing service migration method based on deep reinforcement learning

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US12045738B2 (en) * 2020-12-23 2024-07-23 Intel Corporation Transportation operator collaboration system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110213796A (en) * 2019-05-28 2019-09-06 大连理工大学 A kind of intelligent resource allocation methods in car networking
WO2021237996A1 (en) * 2020-05-26 2021-12-02 多伦科技股份有限公司 Fuzzy c-means-based adaptive energy consumption optimization vehicle clustering method
CN113382383A (en) * 2021-06-11 2021-09-10 浙江工业大学 Method for unloading calculation tasks of public transport vehicle based on strategy gradient
CN114528042A (en) * 2022-01-30 2022-05-24 南京信息工程大学 Energy-saving automatic interconnected vehicle service unloading method based on deep reinforcement learning
CN114625504A (en) * 2022-03-09 2022-06-14 天津理工大学 Internet of vehicles edge computing service migration method based on deep reinforcement learning

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Offloading Strategy for Vehicles in the Architecture of Vehicle-MEC-Cloud;Dasong Zhuang;《2022 IEEE/CIC International Conference on Communications in China (ICCC Workshops)》;20220811;全文 *
Task Offloading for End-Edge-Cloud Orchestrated Computing in Mobile Networks;Xiuhua Li;《2020 IEEE Wireless Communications and Networking Conference (WCNC)》;20200525;全文 *
一种车载服务的快速深度Q学习网络边云迁移策略;彭军;王成龙;蒋富;顾欣;牟玥玥,;刘伟荣;;电子与信息学报;20200115(01);全文 *
车联网中一种基于软件定义网络与移动边缘计算的卸载策略;张海波;荆昆仑;刘开健;贺晓帆;;电子与信息学报;20200315(03);全文 *

Also Published As

Publication number Publication date
CN115550944A (en) 2022-12-30

Similar Documents

Publication Publication Date Title
CN112995950B (en) A joint resource allocation method based on deep reinforcement learning in the Internet of Vehicles
Wang et al. Heterogeneous blockchain and AI-driven hierarchical trust evaluation for 5G-enabled intelligent transportation systems
CN113055489B (en) Implementation method of satellite-ground converged network resource allocation strategy based on Q learning
CN111132175A (en) A collaborative computing offloading and resource allocation method and application
CN113542376A (en) Task unloading method based on energy consumption and time delay weighting
CN115209426B (en) Dynamic deployment method for digital twin servers in edge car networking
CN114050961B (en) A large-scale network simulation system and resource dynamic scheduling and allocation method
CN115550944B (en) Dynamic service placement method based on edge calculation and deep reinforcement learning in Internet of vehicles
Guan et al. An intelligent wireless channel allocation in HAPS 5G communication system based on reinforcement learning
CN115243217A (en) Device-edge-cloud collaborative scheduling method and system based on DDQN in the edge environment of the Internet of Vehicles
Quan et al. Software‐Defined Collaborative Offloading for Heterogeneous Vehicular Networks
CN115905687A (en) Cold start-oriented recommendation system and method based on meta-learning graph neural network
CN113032149B (en) Edge computing service placement and request distribution method and system based on evolution game
Xiao et al. A novel task allocation for maximizing reliability considering fault-tolerant in VANET real time systems
CN113507503B (en) A Resource Allocation Method for Internet of Vehicles with Load Balancing
CN114928826A (en) Two-stage optimization method, controller and decision method for software-defined vehicle-mounted task unloading and resource allocation
Ma et al. Reinforcement learning based task offloading and take-back in vehicle platoon networks
CN116321307A (en) A two-way cache placement method based on deep reinforcement learning in cellular-free networks
CN116709378A (en) Task scheduling and resource allocation method based on federal reinforcement learning in Internet of vehicles
CN116489668A (en) Edge computing task unloading method based on high-altitude communication platform assistance
CN107197039A (en) A kind of PAAS platform service bag distribution methods and system based on CDN
Zhang et al. A Resource Allocation Scheme for Real‐Time Energy‐Aware Offloading in Vehicular Networks with MEC
Rehman et al. FoggyEdge: An information-centric computation offloading and management framework for edge-based vehicular fog computing
Li et al. Deep reinforcement learning for intelligent computing and content edge service in ICN-based IoV
CN114189522B (en) Priority-based blockchain consensus method and system in Internet of vehicles

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant