CN113012013A

CN113012013A - Cooperative edge caching method based on deep reinforcement learning in Internet of vehicles

Info

Publication number: CN113012013A
Application number: CN202110182149.6A
Authority: CN
Inventors: 孙艳华; 邢玉萍; 张延华; 孙恩昌; 杨睿哲; 李萌
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2021-02-09
Filing date: 2021-02-09
Publication date: 2021-06-22
Anticipated expiration: 2041-02-09
Also published as: CN113012013B

Abstract

The invention discloses a collaborative edge caching method based on deep reinforcement learning in the Internet of vehicles, which comprises the following steps: step 1, establishing a system cache model according to the structures of the mobile vehicle, the RSU and the base station. Step 2: system throughput calculation model: and step 3: the problem solving algorithm based on deep reinforcement learning comprises the following steps: the invention utilizes the real simulation environment, thereby ensuring that the performance of the simulation result can estimate and approximate the performance in the real scene. Predicting content popularity based on historical content request records to maximize data throughput of the vehicle from the edge device RSU; the invention makes full use of the buffer resource of the edge device, thereby reducing the burden of the base station.

Description

Cooperative edge caching method based on deep reinforcement learning in Internet of vehicles

Technical Field

The invention belongs to the field of edge computing and Internet of vehicles, and particularly relates to a collaborative edge caching technology based on deep reinforcement learning optimization.

Background

According to Cisco's prediction, by 2022, internet video will account for 82% of all commercial traffic on the internet, VR/AR traffic will increase by 12 times, and internet video surveillance traffic will increase by 7 times. Intelligent cars that have access to popular content and shared traffic information are becoming an important component of next generation network multimedia services. In the vehicular network, with the rapid increase of user demand, the network traffic is exponentially increased.

Edge caching technology brings multimedia resources and data storage closer to users and devices than ever before. Caching data in advance at the network edge can ensure timely and reliable transmission of content data. Edge caching can effectively reduce communication link burden as compared to obtaining data from a remote base station. A set of roadside units (RSUs) deployed on a road provide additional bandwidth, computing and storage resources that can provide multimedia content for vehicles on the road. The vehicle is provided with a vehicle-mounted unit which is an airborne communication device with limited capacity and can receive contents meeting multimedia information entertainment requirements of vehicle users. These are the basis for edge cache implementations. Each RSU is provided with a buffer memory for selectively storing content and providing it with content data upon request by the vehicle. The challenge of the above design is how to optimize the caching policy of each edge device according to user requirements and caching resource constraints, while at the same time, the high mobility of the car and the variation of content popularity need to be considered.

Disclosure of Invention

The invention aims to provide a cooperative edge caching method based on deep reinforcement learning optimization. Under the limitation of the cache capacity, the popularity of the content is predicted according to the historical content request records, and a cooperative cache strategy among RSUs is provided, so that the data throughput of the vehicles obtained from the RSUs of the edge devices is improved to the maximum extent. On the basis, an optimization scheme based on deep reinforcement learning is provided.

The scheme of the invention is as follows:

the technical problem to be solved by the invention is to provide a cooperative edge caching method for providing request content for mobile vehicles in the internet of vehicles, so that the caching resources of edge equipment can be fully utilized, and the burden of a base station is reduced.

In order to solve the problems, the invention adopts the following technical scheme:

a collaborative edge caching method based on deep reinforcement learning in the Internet of vehicles comprises the following steps: step 1, establishing a system cache model according to the structures of the mobile vehicle, the RSU and the base station.

Assume a set of vehicle networks consisting of a base station, a plurality of RSUs and a number of moving vehicles, as shown in fig. 1. The communication range of the base station covers all RSUs and vehicles on the map. The set of RSUs may be represented as S e {1, 2.. S }. Each RSU is placed evenly on the road and is equipped with a buffer of limited size for buffering some content files, each of which can be represented as F e {1, 2.. F }. A vehicle V e {1, 2.. V } moving along a road is interested in a set of content files and issues a content request to a connected RSU.

Step 1.1: RSU cache model:

assuming that all requested files occupy the same storage space LMB, the size of the storage space used by each RSU as a cache is the same, C_sMB, and the storage space of the base station is large enough to cache all content files. The system time is divided into successive time frames t of equal length 1, 2. At the beginning of each timeframe, the RSU requests a record based on historical content, requiring a caching decision to be made. Matrix X for content caching decisions within all RSUs_s,fIs shown, i.e.

The buffering decision for a single content is then expressed as per RSU

Step 1.2: content popularity prediction model:

the edge device RSU chooses to cache partial content, assuming that the initial popularity of the requested content is known, i.e. the request rate obeys the Zipf popularity distribution, assuming that γ, the inclination of all file popularity, is known. Using the content popularity in all RSUs as a matrix P_s,fIs shown, i.e.

Then the initial probability of each content being requested is

As the number of requests increases, the popularity of the content also changes, and in order to capture the change in popularity of the content and better serve the passing vehicles, the placed content needs to be replaced. Therefore, the probability that the content is requested at the next time slot needs to be updated, and the probability is different within different RSUs. The content popularity prediction based on the Hawkes process is used, and compared with the traditional prediction according to the request frequency, the method simultaneously considers the frequency and the freshness of the content request.

Step 2: system throughput calculation model:

the vehicles are considered to keep driving at a constant speed in the RSU, and the speed of the vehicles entering the coverage area of the RSU follows the Poisson distribution

Dividing the velocity into Y discrete levels v_y∈{v₁,v₂,...v_Y}. Then, the probability of each speed level can be obtained

Assuming that the content request rate in each RSU obeys a poisson distribution λ_s. The time from the request of the vehicle to the driving out of the coverage of the RSU, which is the time the vehicle can receive content data from the RSU, is called the remaining time T_v. Based on practical considerations, the vehicle does not necessarily send a request just upon entering RSU range, so T_vFrom the range

And (4) randomly taking values.

When a vehicle sends a request for content to its connected RSU, the RSU directly transfers the content to the mobile vehicle if the content is already cached in the RSU. If the content is not cached in the connected RSU but is cached in the neighboring RSU, the connected RSU first retrieves the content from the neighboring RSU and transmits it to the requesting vehicle. And if the RSU does not cache the requested content and the vehicle cannot acquire the content from the RSU, sending the content request to the base station. And step 3: the problem solving algorithm based on deep reinforcement learning comprises the following steps:

step 3.1: and analyzing the state and action of deep reinforcement learning by taking the updated content popularity in all RSUs as network input at each time slot. The deep neural network outputs a set of continuous actions, and quantifies the continuous actions into K sets of discrete actions for exploring the cache decision.

Step 3.2: the quantized actions may not meet the constraint of the buffer capacity, and at the same time, the buffer space needs to be fully utilized, and each group of actions needs to be corrected. Then, the throughput of each set of actions is calculated corresponding to the content request of the current time slot.

Step 3.3: the DNN is trained using an empirical playback technique.

The present invention models the data throughput obtained by a vehicle from an edge device RSU. And predicting the change of the content popularity at the end of each time slot, taking the content popularity as the input of the neural network, and converting the output of the neural network into a caching decision. In order to improve the system throughput, a cache strategy is optimized by learning from past experience, and a deep reinforcement learning method is used for solving.

Key point of the invention

Step 1, establishing a system cache model according to the structures of the mobile vehicle, the RSU and the base station.

Step 1.1: RSU cache model:

for each content request, using action a_caDescribe the content cache status as a {0,1,2}_caWhen 0, the requested content is cached in the connected RSU; when a is_caWhen 1, it means that the requested content is not cached in the connected RSU but cached in the adjacent RSU; when a is_caWhen the RSU is 2, the RSU does not cache the content requested by the vehicle.

Step 1.2: content popularity prediction model:

a content popularity prediction method based on a Hawkes process is provided. The Hawkes process considers that past events can affect the probability of occurrence of future events. The occurrence of past events generates stimuli that are positive, cumulative, and that diminish over time. The popularity prediction formula for the content within each RSU is as follows:

each content request is considered an event occurrence.

Representing the degree of arousal of the historical event to the future event; delta is the time attenuation coefficient of the historical event excitation; t is a content update period, i.e., the interval of each time frame; t is t_nIs the time during which each event occurred. This approach takes into account both the frequency and freshness of content requests. The more times the content f is requested, the closer the request time, and the higher the popularity is considered. Step 2: system throughput calculation model：

Using M to indicate the remaining time T of the requesting vehicle_vA throughput analysis is performed for each content request, including the throughput available from the edge device.

1) When a is_caThe connected RSU may send the content directly to the requesting vehicle at 0. The achieved throughput is determined by the remaining time, not exceeding the content file size at most:

in the formula

Is the data transfer rate between the RSU and the vehicle.

2) When a is_caThe requesting vehicle acquires content files from neighboring RSUs, with content data being first transferred between RSUs and then transmitted by the connected RSUs to the requesting vehicle.

In the formula

Is the data transmission rate between the RSU and RSU.

3) When a is_caThe requesting vehicle cannot get the content from the edge device and will send a request to the base station 2.

M＝0

The total throughput M per time frame is defined as the sum of all requested data throughputs within all RSUs:

and step 3: problem solving algorithm based on reinforcement learning:

step 3.1: state space, motion space

A caching strategy function pi is introduced, and the optimal caching action can be quickly generated to adapt to different content popularity. The system state is the current content popularity P_s,fThe current parameter of the neural network is theta_tThe output action is a set of continuous action values

Then will act continuously

Quantized into K sets of binary discrete actions

Step 3.2: discrete motion correction

After the continuous actions are quantized into K groups of binary discrete actions, the actions generated by the quantization method may not meet the constraint of the cache capacity, and the cache space needs to be fully utilized. Therefore, algorithm 1 is designed to correct each set of actions. Let the parameter capa be the number of cached content.

Calculating the throughput of each group of discrete actions according to the content request, and selecting the group with the maximum throughput as the optimal action

One iteration is performed per time slot and then the content popularity of the next time slot is predicted.

Step 3.3: empirical playback

The DNN is trained using an empirical playback technique. Content popularity and optimal caching scheme as a set of experiences

Stored in empirical memory D (t) ═ ψ₁,ψ₂,...,ψ_t}. By setting the training interval η, a batch of training data samples Ψ is randomly selected from the empirical storage every η iterations. Using Adam algorithm to measure neural network parameter theta_tThe update is performed to reduce the average cross entropy loss. Average cross entropy loss L (θ)_t) The calculation is as follows:

the overall algorithm flow is as follows:

drawings

Fig. 1 is a diagram of an edge cache model of the internet of vehicles according to the present invention.

Fig. 2 is a schematic diagram of an algorithm based on deep reinforcement learning.

Detailed Description

Examples and effects of the invention

The method of the present invention is then analyzed and compared in terms of performance in conjunction with simulation results.

In the simulation, the hardware environment is a server based on a CPU, and the server is provided with a 128-GB 1600-MHz DDR3,2.2-GHz Intel Core i7,4-TB hard disk. The software environment was Python 3.6.0, TensorFlow 1.13.0. Both of these simulation tools have found wide commercial and academic use. The TensorFlow is able to maintain the same server architecture and application program interface when deploying different machine learning algorithms. Therefore, it has been widely used to deploy new machine learning algorithms and experiments. By utilizing the real simulation environments, the performance of the simulation result can be estimated and approximated in a real scene.

Meanwhile, in order to perform performance comparison, the patent proposes two comparison schemes.

1) LFU (least frequency) scheme, in which an LFU algorithm culls out outdated content according to the historical frequency of content requests.

2) The static caching scheme is that firstly, the content with higher popularity is cached in the RSU, and the cached content is not updated.

Claims

1. A collaborative edge caching method based on deep reinforcement learning in the Internet of vehicles is characterized in that: the method comprises the following steps:

step 1, establishing a system cache model according to structures of a mobile vehicle, an RSU and a base station;

assuming a group of vehicle networks consisting of a base station, a plurality of RSUs and a plurality of mobile vehicles; all RSUs and vehicles on a communication range coverage map of the base station; the set of RSUs is denoted as S e {1, 2.. S }; each RSU is uniformly arranged on a road and is provided with a buffer memory with a limited size for buffering a plurality of content files, and each content file can be represented as F e {1, 2.. F }; a vehicle V e {1, 2.. V } moving along a road is interested in a set of content files and issues a content request to a connected RSU;

step 1.1: RSU cache model:

assuming that all requested files occupy the same storage space LMB, the size of the storage space used by each RSU as a cache is the same, C_sMB, the storage space of the base station is large enough, and all content files are cached; dividing the system time into equal-length continuous time frames t which are 1, 2.. G; at the beginning of each timeframe, the RSU requests a record based on historical content, requiring a caching decision; matrix X for content caching decisions within all RSUs_s,fIs shown, i.e.

The buffering decision for a single content is then expressed as per RSU

Step 1.2: content popularity prediction model:

the edge device RSU selects a cache part of the content, and assumes that the initial popularity of the requested content is known, namely the request rate obeys Zipf popularity distribution, and assumes that gamma of the inclination of the popularity of all files is known; using the content popularity in all RSUs as a matrix P_s,fIs shown, i.e.

Then the initial probability of each content being requested is

Predicting content popularity based on the Hawkes process;

step 2: system throughput calculation model:

Dividing the velocity into Y discrete levels v_y∈{v₁,v₂,...v_Y}; then, the probability of each speed level can be obtained

Assuming that the content request rate in each RSU obeys a poisson distribution λ_s(ii) a The time from the request of the vehicle to the driving out of the coverage of the RSU is the time when the vehicle receives the content data from the RSU, and is called the remaining time T_v(ii) a Based on realistic considerations, T_vFrom the range

Carrying out medium random value taking;

when a vehicle sends a content request to its connected RSU, if the content is already cached in the RSU, the RSU directly transmits the content to the mobile vehicle; if the content is not cached in the connected RSU but is cached in the adjacent RSU, the connected RSU is firstly obtained from the adjacent RSU and then transmitted to the requesting vehicle; if the RSU does not cache the request content, the vehicle can not acquire the content from the RSU, and the content request is sent to the base station;

and step 3: the problem solving algorithm based on deep reinforcement learning comprises the following steps:

step 3.1: in each time slot, the updated content popularity in all RSUs is used as network input, and the state and action of deep reinforcement learning are analyzed; the deep neural network outputs a group of continuous actions, and quantifies the continuous actions into K groups of discrete actions for exploring a cache decision;

step 3.2: the quantized actions may not meet the constraint of the cache capacity, and meanwhile, the cache space needs to be fully utilized, and each group of actions needs to be corrected; then, corresponding to the content request of the current time slot, calculating the throughput of each group of actions;

step 3.3: training the DNN using an empirical playback technique;

modeling data throughput obtained by the vehicle from the edge device RSU; and (3) predicting the change of the content popularity at the end of each time slot, taking the content popularity as the input of a neural network, converting the output of the neural network into a cache decision, and solving by using a deep reinforcement learning method.