CN112565377B

CN112565377B - Content grading optimization caching method for user service experience in Internet of vehicles

Info

Publication number: CN112565377B
Application number: CN202011370700.1A
Authority: CN
Inventors: 李曦; 王杭; 纪红; 张鹤立
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2020-11-30
Filing date: 2020-11-30
Publication date: 2021-09-21
Anticipated expiration: 2040-11-30
Also published as: CN112565377A

Abstract

The invention discloses a content grading optimization caching method for user service experience in the Internet of vehicles. The method adopts a cache strategy cooperatively transmitted by a vehicle mobile public server MPS and a station road side unit RSU, sets the probability of successfully obtaining complete contents from the MPS and the RSU as a QoE hit rate, and optimizes each content slice cache strategy of the MPS by utilizing a deep reinforcement learning network DQN to maximize the QoE hit rate. The invention can adjust the content type and content segment proportion of the MPS cache to adapt to the content preference change caused by passenger flow change, so that more passengers can obtain the required content initial slices, and simultaneously, the cache replacement cost is reduced.

Description

Content grading optimization caching method for user service experience in Internet of vehicles

Technical Field

The invention belongs to the technical field of vehicle networks, and particularly relates to a user service experience oriented content grading optimization cache in a vehicle networking system.

Background

In recent years, with the rapid development of in-vehicle applications, entertainment content is becoming more and more popular with passengers traveling in public vehicles. Thus, the large number of content requests generated by passengers has led to a rapid increase in the demand for communications in urban mass transit systems. In addition, the distance between the traditional cloud server and the public transport vehicle is long, and the time delay of the user for obtaining the content is increased due to the limited capacity of the backhaul link, so that the user experience is seriously influenced. Fortunately, caching hot content at the vehicle edge node is considered an effective way to alleviate this problem.

Some researchers have proposed to equip Public vehicles (buses, subways, etc.) with a Server in an urban Public transportation system to make them MPS (Mobile Public Server) with certain storage and computing resources, so as to be able to provide content services to passengers in the vehicle. However, MPS has limited resources and cannot handle large numbers of content requests during peak hours. Congestion of the communication link results in long communication delays, greatly affecting the passenger experience. Furthermore, since public vehicles travel across multiple regions and the in-vehicle passengers are constantly moving from boarding to disembarking, this results in varying content preferences for in-vehicle passengers, which means that the MPS needs to constantly adjust its cached content to meet more passenger demand.

At present, some content replacement strategies based on DRL (Deep Learning) are proposed to address the problem of limited storage space of edge devices. Reference [1] designs a multi-layer cache mechanism by using a learning-based method, predicts content request distribution based on vehicle mobility, and combines a deep learning method to generate a decision of caching content in RSUs (Road Side Units), so that a mobile user can immediately obtain target content when reaching the RSUs coverage, thereby reducing content acquisition delay and reducing bandwidth consumption. Reference [2] proposes to dynamically update the content cache using a DRL method based only on time-varying requests and the content of the base station cache, and to improve the decision-making capability of the system, the correlation method is improved such that it has a higher cache hit rate than the least used, first-in-first-out and deep Q network based methods. Reference [3] designs an optimal content caching scheme based on content sharing between vehicles using a DRL method. Reference [4] proposes a cooperative edge caching strategy that jointly optimizes content replacement and content distribution among a macro-cellular base station, RSUs, and smart vehicles using a depth-deterministic policy gradient approach. The existing method of providing content service by combining a mobile vehicle and an RSU/base station mostly considers how to transmit complete content to a user through what kind of server, less considers that the storage capacity of the MPS is limited, and less considers that different content slices of the same content are provided by the MPS and the RSU, respectively. Meanwhile, in the existing research on vehicle content caching, most of the research only considers the content updating strategy of a single RSU/base station/vehicle, and the content preference change caused by the mobility of the vehicle and the mobility of users on the vehicle is rarely considered at the same time.

Reference documents:

[1]Z.Zhao,L.Guardalben,M.Karimzadeh,J.Silva,T.Braun and S.Sargento,"Mobility Prediction-Assisted Over-the-Top Edge Prefetching for Hierarchical VANETs,"in IEEE Journal on Selected Areas in Communications,vol.36,no.8,pp.1786-1801,Aug.2018.

[2]P.Wu,J.Li,L.Shi,M.Ding,K.Cai and F.Yang,"Dynamic Content Update for Wireless Edge Caching via Deep Reinforcement Learning,"in IEEE Communications Letters,vol.23,no.10,pp.1773-1777,Oct.2019.

[3]Y.Dai,D.Xu,K.Zhang,S.Maharjan and Y.Zhang,"Deep Reinforcement Learning and Permissioned Blockchain for Content Caching in Vehicular Edge Computing and Networks,"in IEEE Transactions on Vehicular Technology,vol.69,no.4,pp.4312-4324,April 2020.

[4]G.Qiao,S.Leng,S.Maharjan,Y.Zhang and N.Ansari,"Deep Reinforcement Learning for Cooperative Content Caching in Vehicular Edge Computing and Networks,"in IEEE Internet of Things Journal,vol.7,no.1,pp.247-257,Jan.2020.

disclosure of Invention

The invention provides a content grading optimization cache method facing user service Experience, which aims to solve the problem of small MPS storage capacity, adopts a cache strategy of MPS and RSU cooperative transmission, and utilizes a hit rate QoE-CSC (QoE-based content slice cache) technology to maximize the hit rate of QoE (Quality of Experience) of a system, thereby optimizing the content cache strategy.

The invention provides a content grading optimization caching method for user service experience in the Internet of vehicles, which is applied to the Internet of vehicles, wherein in a scene of the Internet of vehicles, MPS is arranged in a public vehicle, and a road side unit RSU is arranged at a station; wherein, the content files needing to be cached in the car networking system are sliced, MPS provides the beginning part of the content for passengers, and RSU provides the rest content slices.

To more accurately assess the user experience, the present invention defines a QoE hit rate, which represents the probability of a passenger successfully obtaining the complete content from the MPS and RSU. Since the change of the passenger flow can cause different content preferences, the content segments of the MPS cache need to be replaced periodically, so the present invention obtains the cache policy of MPS by maximizing the QoE hit rate. The method of the invention is to cache F content files, each content file is D in size, the content viewable time is tau, and each content file is cut into N pieces with equal size; let z of MPS cache f_fThe number of the slices is one,

the invention comprises the following steps:

step 1, modeling the movement of public vehicles, content requests of passengers and wireless channels in the Internet of vehicles, and then establishing a target function for maximizing the QoE (quality of experience) hit rate to obtain a content slice caching strategy;

the QoE hit rate refers to the probability of successful acquisition of the complete content from MPS and RSU by the passenger, and is represented as H_i ^QThe following are:

wherein K represents the number of stations; u shape_iTotal number of user requests, u, received for MPS between the ith stop and the (i + 1) th stop_iIs the u-th of_iA secondary content request;

front for marking content f requested by passenger

Whether the slice has been cached in the MPS, and if so,

if not, the user can not select the specific application,

for marking whether a passenger can stop at the time of stopping at the ith station

The remaining slice of content f is obtained from the RSU and, if possible,

if not, the user can not select the specific application,

indicating the number of content slices that the MPS should provide to the passenger.

Step 2, optimizing a content slice caching strategy by using a deep reinforcement learning network DQN, wherein the method comprises the following steps:

(1) in the DQN training phase, comprising: the initial input state is that the vehicle MPS adopts a uniform cache strategy to store each content slice; the vehicle MPS receives the content request of passengers during the driving process between the stations, when the vehicle arrives at the next station, the MPS selects a feasible action, and the action refers to updating the number of slice caches of each content file in the MPS; MPS calculates the instant benefit of the current action according to the cache hit condition (namely QoE hit rate), and stores the current state, action and instant benefit record into a playback pool; and extracting a certain number of records from the playback pool, calculating the next state and future benefits to obtain an ideal Q value in the current state, and training the Q network according to a gradient descent strategy by taking the ideal Q value as a reference so as to converge the network.

(2) The vehicle continuously repeats the above process, after a certain number of times, the network area is stable, and the MPS can input the current state to the Q network at this time, so as to obtain the best cache replacement action, thereby improving the cache hit rate between the stations 1 to K.

Compared with the prior art, the invention has the advantages and positive effects that: (1) based on QoE hit rate of a content slice caching strategy maximization system, the method can adjust the content type and content segment proportion of MPS cache to adapt to content preference change caused by passenger flow change, so that more passengers can obtain required content initial slices, and meanwhile, cache replacement cost is reduced. (2) The method improves the utilization efficiency of the MPS cache capacity, and simulation experiment results show that under the condition that MPS has different cache capacities, the cache hit rate of the method is higher than that of the existing comparison method, namely the method can better utilize the cache capacity of the MPS. (3) The method improves QoE hit rate of MPS, and simulation experiment results show that compared with LRU and LFU reference methods, the method can better learn and predict content popularity of each region, so that MPS caches corresponding content in advance, QoE hit rate is improved, and user service experience effect is better.

Drawings

FIG. 1 is a schematic diagram of an application scenario of an embodiment of the present invention;

FIG. 2 is a flow chart of one implementation of the method of the present invention;

fig. 3 is a diagram illustrating QoE hit rates for MPS storing the same number of content slices, in an embodiment of the present invention;

fig. 4 is a graph illustrating the average QoE hit rate for MPS storage of varying numbers of content slices, in an embodiment of the present invention;

FIG. 5 is a diagram illustrating the average cache cost for MPS storing different numbers of content slices according to an embodiment of the present invention;

fig. 6 is a diagram illustrating the convergence performance of the method of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples.

Aiming at the problems of limited MPS storage capacity, strong passenger mobility and the like, the invention provides a user service experience-oriented content grading optimization caching method in a vehicle network, a caching strategy of MPS and RSU cooperative transmission is adopted, MPS provides the initial part of content for passengers, and RSU provides the rest content slices; then, considering that the passenger respectively obtains Content slices from the MPS and the RSU, a QoE hit rate is defined, which represents a probability that the MPS and the RSUs together successfully provide the passenger with corresponding Content Segments, thereby implementing a QoE-based Content slice Caching (Content Segments Caching) policy, which can accurately evaluate user experience. The method optimizes the content slice caching strategy of the vehicle MPS by using the deep reinforcement learning network DQN, collects the content requests of passengers in the vehicle each time the vehicle starts from the current station to the next station after the training DQN network converges, inputs the content slice caching state of the current MPS into the DQN network after the vehicle arrives at the next station, calculates the action of outputting the maximum QoE hit rate, and updates the content slice caching strategy of the MPS.

The devices included in the application scenarios of the present invention are as follows:

a public vehicle: including public transport means such as buses, trams and subways; the in-vehicle deployment has a server with storage, computing, and communication capabilities, referred to as the MPS. The MPS caches the beginning content slices of the various content to provide content services for the in-car passengers.

Station: the bus is a station where passengers get on and off the bus at a stop after a certain period of time. The stations are generally located in various areas of the city according to the settings of the traffic departments.

RSU: deployed at the station, caching all sorts of content and all slices thereof in the system, when the bus stops at the station, the system can provide content service for passengers in the bus, and can communicate with the MPS.

As shown in fig. 1, which is a typical public transportation network scenario developed by the present invention and composed of public vehicles and RSUs, K stations are uniformly distributed on an urban highway trunk road at an interval L, and are numbered K ═ 1,2. Bus station i is the ith bus station. The Internet of vehicles system has F content files, each content file is D in size, and the content watching time is tau. Any content file F belongs to F and can be cut into N slices with equal length, and the file F is cut into F₁,f₂,…,f_N. All kinds of contents in the RSU cache system, MPS cache the beginning of partial contents, and MPS can cache N at most_BAnd (4) slicing the content. If MPS caches the top z of content f_f，0≤z_f≦ N slices, then the contents of the MPS cache may be represented as

And is

As shown in fig. 1, the MPS of a vehicle caches the front partial slice of F files F-1, F-2, …, F-F and adjusts the cache contents at the station using the method of the present invention.

Firstly, a bus moving model is established, taking a bus as an example, the bus is set to do uniform linear motion between two adjacent bus stations at a speed v, and then the moving time T between the two stations is set_mComprises the following steps:

T_m＝L/v (1)

let the time when the bus stops at the bus station be T_s，T_s> 0 obey a lognormal distribution. Therefore, the residence time of the bus at K bus stations is respectively used

It is shown that, among others,

representing the residence time of the bus at the ith bus stop, subject to a lognormal distribution, i.e.

μ,σ²Respectively the mathematical expectation and the variance, respectively,

the probability density of (a) is:

and then establishing a content request model according to the zigh law. Generally, the probability p of requests for different contents by users in the same region_fObeying zif's law (ZipF distribution) as follows:

wherein p is_fIs the probability of a user's request for content f, α is the ziff distribution parameter, e_fIs the ranking of the request content f.

In the existing literature, the probability of a user's request for content is often set to follow a ZipF distribution of a single parameter. However, in the present invention, since the public vehicle is in a process of continuously driving-driving by standing-driving, different regions are spanned, wherein passengers are also in a continuous flow due to getting on and off behaviors. Therefore, the content request distribution between two adjacent bus stops is assumed to follow the ziff distribution with different parameters α, as shown in fig. 1, that is, the content requests of the passengers received by the buses among different bus stops follow the ziff distribution with different parameters α.

The total number of requests for passenger content received by the MPS may vary per unit time, taking into account the behavior of passengers getting on and off at the bus stop where the bus stops. The total number of times of user requests received by MPS between the ith bus station and the (i + 1) th bus station is set as U_iWherein u is_i∈[1,U_i]Denotes the u-th_iA request of a secondary passenger. Let u be_iProbability of secondary content request reaching MPS

Obeying a poisson distribution:

wherein, λ is a Poisson distribution parameter,

denotes the u-th_iThe time of arrival of the secondary content request.

Wireless channels in the internet of vehicles are subject to rayleigh distribution, so the present invention models the channel between the passenger and the RSU as a rayleigh fading channel. When the bus stops at the bus station, the distance between the bus and the bus station is set as d, and d is used for path loss^-βWhere β is the path loss exponent and h is the channel attenuationFalling index, then the transmission rate r between the passenger and the RSU at the bus stop_uComprises the following steps:

wherein B represents a transmission bandwidth, P represents a transmission power of an RSU at a bus stop, and N₀Is gaussian white noise.

Likewise, the radio transmission channel between MPS and RSU follows a rayleigh distribution, the distance between them being the same as the distance between user and RSU, d, then the transmission rate r between MPS and RSU at the bus station_bComprises the following steps:

r_b＝r_u (6)

when the method is adopted, the passengers firstly obtain the slice at the beginning of the content from the MPS, and the request of the passengers for the content f is carried out in the process that the bus runs from the ith bus station to the (i + 1) th bus station

The time reaches the MPS, taking video content as an example, in order to ensure that the playing content of the passenger is not blocked before the bus arrives at the (i + 1) th bus stop, the length of the content required by the MPS for the passenger is the time spent on the bus for driving the remaining distance

Namely:

therefore, the MPS should provide the number of content slices for the passenger

The minimum is:

wherein the content of the first and second substances,

represents a floor rounding function; τ represents the duration of time that the user utilizes the file f, e.g., f is a video, then τ is the duration of time that the video f can be played.

The number of slices considering the content f of the MPS cache at this time is z_fThen, there are two cases: 1) if it is

Indicating that the MPS is capable of providing the passenger with content for a sufficient length of time for the passenger to view to the bus stop. 2) If it is

Indicating that the bus can only provide z at most for passengers_fAnd (4) slicing. In either case, when the bus arrives at the (i + 1) th bus stop, the passenger's content request is still forwarded to the RSU at the bus stop. Because the RSU stores all slices of the content f, while the bus only buffers partial slices of the content f, passengers can obtain more slices of the content from the RSU during a limited parking time.

Thus, the number of content slices requested by the passenger from the RSU

Comprises the following steps:

by combining the transmission rate and the parking time, the maximum transmission capacity of the passenger from the RSU at the bus stop can be obtained

Comprises the following steps:

in the above formula, P_iIndicating the transmit power of the RSU at the ith bus stop. Then, the passenger should stop in the bus

The remaining content slices are obtained internally, and the following conditions need to be met:

when the vehicle stops at the ith bus stop, the MPS on the vehicle checks the stored content and updates the content, and the transmission rate between the MPS and the RSU is known

Maximum capacity of RSU to MPS transmission in time

Comprises the following steps:

then if the MPS is replaced at the ith bus station

The individual content slices need to satisfy the following conditions:

at the same time, the number of slices replaced must not exceed the maximum storage space of the MPS:

in the application scenario of the present invention, since the passenger requests different slices of the same content from the MPS and RSU, respectively. Therefore, there are two cases of cache hit failure: 1) when the number of content slices of the MPS cache is insufficient to support passenger playback, the cache hit fails; 2) a cache hit fails when the passenger cannot get the remaining content slices from the RSU during the parking time. Based on the above, the invention provides a cache hit rate of user experience QoE according with the current content slice scene. The QoE hit rate is defined as follows: probability of successful passenger acquisition of the complete content from the MPS and RSU within a certain time.

First, the probability of MPS hitting a passenger request is defined as the initial hit rate

Wherein the content of the first and second substances,

the specific meanings of (A) are as follows:

i.e. when the passenger requests the content f ahead

When the slice is cached in MPS, the success of the first cache hit is recorded,

otherwise, the cache is marked as the failure of the first cache hit,

the first hit indicates that the passenger can retrieve enough content from the MPS.

Secondly, when the bus stops at the ith bus stop, the passengers need to stop for a limited time

Get the rest of the content from the RSU internally, define the tag

The following were used:

i.e. if the passenger can be parked for a limited period of time

The remaining slices of content f are obtained from the RSU,

if not, then,

then, only when the above two conditions are satisfied, i.e.

The passenger can obtain the complete content, and the QoE hit is recorded as one time, otherwise, the QoE hit is recorded as failure,

that is, the probability that a passenger will get the complete content is called the QoE hit rate

Expressed as:

further, the present invention closely correlates QoE hit rates with the number of content slices of the MPS cache. Since MPS has limited storage space, it is necessary to periodically update its cache contents to better serve passengers. In the process of updating the content when the bus arrives at the bus station, neglecting the time of establishing communication connection between the passenger and the RSU and between the bus and the RSU, defining the objective function as the QoE hit rate after the MPS updates the slice content each time, as follows:

the above objective function indicates that the cache policy of MPS is obtained when the QoE hit rate is maximized for a public vehicle traveling between two stations.

In order to maximize the QoE hit rate of the system, the invention utilizes a Deep Q-learning Network (DQN) method to search for the best caching strategy of the bus content slice. DQN is a method fusing a neural network and Q-learning, which predicts Q value by using the neural network and learns the optimal strategy by continuously updating the neural network. The basis of DQN is a Markov process, which is generally described by a quadruple < S, A, P, R > where agent is MPS, S represents state space, A represents action space, P represents transition probability, and R represents reward after action is performed in the current state. Since it is difficult to describe the change between states using a deterministic transition probability P due to the high mobility of passengers, the present invention indirectly describes the magnitude of the transition probability using a state-action cost function in value-based DQN.

Description of the state space S. The state is an objective description of the real-world environment in which the agent is currently located. In the present invention, the state represents the cache proportion of each type of content slice in the MPS. MPS cache state use s before bus arrives at (i + 1) th bus stop_i∈S，i∈[1,K-1]Is represented as follows:

s_i＝(z₁,z₂,...,z_f,...,z_F)

wherein 0 is less than or equal to z_fN ≦ N, top z representing the current cached content f of MPS_fThe number of content slices.

In the initial stage of the system, as the bus is positioned at the initial bus station and the server does not receive any content request information of passengers, the server adopts a uniform cache strategy, namely, N before each content f is cached_Ba/F number of content slices, then

Description of the motion space a. In the process of bus driving, the distribution and the number of passengers are different among bus stops, so the requirements for the contents are different. To maximize the QoE hit rate of the system, MPS needs to adjust the currently cached content slice, add or delete a portion of the content slice at each time of the site. Thus, the motion space in a proxy is defined as a_i∈A，a_iThe number of slices representing the replacement of different contents when the MPS cache at the (i + 1) th bus station is updated:

a_i＝(h₁,h₂,...,h_f,...,h_F)

wherein h is more than or equal to 0_f≤N-z_fH representing increasing content f_fA slice, -z_f≤h_f< 0 denotes h for deleting content f_fIs cut into pieces, and

i.e. to ensure that the number of added content slices does not exceed the number of deleted content slices and that the replaced content does not exceed the maximum storage space of MPS. Further, from equation (13), it can be known that:

description of the reward function R. r is_ie.R represents the agent according to the current state s_iPerforming a specific action a at the i +1 th bus station_iThe immediate benefit later. The instant benefits in the present invention are expressed as having performed action a_iPost QoE hit rate, as follows:

based on the state space, the action space and the reward, the DQN consists of a current Q network, a target Q network and an experience playback module. The network structures of the current Q network and the target Q network are the same but the parameters are different, and the current Q network is responsible for the situation according to the current state s_iSelecting a Current action a_iAnd updating the model parameter omega; the target Q network is used for calculating a target Q value, and the network parameter omega' of the target Q network is periodically copied from omega of the current Q network. Empirical playback is used to store historical data, with caching agents constructing tuples < s_i,a_i,r_i,s_i+1Storing in the experience playback, and extracting a part of data from the experience playback pool for updating each time the parameter is updated, the correlation between the data can be broken. < s_i,a_i,r_i,s_i+1Indicates a pair state s_iPerforming action a_iThen becomes state s_i+1Performing action a_iThe latter prize being r_i。

The content hierarchical optimization caching method for user service experience, which is realized by the invention, updates the cached content of MPS through QoE-CSC technology, and the whole process comprises the following steps, as shown in FIG. 3.

Step 1, initializing the cache content of a public vehicle MPS in a DQN network training phase; setting that F content files need to be cached, caching the initial slices of the F content files in an MPS by adopting a uniform caching strategy, and initializing all parameters of the deep reinforcement learning network.

And 2, initializing the serial numbers and positions of the bus stations, wherein K bus stations are provided in the embodiment of the invention.

And 3, the bus runs from the current station to the next station, collects the content request information of passengers in the bus, and calculates the corresponding QoE hit rate. In the training phase, the content request of the passenger is described by using Zipf distribution and Poisson distribution. In the implementation phase of actually using the DQN network with stable training, the invention can count the content request information according to the actual situation.

And 4, after the next station is reached, updating the deep reinforcement learning network according to the content request and the QoE hit rate, and replacing the cache content of the MPS according to the deep reinforcement learning network. In the training phase, the Q network stores the current state, action and instant benefit into a playback pool.

And 5, in the training stage, the Q network randomly extracts a certain number of records from the Q network, calculates the next state according to the state s and the action a, and calculates future benefits so as to obtain an ideal Q value. And after the calculation is finished, the extracted state records are sent into a Q network, and the network is trained in a gradient descent mode by taking the ideal Q value as a reference.

And 6, in the training stage, driving away from the current station, and continuously repeating the

steps

3 and 4 until reaching the terminal station.

And 7, in the training stage, continuously repeating the steps 2-6 until the set cycle number is reached, or stopping the training process when the network is judged to be stable, so as to obtain the stable DQN network.

The invention utilizes a pseudo code algorithm for optimizing the content slice caching strategy by using a deep reinforcement learning network DQN to realize the following steps:

initializing F, N_B,T_m,K,α,d,B,P_i,N₀；

Initializing an experience playback pool

G, the random weight of the action cost function Q is θ;

initializing a current state

Randomly initializing a parameter omega of the current Q network;

initializing a parameter omega' of the target Q network;

For 1,2,…,M：

pretreatment sequence phi₁＝φ(s₁)；

for i＝1,2,…,K-1：

Will phi(s)_i) Inputting a current Q network;

according to the formula

Formula (II)

And formula

Conditional enumerating all possible action sets A_i；

From A_iIn which the action a is randomly selected with a probability epsilon_iOtherwise, select a_i＝argmax_aQ(s_i,A_i；θ)；

According to action a_iReplacing the cache content of MPS to obtain the next state s_i+1；

Generating content requests using Zipf distribution and Poisson distribution to obtain immediate rewards r_i；

Update state s_i＝s_i+1Pretreatment of phi_i+1＝φ(s_i+1) Will(s)_i,a_i,r_i,s_i+1) By logging inPool

Playback of pools from experience

In which a small number of samples(s) are randomly drawn_i,a_i,r_i,s_i+1)；

Is provided with

For (y)_i-Q(s_i,a_i；θ))²Updating by using a gradient descent method;

End for

in the above pseudo code algorithm: m represents the set maximum number of cycles; phi(s)₁) Refers to a set of slice replacement actions that can be performed in the current state; q(s)_i,A_i(ii) a θ) is the action cost function Q depending on the policy θ; a is_i＝argmax_aQ(s_i,A_i(ii) a θ) refers to the action of selecting the function that maximizes the Q value; for the

Wherein, y_iRepresenting the Q value of the DQN network after action is performed, and considering instant income and future income; r is_iIs an instant benefit; end represents the terminating site, which in the embodiment of the present invention is K, γ is a conversion coefficient, which represents the importance of future profit to the current decision,

representing future benefits, is state s after performing action a_i+1The Q value of (1). With (y)_i-Q(s_i,a_i；θ))²The network parameters are updated as a loss function using a gradient descent method.

The method of the invention is subjected to simulation experiments. The system simulation parameter settings are shown in table 1. The content popularity between two adjacent public transportation stations follows a Zipf distribution with a different parameter α ═ 1.1,2.5,1.6,1.7,1.3,2.7,1.6,2.1,1.9,1.1 ]. The bus is returned to the first bus station immediately after traveling from the first bus station to the tenth bus station. Furthermore, to reduce computational complexity, the number of content types that the MPS replaces at a time is controlled to be not more than three. In addition, according to the related data set in table 1, it is calculated that the number of content slices per replacement does not exceed 5.

TABLE 1 System simulation parameter settings

Parameter(s)	Value of
		Content category F	20
Number of slices N of one content file	10
		Playback duration of one content slice	3 minutes
Data size of one content file	12Mb
		Maximum number of cache slices N for MPS _B	100
Travel time T between two neighboring bus stations_m	600 seconds
		Bus station-approaching time distribution range	4-60 seconds
Distance d between passenger and RSU	3 m
		Communication bandwidth B	1MHz
White gaussian noise N ₀	3*10^-13
		Transmission power P of RSU	1.3W

For comparison, the Least Frequently Used (LFU) and Least Recently Used (LRU) caching strategies are Used as the comparison method of the present invention. The LFU method caches the most requested content and replaces the least frequently used content each time. The LRU method caches the content that has been requested the most recently, replacing the least recently used content each time. In order to compare the hit performance of the various caching methods, the LRU and LFU are also set to not exceed 5 slices per replacement. The simulation results are shown in fig. 3 to 6.

As shown in fig. 3, it is the case that MPS stores 100 slices of content, which is the QoE hit rate across ten bus stops. The abscissa of fig. 3 represents each bus stop, and the ordinate represents the QoE hit rate. It can be seen that the QoE-CSC strategy of the present invention has a QoE hit rate at 1-5 bus stops that is slightly lower than the LFU and LRU caching strategies, but a QoE hit rate at 6-10 bus stops that is significantly higher than the LFU and LRU methods. In addition, the ten-station total QoE hit rate of the QoE-CSC caching method of the present invention is 6.955, while the ten-station total QoE hit rates of the LFU and LRU caching methods are 5.49104 and 4.64264, respectively, which are far lower than the present invention method, calculated according to the simulation result data. The specific reason is that the method of the present invention aims to maximize the total QoE hit rate between ten bus stops, so in the beginning stage, the method of the present invention chooses to cache as much content slice content as possible that will be requested throughout the entire driving process. In contrast, the objective of the LRU and LFU methods is to achieve the maximum QoE hit rate at the current site, so that the hit performance is better at the beginning, but at a later stage, because the number of content slices that can be replaced at a time is limited, the QoE hit rate of the LFU and LRU methods drops rapidly.

Fig. 4 shows the average QoE hit rate between ten bus stations for the case where

MPS caches

40, 60, 80, 100, and 120 slices of content, respectively. The abscissa of fig. 4 represents the cache data amount of the vehicle MPS, and the ordinate represents the average QoE hit rate at the bus stop. It is obvious that the average QoE hit rate of the QoE-CSC caching method of the present invention is between 0.6 and 0.7, and tends to be stable after increasing with the increase of the caching capacity, because the MPS can cache more content slices as the caching capacity increases, and thus can meet the content demand of more passengers. The latter trend is stable because MPS can replace only 5 content slices at a time, and its performance gradually stabilizes. However, the average QoE hit rate of the LFU and LRU caching method does not increase with the increase of the MPS caching capacity, because the LFU and LRU caching method mainly depends on the content request rule of the passenger to execute the corresponding slice replacement policy, and has a certain randomness, which also indicates that the LFU and LRU caching method has poor performance and unstable caching effect. In addition, the average QoE hit rate of the QoE-CSC caching method of the present invention is always higher than that of the two comparison methods, wherein the average QoE hit rate of the LFU method is lower than 0.6 at most and even lower than 0.3 at least, and the average QoE hit rate of the LRU method reaches 0.648 at most, but is lower than that of the caching method of the present invention in most cases. The main reason for this is that the caching method of the present invention can continuously learn the content popularity of each region, and thus select the optimal content slice replacement policy, and therefore, the caching method of the present invention can better utilize the caching capacity than the LFU and LRU caching methods.

In the case that the

MPS caches

40, 60, 80, 100, and 120 content slices, respectively, fig. 5 shows the average caching cost of the MPS for content replacement at different bus stops using three different methods. The abscissa of fig. 5 represents bus stops and the ordinate represents average cache cost, i.e. the number of slices replaced per stop. It is found that the LFU and LRU methods choose to replace 5 slices per station, since the optimization goal of both methods is to improve the hit rate in the current state, so that as many slices of content as possible need to be replaced. However, the average caching cost of the QoE-CSC caching method of the present invention is low, and is on average 3.6 slices, because through training of historical request data, the method of the present invention selects some content slices that are most popular in the whole process, so it does not need to adjust the cached content frequently. Therefore, the method can obtain better hit performance by using lower communication resources.

Figure 6 shows the performance of the QoE-CSC caching method of the present invention in terms of training times. In fig. 6, the abscissa represents the number of training, the left ordinate represents the loss, and the right ordinate represents the reward. The training loss refers to the loss of the neural network in the DQN, and the loss is reduced along with the increase of the training times, which indicates that the accuracy of the neural network is higher and higher. The reward reflects the accumulation of state-action rewards, which increase as training time increases, indicating better performance of the method. As can be seen from fig. 6, the loss of the caching method of the present invention decreases with the increase of the training times, the reward increases, and the training test and the reward tend to be stable after reaching a certain degree.

Claims

1. A content grading optimization cache method facing user service experience in an Internet of vehicles is characterized in that content files needing to be cached in an Internet of vehicles system are sliced, the starting part slices of each content file are cached in a mobile public server MPS arranged in a public vehicle, and the rest content slices of each content file are cached in a road side unit RSU arranged at a station; if F content files need to be cached, the size of each content file is D, and the content watching duration is set asτ, cutting each content file into N pieces of equal size; let z of MPS cache f_fThe number of the slices is one,

the method comprises the following steps:

the QoE hit rate refers to the probability of successful acquisition of the complete content from MPS and RSU by the passenger, and is expressed as

The following were used:

front for marking content f requested by passenger

Whether the slice has been cached in the MPS, and if so,

if not, the user can not select the specific application,

The remaining slice of content f is obtained from the RSU and, if possible,

if not, the user can not select the specific application,

represents the number of content slices that the MPS should provide to the passenger;

the objective function is established as follows:

wherein the content of the first and second substances,

for the maximum transmission capacity the passenger obtains from the RSU at the ith stop,

number of content slices requested to RSU;

step 2, optimizing a content slice caching strategy of the vehicle MPS by using the deep reinforcement learning network DQN, and collecting content requests of passengers in the vehicle when the vehicle runs from the current station to the next station after the DQN is trained to converge; and after the vehicle arrives at the station, inputting the content slice caching state of the current MPS into the DQN network, calculating the action with the maximum QoE hit rate, and updating the content slice caching strategy of the MPS.

2. The method according to claim 1, wherein in step 1, a public vehicle movement model is established, in particular: setting a vehicle to do uniform linear motion between two adjacent vehicle stations to obtain the moving time T between the two stations_m(ii) a Let the dwell time of a vehicle at K vehicle stations be expressed as

Which represents the residence time of the vehicle at the i-th vehicle station, follows a lognormal distribution.

3. The method according to claim 1, wherein in step 1, a content request model is established, specifically: describing the request probability of users in the same area for different contents according to the zigh law, and setting the content request probability between two adjacent vehicle stations to follow the zigh law distribution of different parameters alpha; let U be the total number of times of user requests received by MPS of a vehicle between the ith stop and the (i + 1) th stop_iU th_iThe probability that a secondary passenger's content request reaches the MPS obeys a Poisson distribution, u_i∈[1,U_i]。

4. A method according to claim 1,2 or 3, wherein in step 1, the moving time of the vehicle between two adjacent stations is T_mDuring which the u-th received MPS_iThe time of arrival of the secondary content request is

Z for MPS cache file f_fSlicing; number of content slices that MPS should provide to passengers

At least:

then variable

The following calculations were made:

5. a method according to claim 1,2 or 3, wherein in step 1, variables are varied

Calculated as follows:

wherein, when the vehicle stops at the ith station, the passenger requests the number of the left content slices from the RSU

The following calculations were made:

where N is the total number of slices of the file f, T_mFor the moving time of the vehicle between two adjacent stations, MPS receives the u-th station in the moving process of the two adjacent stations_iThe time of arrival of the secondary content request is

z_fIs the MPS bufferThe number of slices of the file f;

the maximum transmission capacity obtained for passengers from the RSU at the station is calculated as follows:

wherein B denotes a transmission bandwidth, P_iRepresents the transmission power, N, of the RSU at the ith station₀Is gaussian white noise, h denotes a channel fading index, d is a distance between a passenger and a station RSU, β is a path loss index,

indicating the time of the vehicle's stay at the ith stop.

6. The method as claimed in claim 1, wherein in the step 2, when the DQN is used for the optimization, the state in the state space represents the cache proportion of each content slice in the MPS, and the MPS cache state before reaching the i +1 th bus stop is set as s_i∈S，i∈[1,K-1]Denotes s_i＝(z₁,z₂,...,z_f,...,z_F) (ii) a Motion a in motion space_iContent slice cache update quantity, a, representing MPS at station i +1_i＝(h₁,h₂,...,h_f,...,h_F)，h_fH indicating addition or deletion of file f_fIs cut into pieces, and

reward function r_iAccording to the current state s_iPerforming action a at the i +1 st stop_iPost QoE hit rate.

7. The method according to claim 1 or 6, wherein said step 2, in the stage of DQN training, comprises: the initial input state is that the vehicle MPS adopts a uniform cache strategy to store each content slice; the vehicle MPS receiving a content request of a passenger on the way to travel between stations; when the vehicle arrives at the next station, the MPS selects a feasible action, wherein the action is to update the slice cache number of each content file in the MPS; MPS calculates the instant profit of the current action according to the QoE hit rate, and stores the current state, action and instant profit record into a playback pool; extracting a set number of records from the playback pool, and calculating the next state and future benefits to obtain an ideal Q value in the current state; training the network according to a gradient descent method by taking the ideal Q value as a reference so as to enable the network to be converged; the benefit refers to QoE hit rate.

8. The method of claim 7, wherein in step 2, when selecting a feasible action, all feasible actions are listed, and the actions are required to satisfy the following conditions:

wherein the content of the first and second substances,

representing the number of content slices, N, that the MPS replaces at the ith bus station_BRepresenting the maximum storable number of content slices of the MPS.

9. The method of claim 8, wherein in step 2, when selecting a feasible action, the action with the largest QoE hit rate after the action is executed is preferentially selected.