CN112565377B - Content grading optimization caching method for user service experience in Internet of vehicles - Google Patents

Content grading optimization caching method for user service experience in Internet of vehicles Download PDF

Info

Publication number
CN112565377B
CN112565377B CN202011370700.1A CN202011370700A CN112565377B CN 112565377 B CN112565377 B CN 112565377B CN 202011370700 A CN202011370700 A CN 202011370700A CN 112565377 B CN112565377 B CN 112565377B
Authority
CN
China
Prior art keywords
content
mps
vehicle
cache
station
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011370700.1A
Other languages
Chinese (zh)
Other versions
CN112565377A (en
Inventor
李曦
王杭
纪红
张鹤立
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN202011370700.1A priority Critical patent/CN112565377B/en
Publication of CN112565377A publication Critical patent/CN112565377A/en
Application granted granted Critical
Publication of CN112565377B publication Critical patent/CN112565377B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/12Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0893Assignment of logical groups to network elements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/50Network service management, e.g. ensuring proper service fulfilment according to agreements

Abstract

The invention discloses a content grading optimization caching method for user service experience in the Internet of vehicles. The method adopts a cache strategy cooperatively transmitted by a vehicle mobile public server MPS and a station road side unit RSU, sets the probability of successfully obtaining complete contents from the MPS and the RSU as a QoE hit rate, and optimizes each content slice cache strategy of the MPS by utilizing a deep reinforcement learning network DQN to maximize the QoE hit rate. The invention can adjust the content type and content segment proportion of the MPS cache to adapt to the content preference change caused by passenger flow change, so that more passengers can obtain the required content initial slices, and simultaneously, the cache replacement cost is reduced.

Description

Content grading optimization caching method for user service experience in Internet of vehicles
Technical Field
The invention belongs to the technical field of vehicle networks, and particularly relates to a user service experience oriented content grading optimization cache in a vehicle networking system.
Background
In recent years, with the rapid development of in-vehicle applications, entertainment content is becoming more and more popular with passengers traveling in public vehicles. Thus, the large number of content requests generated by passengers has led to a rapid increase in the demand for communications in urban mass transit systems. In addition, the distance between the traditional cloud server and the public transport vehicle is long, and the time delay of the user for obtaining the content is increased due to the limited capacity of the backhaul link, so that the user experience is seriously influenced. Fortunately, caching hot content at the vehicle edge node is considered an effective way to alleviate this problem.
Some researchers have proposed to equip Public vehicles (buses, subways, etc.) with a Server in an urban Public transportation system to make them MPS (Mobile Public Server) with certain storage and computing resources, so as to be able to provide content services to passengers in the vehicle. However, MPS has limited resources and cannot handle large numbers of content requests during peak hours. Congestion of the communication link results in long communication delays, greatly affecting the passenger experience. Furthermore, since public vehicles travel across multiple regions and the in-vehicle passengers are constantly moving from boarding to disembarking, this results in varying content preferences for in-vehicle passengers, which means that the MPS needs to constantly adjust its cached content to meet more passenger demand.
At present, some content replacement strategies based on DRL (Deep Learning) are proposed to address the problem of limited storage space of edge devices. Reference [1] designs a multi-layer cache mechanism by using a learning-based method, predicts content request distribution based on vehicle mobility, and combines a deep learning method to generate a decision of caching content in RSUs (Road Side Units), so that a mobile user can immediately obtain target content when reaching the RSUs coverage, thereby reducing content acquisition delay and reducing bandwidth consumption. Reference [2] proposes to dynamically update the content cache using a DRL method based only on time-varying requests and the content of the base station cache, and to improve the decision-making capability of the system, the correlation method is improved such that it has a higher cache hit rate than the least used, first-in-first-out and deep Q network based methods. Reference [3] designs an optimal content caching scheme based on content sharing between vehicles using a DRL method. Reference [4] proposes a cooperative edge caching strategy that jointly optimizes content replacement and content distribution among a macro-cellular base station, RSUs, and smart vehicles using a depth-deterministic policy gradient approach. The existing method of providing content service by combining a mobile vehicle and an RSU/base station mostly considers how to transmit complete content to a user through what kind of server, less considers that the storage capacity of the MPS is limited, and less considers that different content slices of the same content are provided by the MPS and the RSU, respectively. Meanwhile, in the existing research on vehicle content caching, most of the research only considers the content updating strategy of a single RSU/base station/vehicle, and the content preference change caused by the mobility of the vehicle and the mobility of users on the vehicle is rarely considered at the same time.
Reference documents:
[1]Z.Zhao,L.Guardalben,M.Karimzadeh,J.Silva,T.Braun and S.Sargento,"Mobility Prediction-Assisted Over-the-Top Edge Prefetching for Hierarchical VANETs,"in IEEE Journal on Selected Areas in Communications,vol.36,no.8,pp.1786-1801,Aug.2018.
[2]P.Wu,J.Li,L.Shi,M.Ding,K.Cai and F.Yang,"Dynamic Content Update for Wireless Edge Caching via Deep Reinforcement Learning,"in IEEE Communications Letters,vol.23,no.10,pp.1773-1777,Oct.2019.
[3]Y.Dai,D.Xu,K.Zhang,S.Maharjan and Y.Zhang,"Deep Reinforcement Learning and Permissioned Blockchain for Content Caching in Vehicular Edge Computing and Networks,"in IEEE Transactions on Vehicular Technology,vol.69,no.4,pp.4312-4324,April 2020.
[4]G.Qiao,S.Leng,S.Maharjan,Y.Zhang and N.Ansari,"Deep Reinforcement Learning for Cooperative Content Caching in Vehicular Edge Computing and Networks,"in IEEE Internet of Things Journal,vol.7,no.1,pp.247-257,Jan.2020.
disclosure of Invention
The invention provides a content grading optimization cache method facing user service Experience, which aims to solve the problem of small MPS storage capacity, adopts a cache strategy of MPS and RSU cooperative transmission, and utilizes a hit rate QoE-CSC (QoE-based content slice cache) technology to maximize the hit rate of QoE (Quality of Experience) of a system, thereby optimizing the content cache strategy.
The invention provides a content grading optimization caching method for user service experience in the Internet of vehicles, which is applied to the Internet of vehicles, wherein in a scene of the Internet of vehicles, MPS is arranged in a public vehicle, and a road side unit RSU is arranged at a station; wherein, the content files needing to be cached in the car networking system are sliced, MPS provides the beginning part of the content for passengers, and RSU provides the rest content slices.
To more accurately assess the user experience, the present invention defines a QoE hit rate, which represents the probability of a passenger successfully obtaining the complete content from the MPS and RSU. Since the change of the passenger flow can cause different content preferences, the content segments of the MPS cache need to be replaced periodically, so the present invention obtains the cache policy of MPS by maximizing the QoE hit rate. The method of the invention is to cache F content files, each content file is D in size, the content viewable time is tau, and each content file is cut into N pieces with equal size; let z of MPS cache ffThe number of the slices is one,
Figure GDA0003216247940000021
the invention comprises the following steps:
step 1, modeling the movement of public vehicles, content requests of passengers and wireless channels in the Internet of vehicles, and then establishing a target function for maximizing the QoE (quality of experience) hit rate to obtain a content slice caching strategy;
the QoE hit rate refers to the probability of successful acquisition of the complete content from MPS and RSU by the passenger, and is represented as Hi QThe following are:
Figure GDA0003216247940000022
wherein K represents the number of stations; u shapeiTotal number of user requests, u, received for MPS between the ith stop and the (i + 1) th stopiIs the u-th ofiA secondary content request;
Figure GDA0003216247940000031
front for marking content f requested by passenger
Figure GDA0003216247940000032
Whether the slice has been cached in the MPS, and if so,
Figure GDA0003216247940000033
if not, the user can not select the specific application,
Figure GDA0003216247940000034
Figure GDA0003216247940000035
for marking whether a passenger can stop at the time of stopping at the ith station
Figure GDA0003216247940000036
The remaining slice of content f is obtained from the RSU and, if possible,
Figure GDA0003216247940000037
if not, the user can not select the specific application,
Figure GDA0003216247940000038
Figure GDA0003216247940000039
indicating the number of content slices that the MPS should provide to the passenger.
Step 2, optimizing a content slice caching strategy by using a deep reinforcement learning network DQN, wherein the method comprises the following steps:
(1) in the DQN training phase, comprising: the initial input state is that the vehicle MPS adopts a uniform cache strategy to store each content slice; the vehicle MPS receives the content request of passengers during the driving process between the stations, when the vehicle arrives at the next station, the MPS selects a feasible action, and the action refers to updating the number of slice caches of each content file in the MPS; MPS calculates the instant benefit of the current action according to the cache hit condition (namely QoE hit rate), and stores the current state, action and instant benefit record into a playback pool; and extracting a certain number of records from the playback pool, calculating the next state and future benefits to obtain an ideal Q value in the current state, and training the Q network according to a gradient descent strategy by taking the ideal Q value as a reference so as to converge the network.
(2) The vehicle continuously repeats the above process, after a certain number of times, the network area is stable, and the MPS can input the current state to the Q network at this time, so as to obtain the best cache replacement action, thereby improving the cache hit rate between the stations 1 to K.
Compared with the prior art, the invention has the advantages and positive effects that: (1) based on QoE hit rate of a content slice caching strategy maximization system, the method can adjust the content type and content segment proportion of MPS cache to adapt to content preference change caused by passenger flow change, so that more passengers can obtain required content initial slices, and meanwhile, cache replacement cost is reduced. (2) The method improves the utilization efficiency of the MPS cache capacity, and simulation experiment results show that under the condition that MPS has different cache capacities, the cache hit rate of the method is higher than that of the existing comparison method, namely the method can better utilize the cache capacity of the MPS. (3) The method improves QoE hit rate of MPS, and simulation experiment results show that compared with LRU and LFU reference methods, the method can better learn and predict content popularity of each region, so that MPS caches corresponding content in advance, QoE hit rate is improved, and user service experience effect is better.
Drawings
FIG. 1 is a schematic diagram of an application scenario of an embodiment of the present invention;
FIG. 2 is a flow chart of one implementation of the method of the present invention;
fig. 3 is a diagram illustrating QoE hit rates for MPS storing the same number of content slices, in an embodiment of the present invention;
fig. 4 is a graph illustrating the average QoE hit rate for MPS storage of varying numbers of content slices, in an embodiment of the present invention;
FIG. 5 is a diagram illustrating the average cache cost for MPS storing different numbers of content slices according to an embodiment of the present invention;
fig. 6 is a diagram illustrating the convergence performance of the method of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples.
Aiming at the problems of limited MPS storage capacity, strong passenger mobility and the like, the invention provides a user service experience-oriented content grading optimization caching method in a vehicle network, a caching strategy of MPS and RSU cooperative transmission is adopted, MPS provides the initial part of content for passengers, and RSU provides the rest content slices; then, considering that the passenger respectively obtains Content slices from the MPS and the RSU, a QoE hit rate is defined, which represents a probability that the MPS and the RSUs together successfully provide the passenger with corresponding Content Segments, thereby implementing a QoE-based Content slice Caching (Content Segments Caching) policy, which can accurately evaluate user experience. The method optimizes the content slice caching strategy of the vehicle MPS by using the deep reinforcement learning network DQN, collects the content requests of passengers in the vehicle each time the vehicle starts from the current station to the next station after the training DQN network converges, inputs the content slice caching state of the current MPS into the DQN network after the vehicle arrives at the next station, calculates the action of outputting the maximum QoE hit rate, and updates the content slice caching strategy of the MPS.
The devices included in the application scenarios of the present invention are as follows:
a public vehicle: including public transport means such as buses, trams and subways; the in-vehicle deployment has a server with storage, computing, and communication capabilities, referred to as the MPS. The MPS caches the beginning content slices of the various content to provide content services for the in-car passengers.
Station: the bus is a station where passengers get on and off the bus at a stop after a certain period of time. The stations are generally located in various areas of the city according to the settings of the traffic departments.
RSU: deployed at the station, caching all sorts of content and all slices thereof in the system, when the bus stops at the station, the system can provide content service for passengers in the bus, and can communicate with the MPS.
As shown in fig. 1, which is a typical public transportation network scenario developed by the present invention and composed of public vehicles and RSUs, K stations are uniformly distributed on an urban highway trunk road at an interval L, and are numbered K ═ 1,2. Bus station i is the ith bus station. The Internet of vehicles system has F content files, each content file is D in size, and the content watching time is tau. Any content file F belongs to F and can be cut into N slices with equal length, and the file F is cut into F1,f2,…,fN. All kinds of contents in the RSU cache system, MPS cache the beginning of partial contents, and MPS can cache N at mostBAnd (4) slicing the content. If MPS caches the top z of content ff,0≤zf≦ N slices, then the contents of the MPS cache may be represented as
Figure GDA0003216247940000041
And is
Figure GDA0003216247940000042
As shown in fig. 1, the MPS of a vehicle caches the front partial slice of F files F-1, F-2, …, F-F and adjusts the cache contents at the station using the method of the present invention.
Firstly, a bus moving model is established, taking a bus as an example, the bus is set to do uniform linear motion between two adjacent bus stations at a speed v, and then the moving time T between the two stations is setmComprises the following steps:
Tm=L/v (1)
let the time when the bus stops at the bus station be Ts,Ts> 0 obey a lognormal distribution. Therefore, the residence time of the bus at K bus stations is respectively used
Figure GDA0003216247940000043
It is shown that, among others,
Figure GDA0003216247940000044
representing the residence time of the bus at the ith bus stop, subject to a lognormal distribution, i.e.
Figure GDA0003216247940000051
μ,σ2Respectively the mathematical expectation and the variance, respectively,
Figure GDA0003216247940000052
the probability density of (a) is:
Figure GDA0003216247940000053
and then establishing a content request model according to the zigh law. Generally, the probability p of requests for different contents by users in the same regionfObeying zif's law (ZipF distribution) as follows:
Figure GDA0003216247940000054
wherein p isfIs the probability of a user's request for content f, α is the ziff distribution parameter, efIs the ranking of the request content f.
In the existing literature, the probability of a user's request for content is often set to follow a ZipF distribution of a single parameter. However, in the present invention, since the public vehicle is in a process of continuously driving-driving by standing-driving, different regions are spanned, wherein passengers are also in a continuous flow due to getting on and off behaviors. Therefore, the content request distribution between two adjacent bus stops is assumed to follow the ziff distribution with different parameters α, as shown in fig. 1, that is, the content requests of the passengers received by the buses among different bus stops follow the ziff distribution with different parameters α.
The total number of requests for passenger content received by the MPS may vary per unit time, taking into account the behavior of passengers getting on and off at the bus stop where the bus stops. The total number of times of user requests received by MPS between the ith bus station and the (i + 1) th bus station is set as UiWherein u isi∈[1,Ui]Denotes the u-thiA request of a secondary passenger. Let u beiProbability of secondary content request reaching MPS
Figure GDA0003216247940000055
Obeying a poisson distribution:
Figure GDA0003216247940000056
wherein, λ is a Poisson distribution parameter,
Figure GDA0003216247940000057
denotes the u-thiThe time of arrival of the secondary content request.
Wireless channels in the internet of vehicles are subject to rayleigh distribution, so the present invention models the channel between the passenger and the RSU as a rayleigh fading channel. When the bus stops at the bus station, the distance between the bus and the bus station is set as d, and d is used for path lossWhere β is the path loss exponent and h is the channel attenuationFalling index, then the transmission rate r between the passenger and the RSU at the bus stopuComprises the following steps:
Figure GDA0003216247940000058
wherein B represents a transmission bandwidth, P represents a transmission power of an RSU at a bus stop, and N0Is gaussian white noise.
Likewise, the radio transmission channel between MPS and RSU follows a rayleigh distribution, the distance between them being the same as the distance between user and RSU, d, then the transmission rate r between MPS and RSU at the bus stationbComprises the following steps:
rb=ru (6)
when the method is adopted, the passengers firstly obtain the slice at the beginning of the content from the MPS, and the request of the passengers for the content f is carried out in the process that the bus runs from the ith bus station to the (i + 1) th bus station
Figure GDA0003216247940000059
The time reaches the MPS, taking video content as an example, in order to ensure that the playing content of the passenger is not blocked before the bus arrives at the (i + 1) th bus stop, the length of the content required by the MPS for the passenger is the time spent on the bus for driving the remaining distance
Figure GDA0003216247940000061
Namely:
Figure GDA0003216247940000062
therefore, the MPS should provide the number of content slices for the passenger
Figure GDA0003216247940000063
The minimum is:
Figure GDA0003216247940000064
wherein the content of the first and second substances,
Figure GDA0003216247940000065
represents a floor rounding function; τ represents the duration of time that the user utilizes the file f, e.g., f is a video, then τ is the duration of time that the video f can be played.
The number of slices considering the content f of the MPS cache at this time is zfThen, there are two cases: 1) if it is
Figure GDA0003216247940000066
Indicating that the MPS is capable of providing the passenger with content for a sufficient length of time for the passenger to view to the bus stop. 2) If it is
Figure GDA0003216247940000067
Indicating that the bus can only provide z at most for passengersfAnd (4) slicing. In either case, when the bus arrives at the (i + 1) th bus stop, the passenger's content request is still forwarded to the RSU at the bus stop. Because the RSU stores all slices of the content f, while the bus only buffers partial slices of the content f, passengers can obtain more slices of the content from the RSU during a limited parking time.
Thus, the number of content slices requested by the passenger from the RSU
Figure GDA0003216247940000068
Comprises the following steps:
Figure GDA0003216247940000069
by combining the transmission rate and the parking time, the maximum transmission capacity of the passenger from the RSU at the bus stop can be obtained
Figure GDA00032162479400000610
Comprises the following steps:
Figure GDA00032162479400000611
in the above formula, PiIndicating the transmit power of the RSU at the ith bus stop. Then, the passenger should stop in the bus
Figure GDA00032162479400000612
The remaining content slices are obtained internally, and the following conditions need to be met:
Figure GDA00032162479400000613
when the vehicle stops at the ith bus stop, the MPS on the vehicle checks the stored content and updates the content, and the transmission rate between the MPS and the RSU is known
Figure GDA00032162479400000618
Maximum capacity of RSU to MPS transmission in time
Figure GDA00032162479400000619
Comprises the following steps:
Figure GDA00032162479400000614
then if the MPS is replaced at the ith bus station
Figure GDA00032162479400000615
Figure GDA00032162479400000616
The individual content slices need to satisfy the following conditions:
Figure GDA00032162479400000617
at the same time, the number of slices replaced must not exceed the maximum storage space of the MPS:
Figure GDA0003216247940000071
in the application scenario of the present invention, since the passenger requests different slices of the same content from the MPS and RSU, respectively. Therefore, there are two cases of cache hit failure: 1) when the number of content slices of the MPS cache is insufficient to support passenger playback, the cache hit fails; 2) a cache hit fails when the passenger cannot get the remaining content slices from the RSU during the parking time. Based on the above, the invention provides a cache hit rate of user experience QoE according with the current content slice scene. The QoE hit rate is defined as follows: probability of successful passenger acquisition of the complete content from the MPS and RSU within a certain time.
First, the probability of MPS hitting a passenger request is defined as the initial hit rate
Figure GDA0003216247940000072
Figure GDA0003216247940000073
Wherein the content of the first and second substances,
Figure GDA0003216247940000074
the specific meanings of (A) are as follows:
Figure GDA0003216247940000075
i.e. when the passenger requests the content f ahead
Figure GDA0003216247940000076
When the slice is cached in MPS, the success of the first cache hit is recorded,
Figure GDA0003216247940000077
otherwise, the cache is marked as the failure of the first cache hit,
Figure GDA0003216247940000078
the first hit indicates that the passenger can retrieve enough content from the MPS.
Secondly, when the bus stops at the ith bus stop, the passengers need to stop for a limited time
Figure GDA0003216247940000079
Get the rest of the content from the RSU internally, define the tag
Figure GDA00032162479400000710
The following were used:
Figure GDA00032162479400000711
i.e. if the passenger can be parked for a limited period of time
Figure GDA00032162479400000712
The remaining slices of content f are obtained from the RSU,
Figure GDA00032162479400000713
if not, then,
Figure GDA00032162479400000714
then, only when the above two conditions are satisfied, i.e.
Figure GDA00032162479400000715
The passenger can obtain the complete content, and the QoE hit is recorded as one time, otherwise, the QoE hit is recorded as failure,
Figure GDA00032162479400000716
that is, the probability that a passenger will get the complete content is called the QoE hit rate
Figure GDA00032162479400000717
Expressed as:
Figure GDA00032162479400000718
further, the present invention closely correlates QoE hit rates with the number of content slices of the MPS cache. Since MPS has limited storage space, it is necessary to periodically update its cache contents to better serve passengers. In the process of updating the content when the bus arrives at the bus station, neglecting the time of establishing communication connection between the passenger and the RSU and between the bus and the RSU, defining the objective function as the QoE hit rate after the MPS updates the slice content each time, as follows:
Figure GDA00032162479400000719
Figure GDA0003216247940000081
Figure GDA0003216247940000082
Figure GDA0003216247940000083
the above objective function indicates that the cache policy of MPS is obtained when the QoE hit rate is maximized for a public vehicle traveling between two stations.
In order to maximize the QoE hit rate of the system, the invention utilizes a Deep Q-learning Network (DQN) method to search for the best caching strategy of the bus content slice. DQN is a method fusing a neural network and Q-learning, which predicts Q value by using the neural network and learns the optimal strategy by continuously updating the neural network. The basis of DQN is a Markov process, which is generally described by a quadruple < S, A, P, R > where agent is MPS, S represents state space, A represents action space, P represents transition probability, and R represents reward after action is performed in the current state. Since it is difficult to describe the change between states using a deterministic transition probability P due to the high mobility of passengers, the present invention indirectly describes the magnitude of the transition probability using a state-action cost function in value-based DQN.
Description of the state space S. The state is an objective description of the real-world environment in which the agent is currently located. In the present invention, the state represents the cache proportion of each type of content slice in the MPS. MPS cache state use s before bus arrives at (i + 1) th bus stopi∈S,i∈[1,K-1]Is represented as follows:
si=(z1,z2,...,zf,...,zF)
wherein 0 is less than or equal to zfN ≦ N, top z representing the current cached content f of MPSfThe number of content slices.
In the initial stage of the system, as the bus is positioned at the initial bus station and the server does not receive any content request information of passengers, the server adopts a uniform cache strategy, namely, N before each content f is cachedBa/F number of content slices, then
Figure GDA0003216247940000084
Description of the motion space a. In the process of bus driving, the distribution and the number of passengers are different among bus stops, so the requirements for the contents are different. To maximize the QoE hit rate of the system, MPS needs to adjust the currently cached content slice, add or delete a portion of the content slice at each time of the site. Thus, the motion space in a proxy is defined as ai∈A,aiThe number of slices representing the replacement of different contents when the MPS cache at the (i + 1) th bus station is updated:
ai=(h1,h2,...,hf,...,hF)
wherein h is more than or equal to 0f≤N-zfH representing increasing content ffA slice, -zf≤hf< 0 denotes h for deleting content ffIs cut into pieces, and
Figure GDA0003216247940000085
i.e. to ensure that the number of added content slices does not exceed the number of deleted content slices and that the replaced content does not exceed the maximum storage space of MPS. Further, from equation (13), it can be known that:
Figure GDA0003216247940000086
description of the reward function R. r isie.R represents the agent according to the current state siPerforming a specific action a at the i +1 th bus stationiThe immediate benefit later. The instant benefits in the present invention are expressed as having performed action aiPost QoE hit rate, as follows:
Figure GDA0003216247940000091
based on the state space, the action space and the reward, the DQN consists of a current Q network, a target Q network and an experience playback module. The network structures of the current Q network and the target Q network are the same but the parameters are different, and the current Q network is responsible for the situation according to the current state siSelecting a Current action aiAnd updating the model parameter omega; the target Q network is used for calculating a target Q value, and the network parameter omega' of the target Q network is periodically copied from omega of the current Q network. Empirical playback is used to store historical data, with caching agents constructing tuples < si,ai,ri,si+1Storing in the experience playback, and extracting a part of data from the experience playback pool for updating each time the parameter is updated, the correlation between the data can be broken. < si,ai,ri,si+1Indicates a pair state siPerforming action aiThen becomes state si+1Performing action aiThe latter prize being ri
The content hierarchical optimization caching method for user service experience, which is realized by the invention, updates the cached content of MPS through QoE-CSC technology, and the whole process comprises the following steps, as shown in FIG. 3.
Step 1, initializing the cache content of a public vehicle MPS in a DQN network training phase; setting that F content files need to be cached, caching the initial slices of the F content files in an MPS by adopting a uniform caching strategy, and initializing all parameters of the deep reinforcement learning network.
And 2, initializing the serial numbers and positions of the bus stations, wherein K bus stations are provided in the embodiment of the invention.
And 3, the bus runs from the current station to the next station, collects the content request information of passengers in the bus, and calculates the corresponding QoE hit rate. In the training phase, the content request of the passenger is described by using Zipf distribution and Poisson distribution. In the implementation phase of actually using the DQN network with stable training, the invention can count the content request information according to the actual situation.
And 4, after the next station is reached, updating the deep reinforcement learning network according to the content request and the QoE hit rate, and replacing the cache content of the MPS according to the deep reinforcement learning network. In the training phase, the Q network stores the current state, action and instant benefit into a playback pool.
And 5, in the training stage, the Q network randomly extracts a certain number of records from the Q network, calculates the next state according to the state s and the action a, and calculates future benefits so as to obtain an ideal Q value. And after the calculation is finished, the extracted state records are sent into a Q network, and the network is trained in a gradient descent mode by taking the ideal Q value as a reference.
And 6, in the training stage, driving away from the current station, and continuously repeating the steps 3 and 4 until reaching the terminal station.
And 7, in the training stage, continuously repeating the steps 2-6 until the set cycle number is reached, or stopping the training process when the network is judged to be stable, so as to obtain the stable DQN network.
The invention utilizes a pseudo code algorithm for optimizing the content slice caching strategy by using a deep reinforcement learning network DQN to realize the following steps:
initializing F, NB,Tm,K,α,d,B,Pi,N0
Initializing an experience playback pool
Figure GDA0003216247940000101
G, the random weight of the action cost function Q is θ;
initializing a current state
Figure GDA0003216247940000102
Randomly initializing a parameter omega of the current Q network;
initializing a parameter omega' of the target Q network;
For 1,2,…,M:
pretreatment sequence phi1=φ(s1);
for i=1,2,…,K-1:
Will phi(s)i) Inputting a current Q network;
according to the formula
Figure GDA0003216247940000103
Formula (II)
Figure GDA0003216247940000104
And formula
Figure GDA0003216247940000105
Conditional enumerating all possible action sets Ai
From AiIn which the action a is randomly selected with a probability epsiloniOtherwise, select ai=argmaxaQ(si,Ai;θ);
According to action aiReplacing the cache content of MPS to obtain the next state si+1
Generating content requests using Zipf distribution and Poisson distribution to obtain immediate rewards ri
Update state si=si+1Pretreatment of phii+1=φ(si+1) Will(s)i,ai,ri,si+1) By logging inPool
Figure GDA00032162479400001010
Playback of pools from experience
Figure GDA0003216247940000106
In which a small number of samples(s) are randomly drawni,ai,ri,si+1);
Is provided with
Figure GDA0003216247940000107
For (y)i-Q(si,ai;θ))2Updating by using a gradient descent method;
End for
End for
in the above pseudo code algorithm: m represents the set maximum number of cycles; phi(s)1) Refers to a set of slice replacement actions that can be performed in the current state; q(s)i,Ai(ii) a θ) is the action cost function Q depending on the policy θ; a isi=argmaxaQ(si,Ai(ii) a θ) refers to the action of selecting the function that maximizes the Q value; for the
Figure GDA0003216247940000108
Wherein, yiRepresenting the Q value of the DQN network after action is performed, and considering instant income and future income; r isiIs an instant benefit; end represents the terminating site, which in the embodiment of the present invention is K, γ is a conversion coefficient, which represents the importance of future profit to the current decision,
Figure GDA0003216247940000109
representing future benefits, is state s after performing action ai+1The Q value of (1). With (y)i-Q(si,ai;θ))2The network parameters are updated as a loss function using a gradient descent method.
The method of the invention is subjected to simulation experiments. The system simulation parameter settings are shown in table 1. The content popularity between two adjacent public transportation stations follows a Zipf distribution with a different parameter α ═ 1.1,2.5,1.6,1.7,1.3,2.7,1.6,2.1,1.9,1.1 ]. The bus is returned to the first bus station immediately after traveling from the first bus station to the tenth bus station. Furthermore, to reduce computational complexity, the number of content types that the MPS replaces at a time is controlled to be not more than three. In addition, according to the related data set in table 1, it is calculated that the number of content slices per replacement does not exceed 5.
TABLE 1 System simulation parameter settings
Parameter(s) Value of
Content category F 20
Number of slices N of one content file 10
Playback duration of one content slice 3 minutes
Data size of one content file 12Mb
Maximum number of cache slices N for MPS B 100
Travel time T between two neighboring bus stationsm 600 seconds
Bus station-approaching time distribution range 4-60 seconds
Distance d between passenger and RSU 3 m
Communication bandwidth B 1MHz
White gaussian noise N 0 3*10-13
Transmission power P of RSU 1.3W
For comparison, the Least Frequently Used (LFU) and Least Recently Used (LRU) caching strategies are Used as the comparison method of the present invention. The LFU method caches the most requested content and replaces the least frequently used content each time. The LRU method caches the content that has been requested the most recently, replacing the least recently used content each time. In order to compare the hit performance of the various caching methods, the LRU and LFU are also set to not exceed 5 slices per replacement. The simulation results are shown in fig. 3 to 6.
As shown in fig. 3, it is the case that MPS stores 100 slices of content, which is the QoE hit rate across ten bus stops. The abscissa of fig. 3 represents each bus stop, and the ordinate represents the QoE hit rate. It can be seen that the QoE-CSC strategy of the present invention has a QoE hit rate at 1-5 bus stops that is slightly lower than the LFU and LRU caching strategies, but a QoE hit rate at 6-10 bus stops that is significantly higher than the LFU and LRU methods. In addition, the ten-station total QoE hit rate of the QoE-CSC caching method of the present invention is 6.955, while the ten-station total QoE hit rates of the LFU and LRU caching methods are 5.49104 and 4.64264, respectively, which are far lower than the present invention method, calculated according to the simulation result data. The specific reason is that the method of the present invention aims to maximize the total QoE hit rate between ten bus stops, so in the beginning stage, the method of the present invention chooses to cache as much content slice content as possible that will be requested throughout the entire driving process. In contrast, the objective of the LRU and LFU methods is to achieve the maximum QoE hit rate at the current site, so that the hit performance is better at the beginning, but at a later stage, because the number of content slices that can be replaced at a time is limited, the QoE hit rate of the LFU and LRU methods drops rapidly.
Fig. 4 shows the average QoE hit rate between ten bus stations for the case where MPS caches 40, 60, 80, 100, and 120 slices of content, respectively. The abscissa of fig. 4 represents the cache data amount of the vehicle MPS, and the ordinate represents the average QoE hit rate at the bus stop. It is obvious that the average QoE hit rate of the QoE-CSC caching method of the present invention is between 0.6 and 0.7, and tends to be stable after increasing with the increase of the caching capacity, because the MPS can cache more content slices as the caching capacity increases, and thus can meet the content demand of more passengers. The latter trend is stable because MPS can replace only 5 content slices at a time, and its performance gradually stabilizes. However, the average QoE hit rate of the LFU and LRU caching method does not increase with the increase of the MPS caching capacity, because the LFU and LRU caching method mainly depends on the content request rule of the passenger to execute the corresponding slice replacement policy, and has a certain randomness, which also indicates that the LFU and LRU caching method has poor performance and unstable caching effect. In addition, the average QoE hit rate of the QoE-CSC caching method of the present invention is always higher than that of the two comparison methods, wherein the average QoE hit rate of the LFU method is lower than 0.6 at most and even lower than 0.3 at least, and the average QoE hit rate of the LRU method reaches 0.648 at most, but is lower than that of the caching method of the present invention in most cases. The main reason for this is that the caching method of the present invention can continuously learn the content popularity of each region, and thus select the optimal content slice replacement policy, and therefore, the caching method of the present invention can better utilize the caching capacity than the LFU and LRU caching methods.
In the case that the MPS caches 40, 60, 80, 100, and 120 content slices, respectively, fig. 5 shows the average caching cost of the MPS for content replacement at different bus stops using three different methods. The abscissa of fig. 5 represents bus stops and the ordinate represents average cache cost, i.e. the number of slices replaced per stop. It is found that the LFU and LRU methods choose to replace 5 slices per station, since the optimization goal of both methods is to improve the hit rate in the current state, so that as many slices of content as possible need to be replaced. However, the average caching cost of the QoE-CSC caching method of the present invention is low, and is on average 3.6 slices, because through training of historical request data, the method of the present invention selects some content slices that are most popular in the whole process, so it does not need to adjust the cached content frequently. Therefore, the method can obtain better hit performance by using lower communication resources.
Figure 6 shows the performance of the QoE-CSC caching method of the present invention in terms of training times. In fig. 6, the abscissa represents the number of training, the left ordinate represents the loss, and the right ordinate represents the reward. The training loss refers to the loss of the neural network in the DQN, and the loss is reduced along with the increase of the training times, which indicates that the accuracy of the neural network is higher and higher. The reward reflects the accumulation of state-action rewards, which increase as training time increases, indicating better performance of the method. As can be seen from fig. 6, the loss of the caching method of the present invention decreases with the increase of the training times, the reward increases, and the training test and the reward tend to be stable after reaching a certain degree.

Claims (9)

1. A content grading optimization cache method facing user service experience in an Internet of vehicles is characterized in that content files needing to be cached in an Internet of vehicles system are sliced, the starting part slices of each content file are cached in a mobile public server MPS arranged in a public vehicle, and the rest content slices of each content file are cached in a road side unit RSU arranged at a station; if F content files need to be cached, the size of each content file is D, and the content watching duration is set asτ, cutting each content file into N pieces of equal size; let z of MPS cache ffThe number of the slices is one,
Figure FDA0003216247930000011
the method comprises the following steps:
step 1, modeling the movement of public vehicles, content requests of passengers and wireless channels in the Internet of vehicles, and then establishing a target function for maximizing the QoE (quality of experience) hit rate to obtain a content slice caching strategy;
the QoE hit rate refers to the probability of successful acquisition of the complete content from MPS and RSU by the passenger, and is expressed as
Figure FDA00032162479300000117
The following were used:
Figure FDA0003216247930000012
wherein K represents the number of stations; u shapeiTotal number of user requests, u, received for MPS between the ith stop and the (i + 1) th stopiIs the u-th ofiA secondary content request;
Figure FDA0003216247930000013
front for marking content f requested by passenger
Figure FDA0003216247930000014
Whether the slice has been cached in the MPS, and if so,
Figure FDA0003216247930000015
if not, the user can not select the specific application,
Figure FDA0003216247930000016
for marking whether a passenger can stop at the time of stopping at the ith station
Figure FDA0003216247930000017
The remaining slice of content f is obtained from the RSU and, if possible,
Figure FDA0003216247930000018
if not, the user can not select the specific application,
Figure FDA0003216247930000019
represents the number of content slices that the MPS should provide to the passenger;
the objective function is established as follows:
Figure FDA00032162479300000110
Figure FDA00032162479300000111
Figure FDA00032162479300000112
Figure FDA00032162479300000113
wherein the content of the first and second substances,
Figure FDA00032162479300000114
for the maximum transmission capacity the passenger obtains from the RSU at the ith stop,
Figure FDA00032162479300000115
number of content slices requested to RSU;
step 2, optimizing a content slice caching strategy of the vehicle MPS by using the deep reinforcement learning network DQN, and collecting content requests of passengers in the vehicle when the vehicle runs from the current station to the next station after the DQN is trained to converge; and after the vehicle arrives at the station, inputting the content slice caching state of the current MPS into the DQN network, calculating the action with the maximum QoE hit rate, and updating the content slice caching strategy of the MPS.
2. The method according to claim 1, wherein in step 1, a public vehicle movement model is established, in particular: setting a vehicle to do uniform linear motion between two adjacent vehicle stations to obtain the moving time T between the two stationsm(ii) a Let the dwell time of a vehicle at K vehicle stations be expressed as
Figure FDA00032162479300000116
Which represents the residence time of the vehicle at the i-th vehicle station, follows a lognormal distribution.
3. The method according to claim 1, wherein in step 1, a content request model is established, specifically: describing the request probability of users in the same area for different contents according to the zigh law, and setting the content request probability between two adjacent vehicle stations to follow the zigh law distribution of different parameters alpha; let U be the total number of times of user requests received by MPS of a vehicle between the ith stop and the (i + 1) th stopiU thiThe probability that a secondary passenger's content request reaches the MPS obeys a Poisson distribution, ui∈[1,Ui]。
4. A method according to claim 1,2 or 3, wherein in step 1, the moving time of the vehicle between two adjacent stations is TmDuring which the u-th received MPSiThe time of arrival of the secondary content request is
Figure FDA0003216247930000021
Z for MPS cache file ffSlicing; number of content slices that MPS should provide to passengers
Figure FDA0003216247930000022
At least:
Figure FDA0003216247930000023
then variable
Figure FDA0003216247930000024
The following calculations were made:
Figure FDA0003216247930000025
5. a method according to claim 1,2 or 3, wherein in step 1, variables are varied
Figure FDA0003216247930000026
Calculated as follows:
Figure FDA0003216247930000027
wherein, when the vehicle stops at the ith station, the passenger requests the number of the left content slices from the RSU
Figure FDA0003216247930000028
The following calculations were made:
Figure FDA0003216247930000029
where N is the total number of slices of the file f, TmFor the moving time of the vehicle between two adjacent stations, MPS receives the u-th station in the moving process of the two adjacent stationsiThe time of arrival of the secondary content request is
Figure FDA00032162479300000210
zfIs the MPS bufferThe number of slices of the file f;
Figure FDA00032162479300000211
the maximum transmission capacity obtained for passengers from the RSU at the station is calculated as follows:
Figure FDA00032162479300000212
wherein B denotes a transmission bandwidth, PiRepresents the transmission power, N, of the RSU at the ith station0Is gaussian white noise, h denotes a channel fading index, d is a distance between a passenger and a station RSU, β is a path loss index,
Figure FDA00032162479300000213
indicating the time of the vehicle's stay at the ith stop.
6. The method as claimed in claim 1, wherein in the step 2, when the DQN is used for the optimization, the state in the state space represents the cache proportion of each content slice in the MPS, and the MPS cache state before reaching the i +1 th bus stop is set as si∈S,i∈[1,K-1]Denotes si=(z1,z2,...,zf,...,zF) (ii) a Motion a in motion spaceiContent slice cache update quantity, a, representing MPS at station i +1i=(h1,h2,...,hf,...,hF),hfH indicating addition or deletion of file ffIs cut into pieces, and
Figure FDA0003216247930000031
reward function riAccording to the current state siPerforming action a at the i +1 st stopiPost QoE hit rate.
7. The method according to claim 1 or 6, wherein said step 2, in the stage of DQN training, comprises: the initial input state is that the vehicle MPS adopts a uniform cache strategy to store each content slice; the vehicle MPS receiving a content request of a passenger on the way to travel between stations; when the vehicle arrives at the next station, the MPS selects a feasible action, wherein the action is to update the slice cache number of each content file in the MPS; MPS calculates the instant profit of the current action according to the QoE hit rate, and stores the current state, action and instant profit record into a playback pool; extracting a set number of records from the playback pool, and calculating the next state and future benefits to obtain an ideal Q value in the current state; training the network according to a gradient descent method by taking the ideal Q value as a reference so as to enable the network to be converged; the benefit refers to QoE hit rate.
8. The method of claim 7, wherein in step 2, when selecting a feasible action, all feasible actions are listed, and the actions are required to satisfy the following conditions:
Figure FDA0003216247930000032
wherein the content of the first and second substances,
Figure FDA0003216247930000033
representing the number of content slices, N, that the MPS replaces at the ith bus stationBRepresenting the maximum storable number of content slices of the MPS.
9. The method of claim 8, wherein in step 2, when selecting a feasible action, the action with the largest QoE hit rate after the action is executed is preferentially selected.
CN202011370700.1A 2020-11-30 2020-11-30 Content grading optimization caching method for user service experience in Internet of vehicles Active CN112565377B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011370700.1A CN112565377B (en) 2020-11-30 2020-11-30 Content grading optimization caching method for user service experience in Internet of vehicles

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011370700.1A CN112565377B (en) 2020-11-30 2020-11-30 Content grading optimization caching method for user service experience in Internet of vehicles

Publications (2)

Publication Number Publication Date
CN112565377A CN112565377A (en) 2021-03-26
CN112565377B true CN112565377B (en) 2021-09-21

Family

ID=75045341

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011370700.1A Active CN112565377B (en) 2020-11-30 2020-11-30 Content grading optimization caching method for user service experience in Internet of vehicles

Country Status (1)

Country Link
CN (1) CN112565377B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113094982B (en) * 2021-03-29 2022-12-16 天津理工大学 Internet of vehicles edge caching method based on multi-agent deep reinforcement learning
CN113596160B (en) * 2021-07-30 2022-09-13 电子科技大学 Unmanned aerial vehicle content caching decision method based on transfer learning
CN114979145B (en) * 2022-05-23 2023-01-20 西安电子科技大学 Content distribution method integrating sensing, communication and caching in Internet of vehicles
CN115208952B (en) * 2022-07-20 2023-09-26 北京交通大学 Intelligent collaborative content caching method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110312231A (en) * 2019-06-28 2019-10-08 重庆邮电大学 Content caching decision and resource allocation joint optimization method based on mobile edge calculations in a kind of car networking
CN110546958A (en) * 2017-05-18 2019-12-06 利弗有限公司 Apparatus, system and method for wireless multilink vehicle communication

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9313766B2 (en) * 2012-06-01 2016-04-12 Interdigital Patent Holdings, Inc. Bandwidth management (BWM) operation with opportunistic networks
US10172009B1 (en) * 2018-04-05 2019-01-01 Netsia, Inc. System and method for a vehicular network service over a 5G network
CN108923949A (en) * 2018-04-20 2018-11-30 西南交通大学 A kind of ambulant network edge cache regulation means of user oriented
CN111629218A (en) * 2020-04-29 2020-09-04 南京邮电大学 Accelerated reinforcement learning edge caching method based on time-varying linearity in VANET
CN111629443B (en) * 2020-06-10 2022-07-26 中南大学 Optimization method and system for dynamic spectrum slicing frame in super 5G Internet of vehicles

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110546958A (en) * 2017-05-18 2019-12-06 利弗有限公司 Apparatus, system and method for wireless multilink vehicle communication
CN110312231A (en) * 2019-06-28 2019-10-08 重庆邮电大学 Content caching decision and resource allocation joint optimization method based on mobile edge calculations in a kind of car networking

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
信息中心网络缓存技术研究综述;张天魁等;《北京邮电大学学报》;20160615(第03期);全文 *

Also Published As

Publication number Publication date
CN112565377A (en) 2021-03-26

Similar Documents

Publication Publication Date Title
CN112565377B (en) Content grading optimization caching method for user service experience in Internet of vehicles
Wu et al. Mobility-aware cooperative caching in vehicular edge computing based on asynchronous federated and deep reinforcement learning
CN109218747B (en) Video service classification caching method based on user mobility in super-dense heterogeneous network
CN110312231A (en) Content caching decision and resource allocation joint optimization method based on mobile edge calculations in a kind of car networking
CN111385734B (en) Internet of vehicles content caching decision optimization method
Yu et al. Proactive content caching for internet-of-vehicles based on peer-to-peer federated learning
CN113094982B (en) Internet of vehicles edge caching method based on multi-agent deep reinforcement learning
CN112995950B (en) Resource joint allocation method based on deep reinforcement learning in Internet of vehicles
CN116156455A (en) Internet of vehicles edge content caching decision method based on federal reinforcement learning
CN113283177B (en) Mobile perception caching method based on asynchronous federated learning
CN115297170A (en) Cooperative edge caching method based on asynchronous federation and deep reinforcement learning
CN113012013A (en) Cooperative edge caching method based on deep reinforcement learning in Internet of vehicles
CN115314944A (en) Internet of vehicles cooperative caching method based on mobile vehicle social relation perception
Xu et al. Distributed online caching for high-definition maps in autonomous driving systems
CN115297171A (en) Edge calculation unloading method and system for cellular Internet of vehicles hierarchical decision
CN114973673A (en) Task unloading method combining NOMA and content cache in vehicle-road cooperative system
Liu et al. Mobility-aware coded edge caching in vehicular networks with dynamic content popularity
CN113158544B (en) Edge pre-caching strategy based on federal learning under vehicle-mounted content center network
CN114374949A (en) Power control mechanism based on information freshness optimization in Internet of vehicles
Li et al. User dynamics-aware edge caching and computing for mobile virtual reality
CN112203258A (en) Internet of vehicles cache deployment method under freeflow state of highway
CN116634442A (en) Base station resource scheduling method based on traffic and communication characteristic complementation prediction
Jiang et al. Asynchronous federated and reinforcement learning for mobility-aware edge caching in IoVs
Lyu et al. Service-driven resource management in vehicular networks based on deep reinforcement learning
CN114979145B (en) Content distribution method integrating sensing, communication and caching in Internet of vehicles

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant