CN112565377A - Content grading optimization caching method for user service experience in Internet of vehicles - Google Patents
Content grading optimization caching method for user service experience in Internet of vehicles Download PDFInfo
- Publication number
- CN112565377A CN112565377A CN202011370700.1A CN202011370700A CN112565377A CN 112565377 A CN112565377 A CN 112565377A CN 202011370700 A CN202011370700 A CN 202011370700A CN 112565377 A CN112565377 A CN 112565377A
- Authority
- CN
- China
- Prior art keywords
- content
- mps
- vehicle
- cache
- station
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 90
- 238000005457 optimization Methods 0.000 title claims abstract description 11
- 230000002787 reinforcement Effects 0.000 claims abstract description 8
- 230000009471 action Effects 0.000 claims description 37
- 238000012549 training Methods 0.000 claims description 22
- 230000005540 biological transmission Effects 0.000 claims description 15
- 230000006870 function Effects 0.000 claims description 13
- 230000008901 benefit Effects 0.000 claims description 12
- 230000008569 process Effects 0.000 claims description 11
- 230000033001 locomotion Effects 0.000 claims description 9
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000005562 fading Methods 0.000 claims description 3
- 238000011478 gradient descent method Methods 0.000 claims description 3
- 238000012217 deletion Methods 0.000 claims 1
- 230000037430 deletion Effects 0.000 claims 1
- 230000008859 change Effects 0.000 abstract description 7
- 238000004891 communication Methods 0.000 description 7
- 238000004088 simulation Methods 0.000 description 7
- 238000013528 artificial neural network Methods 0.000 description 5
- 238000013135 deep learning Methods 0.000 description 5
- 239000003795 chemical substances by application Substances 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 230000007704 transition Effects 0.000 description 3
- 230000006399 behavior Effects 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000009825 accumulation Methods 0.000 description 1
- 239000000872 buffer Substances 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 230000008092 positive effect Effects 0.000 description 1
- 238000007430 reference method Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/12—Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/08—Configuration management of networks or network elements
- H04L41/0893—Assignment of logical groups to network elements
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/50—Network service management, e.g. ensuring proper service fulfilment according to agreements
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Mobile Radio Communication Systems (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The invention discloses a content grading optimization caching method for user service experience in the Internet of vehicles. The method adopts a cache strategy cooperatively transmitted by a vehicle mobile public server MPS and a station road side unit RSU, sets the probability of successfully obtaining complete contents from the MPS and the RSU as a QoE hit rate, and optimizes each content slice cache strategy of the MPS by utilizing a deep reinforcement learning network DQN to maximize the QoE hit rate. The invention can adjust the content type and content segment proportion of the MPS cache to adapt to the content preference change caused by passenger flow change, so that more passengers can obtain the required content initial slices, and simultaneously, the cache replacement cost is reduced.
Description
Technical Field
The invention belongs to the technical field of vehicle networks, and particularly relates to a user service experience oriented content grading optimization cache in a vehicle networking system.
Background
In recent years, with the rapid development of in-vehicle applications, entertainment content has been increasingly popular with passengers traveling in public vehicles. Thus, the large number of content requests generated by passengers has led to a rapid increase in the demand for communications in urban mass transit systems. In addition, the distance between the traditional cloud server and the public transport vehicle is long, and the time delay of the user for obtaining the content is increased due to the limited capacity of the backhaul link, so that the user experience is seriously influenced. Fortunately, caching hot content at the vehicle edge node is considered an effective way to alleviate this problem.
Some researchers have proposed that in urban Public transportation systems, Public vehicles (buses, subways, etc.) are equipped with a Server, which becomes an MPS (Mobile Public Server) with certain storage and computing resources, so as to be able to provide content services for passengers in the vehicle. However, MPS has limited resources and cannot handle large numbers of content requests during peak hours. Congestion of the communication link results in long communication delays, greatly affecting the passenger experience. Furthermore, as public vehicles travel across multiple regions, and the in-vehicle passengers are constantly moving from boarding to disembarking, this results in varying content preferences for in-vehicle passengers, which means that the MPS needs to constantly adjust its cached content to meet more passenger demand.
At present, some content replacement strategies based on DRL (Deep Learning) are proposed to address the problem of limited storage space of edge devices. Reference [1] designs a multi-layer cache mechanism by using a learning-based method, predicts content request distribution based on vehicle mobility, and combines a deep learning method to generate a decision of caching content in RSUs (Road Side Units), so that a mobile user can immediately obtain target content when reaching the RSUs coverage, thereby reducing content acquisition delay and reducing bandwidth consumption. Reference [2] proposes to dynamically update the content cache using the DRL method only based on time-varying requests and the content of the base station cache, and to improve the decision-making capability of the system, the related method is improved such that it has a higher cache hit rate than the least used, first-in-first-out and deep Q network based methods. Reference [3] designs an optimal content caching scheme based on content sharing between vehicles using a DRL method. Reference [4] proposes a cooperative edge caching strategy, which jointly optimizes content replacement and content distribution among a macro-cellular base station, RSUs, and smart vehicles using a depth-deterministic strategy gradient method. The existing method of providing content service by combining a mobile vehicle and an RSU/base station mostly considers how to transmit complete content to a user through what kind of server, less considers that the storage capacity of the MPS is limited, and less considers that different content slices of the same content are provided by the MPS and the RSU, respectively. Meanwhile, in the existing research on vehicle content caching, most of the research only considers the content updating strategy of a single RSU/base station/vehicle, and the content preference change caused by the mobility of the vehicle and the mobility of users on the vehicle is rarely considered at the same time.
Reference documents:
[1]Z.Zhao,L.Guardalben,M.Karimzadeh,J.Silva,T.Braun and S.Sargento,"Mobility Prediction-Assisted Over-the-Top Edge Prefetching for Hierarchical VANETs,"in IEEE Journal on Selected Areas in Communications,vol.36,no.8,pp.1786-1801,Aug.2018.
[2]P.Wu,J.Li,L.Shi,M.Ding,K.Cai and F.Yang,"Dynamic Content Update for Wireless Edge Caching via Deep Reinforcement Learning,"in IEEE Communications Letters,vol.23,no.10, pp.1773-1777,Oct.2019.
[3]Y.Dai,D.Xu,K.Zhang,S.Maharjan and Y.Zhang,"Deep Reinforcement Learning and Permissioned Blockchain for Content Caching in Vehicular Edge Computing and Networks,"in IEEE Transactions on Vehicular Technology,vol.69,no.4,pp.4312-4324,April 2020.
[4]G.Qiao,S.Leng,S.Maharjan,Y.Zhang and N.Ansari,"Deep Reinforcement Learning for Cooperative Content Caching in Vehicular Edge Computing and Networks,"in IEEE Internet of Things Journal,vol.7,no.1,pp.247-257,Jan.2020.
disclosure of Invention
The invention provides a content grading optimization cache method facing user service Experience, which aims to solve the problem of small MPS storage capacity, adopts a cache strategy of MPS and RSU cooperative transmission, and utilizes a hit rate QoE-CSC (QoE-based content slice cache) technology to maximize the hit rate of QoE (Quality of Experience) of a system, thereby optimizing the content cache strategy.
The invention provides a content grading optimization caching method for user service experience in the Internet of vehicles, which is applied to the Internet of vehicles, wherein in a scene of the Internet of vehicles, MPS is arranged in a public vehicle, and a road side unit RSU is arranged at a station; the content files needing to be cached in the internet of vehicles are sliced, MPS provides the initial part of the content for passengers, and RSU provides the rest content slices.
To more accurately assess the user experience, the present invention defines a QoE hit rate, which represents the probability of a passenger successfully obtaining the complete content from the MPS and RSU. Since the change of the passenger flow can cause different content preferences, the content segments of the MPS cache need to be replaced periodically, so the present invention obtains the cache policy of MPS by maximizing the QoE hit rate. The method of the invention is to cache F content files, each content file is D in size, the content viewable time is tau, and each content file is cut into N pieces with equal size; let z of MPS cache ffThe number of the slices is one,the invention comprises the following steps:
the QoE hit rate refers to the probability of successful acquisition of the complete content from MPS and RSU by the passenger, and is expressed asThe method comprises the following steps:
wherein K represents the number of stations; u shapeiTotal number of user requests, u, received for MPS between the ith stop and the (i + 1) th stopiIs the u-th ofiA secondary content request;front for marking content f requested by passengerWhether the slice is already cached in the MPS, if so,if not, the user can not select the specific application, for marking whether a passenger can stop at the time of stopping at the ith stationThe remaining slice of content f is obtained from the RSU and, if possible,if not, the user can not select the specific application, indicating the number of content slices that the MPS should provide to the passenger.
(1) in the DQN training phase, comprising: the initial input state is that the vehicle MPS adopts a uniform cache strategy to store each content slice; the method comprises the steps that a vehicle MPS receives content requests of passengers during driving between stations, when the vehicle arrives at the next station, the MPS selects a feasible action, and the action is to update the number of slice caches of each content file in the MPS; MPS calculates the instant benefit of the current action according to the cache hit condition (namely QoE hit rate), and stores the current state, action and instant benefit record into a playback pool; and extracting a certain number of records from the playback pool, calculating the next state and future benefits to obtain an ideal Q value in the current state, and training the Q network according to a gradient descent strategy by taking the ideal Q value as a reference so as to make the network convergent.
(2) The vehicle continuously repeats the above process, after a certain number of times, the network area is stable, and the MPS can input the current state to the Q network at this time, so as to obtain the best cache replacement action, thereby improving the cache hit rate between the stations 1 to K.
Compared with the prior art, the invention has the advantages and positive effects that: (1) the method of the invention is based on QoE hit rate of the content slice cache strategy maximization system, can adjust the content type and content segment proportion of the MPS cache to adapt to content preference change caused by passenger flow change, so that more passengers can obtain required content initial slices, and simultaneously reduces cache replacement cost. (2) The method improves the utilization efficiency of the MPS cache capacity, and simulation experiment results show that under the condition that MPS has different cache capacities, the cache hit rate of the method is higher than that of the existing comparison method, namely the method can better utilize the cache capacity of the MPS. (3) The method improves QoE hit rate of MPS, and simulation experiment results show that compared with LRU and LFU reference methods, the method can better learn and predict content popularity of each region, so that MPS caches corresponding content in advance, QoE hit rate is improved, and user service experience effect is better.
Drawings
FIG. 1 is a schematic diagram of an application scenario of an embodiment of the present invention;
FIG. 2 is a flow chart of one implementation of the method of the present invention;
fig. 3 is a diagram illustrating QoE hit rates for MPS storing the same number of content slices, in an embodiment of the present invention;
fig. 4 is a graph illustrating the average QoE hit rate for MPS storage of varying numbers of content slices, in an embodiment of the present invention;
FIG. 5 is a diagram illustrating the average cache cost for MPS storing different numbers of content slices according to an embodiment of the present invention;
fig. 6 is a diagram illustrating the convergence performance of the method of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples.
Aiming at the problems of limited MPS storage capacity, strong passenger mobility and the like, the invention provides a content grading optimization caching method facing user service experience in a vehicle network, and adopts a caching strategy of MPS and RSU cooperative transmission, wherein MPS provides the initial part of content for passengers, and RSU provides the rest content slices; then, considering that the passenger respectively obtains Content slices from the MPS and the RSU, a QoE hit rate is defined, which represents a probability that the MPS and the RSUs together successfully provide the passenger with corresponding Content Segments, thereby implementing a QoE-based Content slice Caching (Content Segments Caching) policy, which can accurately evaluate user experience. The method optimizes the content slice caching strategy of the vehicle MPS by using the deep reinforcement learning network DQN, collects the content requests of passengers in the vehicle each time the vehicle starts from the current station to the next station after the training DQN network converges, inputs the content slice caching state of the current MPS into the DQN network after the vehicle arrives at the next station, calculates the action of outputting the maximum QoE hit rate, and updates the content slice caching strategy of the MPS.
The devices included in the application scenarios of the present invention are as follows:
a public vehicle: including public transport means such as buses, trams and subways; the in-vehicle deployment has a server with storage, computing, and communication capabilities, referred to as the MPS. The MPS caches the beginning content slices of the various content to provide content services for the in-car passengers.
Station: the bus is a station where passengers get on and off the bus at a stop after a certain period of time. The stations are generally installed in various areas of the city according to the setting of the doors of the department of transportation.
RSU: deployed at the station, caching all kinds of content and all slices thereof in the system, when the bus stops at the station, providing content service for passengers in the bus, and communicating with the MPS.
As shown in fig. 1, which is a typical public transportation network scenario developed by the present invention and composed of public vehicles and RSUs, K stations are uniformly distributed on an urban highway trunk road at an interval L, and are numbered K ═ 1,2. Bus station i is the ith bus station. The Internet of vehicles system has F content files, the size of each content file is D, and the content viewable time is tau. Any content file F belongs to F and can be cut into N slices with equal length, and the file F is cut into F1,f2,…,fN. All kinds of contents in the RSU cache system, MPS cache the beginning of partial contents, and MPS can cache N at mostBAnd (4) slicing the content. If MPS caches the top z of content ff,0≤zf≦ N slices, then the contents of the MPS cache may be represented asAnd isAs shown in FIG. 1, the vehicle's MPS caches the front partial slice of the F files F-1, F-2, …, F-F and adjusts the cached content at the station using the method of the present invention.
Firstly, a bus moving model is established, taking a bus as an example, the bus is set to do uniform linear motion between two adjacent bus stations at a speed v, and then the moving time T between the two stations is setmComprises the following steps:
Tm=L/v (1)
let the time when the bus stops at the bus station be Ts,Ts> 0 obey a lognormal distribution. Therefore, the residence time of the bus at K bus stations is respectively usedIt is shown that, among others,representing the residence time of the bus at the ith bus stop, which follows a lognormal distribution, i.e.μ,σ2Respectively the mathematical expectation and the variance, respectively,the probability density of (a) is:
and then establishing a content request model according to the zigh law. Generally, the probability p of requests for different contents by users in the same regionfObeying zif's law (ZipF distribution) as follows:
wherein p isfIs the probability of a user's request for content f, α is the ziff distribution parameter, efIs the ranking of the request content f.
In the existing literature, the probability of a user's request for content is often set to follow a ZipF distribution of a single parameter. However, in the present invention, since the public vehicles are in the process of continuously driving-driving by standing-driving, different regions are spanned, wherein passengers are continuously moving due to the driving-on-off behavior. Therefore, the content request distribution between two adjacent bus stops follows the ziff distribution with different parameters α, as shown in fig. 1, that is, the content requests of the passengers received by the buses follow the ziff distribution with different parameters α between different bus stops.
The total number of requests for passenger content received by the MPS may vary per unit time, taking into account the behavior of passengers getting on and off at the bus stop where the bus stops. The MPS is arranged at the ith bus station and the (i + 1) th bus stationThe total number of times of the received user requests is UiWherein u isi∈[1,Ui]Denotes the u-thiA request of a secondary passenger. Let u beiProbability of secondary content request reaching MPSObeying a poisson distribution:
wherein, λ is a Poisson distribution parameter,denotes the u-thiThe time of arrival of the secondary content request.
Wireless channels in the internet of vehicles obey rayleigh distribution, so the present invention models the channel between the passenger and the RSU as a rayleigh fading channel. When the bus stops at the bus station, the distance between the bus and the bus station is set as d, and d is used for path loss-βWhere β is the path loss exponent and h is the channel fading exponent, then the transmission rate r between the passenger and the RSU at the bus stationuComprises the following steps:
wherein B represents a transmission bandwidth, P represents a transmission power of an RSU at a bus stop, and N0Is gaussian white noise.
Likewise, the radio transmission channel between MPS and RSU follows a rayleigh distribution, the distance between them being the same as the distance between user and RSU, d, then the transmission rate r between MPS and RSU at the bus stationbComprises the following steps:
rb=ru (6)
when the method is adopted, the passengers firstly obtain the slice at the beginning of the content from the MPS, and drive the bus from the ith bus station to the ith bus stationIn the process of the (i + 1) th bus stop, the passenger requests the content fThe time reaches the MPS, taking the video content as an example, in order to ensure that the playing content of the passenger is not blocked before the bus arrives at the (i + 1) th bus stop, the length of the content required by the MPS for the passenger is the time spent on the bus for driving the remaining distanceNamely:
wherein,represents a floor rounding function; τ represents the duration of time that the user utilizes the file f, e.g., f is video, then τ is the duration of time that the video f can be played.
The number of slices considering the content f of the MPS cache at this time is zfThen, there are two cases: 1) if it isIndicating that the MPS is capable of providing the passenger with content for a sufficient length of time for the passenger to view to the bus stop. 2) If it isIndicating that the bus can only provide z at most for passengersfA slice. In either case, when the bus arrives at the (i + 1) th bus stop, the passenger's content request is still forwarded to the RSU at the bus stop. Because all slices of the content f are stored in the RSU, while the bus only buffers partial slices of the content f, passengers can obtain more slices of the content from the RSU during a limited parking time.
Thus, the number of content slices requested by the passenger from the RSUComprises the following steps:
by combining the transmission rate and the parking time, the maximum transmission capacity of the passenger from the RSU at the bus stop can be obtainedComprises the following steps:
in the above formula, PiIndicating the transmit power of the RSU at the ith bus stop. Then, the passenger should stop in the busThe remaining content slices are obtained internally, and the following conditions need to be met:
when the vehicle stops at the ith bus stop, the MPS on the vehicle checks the stored content and updates the content, and the transmission rate between the MPS and the RSU is knownRSU can move toMaximum capacity for MPS transmissionComprises the following steps:
then if the MPS is replaced at the ith bus station The individual content slices need to satisfy the following conditions:
at the same time, the number of slices replaced must not exceed the maximum storage space of the MPS:
in the application scenario of the present invention, since the passenger requests different slices of the same content from the MPS and RSU, respectively. Therefore, there are two cases of cache hit failure: 1) when the number of content slices of the MPS cache is insufficient to support passenger playback, the cache hit fails; 2) a cache hit fails when the passenger cannot get the remaining content slices from the RSU during the parking time. Based on the above, the invention provides a cache hit rate of user experience QoE according with the current content slice scene. The QoE hit rate is defined as follows: probability of successful passenger acquisition of the complete content from the MPS and RSU within a certain time.
i.e. when the passenger requests the content f aheadWhen a slice is cached in MPS, it is recorded as a success of the first cache hit,otherwise, the cache is marked as the failure of the first cache hit,the first hit indicates that the passenger can retrieve enough content from the MPS.
Secondly, when the bus stops at the ith bus stop, the passengers need to stop for a limited timeThe remaining content is retrieved from the RSU, the tag is definedThe following were used:
i.e. if the passenger can be parked for a limited period of timeThe remaining slices of content f are obtained from the RSU,if not, then,
then, only when the above two conditions are satisfied, i.e.The passenger can obtain the complete content, and the QoE hit is recorded as one time, otherwise, the QoE hit is recorded as failure,that is, the probability that a passenger will get the complete content is called the QoE hit rateExpressed as:
further, the present invention closely correlates QoE hit rates with the number of content slices of the MPS cache. Since MPS has limited storage space, it is necessary to periodically update its cache contents to better serve passengers. In the process of updating the content when the bus arrives at the bus station, neglecting the time of establishing communication connection between the passenger and the RSU and between the bus and the RSU, defining the objective function as the QoE hit rate after the MPS updates the slice content each time, as follows:
the above objective function indicates that the cache policy of MPS is obtained when the QoE hit rate is maximized for a public vehicle traveling between two stations.
In order to maximize the QoE hit rate of the system, the invention utilizes a Deep Q-learning Network (DQN) method to search for the best caching strategy of the bus content slice. DQN is a method fusing a neural network and Q-learning, which predicts Q value by using the neural network and learns the optimal strategy by continuously updating the neural network. The basis of DQN is a Markov process, which is generally described by a quadruple < S, A, P, R > where agent is MPS, S represents state space, A represents action space, P represents transition probability, and R represents reward after action is performed in the current state. Since it is difficult to describe the change between states using a deterministic transition probability P due to the high mobility of passengers, the present invention indirectly describes the magnitude of the transition probability using a state-action cost function in value-based DQN.
Description of the state space S. The state is an objective description of the real-world environment in which the agent is currently located. In the present invention, the state represents the cache proportion of each type of content slice in the MPS. MPS cache state use s before bus arrives at (i + 1) th bus stopi∈S,i∈[1,K-1]Is represented as follows:
si=(z1,z2,...,zf,...,zF)
wherein 0 is less than or equal to zfN ≦ N, top z representing the current cached content f of MPSfThe number of content slices.
In the initial stage of the system, as the bus is positioned at the initial bus station and the server does not receive any content request information of passengers, the server adopts uniform bufferingPolicy storage, i.e. caching the top N for each type of content fBa/F number of content slices, then
Description of the motion space a. In the process of bus driving, the distribution and the number of passengers are different among bus stops, so the requirements for the contents are different. To maximize the QoE hit rate of the system, MPS needs to adjust the currently cached content slice, add or delete a portion of the content slice at each time of the site. Thus, the motion space in a proxy is defined as ai∈A,aiThe number of slices representing the replacement of different contents when the MPS cache at the (i + 1) th bus station is updated:
ai=(h1,h2,...,hf,...,hF)
wherein h is more than or equal to 0f≤N-zfH representing increasing content ffA slice, -zf≤hf< 0 denotes h for deleting content ffIs cut into pieces, andi.e. to ensure that the number of added content slices does not exceed the number of deleted content slices and that the replaced content does not exceed the maximum storage space of MPS. Further, from equation (13), it can be known that:
description of the reward function R. r isie.R represents the agent according to the current state siPerforming a specific action a at the i +1 th bus stationiThe immediate benefit later. The instant benefits in the present invention are expressed as having performed action aiPost QoE hit rate, as follows:
based on state space, action space and rewards, the DQN consists of three modules, namely a current Q network, a target Q network and experience playback. The network structures of the current Q network and the target Q network are the same but the parameters are different, and the current Q network is responsible for the situation according to the current state siSelecting a Current action aiAnd updating the model parameter omega; the target Q network is used for calculating a target Q value, and the network parameter omega' of the target Q network is periodically copied from omega of the current Q network. Empirical playback is used to store historical data, with caching agents constructing tuples < si,ai,ri,si+1Storing in the experience playback, and extracting a part of data from the experience playback pool for updating each time the parameter is updated, the correlation between the data can be broken. < si,ai,ri,si+1Indicates a pair state siPerforming action aiThen becomes state si+1Performing action aiThe latter prize being ri。
The content hierarchical optimization caching method for user service experience, which is realized by the invention, updates the cached content of MPS through QoE-CSC technology, and the whole process comprises the following steps, as shown in FIG. 3.
And 2, initializing the serial numbers and positions of the bus stations, wherein K bus stations are provided in the embodiment of the invention.
And 3, the bus runs from the current station to the next station, collects the content request information of passengers in the bus, and calculates the corresponding QoE hit rate. In the training phase, the content requests of passengers are described by using Zipf distribution and Poisson distribution. In the implementation phase of actually using the DQN network with stable training, the invention can count the content request information according to the actual situation.
And 4, after the next station is reached, updating the deep reinforcement learning network according to the content request and the QoE hit rate, and replacing the cache content of the MPS according to the deep reinforcement learning network. In the training phase, the Q network stores the current state, action and instant benefit into a playback pool.
And 5, in the training stage, the Q network randomly extracts a certain number of records from the Q network, calculates the next state according to the state s and the action a, and calculates future benefits so as to obtain an ideal Q value. And after the calculation is finished, sending the extracted state record into a Q network, and training the network by adopting a gradient descent mode by taking the ideal Q value as a reference.
And 6, in the training stage, driving away from the current station, and continuously repeating the steps 3 and 4 until reaching the terminal station.
And 7, in the training stage, continuously repeating the steps 2-6 until the set cycle number is reached, or stopping the training process when the network is judged to be stable, so as to obtain the stable DQN network.
The invention utilizes a pseudo code algorithm for optimizing the content slice caching strategy by using a deep reinforcement learning network DQN to realize the following steps:
initializing F, NB,Tm,K,α,d,B,Pi,N0;
Randomly initializing a parameter omega of the current Q network;
initializing a parameter omega' of the target Q network;
For 1,2,…,M:
pretreatment sequence phi1=φ(s1);
for i=1,2,…,K-1:
Will phi(s)i) Inputting a current Q network;
From AiIn which the action a is randomly selected with a probability epsiloniOtherwise, select ai=argmaxaQ(si,Ai;θ);
According to action aiReplacing the cache content of MPS to obtain the next state si+1;
Generating content requests using Zipf distribution and Poisson distribution to obtain immediate rewards ri;
Playback of pools from experienceIn which a small number of samples(s) are randomly drawni,ai,ri,si+1);
For (y)i-Q(si,ai;θ))2Updating by using a gradient descent method;
End for
End for
in the above pseudo code algorithm:m represents the set maximum number of cycles; phi(s)1) Refers to a set of slice replacement actions that can be performed in the current state; q(s)i,Ai(ii) a θ) is the action cost function Q depending on the policy θ; a isi=argmaxaQ(si,Ai(ii) a θ) refers to the action chosen to maximize the Q value of the function; for theWherein, yiRepresenting the Q value of the DQN network after action is performed, and considering instant income and future income; r isiIs an instant benefit; end represents the terminating site, which in the embodiment of the present invention is K, γ is a conversion coefficient, which represents the importance of future profit to the current decision,representing future benefits, is state s after performing action ai+1The Q value of (1). With (y)i-Q(si,ai;θ))2The network parameters are updated as a loss function using a gradient descent method.
The method of the invention is subjected to simulation experiments. The system simulation parameter settings are shown in table 1. The content popularity between two adjacent public transportation stations follows a Zipf distribution with a different parameter α ═ 1.1,2.5,1.6,1.7,1.3,2.7,1.6,2.1,1.9,1.1 ]. The bus is returned to the first bus station immediately after traveling from the first bus station to the tenth bus station. Furthermore, to reduce computational complexity, the content types that control each time the MPS replaces do not exceed three. In addition, it is calculated that the number of content slices per replacement does not exceed 5 based on the correlation data set in table 1.
TABLE 1 System simulation parameter settings
Parameter(s) | Value of |
|
20 |
Number of slices N of one |
10 |
Playback duration of one |
3 minutes |
Data size of one content file | 12Mb |
Maximum number of cache slices N for |
100 |
Travel time T between two neighboring bus stationsm | 600 seconds |
Bus station-approaching time distribution range | 4-60 seconds |
Distance d between passenger and RSU | 3 m |
Communication bandwidth B | 1MHz |
White |
3*10-13 |
Transmission power P of RSU | 1.3W |
For comparison, the Least Frequently Used (LFU) and Least Recently Used (LRU) caching strategies are Used as the comparison method of the present invention. The LFU method caches the most requested content and replaces the least frequently used content each time. The LRU method caches the most recently requested content and replaces the least recently used content each time. In order to compare the hit performance of the various caching methods, the LRU and LFU are also set to replace no more than 5 slices at a time. The simulation results are shown in fig. 3 to 6.
As shown in fig. 3, it is the QoE hit rate between ten bus stops for the case where MPS stores 100 slices of content. The abscissa of fig. 3 represents each bus stop, and the ordinate represents the QoE hit rate. It can be seen that the QoE-CSC strategy of the present invention has a QoE hit rate at 1-5 bus stops that is slightly lower than the LFU and LRU caching strategies, but a QoE hit rate at 6-10 bus stops that is significantly higher than the LFU and LRU methods. In addition, the ten-station total QoE hit rate of the QoE-CSC caching method of the present invention is 6.955, while the ten-station total QoE hit rates of the LFU and LRU caching methods are 5.49104 and 4.64264, respectively, which are far lower than the present invention method, calculated from the simulation result data. The specific reason is that the method of the present invention aims to maximize the total QoE hit rate between ten bus stops, so in the beginning stage, the method of the present invention selects content slice content that will be requested as much as possible throughout the entire driving process. In contrast, the objective of the LRU and LFU methods is to obtain the maximum QoE hit rate at the current site, so that the hit performance is better at the beginning, but at a later time, because the number of content slices that can be replaced at a time is limited, the QoE hit rate of the LFU and LFU methods drops rapidly.
Fig. 4 shows the average QoE hit rates between ten bus stops for the cases where MPS caches 40, 60, 80, 100, and 120 slices of content, respectively. The abscissa of fig. 4 represents the cache data amount of the vehicle MPS, and the ordinate represents the average QoE hit rate at the bus stop. It is obvious that the average QoE hit rate of the QoE-CSC caching method of the present invention is between 0.6 and 0.7, and tends to be stable after increasing with the increase of the caching capacity, because the MPS can cache more content slices as the caching capacity increases, and thus can meet the content demand of more passengers. The latter trend is stable because MPS can replace only 5 content slices at a time, and its performance gradually stabilizes. However, the average QoE hit rate of the LFU and LRU caching method does not increase with the increase of the MPS caching capacity, because the LFU and LRU caching method mainly depends on the content request rule of the passenger to execute the corresponding slice replacement policy, and has a certain randomness, which also indicates that the LFU and LRU caching method has poor performance and unstable caching effect. In addition, the average QoE hit rate of the QoE-CSC caching method of the present invention is always higher than that of the two comparison methods, wherein the average QoE hit rate of the LFU method is lower than 0.6 at most and even lower than 0.3 at least, and the average QoE hit rate of the LRU method reaches 0.648 at most, but is lower than that of the caching method of the present invention in most cases. The main reason for this is that the caching method of the present invention can continuously learn the content popularity of each region, and thus select the optimal content slice replacement policy, and therefore, the caching method of the present invention can better utilize the caching capacity than the LFU and LRU caching methods.
In the case that the MPS caches 40, 60, 80, 100, and 120 content slices, respectively, fig. 5 shows the average caching cost of the MPS for content replacement at different bus stops using three different methods. The abscissa of fig. 5 represents bus stops and the ordinate represents average cache cost, i.e. the number of slices replaced per stop. It is found that the LFU and LRU methods choose to replace 5 slices per station, since both methods have the optimization goal of increasing the hit rate in the current state, so that as many slices of content as possible need to be replaced. However, the average caching cost of the QoE-CSC caching method of the present invention is low, and is on average 3.6 slices, because through training of historical request data, the method of the present invention selects some content slices that are most popular in the whole process, so it does not need to adjust the cached content frequently. Therefore, the method can obtain better hit performance by using lower communication resources.
Figure 6 shows the performance of the QoE-CSC caching method of the present invention in terms of training times. In fig. 6, the abscissa represents the number of training, the left ordinate represents the loss, and the right ordinate represents the reward. The training loss refers to the loss of the neural network in the DQN, and the loss is reduced along with the increase of the training times, which indicates that the accuracy of the neural network is higher and higher. The reward reflects the accumulation of state-action rewards, which increase as training time increases, indicating better performance of the method. As can be seen from FIG. 6, the loss of the caching method of the present invention decreases with the increase of the training times, the reward increases, and the training test and the reward tend to be stable after reaching a certain degree.
Claims (10)
1. A content grading optimization cache method facing user service experience in an Internet of vehicles is characterized in that content files needing to be cached in an Internet of vehicles system are sliced, the starting part slices of each content file are cached in a mobile public server MPS arranged in a public vehicle, and the rest content slices of each content file are cached in a road side unit RSU arranged at a station; setting that F content files need to be cached, wherein the size of each content file is D, the content viewable time length is tau, and cutting each content file into N pieces with equal size; let z of MPS cache ffThe number of the slices is one,the method comprises the following steps:
step 1, modeling the movement of public vehicles, content requests of passengers and wireless channels in the Internet of vehicles, and then establishing a target function for maximizing the QoE (quality of experience) hit rate to obtain a content slice caching strategy;
the QoE hit rate refers to the probability of successful acquisition of the complete content from MPS and RSU by the passenger, and is expressed asThe following were used:
wherein K represents the number of stations; u shapeiTotal number of user requests, u, received for MPS between the ith stop and the (i + 1) th stopiIs the u-th ofiA secondary content request;front for marking content f requested by passengerWhether the slice has been cached in the MPS, and if so,if not, the user can not select the specific application, for marking whether a passenger can stop at the time of stopping at the ith stationThe remaining slice of content f is obtained from the RSU and, if possible,if not, the user can not select the specific application, represents the number of content slices that the MPS should provide to the passenger;
step 2, optimizing a content slice caching strategy of the vehicle MPS by using the deep reinforcement learning network DQN, and collecting content requests of passengers in the vehicle when the vehicle runs from the current station to the next station after the DQN is trained to converge; and after the vehicle arrives at the station, inputting the content slice caching state of the current MPS into the DQN network, calculating the action with the maximum QoE hit rate, and updating the content slice caching strategy of the MPS.
2. The method according to claim 1, wherein in step 1, a public vehicle movement model is established, in particular: setting a vehicle to do uniform linear motion between two adjacent vehicle stations to obtain the moving time T between the two stationsm(ii) a Let the dwell time of a vehicle at K vehicle stations be expressed as Which represents the residence time of the vehicle at the i-th vehicle station, follows a lognormal distribution.
3. The method according to claim 1, wherein in step 1, a content request model is established, specifically: describing the request probability of users in the same area for different contents according to the zigh law, and setting the content request probability between two adjacent vehicle stations to follow the zigh law distribution of different parameters alpha; let U be the total number of times of user requests received by MPS of a vehicle between the ith stop and the (i + 1) th stopiU thiThe probability that a secondary passenger's content request reaches the MPS obeys a Poisson distribution, ui∈[1,Ui]。
4. A method according to claim 1,2 or 3, wherein in step 1, the moving time of the vehicle between two adjacent stations is TmDuring which the u-th received MPSiThe time of arrival of the secondary content request isZ for MPS cache file ffSlicing; number of content slices that MPS should provide to passengersAt least:
where τ represents the duration of time the user utilizes the file f.
5. A method according to claim 1,2 or 3, wherein in step 1, variables are variedCalculated as follows:
wherein, when the vehicle stops at the ith station, the passenger requests the number of the left content slices from the RSUThe following calculations were made:
where N is the total number of slices of the file f, TmFor the moving time of the vehicle between two adjacent stations, MPS receives the u-th station in the moving process of the two adjacent stationsiThe time of arrival of the secondary content request iszfThe number of slices of the MPS cache file f;
the maximum transmission capacity obtained for passengers from the RSU at the station is calculated as follows:
wherein B denotes a transmission bandwidth, PiRepresents the transmission power, N, of the RSU at the ith station0Is gaussian white noise, h denotes a channel fading index, d is a distance between a passenger and a station RSU, β is a path loss index,indicating the time of the vehicle's stay at the ith stop.
7. The method as claimed in claim 1, wherein in the step 2, when the DQN is used for the optimization, the state in the state space represents the cache proportion of each content slice in the MPS, and the MPS cache state before reaching the i +1 th bus stop is set as si∈S,i∈[1,K-1]Denotes si=(z1,z2,...,zf,...,zF) (ii) a Motion a in motion spaceiContent slice cache update quantity, a, representing MPS at station i +1i=(h1,h2,...,hf,...,hF),hfH indicating addition or deletion of file ffIs cut into pieces, andreward function riAccording to the current state siPerforming action a at the i +1 st stopiPost QoE hit rate.
8. The method according to claim 1 or 7, wherein said step 2, in the stage of DQN training, comprises: the initial input state is that the vehicle MPS adopts a uniform cache strategy to store each content slice; the vehicle MPS receiving a content request of a passenger on the way to travel between stations; when the vehicle arrives at the next station, the MPS selects a feasible action, wherein the action is to update the slice cache number of each content file in the MPS; MPS calculates the instant profit of the current action according to the QoE hit rate, and stores the current state, action and instant profit record into a playback pool; extracting a set number of records from the playback pool, and calculating the next state and future benefits to obtain an ideal Q value in the current state; training the network according to a gradient descent method by taking the ideal Q value as a reference so as to enable the network to be converged; the benefit refers to QoE hit rate.
9. The method of claim 8, wherein in step 2, when selecting a feasible action, all feasible actions are listed, and the actions are required to satisfy the following conditions:
10. The method of claim 9, wherein in step 2, when selecting a feasible action, the action with the largest QoE hit rate after the action is executed is preferentially selected.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011370700.1A CN112565377B (en) | 2020-11-30 | 2020-11-30 | Content grading optimization caching method for user service experience in Internet of vehicles |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011370700.1A CN112565377B (en) | 2020-11-30 | 2020-11-30 | Content grading optimization caching method for user service experience in Internet of vehicles |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112565377A true CN112565377A (en) | 2021-03-26 |
CN112565377B CN112565377B (en) | 2021-09-21 |
Family
ID=75045341
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011370700.1A Expired - Fee Related CN112565377B (en) | 2020-11-30 | 2020-11-30 | Content grading optimization caching method for user service experience in Internet of vehicles |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112565377B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113094982A (en) * | 2021-03-29 | 2021-07-09 | 天津理工大学 | Internet of vehicles edge caching method based on multi-agent deep reinforcement learning |
CN113596160A (en) * | 2021-07-30 | 2021-11-02 | 电子科技大学 | Unmanned aerial vehicle content caching decision method based on transfer learning |
CN114979145A (en) * | 2022-05-23 | 2022-08-30 | 西安电子科技大学 | Content distribution method integrating sensing, communication and caching in Internet of vehicles |
CN115208952A (en) * | 2022-07-20 | 2022-10-18 | 北京交通大学 | Intelligent collaborative content caching method |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130329598A1 (en) * | 2012-06-01 | 2013-12-12 | Interdigital Patent Holdings, Inc. | Bandwidth management (bwm) operation with opportunistic networks |
US10172009B1 (en) * | 2018-04-05 | 2019-01-01 | Netsia, Inc. | System and method for a vehicular network service over a 5G network |
CN109547275A (en) * | 2018-04-20 | 2019-03-29 | 西南交通大学 | A kind of ambulant network edge cache regulation means of user oriented |
CN110312231A (en) * | 2019-06-28 | 2019-10-08 | 重庆邮电大学 | Content caching decision and resource allocation joint optimization method based on mobile edge calculations in a kind of car networking |
CN110546958A (en) * | 2017-05-18 | 2019-12-06 | 利弗有限公司 | Apparatus, system and method for wireless multilink vehicle communication |
CN111629443A (en) * | 2020-06-10 | 2020-09-04 | 中南大学 | Optimization method and system for dynamic spectrum slicing frame in super 5G vehicle networking |
CN111629218A (en) * | 2020-04-29 | 2020-09-04 | 南京邮电大学 | Accelerated reinforcement learning edge caching method based on time-varying linearity in VANET |
-
2020
- 2020-11-30 CN CN202011370700.1A patent/CN112565377B/en not_active Expired - Fee Related
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130329598A1 (en) * | 2012-06-01 | 2013-12-12 | Interdigital Patent Holdings, Inc. | Bandwidth management (bwm) operation with opportunistic networks |
CN110546958A (en) * | 2017-05-18 | 2019-12-06 | 利弗有限公司 | Apparatus, system and method for wireless multilink vehicle communication |
US10172009B1 (en) * | 2018-04-05 | 2019-01-01 | Netsia, Inc. | System and method for a vehicular network service over a 5G network |
CN109547275A (en) * | 2018-04-20 | 2019-03-29 | 西南交通大学 | A kind of ambulant network edge cache regulation means of user oriented |
CN110312231A (en) * | 2019-06-28 | 2019-10-08 | 重庆邮电大学 | Content caching decision and resource allocation joint optimization method based on mobile edge calculations in a kind of car networking |
CN111629218A (en) * | 2020-04-29 | 2020-09-04 | 南京邮电大学 | Accelerated reinforcement learning edge caching method based on time-varying linearity in VANET |
CN111629443A (en) * | 2020-06-10 | 2020-09-04 | 中南大学 | Optimization method and system for dynamic spectrum slicing frame in super 5G vehicle networking |
Non-Patent Citations (2)
Title |
---|
XIAOJING HAN; XI LI; CHANGQING LUO; HONG JI; HELI ZHANG: "Incentive Mechanism with the Caching Strategy for Content Sharing in Vehicular Networks", 《 2019 IEEE GLOBECOM WORKSHOPS (GC WKSHPS)》 * |
张天魁等: "信息中心网络缓存技术研究综述", 《北京邮电大学学报》 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113094982A (en) * | 2021-03-29 | 2021-07-09 | 天津理工大学 | Internet of vehicles edge caching method based on multi-agent deep reinforcement learning |
CN113094982B (en) * | 2021-03-29 | 2022-12-16 | 天津理工大学 | Internet of vehicles edge caching method based on multi-agent deep reinforcement learning |
CN113596160A (en) * | 2021-07-30 | 2021-11-02 | 电子科技大学 | Unmanned aerial vehicle content caching decision method based on transfer learning |
CN113596160B (en) * | 2021-07-30 | 2022-09-13 | 电子科技大学 | Unmanned aerial vehicle content caching decision method based on transfer learning |
CN114979145A (en) * | 2022-05-23 | 2022-08-30 | 西安电子科技大学 | Content distribution method integrating sensing, communication and caching in Internet of vehicles |
CN114979145B (en) * | 2022-05-23 | 2023-01-20 | 西安电子科技大学 | Content distribution method integrating sensing, communication and caching in Internet of vehicles |
CN115208952A (en) * | 2022-07-20 | 2022-10-18 | 北京交通大学 | Intelligent collaborative content caching method |
CN115208952B (en) * | 2022-07-20 | 2023-09-26 | 北京交通大学 | Intelligent collaborative content caching method |
Also Published As
Publication number | Publication date |
---|---|
CN112565377B (en) | 2021-09-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112565377B (en) | Content grading optimization caching method for user service experience in Internet of vehicles | |
Wu et al. | Mobility-aware cooperative caching in vehicular edge computing based on asynchronous federated and deep reinforcement learning | |
CN109218747B (en) | Video service classification caching method based on user mobility in super-dense heterogeneous network | |
CN113094982B (en) | Internet of vehicles edge caching method based on multi-agent deep reinforcement learning | |
CN115297170A (en) | Cooperative edge caching method based on asynchronous federation and deep reinforcement learning | |
CN112995950B (en) | Resource joint allocation method based on deep reinforcement learning in Internet of vehicles | |
CN115277845B (en) | Internet of vehicles distributed edge cache decision method based on multi-agent near-end strategy | |
CN113158544B (en) | Edge pre-caching strategy based on federal learning under vehicle-mounted content center network | |
CN113012013B (en) | Collaborative edge caching method based on deep reinforcement learning in Internet of vehicles | |
Xu et al. | Distributed online caching for high-definition maps in autonomous driving systems | |
CN115314944A (en) | Internet of vehicles cooperative caching method based on mobile vehicle social relation perception | |
Liu et al. | Mobility-aware coded edge caching in vehicular networks with dynamic content popularity | |
CN114979145B (en) | Content distribution method integrating sensing, communication and caching in Internet of vehicles | |
Yu et al. | Mobility-aware proactive edge caching for large files in the internet of vehicles | |
CN116321307A (en) | Bidirectional cache placement method based on deep reinforcement learning in non-cellular network | |
CN117221403A (en) | Content caching method based on user movement and federal caching decision | |
CN117873402A (en) | Collaborative edge cache optimization method based on asynchronous federal learning and perceptual clustering | |
Chan et al. | Big data driven predictive caching at the wireless edge | |
Park et al. | Applying DQN solutions in fog-based vehicular networks: Scheduling, caching, and collision control | |
CN117939505A (en) | Edge collaborative caching method and system based on excitation mechanism in vehicle edge network | |
CN114374949A (en) | Power control mechanism based on information freshness optimization in Internet of vehicles | |
Lyu et al. | Service-driven resource management in vehicular networks based on deep reinforcement learning | |
Khanal et al. | Proactive content caching at self-driving car using federated learning with edge cloud | |
CN115643176A (en) | Decentralized cooperation caching method for social networking | |
Yang et al. | Efficient Vehicular Edge Computing: A Novel Approach With Asynchronous Federated and Deep Reinforcement Learning for Content Caching in VEC |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20210921 |