WO2021212444A1 - Vod service cache replacement method based on random forest algorithm in edge network environment - Google Patents

Vod service cache replacement method based on random forest algorithm in edge network environment Download PDF

Info

Publication number
WO2021212444A1
WO2021212444A1 PCT/CN2020/086550 CN2020086550W WO2021212444A1 WO 2021212444 A1 WO2021212444 A1 WO 2021212444A1 CN 2020086550 W CN2020086550 W CN 2020086550W WO 2021212444 A1 WO2021212444 A1 WO 2021212444A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
cache replacement
model
cache
edge server
Prior art date
Application number
PCT/CN2020/086550
Other languages
French (fr)
Chinese (zh)
Inventor
张晖
孙叶钧
赵海涛
孙雁飞
倪艺洋
朱洪波
Original Assignee
南京邮电大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 南京邮电大学 filed Critical 南京邮电大学
Priority to JP2021520158A priority Critical patent/JP7098204B2/en
Publication of WO2021212444A1 publication Critical patent/WO2021212444A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/2183Cache memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/222Secondary servers, e.g. proxy server, cable television Head-end
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/258Client or end-user data management, e.g. managing client capabilities, user preferences or demographics, processing of multiple end-users preferences to derive collaborative data
    • H04N21/25866Management of end-user data
    • H04N21/25891Management of end-user data being end-user preferences

Definitions

  • the core network is an important part of the distribution business and service provision.
  • One of the main functions of the core network is to connect requests for access to the network through devices and interfaces of different standards to different service networks according to business requirements, so that each business request can get the service it deserves.
  • Another main function of the core network is to act as a service provider to process business requests submitted by various interfaces.
  • the core network itself contains a number of different service networks. When a business request comes, the core network must provide services for the business. With the explosion of business volume, the service volume provided by the core network increases sharply. In terms of providing business services, the core network is under tremendous load pressure.
  • step S2 specifically includes:
  • the average visit time is used as the dependent variable, and the remaining features are used as independent variables for regression training, and the data set is divided, the importance ranking of each feature value is output, and the features are selected according to the ranking to obtain the final modeling feature value.
  • Eigenvalue modeling forms a predictive model.
  • step S4 the establishment process of the cache replacement model in step S4 is specifically as follows:
  • step S5 the solution process of the cache replacement model in step S5 is:
  • the new total access duration calculated each time is TC'.
  • the initial condition be The ⁇ a 1 ,a 2 ,...,a K ⁇ part is the video cache set before the cache replacement, and the ⁇ b 1 ,b 2 ,...,b Q ⁇ part is the initial cache video set of the video in the cloud, and the initial conditions are substituted into the formula (2) Get the initial total access time cost performance TC 0 , and add new constraints:
  • the constraint formula (3) is regarded as the constraint formula 1, and the two constraint formulas in the cache replacement model are respectively regarded as the constraint formula 2 and the constraint formula 3.
  • the specific calculation process is as follows:
  • the present invention considers that the edge server needs to process a large amount of video information, and the excellent analysis ability of machine learning in big data processing, uses the random forest algorithm in machine learning to predict the weekly average access time of the video, and proposes on this basis Introduced a new video cache replacement scheme.
  • the scheme uses random forest algorithm for modeling, and the prediction accuracy is high; on the other hand, the scheme is very simple and easy to implement, and has good application prospects.
  • the present invention considers that the edge server needs to process a large amount of video information and the excellent analysis ability of machine learning in big data processing.
  • the access duration is predicted, and a new video cache replacement model is proposed on this basis, and the model is solved by implicit enumeration.
  • the weekly average access time of the video cached in the edge server is the longest.
  • the access time represents the load shared by the edge server for the core network.
  • the replacement model of the present invention can make the edge server have a certain capacity Under the conditions, the load of the core network is minimized, and the scheme is very simple and easy to implement, and has a good application prospect.
  • Figure 1 is a schematic flow diagram of the method of the present invention
  • Figure 2 is a schematic diagram of cache replacement
  • Figure 3 is a comparison diagram between the average weekly total visit time of videos and the actual weekly average visit time
  • Figure 4 is a comparison diagram of the price-performance ratio between the average weekly video visit time and the actual weekly average visit time
  • Figure 5 is a graph showing changes in the prediction accuracy rate of the weekly average video visit time and the prediction accuracy rate of the weekly average visit time cost performance over time;
  • Figure 6 is a graph showing changes in cache replacement rate and weekly access duration increase rate over time.
  • the present invention provides a VOD service cache replacement method based on a random forest algorithm in an edge network environment, which mainly includes three major parts, one respectively, using random forest to model and predict video access duration; 2. Propose a cache replacement model based on the prediction results; 3. Use the implicit enumeration method to solve the cache replacement model; the specific process is as follows:
  • the data retains one decimal place. For data that cannot be a decimal, such as movie popularity rankings and the number of days on the line, the average obtained is rounded to the nearest whole number. For videos that have been online for less than a week, the data corresponding to the remaining days are filled with 0.
  • Visit duration refers to the duration of continuous visits, that is, if the time interval between two visits in the visit log is less than 60 seconds, and the user makes a mistake or skips the advertisement, it does not mean that the playback is stopped, so the break is not counted as the break.
  • the random forest filling method is used to process the missing values of the data. Assuming that a certain feature is missing, the feature is regarded as a label, and the remaining features are set as a new feature matrix. If other features also have missing values, then all features are traversed, starting with the feature with the least missing values, the fewer missing values, the less accurate information is needed. When filling a feature, you need to replace the missing values of other feature values with 0. Each time you loop, the features with missing values will be reduced by one.
  • Forecast model Take 60% of the data set as the training set and 40% as the test set.
  • the online time, movie popularity ranking, popularity, number of likes, number of comments and scores are independent variables, and the average weekly visit time is used as the prediction object to model the formation.
  • Forecast model to get the forecast value.
  • the order of cache replacement in the figure does not mean that the actual replacement process is replaced in order.
  • the establishment of a cache replacement model is as follows:
  • a i represents the i-th video in the edge server
  • b j represents For the j-th video in the cloud
  • formula It represents the cost-effectiveness of replacing the standard edge server with the access duration.
  • the definition formula Indicates the cost-effectiveness of buffer replacement of video i, and the purpose of optimization is to maximize the cost-effectiveness of video buffer replacement; similarly, the formula Represents the cost-effectiveness of cloud cache replacement for video j.
  • the physical meaning is the same as above;
  • the first constraint indicates that the cloud cache is replaced with the edge server The total video volume in the edge server cannot be greater than the total volume of the video replaced by the cache in the edge server, otherwise the cache in the edge server will not be enough to replace the video;
  • the second constraint is that the video that is not replaced in the edge server and the video from the cloud The total volume of videos replaced into the edge server cannot be greater than the cache space of the edge server.
  • the above model is essentially a 0-1 plastic programming problem.
  • the implicit enumeration method is used to solve the problem.
  • the variables are checked as part of the combination of 0 or 1, and the objective function values are compared to find the optimal solution.
  • the filter condition is to satisfy the constraint condition that the objective function value is better than the objective function value of the feasible solution that has been calculated.
  • the new total access duration calculated each time is TC'.
  • the initial condition be The ⁇ a 1 ,a 2 ,...,a K ⁇ part of the set is the video cache set before the cache replacement, and the ⁇ b 1 ,b 2 ,...,b Q ⁇ part is the initial cache video set of the cloud video, and the initial conditions are substituted into Equation (2), the initial total access time cost performance TC 0 is obtained , and the new constraint conditions are added:
  • TC is the total access time cost performance obtained after each iteration.
  • the optimization targets are arranged in order of coefficients, and the set ⁇ a 1 ,a 2 ,...,
  • the variables in a K ⁇ are arranged in descending order of the cost-effectiveness coefficient, and the variables in the set ⁇ b 1 ,b 2 ,...,b Q ⁇ are arranged in descending order of the cost-effectiveness coefficient.
  • traversing the two parts of the set are from right to Left traversal, the purpose of this sorting is to replace the videos with lower cost-effectiveness first.
  • start from the videos with higher cost-effectiveness in the cloud to achieve the pruning effect.
  • the constraint formulas in the cache replacement model (1) are the constraint formula 2 and the constraint formula 3 in sequence, and the calculation process is as follows:
  • the video in the set ⁇ b 1 ,b 2 ,...,b Q ⁇ that changes from 1 to 0 at the same time represents the replacement set ⁇ a 1 ,a 2 ,...,a K ⁇ from 1 to
  • a video may be replaced by two or three or more videos at the same time due to its large size. Therefore, the situation of replacing one video with multiple videos is not considered.
  • This embodiment uses existing data simulation results to illustrate the cache replacement effect of the present invention.
  • the first is the prediction effect of the random forest algorithm.
  • the prediction accuracy of weekly average visit time is:
  • the second term of the above formula represents the ratio of the predicted visit duration error to the actual total visit duration. The smaller the value, the better the prediction effect.
  • the correct rate of predicting the average weekly visit time cost performance is defined as:
  • Equation (7) represents the ratio of the difference between the sum of the weekly average access time of the video after cache replacement and the sum of the weekly average access time of the video before the cache replacement to the sum of the weekly average access time of the video before the cache replacement, if P t ⁇ 0 . It means that the access time of the video after the cache replacement is less than the access time of the video before the cache replacement or the same as before the cache replacement, that is, the load shared by the edge server for the core network after the cache replacement is not increased or smaller, and the cache replacement effect is very poor .

Abstract

Disclosed is a VOD service cache replacement method based on a random forest algorithm in an edge network environment. The method comprises the following steps: collecting video data; processing a missing value of the video data using a random forest filling method, and establishing a prediction model; predicting an average access duration by means of the prediction model; establishing a cache replacement model according to a prediction result; and solving the cache replacement model using an implicit enumeration method to obtain a final replacement scheme. According to the present invention, an edge server needing to process a large amount of video information and machine learning having an excellent analysis capability in terms of big data processing are taken into consideration, and a random forest algorithm in machine learning is first used to predict a weekly average access duration for a video. Therefore, on this basis, a new video cache replacement model is provided, and the model is solved using an implicit enumeration method, such that the load of a core network is reduced to the greatest extent by an edge server. Moreover, the scheme is very simple and is easily implemented, and has very good application prospects.

Description

边缘网络环境下基于随机森林算法的VOD业务缓存替换方法VOD service buffer replacement method based on random forest algorithm in edge network environment 技术领域Technical field
本发明属于边缘网络技术领域,具体涉及一种边缘网络环境下基于随机森林算法的VOD业务缓存替换方法。The invention belongs to the technical field of edge networks, and in particular relates to a VOD service buffer replacement method based on a random forest algorithm in an edge network environment.
背景技术Background technique
随着科学技术的发展,各种制式的端口和设备,以及各种各样的服务和应用接入到互联网,致使网络中的业务请求呈现爆炸式增长,继而网络中的数据流量也出现了井喷式的增长,其中主要就是视频流量的增长。核心网是分发业务和提供服务环节中的重要组成部分。核心网的主要功能之一是把通过不同制式的设备和接口进入网络的请求,按照业务需求接入到不同服务网上,从而使每个业务请求得到应有的服务。核心网的另一主要功能是作为服务方,处理各个接口提交的业务请求。核心网本身包含多个不同的服务网,当业务请求到来时,核心网要为业务提供服务,而随着业务量的爆发,核心网提供的服务量急剧增长,因此,不管在业务请求处理还是在提供业务服务上,核心网都承担了巨大的负载压力。With the development of science and technology, ports and equipment of various standards, as well as various services and applications are connected to the Internet, resulting in an explosive growth of business requests in the network, and then a blowout of data traffic in the network The main reason is the growth of video traffic. The core network is an important part of the distribution business and service provision. One of the main functions of the core network is to connect requests for access to the network through devices and interfaces of different standards to different service networks according to business requirements, so that each business request can get the service it deserves. Another main function of the core network is to act as a service provider to process business requests submitted by various interfaces. The core network itself contains a number of different service networks. When a business request comes, the core network must provide services for the business. With the explosion of business volume, the service volume provided by the core network increases sharply. In terms of providing business services, the core network is under tremendous load pressure.
边缘网络是最靠近用户的一部分网络。边缘网络一方面是为核心网分担业务请求处理压力,另一方面是将服务提供也下放到边缘网络,将业务所需的服务若边缘网络有能力处理则在边缘网络侧处理。然而,由于边缘网络的计算能力有限,要最大限度的为核心网分流,关键在于如何提高服务效率,而边缘缓存是提高服务效率的关键所在。边缘缓存是指将业务使用频率较高的资源缓存在边缘服务器上,当与之相关的业务再次到来时,直接从缓存中获取资源即可,边缘服务器无法满足的业务需求则再从核心网获取。The edge network is the part of the network closest to the user. On the one hand, the edge network shares the service request processing pressure for the core network. On the other hand, it also decentralizes service provision to the edge network. If the edge network is capable of processing the services required by the business, they are processed on the edge network side. However, due to the limited computing power of the edge network, the key to maximizing the distribution of traffic to the core network is how to improve service efficiency, and edge caching is the key to improving service efficiency. Edge caching refers to the caching of resources that are frequently used by services on the edge server. When the related services come again, the resources can be obtained directly from the cache. The business needs that the edge server cannot meet are then obtained from the core network. .
此外,随着大数据时代的到来,通过机器学习高效地获取知识,已逐渐成为各个领域技术发展的主要推动力之一,边缘网络领域也不例外。在大数据时代,随着数据的爆发式增长,各种需要分析的新的数据种类也在不断涌现,如语义理解、图像分析、网络数据的分析等,使得机器学习在大数据环境下具有极其重要的作用。In addition, with the advent of the era of big data, the efficient acquisition of knowledge through machine learning has gradually become one of the main driving forces for technological development in various fields, and the field of edge networks is no exception. In the era of big data, with the explosive growth of data, a variety of new data types that need to be analyzed are also emerging, such as semantic understanding, image analysis, network data analysis, etc., making machine learning extremely useful in the big data environment. Important role.
现有的缓存替换方案大多依然以视频流行度作为主要标准之一,加以一些辅助标准如视频相似度,从而减少重复缓存相似且流行度低的视频。视频流行度反映的是视频单位时长内的访问量,针对视频业务而言,边缘服务器内缓存的视频访问总量高并不能代表其为核心网分担的负载大,而视频访问时长表示的是视频使用的时间,更适合反映边缘服务器承担的负载,再加之辅助标准,如视频体积等因素,以此来进行缓存替换效果将会更理想。Most of the existing cache replacement solutions still use video popularity as one of the main standards, and add some auxiliary standards such as video similarity, so as to reduce repeated caching of similar and low-popular videos. Video popularity reflects the number of video visits per unit of time. For video services, the high total number of video visits cached in the edge server does not mean that it is burdened by the core network. The video visit time represents the video. The time used is more suitable to reflect the load borne by the edge server, coupled with auxiliary standards, such as video volume, etc., to perform cache replacement effects will be more ideal.
发明内容Summary of the invention
发明目的:为了克服现有技术中存在的不足,提供一种边缘网络环境下基于随机森林算法的VOD业务缓存替换方法。The purpose of the invention: In order to overcome the shortcomings in the prior art, to provide a VOD service buffer replacement method based on a random forest algorithm in an edge network environment.
技术方案:为实现上述目的,本发明提供一种边缘网络环境下基于随机森林算法的VOD业务缓存替换方法,包括如下步骤:Technical Solution: In order to achieve the above objective, the present invention provides a method for replacing a VOD service buffer based on a random forest algorithm in an edge network environment, which includes the following steps:
S1:采集视频数据;S1: Collect video data;
S2:使用随机森林填充法处理视频数据缺失值,建立预测模型;S2: Use random forest filling method to process missing values of video data and establish a prediction model;
S3:通过预测模型对平均访问时长进行预测;S3: Predict the average visit duration through the predictive model;
S4:根据预测结果建立缓存替换模型;S4: Establish a cache replacement model according to the prediction result;
S5:使用隐枚举法求解缓存替换模型,得到最终替换方案。S5: Use implicit enumeration to solve the cache replacement model and get the final replacement scheme.
进一步的,所述步骤S2中预测模型的建立具体为:Further, the establishment of the prediction model in step S2 specifically includes:
以平均访问时长作为因变量,其余特征作为自变量进行回归训练,且进行数据集的划分,输出各个特征值的重要性排名,根据排名对特征进行删选得到最终建模特征值,根据建模特征值建模形成预测模型。The average visit time is used as the dependent variable, and the remaining features are used as independent variables for regression training, and the data set is divided, the importance ranking of each feature value is output, and the features are selected according to the ranking to obtain the final modeling feature value. Eigenvalue modeling forms a predictive model.
进一步的,所述步骤S4中缓存替换模型的建立流程具体为:Further, the establishment process of the cache replacement model in step S4 is specifically as follows:
假设边缘服务器的缓存空间大小为S,测试集中无法被边缘服务器缓存的视频存储在云上,测试集所有视频的预测访问时长集合为T={t 1,t 2,…,t K},视频体积集合为V={v 1,v 2,…,v K},其中K为测试集内的视频总数,缓存替换前边缘服务器内有缓存视频个数为R;云中有视频Q部,K=R+Q;建立缓存替换模型如下式所示: Assuming that the cache space size of the edge server is S, the video in the test set that cannot be cached by the edge server is stored on the cloud, and the predicted access time set of all videos in the test set is T = {t 1 ,t 2 ,...,t K }, the video The volume set is V={v 1 ,v 2 ,...,v K }, where K is the total number of videos in the test set, the number of cached videos in the edge server before cache replacement is R; there is a video Q part in the cloud, K =R+Q; the establishment of a cache replacement model is shown in the following formula:
Figure PCTCN2020086550-appb-000001
Figure PCTCN2020086550-appb-000001
其中
Figure PCTCN2020086550-appb-000002
为视频的最佳缓存替换方案,a i代表边缘服务器中第i部视频,a i=0表示视频i需要被替换,a i=1表示视频i不需要被替换,b j代表的是云中的第j部视频,b j=0代表视频j不用继续在云中存储,需要被替换进边缘服务器,b j=1代表视频j依然在云中存储,不需要被替换进边缘服务器;式子
Figure PCTCN2020086550-appb-000003
代表以访问时长替换 标准时的边缘服务器替换性价比,其有两种可能性,当a i=0时,该式为0,无实际意义,当a i=1时,表示视频i的访问时长与视频i体积的比值;
in
Figure PCTCN2020086550-appb-000002
It is the best cache replacement solution for video, a i represents the i-th video in the edge server, a i =0 means that video i needs to be replaced, a i =1 means that video i does not need to be replaced, and b j represents the cloud B j =0 means that video j does not need to be stored in the cloud and needs to be replaced into the edge server, b j =1 means that video j is still stored in the cloud and does not need to be replaced into the edge server;
Figure PCTCN2020086550-appb-000003
It represents the cost-effectiveness of replacing the standard edge server with the access duration. There are two possibilities. When a i =0, the formula is 0, meaningless. When a i =1, it means the access duration of video i and the video i volume ratio;
定义式子
Figure PCTCN2020086550-appb-000004
表示视频i的缓存替换性价比;同样,式子
Figure PCTCN2020086550-appb-000005
代表视频j的云缓存替换性价比,当b j=1时,该式为0,无实际意义。
Definition
Figure PCTCN2020086550-appb-000004
Represents the cost-effectiveness of buffer replacement for video i; similarly, the formula
Figure PCTCN2020086550-appb-000005
It represents the cost-effectiveness of cloud cache replacement for video j. When b j =1, this formula is 0, which has no practical meaning.
进一步的,所述步骤S5中缓存替换模型的求解过程为:Further, the solution process of the cache replacement model in step S5 is:
令总访问时长性价比为:Let the cost of total visit time be:
Figure PCTCN2020086550-appb-000006
Figure PCTCN2020086550-appb-000006
假设边缘服务器的容量为S,将每次计算得到的新总访问时长为TC',为减少枚举个数,令初始条件为
Figure PCTCN2020086550-appb-000007
其中{a 1,a 2,…,a K}部分为缓存替换前的视频缓存集合,{b 1,b 2,…,b Q}部分为云中视频初始缓存视频集合,将初始条件代入式(2),得到初始总访问时长性价比TC 0,新增约束条件:
Assuming that the capacity of the edge server is S, the new total access duration calculated each time is TC'. In order to reduce the number of enumerations, let the initial condition be
Figure PCTCN2020086550-appb-000007
The {a 1 ,a 2 ,...,a K } part is the video cache set before the cache replacement, and the {b 1 ,b 2 ,...,b Q } part is the initial cache video set of the video in the cloud, and the initial conditions are substituted into the formula (2) Get the initial total access time cost performance TC 0 , and add new constraints:
TC>TC 0             (3) TC>TC 0 (3)
将约束条件式(3)、缓存替换模型中的两个约束式进行迭代计算,得到最优的
Figure PCTCN2020086550-appb-000008
替换方案。
Iteratively calculate the constraint condition equation (3) and the two constraint equations in the cache replacement model to obtain the optimal
Figure PCTCN2020086550-appb-000008
Alternative plan.
进一步的,所述迭代计算具体为:Further, the iterative calculation is specifically:
将约束条件式(3)作为约束式①、缓存替换模型中的两个约束式分别作为约束式②和约束式③,具体计算过程如下:The constraint formula (3) is regarded as the constraint formula ①, and the two constraint formulas in the cache replacement model are respectively regarded as the constraint formula ② and the constraint formula ③. The specific calculation process is as follows:
1)从后往前替换集合{a 1,a 2,…,a K}中的一部已缓存视频,即将该视频的a i=1置为a i=0; 1) Replace a cached video in the set {a 1 , a 2 ,..., a K } from back to front , that is, set a i =1 of the video to a i =0;
2)从后往前遍历集合{b 1,b 2,…,b Q},计算新的总访问时长TC; 2) Traverse the set {b 1 ,b 2 ,...,b Q } from back to front, and calculate the new total access time TC;
3)比较TC和TC 0,若TC≥TC 0,则将TC 0置为新的值TC,即令TC 0=TC,继续步骤4,否则重新进行步骤1,进行下一次迭代,TC 0不变; 3) Comparison of TC and TC 0, if TC≥TC 0, then TC 0 TC is set to a new value, and even if TC 0 = TC, proceed to step 4, otherwise repeat Step 1, the next iteration, TC 0 unchanged ;
4)计算约束条件②,若满足,则进行步骤5,否则重新进行步骤1,进行下一次迭代,TC 0不变; 4) Calculate the constraint ②, if it is met, proceed to step 5, otherwise, proceed to step 1 again and proceed to the next iteration, and TC 0 remains unchanged;
5)计算约束条件③,若满足,则本次迭代满足所有约束条件,TC 0即为新值,并在 此处进行剪枝,即停止遍历集合{b 1,b 2,…,b Q},从步骤1开始进行下一次迭代。 5) Calculate the constraint ③, if it is satisfied, then this iteration meets all the constraints, TC 0 is the new value, and pruning is performed here, that is, stop traversing the set {b 1 ,b 2 ,...,b Q } , And proceed to the next iteration from step 1.
本发明考虑到边缘服务器需要处理大量的视频信息,以及机器学习在大数据处理中出色的分析能力,利用机器学习中的随机森林算法对视频的周平均访问时长进行预测,并在此基础上提出了一种新的视频缓存替换方案。一方面,该方案利用随机森林算法进行建模,预测准确度高;另一方面,该方案又非常简单而易于实现,具有很好的应用前景。The present invention considers that the edge server needs to process a large amount of video information, and the excellent analysis ability of machine learning in big data processing, uses the random forest algorithm in machine learning to predict the weekly average access time of the video, and proposes on this basis Introduced a new video cache replacement scheme. On the one hand, the scheme uses random forest algorithm for modeling, and the prediction accuracy is high; on the other hand, the scheme is very simple and easy to implement, and has good application prospects.
有益效果:本发明与现有技术相比,考虑到边缘服务器需要处理大量的视频信息,以及机器学习在大数据处理中出色的分析能力,首先利用机器学习中的随机森林算法对视频的周平均访问时长进行预测,从而在此基础上提出了一种新的视频缓存替换模型,并使用隐枚举法对模型进行求解。在边缘服务器容量一定的情况下,使边缘服务器内缓存视频的周平均访问时长最长,访问时长代表了边缘服务器为核心网分担的负载大小,本发明的替换模型可以使边缘服务器在容量一定的条件下最大限度地减轻核心网负载,且该方案非常简单而易于实现,具有很好的应用前景。Beneficial effects: Compared with the prior art, the present invention considers that the edge server needs to process a large amount of video information and the excellent analysis ability of machine learning in big data processing. The access duration is predicted, and a new video cache replacement model is proposed on this basis, and the model is solved by implicit enumeration. In the case of a certain edge server capacity, the weekly average access time of the video cached in the edge server is the longest. The access time represents the load shared by the edge server for the core network. The replacement model of the present invention can make the edge server have a certain capacity Under the conditions, the load of the core network is minimized, and the scheme is very simple and easy to implement, and has a good application prospect.
附图说明Description of the drawings
图1为本发明方法的流程示意图;Figure 1 is a schematic flow diagram of the method of the present invention;
图2为缓存替换示意图;Figure 2 is a schematic diagram of cache replacement;
图3为视频周平均总共访问时长与实际周平均访问时长比较图;Figure 3 is a comparison diagram between the average weekly total visit time of videos and the actual weekly average visit time;
图4为视频周平均访问时长性价比与实际周平均访问时长性价比比较图;Figure 4 is a comparison diagram of the price-performance ratio between the average weekly video visit time and the actual weekly average visit time;
图5为视频周平均访问时长预测准确率和周平均访问时长性价比预测准确率随时间变化图;Figure 5 is a graph showing changes in the prediction accuracy rate of the weekly average video visit time and the prediction accuracy rate of the weekly average visit time cost performance over time;
图6为缓存替换率和周访问时长增加率随时间变化图。Figure 6 is a graph showing changes in cache replacement rate and weekly access duration increase rate over time.
具体实施方式Detailed ways
下面结合附图和具体实施例,进一步阐明本发明。The present invention will be further clarified below in conjunction with the drawings and specific embodiments.
如图1所示,本发明提供一种边缘网络环境下基于随机森林算法的VOD业务缓存替换方法,其主要包括三大部分,分别为一、利用随机森林对视频访问时长进行建模和预测;二、基于预测结果提出缓存替换模型;三、利用隐枚举法求解缓存替换模型;其具体的过程如下:As shown in Figure 1, the present invention provides a VOD service cache replacement method based on a random forest algorithm in an edge network environment, which mainly includes three major parts, one respectively, using random forest to model and predict video access duration; 2. Propose a cache replacement model based on the prediction results; 3. Use the implicit enumeration method to solve the cache replacement model; the specific process is as follows:
一、使用随机森林算法对VOD视频周平均访问时长回归建模和预测1. Using random forest algorithm to model and predict the weekly average visit time of VOD videos
(1)采集样本视频数据和数据预处理(1) Collect sample video data and data preprocessing
在视频播放平台的电影片库中随机采集100000部视频信息得到样本数据集,将样本数据集中的视频数据进行预处理:以周为单位,对视频信息在一周内的数据取平均,视频信息包括上线时间、电影热度榜名次、热度、点赞数、评论数、评分和视频访问时 长等。数据保留一位小数,对无法为小数的数据,如电影热度榜名次和上线天数,则将求得的平均数四舍五入取整。对于上线时间不足一周的视频,将剩余天数对应的数据用0补齐。访问时长是指连续访问时长,即若访问日志中两次访问的时间间隔小于60秒,用户误点或者是跳过广告,不属于是停止播放,因此该间断时间不计为间断时间。Randomly collect 100,000 pieces of video information in the movie library of the video playback platform to obtain a sample data set, and preprocess the video data in the sample data set: take the week as the unit, average the data of the video information within a week, and the video information includes Online time, movie popularity rankings, popularity, number of likes, number of comments, ratings, and video access time, etc. The data retains one decimal place. For data that cannot be a decimal, such as movie popularity rankings and the number of days on the line, the average obtained is rounded to the nearest whole number. For videos that have been online for less than a week, the data corresponding to the remaining days are filled with 0. Visit duration refers to the duration of continuous visits, that is, if the time interval between two visits in the visit log is less than 60 seconds, and the user makes a mistake or skips the advertisement, it does not mean that the playback is stopped, so the break is not counted as the break.
(2)使用随机森林算法建模和预测(2) Modeling and prediction using random forest algorithm
接着使用随机森林填充法处理数据缺失值,假设某一特征存在缺失,则把该特征当成标签,将剩余特征令为一个新的特征矩阵。如果其他特征也存在缺失值,则遍历所有的特征,从缺失值最少的特征开始,缺失值越少,则所需要的准确信息也越少。在填补一个特征时,需先将其他特征值的缺失值用0代替,每次循环一次,有缺失值的特征便会减少一个。Then, the random forest filling method is used to process the missing values of the data. Assuming that a certain feature is missing, the feature is regarded as a label, and the remaining features are set as a new feature matrix. If other features also have missing values, then all features are traversed, starting with the feature with the least missing values, the fewer missing values, the less accurate information is needed. When filling a feature, you need to replace the missing values of other feature values with 0. Each time you loop, the features with missing values will be reduced by one.
将数据集中的60%作为训练集,40%作为测试集,以上线时间、电影热度榜名次、热度、点赞数、评论数和评分为自变量,周平均访问时长为预测对象进行建模形成预测模型,得到预测值。输出特征重要性,剔除掉重要性较低的特征,减小模型复杂度,调整参数,使模型预测准确率达到较理想的值,得到最终模型,使用建好的模型预测下周的视频周平均访问时长。Take 60% of the data set as the training set and 40% as the test set. The online time, movie popularity ranking, popularity, number of likes, number of comments and scores are independent variables, and the average weekly visit time is used as the prediction object to model the formation. Forecast model to get the forecast value. Output feature importance, remove less important features, reduce model complexity, adjust parameters, make model prediction accuracy reach a more ideal value, get the final model, and use the built model to predict the weekly average of the next week’s video Duration of the visit.
二、建立缓存替换模型2. Establish a cache replacement model
假设某边缘服务器的缓存空间大小为S,测试集中无法被边缘服务器缓存的视频存储在云上,测试集所有视频的预测访问时长集合为T={t 1,t 2,…,t K},视频体积集合为V={v 1,v 2,…,v K},其中K为测试集内的视频总数,缓存替换前边缘服务器内有缓存视频个数为R;云中有视频Q部,K=R+Q,其中缓存替换示意图如图2所示,图中的缓存替换顺序并不代表实际替换过程是按序替换的。建立缓存替换模型如下式所示: Assuming that the cache space size of an edge server is S, the video in the test set that cannot be cached by the edge server is stored on the cloud, and the predicted access time set of all videos in the test set is T = {t 1 ,t 2 ,...,t K }, The video volume set is V={v 1 ,v 2 ,...,v K }, where K is the total number of videos in the test set, the number of cached videos in the edge server before cache replacement is R; there is a video Q part in the cloud, K=R+Q, where a schematic diagram of cache replacement is shown in Figure 2. The order of cache replacement in the figure does not mean that the actual replacement process is replaced in order. The establishment of a cache replacement model is as follows:
Figure PCTCN2020086550-appb-000009
Figure PCTCN2020086550-appb-000009
其中
Figure PCTCN2020086550-appb-000010
为视频的最佳缓存替换方案,a i代表的是边缘服务器中第i部视频,a i=0表示视频i需要被替换,a i=1表示视频i不需要被替换,b j代表的是云中的第j部视频,b j=0代表视频j不用继续在云中存储,需要被替换进边缘服务器,b j=1 代表视频j依然在云中存储,不需要被替换进边缘服务器;式子
Figure PCTCN2020086550-appb-000011
代表以访问时长替换标准时的边缘服务器替换性价比,其有两种可能性,当a i=0时,该式为0,无实际意义,当a i=1时,表示视频i的访问时长与视频i体积的比值,该值是为了权衡访问时长和视频体积。
in
Figure PCTCN2020086550-appb-000010
It is the best cache replacement solution for videos, a i represents the i-th video in the edge server, a i =0 means that video i needs to be replaced, a i =1 means that video i does not need to be replaced, and b j represents For the j-th video in the cloud, b j =0 means that video j does not need to be stored in the cloud and needs to be replaced into the edge server, b j =1 means that video j is still stored in the cloud and does not need to be replaced into the edge server; formula
Figure PCTCN2020086550-appb-000011
It represents the cost-effectiveness of replacing the standard edge server with the access duration. There are two possibilities. When a i =0, the formula is 0, meaningless. When a i =1, it means the access duration of video i and the video The ratio of i volume, this value is to weigh the length of the visit and the volume of the video.
假设视频i预测得到的访问时长很高,但同时该视频的体积非常大,会占用很大的边缘服务器缓存内存,如果这样的视频个数较多,那么势必会使边缘服务器内可以缓存的视频大大减少,缓存替换效果反而得不到保证,因此定义式子
Figure PCTCN2020086550-appb-000012
表示视频i的缓存替换性价比,优化目的即为使视频缓存替换性价比最大化;同样,式子
Figure PCTCN2020086550-appb-000013
代表视频j的云缓存替换性价比,当b j=1时,该式为0,无实际意义,当b j=0时,其物理意义同上;第一个约束条件表示云中缓存替换到边缘服务器内的视频总体积不能大于边缘服务器内被缓存替换掉的视频总体积,否则边缘服务器内将缓存不下替换进来的视频;第二个约束条件表示边缘服务器内没有被替换掉的视频和从云中替换进边缘服务器的视频体积总和不能大于边缘服务器的缓存空间。
Suppose that the predicted access time of video i is very high, but at the same time the volume of the video is very large, which will occupy a large amount of cache memory of the edge server. If there are many such videos, it will inevitably make the video that can be cached in the edge server Greatly reduced, the cache replacement effect is not guaranteed, so the definition formula
Figure PCTCN2020086550-appb-000012
Indicates the cost-effectiveness of buffer replacement of video i, and the purpose of optimization is to maximize the cost-effectiveness of video buffer replacement; similarly, the formula
Figure PCTCN2020086550-appb-000013
Represents the cost-effectiveness of cloud cache replacement for video j. When b j =1, the formula is 0, meaningless. When b j =0, the physical meaning is the same as above; the first constraint indicates that the cloud cache is replaced with the edge server The total video volume in the edge server cannot be greater than the total volume of the video replaced by the cache in the edge server, otherwise the cache in the edge server will not be enough to replace the video; the second constraint is that the video that is not replaced in the edge server and the video from the cloud The total volume of videos replaced into the edge server cannot be greater than the cache space of the edge server.
三、使用隐枚举法求解缓存替换模型Third, use the implicit enumeration method to solve the cache replacement model
上述模型本质上是个0-1整形规划问题,采用隐枚举法求解该问题,检查变量为0或者1组合的一部分,比较目标函数值,以求得最优解。The above model is essentially a 0-1 plastic programming problem. The implicit enumeration method is used to solve the problem. The variables are checked as part of the combination of 0 or 1, and the objective function values are compared to find the optimal solution.
首先寻找一个可行解,产生过滤条件,过滤条件即为满足目标函数值优于计算过的可行解目标函数值的约束条件。令总访问时长性价比为:First, find a feasible solution and generate a filter condition. The filter condition is to satisfy the constraint condition that the objective function value is better than the objective function value of the feasible solution that has been calculated. Let the cost of total visit time be:
Figure PCTCN2020086550-appb-000014
Figure PCTCN2020086550-appb-000014
假设边缘服务器的容量为S,将每次计算得到的新总访问时长为TC',为减少枚举个数,令初始条件为
Figure PCTCN2020086550-appb-000015
其中{a 1,a 2,…,a K}部分集合为缓存替换前的视频缓存集合,{b 1,b 2,…,b Q}部分为云中视频初始缓存视频集合,将初始条件代入式(2),得到初始总访问时长性价比TC 0,新增约束条件:
Assuming that the capacity of the edge server is S, the new total access duration calculated each time is TC'. In order to reduce the number of enumerations, let the initial condition be
Figure PCTCN2020086550-appb-000015
The {a 1 ,a 2 ,...,a K } part of the set is the video cache set before the cache replacement, and the {b 1 ,b 2 ,...,b Q } part is the initial cache video set of the cloud video, and the initial conditions are substituted into Equation (2), the initial total access time cost performance TC 0 is obtained , and the new constraint conditions are added:
TC>TC 0               (3) TC>TC 0 (3)
其中TC为每次迭代后得到的总访问时长性价比,为了在迭代过程中有效的剪枝,并使替换效率最高,将优化目标按照系数有序排列,将集合{a 1,a 2,…,a K}中的变量按照性价比系数从大到小排列,将集合{b 1,b 2,…,b Q}中的变量按照性价比系数从小到达排列,在遍历时,两部分集合都从右往左遍历,这样排序的目的是将性价比较小的视频优先进行替换,在替换时,从云中性价比较高的视频开始替换,达到剪枝效果。 Among them, TC is the total access time cost performance obtained after each iteration. In order to effectively prune during the iteration process and maximize the replacement efficiency, the optimization targets are arranged in order of coefficients, and the set {a 1 ,a 2 ,..., The variables in a K } are arranged in descending order of the cost-effectiveness coefficient, and the variables in the set {b 1 ,b 2 ,...,b Q } are arranged in descending order of the cost-effectiveness coefficient. When traversing, the two parts of the set are from right to Left traversal, the purpose of this sorting is to replace the videos with lower cost-effectiveness first. When replacing, start from the videos with higher cost-effectiveness in the cloud to achieve the pruning effect.
将新增约束条件式(3)作为约束式①,缓存替换模型(1)中的约束式依次为约束式②和约束式③,计算过程如下:Regarding the newly added constraint formula (3) as the constraint formula ①, the constraint formulas in the cache replacement model (1) are the constraint formula ② and the constraint formula ③ in sequence, and the calculation process is as follows:
(1)从后往前替换集合{a 1,a 2,…,a K}中的一部已缓存视频,即将该视频的a i=1置为a i=0; (1) Replace a cached video in the set {a 1 , a 2 ,..., a K } from back to front, that is, set a i =1 of the video to a i =0;
(2)从后往前遍历集合{b 1,b 2,…,b Q},计算新的总访问时长TC; (2) Traverse the set {b 1 ,b 2 ,...,b Q } from back to front, and calculate the new total access time TC;
(3)比较TC和TC 0,若TC≥TC 0,则将TC 0置为新的值TC,即令TC 0=TC,继续步骤(4),否则重新进行步骤(1),进行下一次迭代,TC 0不变; (3) comparing the TC and TC 0, if TC≥TC 0, then 0 TC TC is set to a new value, and even if TC 0 = TC, continue with step (4), otherwise repeat steps (1), one iteration , TC 0 remains unchanged;
(4)计算约束条件②,若满足,则进行步骤(5),否则重新进行步骤(1),进行下一次迭代,TC 0不变; (4) Calculate the constraint condition ②, if it is satisfied, proceed to step (5), otherwise proceed to step (1) again, and proceed to the next iteration, and TC 0 remains unchanged;
(5)计算约束条件③,若满足,则本次迭代满足所有约束条件,TC 0即为新值,并在此处进行剪枝,即停止遍历集合{b 1,b 2,…,b Q},从步骤(1)开始进行下一次迭代。 (5) Calculate the constraint ③, if it is satisfied, then this iteration meets all the constraints, TC 0 is the new value, and pruning is performed here, that is, stop traversing the set {b 1 ,b 2 ,...,b Q }, start the next iteration from step (1).
在上述迭代过程中,集合{b 1,b 2,…,b Q}中同时从1变为0的这部视频代表替换集合{a 1,a 2,…,a K}中从1变为0的那一部视频,在实际视频替换中,一部视频可能由于体积较大存在同时被两部或三部及以上视频替换的情况非常少,因此不考虑多部视频替换一部视频的情况,即在遍历集合{b 1,b 2,…,b Q}时,集合{b 1,b 2,…,b Q}内2位及以上同时变化的情况不考虑,由此极大地减少了迭代次数和计算量,最终得到最优的
Figure PCTCN2020086550-appb-000016
替换方案。
In the above iterative process, the video in the set {b 1 ,b 2 ,...,b Q } that changes from 1 to 0 at the same time represents the replacement set {a 1 ,a 2 ,...,a K } from 1 to For the video of 0, in the actual video replacement, a video may be replaced by two or three or more videos at the same time due to its large size. Therefore, the situation of replacing one video with multiple videos is not considered. , That is, when traversing the set {b 1 ,b 2 ,...,b Q }, the simultaneous change of 2 or more bits in the set {b 1 ,b 2 ,...,b Q} is not considered, which greatly reduces The number of iterations and the amount of calculation, and finally get the best
Figure PCTCN2020086550-appb-000016
Alternative plan.
本实施例利用已有数据仿真结果来说明本发明的缓存替换效果。首先是随机森林算法预测效果。令测试视频集为c={c 1,c 2,…c K},其预测周平均访问时长集合为t={t 1,t 2,…t K},实际视频的周平均访问时长集合为t'={t' 1,t' 2,…t' K},则周平均访问时长预测正确率为: This embodiment uses existing data simulation results to illustrate the cache replacement effect of the present invention. The first is the prediction effect of the random forest algorithm. Let the test video set be c={c 1 ,c 2 ,...c K }, its predicted weekly average visit time set is t={t 1 ,t 2 ,...t K }, and the weekly average visit time set of actual videos is t'={t' 1 ,t' 2 ,...t' K }, the prediction accuracy of weekly average visit time is:
Figure PCTCN2020086550-appb-000017
Figure PCTCN2020086550-appb-000017
上式第二项表示预测得到的访问时长误差占实际总访问时长的比值,该值越小表示预测效果越好。周平均总共访问时长与实际周平均访问时长比较图如图3所示,经过计算得到P at=95.1%。 The second term of the above formula represents the ratio of the predicted visit duration error to the actual total visit duration. The smaller the value, the better the prediction effect. The comparison chart between the average weekly total visit time and the actual weekly average visit time is shown in Figure 3. After calculation, P at = 95.1%.
假设预测周平均访问时长性价比集合为tp={tp 1,tp 2,…,tp K},实际周平均访问时长性价比集合为tp'={tp' 1,tp' 2,…,tp' K},则定义周平均访问时长性价比预测正确率为: Suppose the predicted weekly average visit time cost performance set is tp={tp 1 ,tp 2 ,…,tp K }, and the actual weekly average visit time cost performance set is tp'={tp' 1 ,tp' 2 ,…,tp' K } , Then the correct rate of predicting the average weekly visit time cost performance is defined as:
Figure PCTCN2020086550-appb-000018
Figure PCTCN2020086550-appb-000018
周平均访问时长性价比与实际周平均访问时长性价比比较图如图4所示,经过计算得到P tp=94.7%。 A comparison chart of the price/performance ratio between the average weekly visit time and the actual weekly average visit time is shown in Fig. 4. After calculation, P tp = 94.7%.
以上结果说明本发明中的随机森林预测结果准确率非常高。接下来对缓存替换模型的替换效果进行仿真验证。假设缓存替换前缓存的视频集为c,其中u为边缘服务器内所缓存的视频个数,缓存替换后的视频集为c',定义视频的缓存替换率为:The above results indicate that the accuracy of the random forest prediction results in the present invention is very high. Next, simulate and verify the replacement effect of the cache replacement model. Assuming that the video set cached before the cache replacement is c, where u is the number of videos cached in the edge server, and the video set after the cache replacement is c', the video cache replacement rate is defined as:
Figure PCTCN2020086550-appb-000019
Figure PCTCN2020086550-appb-000019
经过计算P re=11.6%。 After calculation, P re = 11.6%.
假设缓存替换前边缘服务器内缓存视频的周平均访问时长为t c={t 1,t 2,…,t u},缓存替换后边缘服务器内缓存视频的周平均访问时长为t c'={t 1,t 2,…,t u},定义访问时长增加率,表达式如下: Assuming that the weekly average access time of the video cached in the edge server before the cache replacement is t c = {t 1 , t 2 ,..., tu }, the weekly average access time of the video cached in the edge server after the cache replacement is t c' = { t 1 ,t 2 ,…,t u }, define the increase rate of access duration, the expression is as follows:
Figure PCTCN2020086550-appb-000020
Figure PCTCN2020086550-appb-000020
式(7)表示缓存替换后视频的周平均访问时长之和与缓存替换前视频的周平均访问时长之和之差与缓存替换前视频的周平均访问时长之和之比,若P t≤0,则说明缓存替换后视频的访问时长不及缓存替换前视频的访问时长或与缓存替换前没有差别,即缓存替换后边缘服务器为核心网分担的负载不增或者更小了,缓存替换效果非常差。若P t>0,则说明缓存替换后视频的访问时长大于缓存替换前视频的访问时长,即缓存替换后边缘服务器为核心网分担的负载更大了,P t值越大,缓存替换后的边缘服务器为核心网分担的负载就越多。经过计算,P t=8.7%,说明本发明缓存替换模型有效增加了边缘服务器为核心网分担的负载量。 Equation (7) represents the ratio of the difference between the sum of the weekly average access time of the video after cache replacement and the sum of the weekly average access time of the video before the cache replacement to the sum of the weekly average access time of the video before the cache replacement, if P t ≤ 0 , It means that the access time of the video after the cache replacement is less than the access time of the video before the cache replacement or the same as before the cache replacement, that is, the load shared by the edge server for the core network after the cache replacement is not increased or smaller, and the cache replacement effect is very poor . Long if P t> 0, then the access to the cache replacement video when grown in the cache replace access to the front of the video, i.e., greater post-cache substitution edge server as the core network share the load, the larger P t value, the cache replacement The edge server will share more load for the core network. After calculation, P t =8.7%, indicating that the cache replacement model of the present invention effectively increases the load shared by the edge server for the core network.
周预测模型和缓存替换模型随时间变化仿真图如图5和图6所示,由此可知,周平均访问时长预测准确率和周平均访问时长性价比预测准确率随时间的推移在降低,而缓存替换率和访问时长增长率随时间的推移在升高,其中缓存替换率的升高趋势较快,但曲线整体随时间变化趋势比较平稳,没有较大的波动,因此本发明在实际应用中的算法更新频次,节约计算资源。The simulation diagrams of the weekly prediction model and the cache replacement model over time are shown in Figure 5 and Figure 6. It can be seen that the prediction accuracy of the weekly average access time and the prediction accuracy of the weekly average access time cost performance are decreasing with the passage of time, while the cache The replacement rate and the growth rate of access duration are increasing with the passage of time. Among them, the increasing trend of the cache replacement rate is relatively fast, but the overall trend of the curve is relatively stable over time, and there is no large fluctuation. Therefore, the present invention is in practical application. The algorithm update frequency saves computing resources.

Claims (5)

  1. 边缘网络环境下基于随机森林算法的VOD业务缓存替换方法,其特征在于:包括如下步骤:The VOD service buffer replacement method based on the random forest algorithm in the edge network environment is characterized in that it includes the following steps:
    S1:采集视频数据;S1: Collect video data;
    S2:使用随机森林填充法处理视频数据缺失值,建立预测模型;S2: Use random forest filling method to process missing values of video data and establish a prediction model;
    S3:通过预测模型对平均访问时长进行预测;S3: Predict the average visit duration through the predictive model;
    S4:根据预测结果建立缓存替换模型;S4: Establish a cache replacement model according to the prediction result;
    S5:使用隐枚举法求解缓存替换模型,得到最终替换方案。S5: Use implicit enumeration to solve the cache replacement model and get the final replacement scheme.
  2. 根据权利要求1所述的边缘网络环境下基于随机森林算法的VOD业务缓存替换方法,其特征在于:所述步骤S2中预测模型的建立具体为:The VOD service buffer replacement method based on the random forest algorithm in the edge network environment according to claim 1, wherein the establishment of the prediction model in step S2 specifically includes:
    以平均访问时长作为因变量,其余特征作为自变量进行回归训练,且进行数据集的划分,输出各个特征值的重要性排名,根据排名对特征进行筛选得到最终建模特征值,根据建模特征值建模形成预测模型。The average visit time is used as the dependent variable, and the remaining features are used as independent variables for regression training, and the data set is divided, the importance ranking of each feature value is output, and the features are filtered according to the ranking to obtain the final modeling feature value, according to the modeling feature Value modeling forms a predictive model.
  3. 根据权利要求1所述的边缘网络环境下基于随机森林算法的VOD业务缓存替换方法,其特征在于:所述步骤S4中缓存替换模型的建立流程具体为:The VOD service cache replacement method based on the random forest algorithm in the edge network environment according to claim 1, wherein the establishment process of the cache replacement model in step S4 is specifically:
    假设边缘服务器的缓存空间大小为S,测试集中无法被边缘服务器缓存的视频存储在云上,测试集所有视频的预测访问时长集合为T={t 1,t 2,…,t K},视频体积集合为V={v 1,v 2,…,v K},其中K为测试集内的视频总数,缓存替换前边缘服务器内有缓存视频个数为R;云中有视频Q部,K=R+Q;建立缓存替换模型如下式所示: Assuming that the cache space size of the edge server is S, the video in the test set that cannot be cached by the edge server is stored on the cloud, and the predicted access time set of all videos in the test set is T = {t 1 ,t 2 ,...,t K }, the video The volume set is V={v 1 ,v 2 ,...,v K }, where K is the total number of videos in the test set, the number of cached videos in the edge server before cache replacement is R; there is a video Q part in the cloud, K =R+Q; the establishment of a cache replacement model is shown in the following formula:
    Figure PCTCN2020086550-appb-100001
    Figure PCTCN2020086550-appb-100001
    其中
    Figure PCTCN2020086550-appb-100002
    为视频的最佳缓存替换方案,a i代表边缘服务器中第i部视频,a i=0表示视频i需要被替换,a i=1表示视频i不需要被替换,b j代表的是云中的第j部视频,b j=0代表视频j不用继续在云中存储,需要被替换进边缘服务器,b j=1代表视频j依然在云中存储,不需要被替换进边缘服务器;式子
    Figure PCTCN2020086550-appb-100003
    代表以访问时长替换 标准时的边缘服务器替换性价比,其有两种可能性,当a i=0时,该式为0,无实际意义,当a i=1时,表示视频i的访问时长与视频i体积的比值;
    in
    Figure PCTCN2020086550-appb-100002
    It is the best cache replacement solution for video, a i represents the i-th video in the edge server, a i =0 means that video i needs to be replaced, a i =1 means that video i does not need to be replaced, and b j represents the cloud B j =0 means that video j does not need to be stored in the cloud and needs to be replaced into the edge server, b j =1 means that video j is still stored in the cloud and does not need to be replaced into the edge server;
    Figure PCTCN2020086550-appb-100003
    It represents the cost-effectiveness of replacing the standard edge server with the access duration. There are two possibilities. When a i =0, the formula is 0, meaningless. When a i =1, it means the access duration of video i and the video i volume ratio;
    定义式子
    Figure PCTCN2020086550-appb-100004
    表示视频i的缓存替换性价比;同样,式子
    Figure PCTCN2020086550-appb-100005
    代表视频j的云缓存替换性价比,当b j=1时,该式为0,无实际意义。
    Definition
    Figure PCTCN2020086550-appb-100004
    Represents the cost-effectiveness of buffer replacement for video i; similarly, the formula
    Figure PCTCN2020086550-appb-100005
    It represents the cost-effectiveness of cloud cache replacement for video j. When b j =1, this formula is 0, which has no practical meaning.
  4. 根据权利要求3所述的边缘网络环境下基于随机森林算法的VOD业务缓存替换方法,其特征在于:所述步骤S5中缓存替换模型的求解过程为:The VOD service cache replacement method based on the random forest algorithm in the edge network environment according to claim 3, characterized in that: the solution process of the cache replacement model in step S5 is:
    令总访问时长性价比为:Let the cost of total visit time be:
    Figure PCTCN2020086550-appb-100006
    Figure PCTCN2020086550-appb-100006
    假设边缘服务器的容量为S,将每次计算得到的新总访问时长为TC',为减少枚举个数,令初始条件为
    Figure PCTCN2020086550-appb-100007
    其中{a 1,a 2,…,a K}部分为缓存替换前的视频缓存集合,{b 1,b 2,…,b Q}部分为云中视频初始缓存视频集合,将初始条件代入式(2),得到初始总访问时长性价比TC 0,新增约束条件:
    Assuming that the capacity of the edge server is S, the new total access duration calculated each time is TC'. In order to reduce the number of enumerations, let the initial condition be
    Figure PCTCN2020086550-appb-100007
    The {a 1 ,a 2 ,...,a K } part is the video cache set before the cache replacement, and the {b 1 ,b 2 ,...,b Q } part is the initial cache video set of the video in the cloud, and the initial conditions are substituted into the formula (2) Get the initial total access time cost performance TC 0 , and add new constraints:
    TC>TC 0   (3) TC>TC 0 (3)
    将约束条件式(3)、缓存替换模型中的两个约束式进行迭代计算,得到最优的
    Figure PCTCN2020086550-appb-100008
    替换方案。
    Iteratively calculate the constraint condition equation (3) and the two constraint equations in the cache replacement model to obtain the optimal
    Figure PCTCN2020086550-appb-100008
    Alternative plan.
  5. 根据权利要求4所述的边缘网络环境下基于随机森林算法的VOD业务缓存替换方法,其特征在于:所述迭代计算具体为:The VOD service buffer replacement method based on the random forest algorithm in an edge network environment according to claim 4, wherein the iterative calculation is specifically:
    将约束条件式(3)作为约束式①、缓存替换模型中的两个约束式分别作为约束式②和约束式③,具体计算过程如下:Regarding constraint equation (3) as constraint equation ①, and the two constraint equations in the cache replacement model as constraint equation ② and constraint equation ③, the specific calculation process is as follows:
    1)从后往前替换集合{a 1,a 2,…,a K}中的一部已缓存视频,即将该视频的a i=1置为a i=0; 1) Replace a cached video in the set {a 1 , a 2 ,..., a K } from back to front , that is, set a i =1 of the video to a i =0;
    2)从后往前遍历集合{b 1,b 2,…,b Q},计算新的总访问时长TC; 2) Traverse the set {b 1 ,b 2 ,...,b Q } from back to front, and calculate the new total access time TC;
    3)比较TC和TC 0,若TC≥TC 0,则将TC 0置为新的值TC,即令TC 0=TC,继续步骤4,否则重新进行步骤1,进行下一次迭代,TC 0不变; 3) Comparison of TC and TC 0, if TC≥TC 0, then TC 0 TC is set to a new value, and even if TC 0 = TC, proceed to step 4, otherwise repeat Step 1, the next iteration, TC 0 unchanged ;
    4)计算约束条件②,若满足,则进行步骤5,否则重新进行步骤1,进行下一次迭代,TC 0不变; 4) Calculate the constraint ②, if it is met, proceed to step 5, otherwise, proceed to step 1 again and proceed to the next iteration, and TC 0 remains unchanged;
    5)计算约束条件③,若满足,则本次迭代满足所有约束条件,TC 0即为新值,并在此处进行剪枝,即停止遍历集合{b 1,b 2,…,b Q},从步骤1开始进行下一次迭代。 5) Calculate the constraint ③, if it is satisfied, then this iteration meets all the constraints, TC 0 is the new value, and pruning is performed here, that is, stop traversing the set {b 1 ,b 2 ,...,b Q } , And proceed to the next iteration from step 1.
PCT/CN2020/086550 2020-04-20 2020-04-24 Vod service cache replacement method based on random forest algorithm in edge network environment WO2021212444A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2021520158A JP7098204B2 (en) 2020-04-20 2020-04-24 VOD service cache replacement method based on random forest algorithm in edge network environment

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010311152.9 2020-04-20
CN202010311152.9A CN111629216B (en) 2020-04-20 2020-04-20 VOD service cache replacement method based on random forest algorithm under edge network environment

Publications (1)

Publication Number Publication Date
WO2021212444A1 true WO2021212444A1 (en) 2021-10-28

Family

ID=72273187

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/086550 WO2021212444A1 (en) 2020-04-20 2020-04-24 Vod service cache replacement method based on random forest algorithm in edge network environment

Country Status (3)

Country Link
JP (1) JP7098204B2 (en)
CN (1) CN111629216B (en)
WO (1) WO2021212444A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114584801A (en) * 2022-01-13 2022-06-03 北京理工大学 Video resource caching method based on graph neural network recommendation algorithm

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112073752B (en) * 2020-09-08 2022-04-22 北京一起教育信息咨询有限责任公司 Multi-line flow distribution method and device and electronic equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1996996A (en) * 2006-12-19 2007-07-11 北京邮电大学 The method for stream media file buffer for the mobile stream media proxy server
US20150201223A1 (en) * 2010-11-02 2015-07-16 InnFlicks Media Group, LLC Processing, storing, and delivering digital content
CN104822068A (en) * 2015-04-29 2015-08-05 四达时代通讯网络技术有限公司 Streaming media proxy cache replacing method and device
KR20180105351A (en) * 2017-03-15 2018-09-28 한국전자통신연구원 Transmission Service Apparatus Of Customized Advertisement Image And Method Of Thereof
CN108833468A (en) * 2018-04-27 2018-11-16 广州西麦科技股份有限公司 Method for processing video frequency, device, equipment and medium based on mobile edge calculations
CN109788319A (en) * 2017-11-14 2019-05-21 中国科学院声学研究所 A kind of data cache method

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008234352A (en) 2007-03-20 2008-10-02 Nec Corp Deficit value complementing method and device
US8589477B2 (en) 2008-03-12 2013-11-19 Nec Corporation Content information display device, system, and method used for creating content list information based on a storage state of contents in a cache
US9147129B2 (en) * 2011-11-18 2015-09-29 Honeywell International Inc. Score fusion and training data recycling for video classification
US10235605B2 (en) * 2013-04-10 2019-03-19 Microsoft Technology Licensing, Llc Image labeling using geodesic features
CN104053024B (en) * 2014-06-19 2017-02-01 华东师范大学 Short-period video-on-demand volume prediction system based on small number of data
TWI524695B (en) 2014-08-20 2016-03-01 國立清華大學 Node-based sequential implicit enumeration method and system thereof
CN108510096A (en) * 2017-02-24 2018-09-07 百度在线网络技术(北京)有限公司 Trade company's attrition prediction method, apparatus, equipment and storage medium
CN108259929B (en) * 2017-12-22 2020-03-06 北京交通大学 Prediction and caching method for video active period mode
CN108322819B (en) * 2018-01-18 2020-07-21 北京奇艺世纪科技有限公司 Method and device for predicting user behavior
CN108600836B (en) * 2018-04-03 2020-11-13 北京奇艺世纪科技有限公司 Video processing method and device
CN109523086B (en) * 2018-11-26 2021-08-24 浙江蓝卓工业互联网信息技术有限公司 Quality prediction method and system for chemical products based on random forest
CN110891283A (en) * 2019-11-22 2020-03-17 超讯通信股份有限公司 Small base station monitoring device and method based on edge calculation model

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1996996A (en) * 2006-12-19 2007-07-11 北京邮电大学 The method for stream media file buffer for the mobile stream media proxy server
US20150201223A1 (en) * 2010-11-02 2015-07-16 InnFlicks Media Group, LLC Processing, storing, and delivering digital content
CN104822068A (en) * 2015-04-29 2015-08-05 四达时代通讯网络技术有限公司 Streaming media proxy cache replacing method and device
KR20180105351A (en) * 2017-03-15 2018-09-28 한국전자통신연구원 Transmission Service Apparatus Of Customized Advertisement Image And Method Of Thereof
CN109788319A (en) * 2017-11-14 2019-05-21 中国科学院声学研究所 A kind of data cache method
CN108833468A (en) * 2018-04-27 2018-11-16 广州西麦科技股份有限公司 Method for processing video frequency, device, equipment and medium based on mobile edge calculations

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HUANG DAN, SONG RONGFANG: "Cache replacement strategy based on content value", DIANXIN KEXUE - TELECOMMUNICATIONS SCIENCE, RENMIN YOUDIAN CHUBANSHE, BEIJING, CN, no. 11, 30 November 2018 (2018-11-30), CN , pages 59 - 66, XP055860769, ISSN: 1000-0801, DOI: 10.11959/j.issn.1000−0801.2018265 *
ZHANG HUI; SUN YEJUN; LOU YAXIANG; ZHAO HAITAO; ZHU HONGBO; SUN YANFEI: "Modelling and Optimization Algorithm for Dynamic Volume of Access to VOD Business in Ubiquitous Wireless Environment", 2019 IEEE 11TH INTERNATIONAL CONFERENCE ON COMMUNICATION SOFTWARE AND NETWORKS (ICCSN), IEEE, 12 June 2019 (2019-06-12), pages 245 - 250, XP033662046, DOI: 10.1109/ICCSN.2019.8905386 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114584801A (en) * 2022-01-13 2022-06-03 北京理工大学 Video resource caching method based on graph neural network recommendation algorithm
CN114584801B (en) * 2022-01-13 2022-12-09 北京理工大学 Video resource caching method based on graph neural network recommendation algorithm

Also Published As

Publication number Publication date
CN111629216B (en) 2021-04-06
JP7098204B2 (en) 2022-07-11
JP2022530175A (en) 2022-06-28
CN111629216A (en) 2020-09-04

Similar Documents

Publication Publication Date Title
CN103309958B (en) The star-like Connection inquiring optimization method of OLAP under GPU and CPU mixed architecture
CN101183368B (en) Method and system for distributed calculating and enquiring magnanimity data in on-line analysis processing
US20100293135A1 (en) Highconcurrency query operator and method
WO2021212444A1 (en) Vod service cache replacement method based on random forest algorithm in edge network environment
Ordentlich et al. Network-efficient distributed word2vec training system for large vocabularies
Hao et al. Ts-benchmark: A benchmark for time series databases
CN111091247A (en) Power load prediction method and device based on deep neural network model fusion
US20070061289A1 (en) Validator and method for managing database system performance
Nin et al. Speed up gradual rule mining from stream data! A B-Tree and OWA-based approach
CN109978177A (en) Model training method, method for processing business, device and relevant device
CN115904638B (en) Intelligent management method and system for database transaction
CN109635186A (en) A kind of real-time recommendation method based on Lambda framework
Li et al. Resource usage prediction based on BiLSTM-GRU combination model
Wang et al. An Improved Memory Cache Management Study Based on Spark.
Bi et al. Accurate Prediction of Workloads and Resources with Multi-head Attention and Hybrid LSTM for Cloud Data Centers
CN111988412A (en) Intelligent prediction system and method for multi-tenant service resource demand
CN111629217B (en) XGboost algorithm-based VOD (video on demand) service cache optimization method in edge network environment
CN113407587B (en) Data processing method, device and equipment for online analysis processing engine
CN114896285A (en) Bank flow calculation service real-time index system based on multi-dimensional intermediate state aggregation
Zhen et al. Improved Hybrid Collaborative Fitering Algorithm Based on Spark Platform
US20110258187A1 (en) Relevance-Based Open Source Intelligence (OSINT) Collection
Shi et al. HEDC++: an extended histogram estimator for data in the cloud
Ma et al. Knowledge graph based recommendation algorithm for educational resource
Xie Research on vertical search method of multidimensional resources in English discipline based on edge computing
CN115277708B (en) Dynamic load distribution method for streaming media server

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2021520158

Country of ref document: JP

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20932521

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20932521

Country of ref document: EP

Kind code of ref document: A1