CN108965949A

CN108965949A - Meet the code rate adaptive approach of user individual experience in a kind of video traffic

Info

Publication number: CN108965949A
Application number: CN201810844053.XA
Authority: CN
Inventors: 崔勇; 王莫为; 左旭彤; 杨啖
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2018-07-27
Filing date: 2018-07-27
Publication date: 2018-12-07
Anticipated expiration: 2038-07-27
Also published as: CN108965949B

Abstract

The scheme that personalized user is experienced in video traffic is a kind of technology for improving user's viewing experience in video display process.The purpose is to construct a function approximator by one neural network of design, influence of the code rate selection to subsequent video broadcast performance index is predicted, to meet different user experience demands.Its design cycle is 1) to assess: assessing influence of each code rate selection to different first performance indicators using neural network.2) decision: the assessed value to first performance indicator obtained using evaluation process is explicitly multiplied with optimization aim g, selects the corresponding code rate of maximum value.The present invention may be implemented under different optimization aims user experience and maximize, while when the optimization aim of user experience changes, can quick low overhead realization it is extensive on ownership goal.

Description

A Bit Rate Adaptive Method to Satisfy User's Personalized Experience in Video Service

技术领域technical field

本发明属于流媒体视频技术领域，涉及用户体验优化，特别涉及一种视频业务中满足用户个性化体验的码率自适应方法。The invention belongs to the technical field of streaming media video, relates to user experience optimization, and in particular to a code rate self-adaptive method for satisfying user personalized experience in video services.

背景技术Background technique

近年来，互联网中视频业务兴起，预计到2019年，在整个互联网流量中，视频流量占近八成。视频性能的问题变得越来越重要，因为视频的性能直接影响了用户的感受，进而影响了用户观看视频的时长，最终影响内容提供商的收益。用户期望视频可以更加清晰，视频播放过程中不会出现卡顿，而且视频流畅、时延低。但是这些性能指标之间是互相矛盾、彼此制约的。随着新场景和新的表现形式的出现，如直播场景、虚拟现实(VR)等，满足用户体验的要求变得更加具有挑战性。In recent years, video services on the Internet have risen. It is estimated that by 2019, video traffic will account for nearly 80% of the total Internet traffic. The problem of video performance is becoming more and more important, because the performance of video directly affects the experience of users, which in turn affects the length of time users watch videos, and ultimately affects the revenue of content providers. Users expect clearer videos, no stuttering during video playback, and smooth videos with low latency. However, these performance indicators are mutually contradictory and restrict each other. With the emergence of new scenarios and new forms of expression, such as live broadcast scenarios, virtual reality (VR), etc., it becomes more challenging to meet the requirements of user experience.

描述和量化用户体验和用户对视频的需求的工具是用户体验质量(QoE)。码率自适应(ABR)算法是提升用户QoE的常用方法，它通过为下一个要播放的视频块选择一个合适的码率来最大化用户体验。用户QoE一般包括如下几个元指标：码率、视频卡顿时长、码率切换、时延。观看视频时，不同用户和不同观看场景对于QoE的各个性能指标的需求是不同的。例如游戏直播的情况下，用户更希望有高清晰度的视频，而且不希望出现卡顿，但是对于时延的要求较低。如果是交互性强的场景，用户可能对于时延的要求更高，而对于清晰度的要求会低于时延。所以当面对不同的用户时，提供一个满足用户个性化体验需求的方法是有意义的。权衡不同性能指标以达到用户体验最大化成为学术界和工业界关注和研究的关键点。A tool to describe and quantify user experience and user demand for video is user quality of experience (QoE). Adaptive bit rate (ABR) algorithm is a common method to improve user QoE, which maximizes user experience by selecting an appropriate bit rate for the next video block to be played. User QoE generally includes the following meta-indicators: bit rate, video freeze duration, bit rate switching, and delay. When watching videos, different users and different viewing scenarios have different requirements for various performance indicators of QoE. For example, in the case of live game broadcasting, users prefer to have high-definition video and do not want to experience stuttering, but they have lower requirements for latency. If it is a highly interactive scene, users may have higher requirements for delay, but lower requirements for clarity. Therefore, when facing different users, it is meaningful to provide a method that meets the user's personalized experience needs. Balancing different performance indicators to maximize user experience has become a key point of concern and research in academia and industry.

发明内容Contents of the invention

针对上述视频业务中提升用户体验存在的本质难题以及希望满足用户个性化体验的问题，本发明提出了视频业务中满足用户个性化体验的码率自适应方法一个具有泛化能力的模型，以实现视频播放中个性化用户体验的目标。本发明是一个基于强化学习的码率自适应算法，能够根据网络环境，选择在该网络场景下最适合的码率，优化视频业务中各项性能指标，以满足用户的个性化体验需求。本算法性能优于之前的码率自适应算法，即在特定用户QoE目标的情况下提供最好的用户体验。同时，当用户或者播放内容改变时，算法可以快速且低开销的在用户偏好上进行泛化，最终提高视频播放过程中用户的观看体验，实现不同的优化目标下用户体验最大化。Aiming at the essential problem of improving user experience in the above-mentioned video service and the problem of satisfying the user's personalized experience, the present invention proposes a code rate adaptive method for satisfying the user's personalized experience in the video service, a model with generalization ability, in order to realize The goal of personalizing the user experience in video playback. The present invention is a code rate self-adaptive algorithm based on reinforcement learning, which can select the most suitable code rate in the network scene according to the network environment, optimize various performance indicators in the video service, and meet the user's personalized experience requirements. The performance of this algorithm is better than the previous code rate adaptive algorithm, that is, it provides the best user experience under the condition of specific user QoE target. At the same time, when the user or the playback content changes, the algorithm can generalize user preferences quickly and with low overhead, ultimately improving the user's viewing experience during video playback and maximizing user experience under different optimization goals.

为了实现上述目的，本发明采用的技术方案是：In order to achieve the above object, the technical scheme adopted in the present invention is:

一种视频业务中满足用户个性化体验的码率自适应方法视频业务中满足用户个性化体验的码率自适应方法，其特征在于，利用神经网络作为评价函数Q(s,a,m,g)，评估每一个码率选择a对不同元性能指标m的影响，利用评估过程得到的对元性能指标的评估值，与优化目标权重值即给定用户偏好g显式相乘，选出最大值对应的码率，从而满足不同的用户体验需求，其中所述评价函数Q(s,a,m,g)表示每个码率选择a在不同的网络状态s和给定用户偏好g的情况下，如何影响每个元性能指标m。A code rate adaptive method for satisfying user's personalized experience in video service A code rate adaptive method for satisfying user's personalized experience in video service, characterized in that neural network is used as evaluation function Q(s,a,m,g ), evaluate the impact of each code rate selection a on different meta-performance indicators m, and use the evaluation value of the meta-performance indicators obtained in the evaluation process to explicitly multiply the optimization target weight value, that is, the given user preference g, to select the largest The code rate corresponding to the value, so as to meet different user experience requirements, wherein the evaluation function Q(s, a, m, g) represents the situation of each code rate selection a in different network states s and given user preference g Next, how to affect each meta-performance index m.

所述评估过程的输入由状态值s以及优化目标权重值g组成，其中状态值s描述了网络的状况以及缓冲区占用情况；优化目标权重值g表示不同的用户视频性能需求；The input of described evaluation process is made up of state value s and optimized target weight value g, wherein state value s has described the situation of network and buffer zone occupancy situation; Optimized target weight value g represents different user video performance demands;

所述评估过程的输出是到视频播放结束时的QoE观测值的累积和，输出为Q_∞(s,a,m,g)，此式中用∞表示视频播放结束。The output of the evaluation process is the cumulative sum of QoE observations until the end of video playback, and the output is Q _∞ (s, a, m, g), where ∞ represents the end of video playback.

用元性能指标m与用户偏好g的线性组合来表示用户体验QoE，则Using the linear combination of meta-performance index m and user preference g to represent user experience QoE, then

其中，N是播放的一个视频中的块的个数，R_n是第n个块的码率，q(R_n)是第n个视频块质量，T_n是第n块的卡顿时间，|q(R_n+1)-q(R_n)|是视频播放时相邻两块的码率差，表示视频的平滑，D_n是下载第n块的时延，α,β,γ,μ是优化目标g的四项。Wherein, N is the number of blocks in a video played, R _n is the code rate of the nth block, q(R _n ) is the nth video block quality, T _n is the stall time of the nth block, |q(R _n+1 )-q(R _n )| is the code rate difference between two adjacent blocks during video playback, indicating the smoothness of the video, D _n is the delay of downloading the nth block, α, β, γ, μ is the four terms of the optimization objective g.

所述评估过程的两部分输入为状态值s和优化目标权重值g，分别被两个神经网络处理，两个模块的输出连接作为下一个神经网络的输入，未来的QoE值基于连接起来的输入，神经网络同时输出各个动作对应的未来观测值，神经网络分为两个模块，一个是期望模块，预测的是未来QoE观测值的平均值，该部分值仅与状态值s有关，与动作无关；另一个是动作模块，预测的是在某一个状态下，采取不同的动作对应的QoE观测值。两部分输出相加作为整个神经网络的输出，即在某一个特定的状态下，采取不同的动作所对应的到视频播放结束时的不同QoE四项元性能指标值。The two-part input of the evaluation process is the state value s and the optimization target weight value g, which are processed by two neural networks respectively, and the output of the two modules is connected as the input of the next neural network, and the future QoE value is based on the connected input , the neural network outputs the future observations corresponding to each action at the same time. The neural network is divided into two modules, one is the expectation module, which predicts the average value of the future QoE observations. This part of the value is only related to the state value s and has nothing to do with the action. ; The other is the action module, which predicts the QoE observations corresponding to different actions taken in a certain state. The output of the two parts is added together as the output of the entire neural network, that is, in a certain state, different actions are taken to correspond to different four QoE meta-performance index values at the end of video playback.

在线时，利用评估过程得到的对元性能指标的评估值，与优化目标权重值g显式相乘的计算式如下：When online, the calculation formula for explicitly multiplying the evaluation value of the meta-performance index obtained by the evaluation process with the optimization target weight value g is as follows:

a＝argmaxg^TQ_∞(s,a,m,g)a＝argmaxg ^T Q _∞ (s,a,m,g)

根据上式可以选择在某一个特定目标下的最优码率，Q值与优化目标g的乘积最大时，即得到最优目标值，对应的码率a即为此块所需选择的码率。According to the above formula, the optimal code rate under a specific target can be selected. When the product of the Q value and the optimization target g is the largest, the optimal target value is obtained, and the corresponding code rate a is the code rate to be selected for this block. .

在训练神经网络模型时，利用的是随机产生的优化目标权重值g。与现有技术相比，本发明的有益效果是：When training the neural network model, the randomly generated optimization target weight value g is used. Compared with prior art, the beneficial effect of the present invention is:

神经网络的输出维度增加。传统的强化学习算法的输出是一个标量奖励值，它表示采取一个动作之后获得的奖励，但是标量值的信息含量较小。输出维度的增加使得算法的可操作性增强。同时，可以通过设置不同的g值来满足不同用户个性化的QoE需求。The output dimensionality of the neural network increases. The output of traditional reinforcement learning algorithms is a scalar reward value, which represents the reward obtained after taking an action, but the information content of the scalar value is small. The increase of the output dimension enhances the operability of the algorithm. At the same time, different g values can be set to meet the personalized QoE requirements of different users.

附图说明Description of drawings

图1为评估过程的模型，其中输入为状态、优化目标，输出为选取每一个码率对元性能指标的累计影响。Figure 1 is a model of the evaluation process, in which the input is the state and the optimization target, and the output is the cumulative impact of selecting each code rate on the meta-performance index.

具体实施方式Detailed ways

下面结合附图和实施例详细说明本发明的实施方式。The implementation of the present invention will be described in detail below in conjunction with the drawings and examples.

本发明是一种提高视频业务中用户体验的方法，其目标是利用一个具有泛化能力的模型，实现个性化的用户体验。用户QoE一般包括如下几个元指标：码率、视频卡顿时间、码率切换、时延。观看视频的不同用户对于视频性能指标的需求是不一样的。当存在不同的视频优化目标时，本发明可以快速低开销的做出性能优化。The invention is a method for improving user experience in video services, and its goal is to use a model with generalization capability to realize personalized user experience. User QoE generally includes the following meta-indicators: bit rate, video freezing time, bit rate switching, and delay. Different users who watch videos have different requirements for video performance indicators. When there are different video optimization targets, the present invention can perform performance optimization quickly and with low overhead.

本发明的设计思想如下：Design idea of the present invention is as follows:

(1)设计思想概述：在深度强化学习框架下设计。同时，通过显式地将用户偏好g引入，将普通的强化学习的评估过程和决策过程解耦。使用神经网络作为评价函数Q(s,a,m,g)，它表示：每个码率选择在不同的网络状态和给定用户偏好g的情况下，对每个元性能指标m的影响，利用这个评价函数进行下一个块的码率选择。(1) Overview of design ideas: designed under the framework of deep reinforcement learning. At the same time, by explicitly introducing user preference g, the evaluation process and decision-making process of ordinary reinforcement learning are decoupled. Using the neural network as the evaluation function Q(s,a,m,g), it represents: the impact of each code rate selection on each meta-performance index m under different network states and given user preferences g, Use this evaluation function to select the code rate of the next block.

(2)评估过程：利用通用价值估值函数的思想，目标是构造一个函数逼近器来预测未来的元性能指标值。(2) Evaluation process: Utilizing the idea of a universal value evaluation function, the goal is to construct a function approximator to predict future meta-performance index values.

评估过程输入：输入由两部分组成，状态s，优化目标权重值g。其中状态值描述了网络的状态及缓冲区占用情况。g是优化目标对应的权重值，表示不同的用户对视频性能的不同偏好。Evaluation process input: The input consists of two parts, the state s, and the optimization target weight value g. The state value describes the state of the network and the occupancy of the buffer. g is the weight value corresponding to the optimization target, which represents different preferences of different users for video performance.

评估过程输出：输出的是视频播放结束时的QoE观测值。将传统的奖励值Q(s，a)分成A个动作度量值Q(s,a,m)，A表示可选码率数。可用元性能指标值m与用户偏好g的线性组合来表示用户体验QoE，即Evaluation process output: The output is the QoE observation value at the end of video playback. Divide the traditional reward value Q(s, a) into A action measurement values Q(s, a, m), where A represents the number of optional code rates. The linear combination of the meta-performance index value m and user preference g can be used to represent the user experience QoE, namely

简单表示即为：The simple expression is:

QoE＝g^TQQoE＝g ^T Q

因此，可以通过计算获得任何偏好g下的每一个动作的QoE。Therefore, the QoE of each action under any preference g can be obtained computationally.

评估过程模型描述：两部分输入为状态和优化目标，它们分别被两个神经网络处理，两个模块的输出连接作为下一层神经网络的输入。未来的QoE观测值基于连接起来的输入。神经网络同时输出各个动作对应的未来观测值。神经网络分为两个模块，一个是期望模块，预测的是未来QoE观测值的平均值，该部分值仅与状态值有关，与动作无关；另一个是动作模块，预测的是在某一个状态下，采取不同的动作对应的QoE观测值。两部分输出相加作为整个神经网络的输出，即在某一个特定的状态下，采取不同的动作所对应的到视频播放结束时的不同QoE四项元性能指标值。Evaluation process model description: The two parts of input are the state and the optimization target, which are processed by two neural networks respectively, and the output connection of the two modules is used as the input of the next layer of neural network. Future QoE observations are based on concatenated inputs. The neural network simultaneously outputs future observations corresponding to each action. The neural network is divided into two modules. One is the expectation module, which predicts the average value of future QoE observations. This part of the value is only related to the state value and has nothing to do with the action; the other is the action module, which predicts in a certain state Next, take different actions corresponding to QoE observations. The output of the two parts is added together as the output of the entire neural network, that is, in a certain state, different actions are taken to correspond to different four QoE meta-performance index values at the end of video playback.

(3)决策过程：在线时，该算法可以利用评估过程获得的视频播放结束时的元性能指标(清晰度、卡顿、平滑、时延)以及优化目标，(3) Decision-making process: when online, the algorithm can use the meta-performance indicators (clarity, freeze, smoothness, delay) and optimization goals obtained during the evaluation process at the end of video playback,

a＝argmaxg^TQ_∞(s,a,m,g)a＝argmaxg ^T Q _∞ (s,a,m,g)

根据上式选择在某一个特定目标下的最优码率。Select the optimal bit rate under a certain target according to the above formula.

综上所述，本发明提出了一种能够实现个性化用户体验的码率自适应算法。通过利用神经网络来构造一个函数逼近器，预测码率选择对后续视频播放性能指标的影响，从而满足不同的用户体验需求。该方案能够根据不同的播放内容、用户以及用户行为选择不同的码率，实现不同的优化目标下用户体验最大化，且当用户体验的优化目标改变时，可以快速低开销的实现在用户目标上的泛化，从而满足个性化用户体验的需求。To sum up, the present invention proposes a code rate adaptive algorithm capable of realizing personalized user experience. By using the neural network to construct a function approximator, it predicts the impact of bit rate selection on subsequent video playback performance indicators, so as to meet different user experience requirements. This solution can select different bit rates according to different playback content, users, and user behaviors to maximize user experience under different optimization goals, and when the optimization goal of user experience changes, it can quickly and cost-effectively achieve user goals Generalization to meet the needs of personalized user experience.

Claims

1. A code rate adaptive method that satisfies user personalized experience in a video service, is characterized in that, utilizes neural network as evaluation function Q (s, a, m, g), evaluates each code rate selection a to different element The impact of the performance index m, the evaluation value of the meta-performance index obtained by the evaluation process, is explicitly multiplied by the optimization target weight value, that is, the given user preference g, and the code rate corresponding to the maximum value is selected to meet different user experiences Requirements, wherein the evaluation function Q(s, a, m, g) represents how each code rate choice a affects each meta-performance index m in the case of different network states s and given user preference g.

2. The code rate adaptive method satisfying user personalized experience in the video service according to claim 1, wherein the input of the evaluation process is composed of a state value s and an optimization target weight value g, wherein the state value s describes The status of the network and the occupancy of the buffer zone; the optimization target weight value g represents different user video performance requirements;

The output of the evaluation process is the cumulative sum of QoE observations until the end of video playback, and the output is Q _∞ (s, a, m, g), where ∞ represents the end of video playback.

3. according to the code rate self-adaptive method that satisfies user's personalized experience in the described video service of claim 2, it is characterized in that, represent user experience QoE with the linear combination of meta-performance index m and user preference g, then

Wherein, N is the number of blocks in a video played, R _n is the code rate of the nth block, q(R _n ) is the nth video block quality, T _n is the stall time of the nth block, |q(R _n+1 )-q(R _n )| is the code rate difference between two adjacent blocks during video playback, indicating the smoothness of the video, D _n is the delay of downloading the nth block, α, β, γ, μ is the four terms of the optimization objective g.

4. according to the code rate self-adaptive method that satisfies user's personalized experience in the described video service of claim 2, it is characterized in that, the two parts input of described assessment process are state value s and optimization target weight value g, are divided by two respectively Neural network processing, the output connection of two modules is used as the input of the next neural network, the future QoE value is based on the connected input, and the neural network outputs the future observation values corresponding to each action at the same time, the neural network is divided into two modules, one is The expectation module predicts the average value of future QoE observations. This part of the value is only related to the state value s and has nothing to do with the action; the other is the action module, which predicts the QoE corresponding to taking different actions in a certain state Observations. The output of the two parts is added together as the output of the entire neural network, that is, in a certain state, different actions are taken to correspond to different four QoE meta-performance index values at the end of video playback.

5. according to the code rate self-adaptive method that satisfies user's personalized experience in the described video service of claim 2, it is characterized in that, when online, utilize the evaluation value of the meta-performance index that evaluation process obtains, and optimize target weight value g to show The calculation formula for multiplication is as follows:

a＝argmaxg ^T Q _∞ (s,a,m,g)

According to the above formula, the optimal code rate under a specific target can be selected. When the product of the Q value and the optimization target g is the largest, the optimal target value is obtained, and the corresponding code rate a is the code rate to be selected for this block. .

6. According to claim 2, the code rate adaptive method satisfying the user's personalized experience in the video service, is characterized in that, when training the neural network model, the optimization target weight value g generated randomly is utilized.