CN108965949B

CN108965949B - Code rate self-adaption method for satisfying user personalized experience in video service

Info

Publication number: CN108965949B
Application number: CN201810844053.XA
Authority: CN
Inventors: 崔勇; 王莫为; 左旭彤; 杨啖
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2018-07-27
Filing date: 2018-07-27
Publication date: 2020-06-16
Anticipated expiration: 2038-07-27
Also published as: CN108965949A

Abstract

The scheme of personalized user experience in video service is a technology for improving the user viewing experience in the video playing process. The method aims to construct a function approximator by designing a neural network, and predict the influence of code rate selection on subsequent video playing performance indexes, thereby meeting different user experience requirements. The design flow is 1) evaluation: and evaluating the influence of each code rate selection on different element performance indexes by using a neural network. 2) And (3) decision making: and (4) utilizing the evaluation value of the element performance index obtained in the evaluation process to be explicitly multiplied by the optimization target g, and selecting the code rate corresponding to the maximum value. The invention can realize the maximization of the user experience under different optimization targets, and can quickly realize the generalization on the user target with low cost when the optimization target of the user experience is changed.

Description

Code rate self-adaption method for satisfying user personalized experience in video service

Technical Field

The invention belongs to the technical field of streaming media video, relates to user experience optimization, and particularly relates to a code rate self-adaption method for meeting user personalized experience in video service.

Background

In recent years, video traffic in the internet has emerged, and it is expected that video traffic accounts for nearly eight times of the entire internet traffic in 2019. The problem of video performance becomes more and more important, because the performance of video directly affects the user's experience, and further affects the duration of watching video by the user, and ultimately the revenue of the content provider. The user expects that the video can be clearer, the video playing process cannot be blocked, and the video is smooth and low in time delay. However, these performance indexes are contradictory and restrictive. With the advent of new scenes and new forms of presentation, such as live scenes, Virtual Reality (VR), etc., meeting the requirements of the user experience becomes more challenging.

A tool that describes and quantifies user experience and user demand for video is user quality of experience (QoE). The bitrate Adaptive (ABR) algorithm is a common method to improve user QoE by selecting an appropriate bitrate for the next video block to be played to maximize user experience. The user QoE generally includes several meta-metrics as follows: code rate, video pause time, code rate switching and time delay. When watching videos, different users and different watching scenes have different requirements on each performance index of the QoE. For example, in the case of live game, the user would prefer to have high-definition video and would prefer no pause, but the requirement for delay would be low. In case of a highly interactive scenario, the user may have a higher requirement for latency, while the requirement for sharpness may be lower than latency. It is therefore meaningful to provide a way to meet the needs of a user's personalized experience when faced with different users. Balancing different performance metrics to maximize user experience has become a key point of academic and industrial concern and research.

Disclosure of Invention

Aiming at the problems of the essential difficulty in improving the user experience in the video service and the desire to meet the user personalized experience, the invention provides a code rate self-adaption method which meets the user personalized experience in the video service and is a model with generalization capability so as to realize the goal of personalized user experience in video playing. The invention is a code rate self-adaptive algorithm based on reinforcement learning, which can select the most suitable code rate in the network scene according to the network environment and optimize various performance indexes in the video service so as to meet the individual experience requirements of users. The performance of the algorithm is superior to that of the prior code rate adaptive algorithm, namely, the best user experience is provided under the condition of a specific user QoE target. Meanwhile, when the user or the playing content is changed, the algorithm can be generalized on the user preference quickly and with low cost, the watching experience of the user in the video playing process is finally improved, and the maximization of the user experience under different optimization targets is realized.

In order to achieve the purpose, the invention adopts the technical scheme that:

a code rate self-adaption method meeting user personalized experience in video services is characterized in that a neural network is used as an evaluation function Q (s, a, m, g), the influence of each code rate selection a on different element performance indexes m is evaluated, the evaluation value of the element performance indexes obtained in the evaluation process is used for being multiplied by an optimized target weight value, namely a given user preference g in an explicit mode, the code rate corresponding to the maximum value is selected, and therefore different user experience requirements are met, wherein the evaluation function Q (s, a, m, g) represents how each element performance index m is influenced by each code rate selection a under the conditions of different network states s and given user preference g.

The input of the evaluation process consists of a state value s and an optimized target weight value g, wherein the state value s describes the condition of the network and the occupation condition of the buffer area; the optimization target weight value g represents different user video performance requirements;

the output of the evaluation process is the cumulative sum of the QoE observations by the end of the video playback, output Q_∞(s, a, m, g), where [ infinity ] indicates the end of video playback.

The linear combination of the meta-performance index m and the user preference g is used to represent the QoE of the user experience, then

Where N is the number of blocks in a video being played, R_nIs the code rate of the nth block, q (R)_n) Is the nth video block quality, T_nIs the stuck time of the nth block, | q (R)_n+1)-q(R_n) I is the code rate difference of two adjacent blocks when the video is played, which represents the smoothness of the video, D_nIs the time delay for downloading the nth block, α, γ, μ is the four terms of the optimization objective g.

The two parts of input of the evaluation process are a state value s and an optimized target weight value g, the state value s and the optimized target weight value g are respectively processed by two neural networks, the output connection of two modules is used as the input of the next neural network, the future QoE value is based on the connected input, the neural network simultaneously outputs the future observed value corresponding to each action, the neural network is divided into two modules, one module is an expected module, the predicted future QoE observed value is the average value of the future QoE observed values, and the partial values are only related to the state value s and are not related to the action; the other is an action module, which predicts the QoE observed value corresponding to different actions taken under a certain state. The two parts of output are added to be used as the output of the whole neural network, namely, under a certain specific state, different QoE four-element performance index values corresponding to different actions are taken until the video playing is finished.

When online, the evaluation value of the element performance index obtained in the evaluation process is used for being explicitly multiplied by the optimized target weight value g, and the calculation formula is as follows:

a＝argmaxg^TQ_∞(s,a,m,g)

according to the formula, the optimal code rate under a certain specific target can be selected, when the product of the Q value and the optimal target g is maximum, the optimal target value is obtained, and the corresponding code rate a is the code rate required to be selected by the block.

In training the neural network model, randomly generated optimization target weight values g are utilized. Compared with the prior art, the invention has the beneficial effects that:

the output dimension of the neural network increases. The output of a conventional reinforcement learning algorithm is a scalar reward value that represents the reward that is obtained after an action is taken, but the information content of the scalar value is small. The increase in output dimensions leads to an increase in the operability of the algorithm. Meanwhile, the personalized QoE requirements of different users can be met by setting different g values.

Drawings

FIG. 1 is a model of an evaluation process, where the inputs are state, optimization objectives, and the output is the cumulative impact of selecting each code rate on the meta-performance index.

Detailed Description

The embodiments of the present invention will be described in detail below with reference to the drawings and examples.

The invention relates to a method for improving user experience in video service, which aims to realize personalized user experience by utilizing a model with generalization capability. The user QoE generally includes several meta-metrics as follows: code rate, video pause time, code rate switching and time delay. The demands on video performance indicators by different users watching the video are different. When different video optimization targets exist, the invention can quickly perform performance optimization with low cost.

The design idea of the invention is as follows:

(1) the design idea outlines: and designing under a deep reinforcement learning framework. Meanwhile, by explicitly introducing the user preference g, the evaluation process and the decision process of the ordinary reinforcement learning are decoupled. As evaluation function Q (s, a, m, g) a neural network is used, which represents: and under the condition of different network states and given user preference g, selecting the code rate of the next block by utilizing the evaluation function on the influence of each element performance index m.

(2) And (3) evaluation process: the method aims to construct a function approximator to predict the value of the element performance index in the future by utilizing the idea of a universal value estimation function.

The evaluation process inputs: the input consists of two parts, state s, optimizing the target weight value g. Where the status values describe the status of the network and the buffer occupancy. g is a weight value corresponding to the optimization target, and represents different preferences of different users on video performance.

And (4) outputting an evaluation process: the output is the QoE observed value at the end of the video playback. The traditional bonus value Q (s, a) is divided into a action metric values Q (s, a, m), a representing the number of selectable code rates. The user experience QoE may be represented by a linear combination of meta performance indicator values m and user preferences g, i.e. the user experience QoE is expressed as a linear combination of meta performance indicator values m and user preferences g

The simple representation is:

QoE＝g^TQ

thus, the QoE of each action at any preference g can be obtained by calculation.

Description of an evaluation process model: the two part inputs are state and optimization targets, which are processed by two neural networks respectively, and the outputs of the two modules are connected as the inputs of the next layer of neural network. Future QoE observations are based on the concatenated input. And the neural network simultaneously outputs future observed values corresponding to all the actions. The neural network is divided into two modules, one is an expectation module, the predicted value is the average value of future QoE observed values, and the partial value is only related to state values and is not related to actions; the other is an action module, which predicts the QoE observed value corresponding to different actions taken under a certain state. The two parts of output are added to be used as the output of the whole neural network, namely, under a certain specific state, different QoE four-element performance index values corresponding to different actions are taken until the video playing is finished.

(3) And (3) decision making process: when online, the algorithm can utilize the meta-performance indexes (definition, stuck, smooth and time delay) and the optimization target obtained in the evaluation process when the video playing is finished,

a＝argmaxg^TQ_∞(s,a,m,g)

and selecting the optimal code rate under a certain specific target according to the formula.

In summary, the present invention provides a code rate adaptive algorithm capable of realizing personalized user experience. A function approximator is constructed by utilizing a neural network, and the influence of code rate selection on the subsequent video playing performance index is predicted, so that different user experience requirements are met. According to the scheme, different code rates can be selected according to different playing contents, users and user behaviors, the maximization of user experience under different optimization targets is achieved, and when the optimization target of the user experience is changed, generalization on the user target can be achieved rapidly and at low cost, so that the requirement of personalized user experience is met.

Claims

1. A code rate self-adaption method for satisfying user personalized experience in video service utilizes a neural network as an evaluation function Q (s, a, m, g), evaluates the influence of each code rate selection a on different element performance indexes m, utilizes the evaluation value of the element performance indexes obtained in the evaluation process to be explicitly multiplied by an optimized target weight value, namely a given user preference g, and selects a code rate corresponding to the maximum value, thereby satisfying different user experience requirements, wherein the evaluation function Q (s, a, m, g) represents how each code rate selection a influences each element performance index m under the conditions of different network states s and the given user preference g, the input of the evaluation process is composed of a state value s and the optimized target weight value g, wherein the state value s describes the network condition and the buffer area occupation condition; the optimization target weight value g represents different user video performance requirements;

the output of the evaluation process is the cumulative sum of the QoE observations by the end of the video playback, output Q_∞(s, a, m, g), where the end of video playback is represented by ∞ in the formula;

the method is characterized in that the QoE (quality of experience) of the user is expressed by linear combination of the meta-performance index m and the user preference g

Where N is the number of blocks in a video being played, R_nIs the code rate of the nth block, q (R)_n) Is the nth video block quality, T_nIs the stuck time of the nth block, | q (R)_n+1)-q(R_n) I is the code rate difference of two adjacent blocks when the video is played, which represents the smoothness of the video, D_nIs the time delay for downloading the nth block, α, γ, μ are the four terms of the optimization objective g;

the two parts of input of the evaluation process are a state value s and an optimized target weight value g, which are respectively processed by two neural networks, the output of the two modules is connected as the input of the next neural network, the future QoE value is based on the connected input, the neural network simultaneously outputs the future QoE observed value corresponding to each action, the neural network is divided into two modules, one module is an expected module, the predicted future QoE observed value is the average value of the future QoE observed values, and the future QoE observed values are only related to the state value s and are unrelated to the actions; the other is an action module, which predicts that in a certain state, different actions are taken to correspond to future QoE observed values; the two parts of output are added to be used as the output of the whole neural network, namely, under a certain specific state, different QoE four-element performance index values corresponding to different actions are taken until the video playing is finished.

2. The code rate adaptation method satisfying the personalized experience of the user in the video service according to claim 1, wherein on-line, the evaluation value of the meta-performance index obtained in the evaluation process is explicitly multiplied by the optimized target weight value g by the following calculation formula:

a＝argmaxg^TQ_∞(s,a,m,g)

3. The adaptive bitrate method according to claim 1, wherein the optimal target weight value g is randomly generated when training the neural network model.