CN112637631A

CN112637631A - Code rate determining method and device, electronic equipment and storage medium

Info

Publication number: CN112637631A
Application number: CN202011497179.8A
Authority: CN
Inventors: 周超; 王博; 孔啸; 徐明伟
Original assignee: Tsinghua University; Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Tsinghua University; Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2020-12-17
Filing date: 2020-12-17
Publication date: 2021-04-09
Anticipated expiration: 2040-12-17
Also published as: CN112637631B

Abstract

The present disclosure provides a code rate determination method, apparatus, electronic device and storage medium, the method comprising: predicting the downloading time for downloading the next video block of the video according to the cache occupation amount and the cache variation when the current video block of the video is downloaded; predicting the downloading rate of the next video block according to the downloading rate condition of the current video block; determining a code rate for the next video block according to the predicted download time of the next video block, the time length of the next video block, and the predicted download rate of the next video block, wherein the video is divided into a plurality of video blocks, and the time length of each video block is the same.

Description

Code rate determining method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of internet technologies, and in particular, to a code rate determining method and apparatus, an electronic device, and a storage medium.

Background

Video live applications are a very important part of the current internet. Live video content providers strive to reduce live latency and improve user quality of experience (QoE) metrics in order to provide users with a better viewing and interactive experience. Among them, designing a better code rate Adaptation (ABR) algorithm is a main way to improve QoE and guarantee low latency.

Nowadays, most of video live broadcast is based on a hypertext transfer protocol (HTTP), and the advantages of the video live broadcast are low deployment cost, high expandability and convenience for realizing multi-rate adaptation. When live broadcasting is carried out, videos can be coded into different code rates by taking blocks as units, and video data blocks with different code rates can be selected by the ABR algorithm of the client side to be downloaded according to the conditions of cache occupation, network throughput and the like.

The streaming media transmission technical scheme of the related technology mostly takes video on demand as a research object. Video on demand and live video have different requirements for the ABR algorithm. For video on demand, the caching of videos can be as long as several minutes, the downloading of the videos mostly takes video blocks as units, the bandwidth measurement is simple and direct, and the ABR algorithm is more concerned about switching of high code rate, low pause rate and smooth code rate. For live video, the low-delay characteristic causes the video to be cached for only a few seconds, and the video is mostly downloaded at a short video block level or a frame level, the ABR algorithm is more concerned about high bit rate, low blocking rate and average delay, and because the video capacity downloaded each time is small, bandwidth prediction becomes difficult, and control on video caching becomes more important.

In addition, due to the difference of the characteristics of calculation, storage and network resources of the mobile equipment, the live broadcast service quality of the mobile equipment is more difficult to guarantee. The mobile equipment is difficult to deploy and complex due to poor computing capability and few storage resources, and an algorithm requiring fine-grained sampling is required; the bandwidth prediction accuracy in the traditional adaptive code rate is greatly reduced due to the common bandwidth and delay mutation of the mobile network.

For example, in the buffer level diagram of a video session implemented by the Model Predictive Control (MPC) algorithm shown in fig. 1, it can be seen that MPC has a serious stuck risk: the jitter amplitude of the buffer memory in the player is large, the buffer memory is often small, and once prediction error or throughput jitter occurs, the buffer memory is extremely easy to empty, so that the jamming is caused.

Another typical ABR algorithm for video is pensive, which uses an Actor-Critic algorithm in reinforcement learning to make a rate decision, and the working mechanism is shown in fig. 2. The input of the neural network comprises six data, namely network throughput when a plurality of past video blocks are downloaded, downloading time of the plurality of past video blocks, the size of the plurality of future video blocks, current cache size, the number of the remaining video blocks which are not downloaded and the code rate of the last video block. The Pensieve algorithm needs to be trained in a simulation environment by using data acquired online before deployment. However, the input of the Pensieve (for example, the size of a plurality of video blocks in the future, the number of remaining video blocks, and the like) is not suitable for a live scene generated by video content in real time, the neural network usually consumes a large amount of computing resources, and is difficult to achieve a similar effect when being deployed on a mobile device, and the Pensieve requires a data set to perform a pre-training process by using a reinforcement learning algorithm, while the comprehensive training data is difficult to obtain, and the incomprehensive training data may cause abnormal behaviors which cannot be predicted and explained in the algorithm.

Disclosure of Invention

The present disclosure provides a method, an apparatus, a storage medium, and an electronic device for determining a bit rate, so as to solve at least the video blocking problem in the related art, but not any of the above problems.

According to a first aspect of the present disclosure, there is provided a code rate determination method, the method comprising: predicting the downloading time for downloading the next video block of the video according to the cache occupation amount and the cache variation when the current video block of the video is downloaded; predicting the downloading rate of the next video block according to the downloading rate condition of the current video block; determining a code rate for the next video block according to the predicted download time of the next video block, the time length of the next video block, and the predicted download rate of the next video block.

According to the first aspect of the present disclosure, the predicting, according to the buffer occupancy and the buffer variation at the current time, the download time for downloading the next video block of the video includes: and predicting the downloading time of the next video block for downloading the video by using a sliding mode control model established based on the relation between the cache occupation amount and the cache variation amount when the video block of the video is downloaded and the downloading time of the video block.

According to a first aspect of the present disclosure, the sliding mode control model is designed such that the buffer occupancy of the video converges to a target value and is not affected by network throughput variations and parameter settings of the sliding mode control model.

According to a first aspect of the present disclosure, the sliding mode control model is constructed as:

wherein, B_f(k) Represents the buffer occupancy when the kth video block is downloaded, Δ b (k) represents the buffer variation when the kth video block is downloaded, λ is a constant greater than 0,

wherein, the download time T of the k video block_kIs calculated as:

T_k＝U(k)-T_k-1，

E_μis the maximum likelihood estimate of the systematic error of the historical statistics, k being an integer greater than 0.

According to a first aspect of the present disclosure, λ ═ 1.

According to a first aspect of the present disclosure, the predicting a download rate of a next video block according to a download rate status of a current video block comprises: measuring the download rate of the last M video frames of the current video block; determining a download rate of a next video block based on the download rates of the last M video frames, wherein M is a positive integer.

According to a first aspect of the disclosure, the measuring a download rate of a last M video frames of a current video block comprises: recording the receiving completion time of each video frame in the last M video frames of the current video block; and calculating the downloading rate of the video frame according to the size of the video frame and the difference between the receiving completion time of the video frame and the receiving completion time of the previous video frame of the video frame.

According to a first aspect of the disclosure, the determining a download rate of a next video block based on the download rate of the last M video frames comprises: determining the calculated download rate of the video frame as an effective download rate in response to the calculated download rate of the video frame having a significant difference from an average download rate of N frames preceding the video frame; determining the calculated download rate of the video frame as an ignored download rate in response to the calculated download rate of the video frame not having a significant difference from an average download rate of N frames preceding the video frame; and determining the average value of the downloading rates of the video frames with the effective downloading rates in the last M video frames as the downloading rate of the next video block, wherein N is the frame number of the video block.

According to a second aspect of the present disclosure, there is provided a code rate determination apparatus, the apparatus comprising: the downloading time prediction module is configured to predict the downloading time for downloading the next video block of the video according to the cache occupancy and the cache variation when the current video block of the video is downloaded; the download rate prediction module is configured to predict the download rate of the next video block according to the download rate condition of the current video block; a code rate determination module configured to determine a code rate for the next video block according to the predicted download time of the next video block, the time length of the next video block, and the predicted download rate of the next video block.

According to a second aspect of the present disclosure, the download time prediction module is configured to predict the download time for downloading a next video block of the video using a sliding mode control model established based on a relation of a cache occupancy and a cache variance when downloading the video block of the video and the download time for the video block.

According to a second aspect of the present disclosure, the sliding mode control model is designed such that the buffer occupancy of the video converges to a target value and is not affected by network throughput variations and parameter settings of the sliding mode control model.

According to a second aspect of the present disclosure, the sliding mode control model is constructed as:

wherein, the download time T of the k video block_kIs calculated as:

T_k＝U(k)-T_k-1，

According to a second aspect of the present disclosure, λ ═ 1.

According to a second aspect of the disclosure, the download rate prediction module comprises: a measurement module configured to measure a download rate of a last M video frames of a current video block; a prediction module configured to determine a download rate of a next video block based on the download rates of the last M video frames.

According to a second aspect of the disclosure, the measurement module is configured to record the reception completion time of each of the last M video frames of the current video block, and to calculate the download rate of the video frame according to the size of the video frame and the difference between the reception completion time of the video frame and the previous video frame of the video frame.

According to a second aspect of the disclosure, the prediction module is configured to: determining the calculated download rate of the video frame as an effective download rate in response to the calculated download rate of the video frame having a significant difference from an average download rate of N frames preceding the video frame; determining the calculated download rate of the video frame as an ignored download rate in response to the calculated download rate of the video frame not having a significant difference from an average download rate of N frames preceding the video frame; determining an average of the download rates of the video frames having the effective download rate of the last M video frames as a download rate of a next video block.

According to a third aspect of the present disclosure, there is provided an electronic device comprising: at least one processor; at least one memory storing computer-executable instructions, wherein the computer-executable instructions, when executed by the at least one processor, cause the at least one processor to perform a code rate determination method as described above.

According to a fourth aspect of the present disclosure, there is provided a storage medium, wherein instructions, when executed by a processor of an electronic device, enable the electronic device to perform the code rate determination method as described above.

According to a fifth aspect of the present disclosure, there is provided a computer program product in which instructions are executed by at least one processor in an electronic device to perform the code rate determination method as described above.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

exemplary embodiments of the present disclosure achieve high QoE and low latency of video live in a terminal device by accurately adjusting video buffering. The exemplary embodiment of the present disclosure enhances the control capability by frame-level network bandwidth measurement and controlling the size and variation of the video buffer, and converts the multi-target control problem into an equivalent single-target problem, improving the robustness of the algorithm. Exemplary embodiments of the present disclosure may ensure that video buffer size is near a target level under uncertain network conditions, thereby reducing video stuck phenomenon.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.

Fig. 1 is a diagram illustrating the amount of buffer change in rate control using a Model Predictive Control (MPC) algorithm.

FIG. 2 is an architecture diagram illustrating rate control using the Pensieve algorithm;

fig. 3 is a system environment illustrating implementation of a code rate determination method and apparatus according to an example embodiment.

Fig. 4 is a flowchart illustrating a code rate determination method according to an exemplary embodiment.

Fig. 5 is a diagram illustrating a downloading situation of a video block according to an exemplary embodiment.

Fig. 6 is a block diagram illustrating a code rate determination apparatus according to an exemplary embodiment.

Fig. 7 is a diagram illustrating an electronic device for code rate determination according to an example embodiment.

Fig. 8 is a schematic diagram illustrating QoE metrics of a rate determination method with respect to a rate control algorithm of the related art according to an exemplary embodiment of the present disclosure.

Detailed Description

In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The embodiments described in the following examples do not represent all embodiments consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

In this case, the expression "at least one of the items" in the present disclosure means a case where three types of parallel expressions "any one of the items", "a combination of any plural ones of the items", and "the entirety of the items" are included. For example, "include at least one of a and B" includes the following three cases in parallel: (1) comprises A; (2) comprises B; (3) including a and B. For another example, "at least one of the first step and the second step is performed", which means that the following three cases are juxtaposed: (1) executing the step one; (2) executing the step two; (3) and executing the step one and the step two.

Fig. 3 illustrates a system environment for a code rate determination method according to an exemplary embodiment of the present disclosure. As shown in fig. 3, the system environment may include a plurality of terminal devices 100-1, 100-2, … 100-n, a server 200, and a network for connecting the terminal devices 100 and the server 200. Here, the terminal device 100 may be a terminal device having a communication function and a video playing function, for example, the terminal device 100 in the embodiment of the present disclosure may be a mobile phone, a tablet computer, a desktop computer, a laptop computer, a handheld computer, a notebook computer, a netbook, a Personal Digital Assistant (PDA), an Augmented Reality (AR)/Virtual Reality (VR) device. Various applications or functions using a video playing function, such as a short video application, a live application, a social application, a video conference application, an online education application, and the like, may be run on the terminal device 100. When the terminal device 100 is running these applications (e.g., opening a live application), it connects and communicates with the server 200 through the network, thereby using the corresponding video service provided by the server 200. The terminal device 100 may determine an optimal bitrate of a video to be downloaded according to factors such as a current video buffer amount, a buffer variation amount, a network condition, and the like, and acquire a video with the determined bitrate in subsequent video downloading, so that the video with the optimal bitrate according to the current condition may be downloaded from a server, thereby ensuring stability, smoothness and quality of video playing, and improving a quality of experience (QoE) index of a user.

A code rate determination method according to an exemplary embodiment of the present disclosure implemented on the terminal device 100 will be described below with reference to fig. 4.

Fig. 4 shows a flow diagram of code rate determination according to an example embodiment of the present disclosure. In the following description, a smartphone is described as an example of a terminal device, but the present disclosure is not limited thereto. The method of controlling a bitrate of a video according to an exemplary embodiment of the present disclosure may be performed by an application installed in a terminal device when acquiring (i.e., downloading) and playing a video from a server.

Specifically, first, in step S410, when the terminal device 100 acquires a video from the server 200, the download time for downloading the next video block of the video is predicted from the buffer occupancy and the buffer variation when the current video block of the video is downloaded.

A key to providing high QoE under low delay constraints is to accurately control video buffer occupancy (i.e., buffer occupancy). Since the future state of buffer occupancy varies with fluctuations in network throughput, the related art rate controller adjusts the instantaneous state of buffer occupancy, which may be affected by large control errors. Intuitively, if the dynamic behavior of the buffer is throughput independent, then the future state of buffer occupancy becomes predictable, making robust control easier to achieve. That is, by establishing a control model in which the buffer occupancy and buffer variation when downloading the current video block of a video have a specific relationship with the download time available for downloading the next video block, the download time of the next video block can be predicted according to the model and the video block with the corresponding code rate can be selected for downloading according to the predicted time.

According to an exemplary embodiment of the present disclosure, a Sliding Mode Control (SMC) method may be employed to achieve control of cache region dynamic behavior. The sliding mode control method is a special type of nonlinear control, and the nonlinearity is represented by discontinuity of the control. This control strategy differs from other controls in that the "structure" of the system is not fixed, but can be purposefully changed continuously in a dynamic process according to the current state of the system (such as deviation and its various derivatives, etc.), so as to force the system to move according to the state track of a predetermined "sliding mode". The sliding mode can be designed and is irrelevant to the parameters and disturbance of an object, so that the sliding mode control has the advantages of quick response, insensitive corresponding parameter change and disturbance, no need of system online identification, simple physical implementation and the like.

In the control of video transmission as applied by the exemplary embodiments of the present disclosure, the cache dynamic control problem may be defined as an SMC problem in the state space. Here, a state space model based on the amount of video buffering and the amount of video buffering change can be constructed, i.e., the current state is (B)_f(k) Δ B (k)). Accordingly, the ideal register dynamics model is designed to correspond to a line in the state space. By controlling the dynamic behavior of the buffer to follow the ideal dynamic model, the buffer occupancy rate can be evolved to a target value in a deterministic manner, and the problem that the future buffer state and control performance are influenced by throughput jitter when the buffer occupancy is taken as a control object can be avoided.

As described above, based on the above idea, the video buffer occupancy can be controlled by predicting the download time of the next video block for video using a sliding mode control model (sliding mode controller) established based on the relation between the buffer occupancy and buffer variation when downloading the video block of the video and the time available for downloading the next video block.

According to an exemplary embodiment of the present disclosure, the sliding mode control model may be designed such that the buffer occupancy of the video converges to a target value and is not affected by network throughput variation and parameter settings of the sliding mode control model.

A process of predicting (calculating) a download time for a next video block from a video buffer amount and a video buffer variation at a current time using the sliding mode control model according to an exemplary embodiment of the present disclosure will be described below.

Specifically, assume that the video to be downloaded is divided into K video blocks, each L seconds long. For the k video block, the code rate is r (k), the download duration is t (k), the download rate is c (k), and the buffer amount of the video at the beginning of downloading is b (k), the variation relationship of the buffer at the beginning of downloading of each video block is the following equation (1):

let k (k)>0 and integer) the buffer occupancy when downloading the video blocks is B_f(k) The amount of change in the cache occupancy is Δ b (k). From equation (1), the dynamic equation of the state of the cache system can be obtained as equation (2) below:

wherein u (k) is the difference between the download time of the kth video block and the kth-1 video block of two adjacent video blocks, i.e. u (k) ═ T_k-T_k-1. E (k) represents the systematic error due to the prediction error of the download time.

The sliding function P and the switching surface Λ in the sliding mode controller can be designed as in equation (3) below:

here, λ is a constant and is used to adjust the convergence target value of the switching surface. Optionally, λ ═ 1.

That is, by definition, the control objective is the drive system state (B)_f(k) Δ b (k)) reaches the switching plane Λ. To describe the system state (B)_f(k) Δ B (k)) defines the position of the sliding function p (k) ═ λ B with respect to the switching plane Λ_f(k)+ΔB。P(k)<0 denotes the current state (B)_f(k) Δ B (k)) located below Λ, P (k)>0 represents (B)_f(k) Δ b (k)) above the switching plane Λ.

In order to just cope with the limitation caused by the discreteness of the code rate and consider the system state change of several future steps, the method of the present disclosure considers that the difference between the sliding function P and the switching surface Λ in the future N steps reaches the minimum value, and the prediction and calculation process thereof is as the following equation (4):

here, B_maxThe maximum value of the buffer storage is shown, and R is the value range of the code rate. The difference U (k) between the download times of adjacent video blocks can be solved by the equation (4), so as to obtain the available download time T of the next video block_k＝U(k)-T_k-1。

To solve the minimum of equation (4), the system slip state should be predicted first for several steps in the future. From the equations (2) and (3), the sliding state P (k + i) at the k + i-th step can be obtained as the following equation (5):

R_μfor maximum likelihood estimation of historical systematic errors, then the sliding function state after step i

Equation (6) below:

from equation (6), making the partial derivative of J to U equal to 0 in equation (4), the value of U (k) can be solved as follows:

that is, as shown in equation (7) above, according to an exemplary embodiment of the present disclosure, a download time for a next video block may be calculated (predicted) from the video buffer amount and the video buffer variation at the current time using a sliding mode control model established for the video buffer amount and the video buffer variation.

It should be understood that the above examples of the sliding mode control model are only illustrative, and the sliding mode control model capable of achieving the convergence targets of the buffer amount and the buffer variation amount according to the requirement can be designed according to the practical situation according to the exemplary embodiment of the present disclosure.

Next, in step S420, the download rate of the next video block is predicted according to the download rate status of the current video block. Download rate prediction according to an exemplary embodiment of the present disclosure will be explained with reference to fig. 5.

It is generally believed that the download rate status of a video frame of a current video block has a particular relationship to the download rate of the next video block. For example, the last partial frame of the current video block is strongly correlated with the next video block in time, space, and network environment. Thus, according to an exemplary embodiment of the present disclosure, the download rate of the last M video frames of a current video block may be measured, and the download rate of the next video block is determined based on the download rate of the last M video frames. That is, the download rate of the next video block may be predicted based on the download rates of the M video frames closest to the next video block. Alternatively, the download rate of the next video block may be calculated using the average of the download rates of the last M video frames of the current video block. It should be understood that the above prediction method is only an example, and other methods can be adopted to predict the download rate of the next video block according to the download rate condition of the current video block. For example, the average download rate of a portion of the video frames in the current video block may be used as the predicted download rate of the next video block.

According to an example embodiment of the present disclosure, measuring a download rate of a last M video frames of a current video block may include: recording the receiving completion time of each video frame in the last M video frames of the current video block, and calculating the downloading rate of the video frame according to the size of the video frame and the difference between the receiving completion time of the video frame and the previous video frame of the video frame.

In particular, fig. 5 shows the relationship between the time of receiving a video frame and the size of the frame. Frames 1 to 7 are received in sequence, wherein the reception completion time end (k) of frames 1 to 7 is end respectively₁、end₂、…end₇. As shown in FIG. 5, for example, the download rate R for frame 2_frame(2) Can be calculated as:

R_frame(2) size/end of frame 2₂-end₁

In addition, in a live video scene, the latency of the video becomes critical and frame-level transmission becomes common. Frame level transmission refers to: the live broadcast content is generated by the anchor terminal in real time, and is transmitted to the client terminal by taking a frame as a unit after passing through the transcoding server. Since the amount of information per transmission is greatly reduced and the content of the frames is generated in real time, there is an idle time between the transmission of frames. As shown in fig. 5, it can be seen that the size of frame 1 is the largest and its reception time is the longest, and there is an idle between the reception completion times such as frame 6 and frame 7. Therefore, if the download rates of frame 6, frame 7 are used to predict the download rate of the next video block, errors may occur. Since video is still encoded in units of video blocks, the first frame of a video block is often a key frame (i.e., I frame) with a large capacity, which results in the idle time for transmission between the first frames of each video block being much shorter than the idle time between other frames. Accordingly, exemplary embodiments according to the present disclosure utilize this property to make a more accurate measurement of the download rate of video frames (i.e., the bandwidth used in downloading the video frames).

According to an exemplary embodiment of the present disclosure, in predicting a download rate of a next video block based on a download rate of M video frames closest to the next video block, the calculated download rate of the video frame may be determined as an effective download rate in response to the calculated download rate of the video frame having a significant difference from an average download rate of N frames before the video frame; determining the calculated download rate of the video frame as an ignored download rate in response to the calculated download rate of the video frame not having a significant difference from an average download rate of N frames preceding the video frame, where N is a number of frames of the video block. Here, whether there is a significant difference may be determined according to a ratio of a difference between the calculated download rate of the video frame and the average download rate of N frames before the video frame with respect to the average download rate of N frames before the video frame. For example, if the download rate R of the video frames is determined_frameThe difference with the average download rate R of the previous N frames satisfies

Alternatively, the first and second electrodes may be,

a significant difference between the two can be determined. Here, α and β can be determined according to the actual situation on the line. For example, α ═ 0.1 and β ═ 0.15 can be set.

Accordingly, in determining the download rate of the next video block based on the download rates of the last M video frames, an average of the download rates of the video frames having an effective download rate of the last M video frames may be determined as the download rate of the next video block. When the average download rate of the previous N frames of the video frames is calculated, the total download time of the N video frames may be calculated according to the download completion time of the first video frame and the download completion time of the nth video frame in the previous N video frames, and then the average download rate may be calculated according to the total size of the N video frames. For example, if a video block includes N video frames, the last M video frames of the video block may be selected to predict the download rate of the next video block. Assuming that the download rates of the M video frames are respectively denoted by c (k), k is 1,2,3 …, M, the harmonic mean of the M video frames calculated by using the sliding window strategy is as shown in equation (8) below:

here, if the download rate of some frames of the M video frames is determined to be a negligible download rate by the above-described comparison method, the download rate of the some frames is not used for the calculation of equation (8). For example, assuming that a video block has 30 frames and the last 8 frames are used to predict the download rate of the next video block, if the download rate of the 4 th frame is higher than the average download rate of 30 frames before the 4 th frame and the difference between the two is less than 10% of the average download rate, it may be determined that the 4 th frame is a valid video frame that can be used for prediction.

By the method, inaccurate idle frame downloading rate can be removed, so that more accurate downloading rate can be obtained for the last M video frames, and the downloading rate of the next video block can be predicted more accurately.

Referring back to fig. 4, in step S430, a code rate for the next video block is determined according to the predicted download time of the next video block, the time length of the next video block, and the predicted download rate of the next video block.

That is, the predicted download time T (k) and predicted network throughput for the next video block may be based on the combined equations (7) and (8)

The selected code rate R (k) can be obtained as in equation (9) below:

as described above, the downloading time of the video block can be predicted according to the video buffer amount and the buffer variation amount, so that a proper code rate can be selected, the current state of the video player system can be more accurately evaluated to improve the control accuracy, multi-target control on the buffer amount and the buffer amount variation amount is converted into a single-target control problem through the design of the control model, and the control robustness is improved.

Fig. 6 is a block diagram illustrating a code rate determination apparatus according to an exemplary embodiment of the present disclosure. The apparatus may be implemented in a mobile terminal device, such as a smartphone, and executed when the mobile terminal device is running, such as a live application.

As shown in fig. 6, the bitrate determination apparatus 600 includes a download time prediction module 610, a download rate prediction module 620 and a bitrate determination module 630.

Download time prediction module 610 is configured to predict a download time for downloading a next video block of a video based on a buffer occupancy and a buffer variance when downloading a current video block of the video. Here, the video is divided into a plurality of video blocks, each of which has the same temporal length.

The download time prediction module 610 is configured to predict a download time for downloading a next video block of the video using a sliding mode control model established based on a relationship of a buffer occupancy and a buffer variance when downloading the video block of the video to the download time for the video block. According to an exemplary embodiment of the present disclosure, the sliding mode control model is designed such that the buffer occupancy of the video converges to a target value and is not affected by network throughput variations and parameter settings of the sliding mode control model.

According to an exemplary embodiment of the present disclosure, the sliding mode control model may be constructed as:

wherein, B_f(k) Represents the buffer occupancy when the kth video block is downloaded, Δ b (k) represents the buffer variation when the kth video block is downloaded, and λ is a constant greater than 0. Optionally, λ ═ 1.

Wherein, the download time T of the k video block_kIs calculated as:

T_k＝U(k)-T_k-1，

Download rate prediction module 620 is configured to predict the download rate of the next video block based on the download rate status of the current video block.

According to an exemplary embodiment of the present disclosure, the download rate prediction module 620 includes: a measuring module 621 configured to measure a download rate of the last M video frames of the current video block; a prediction module 623 configured to determine a download rate of a next video block based on the download rate of the last M video frames.

According to an exemplary embodiment of the present disclosure, the measurement module 621 is configured to record the reception completion time of each of the last M video frames of the current video block, and calculate the download rate of the video frame according to the size of the video frame and the difference between the reception completion time of the video frame and the previous video frame of the video frame.

According to an exemplary embodiment of the present disclosure, the prediction module 623 is configured to determine the calculated download rate of the video frame as an effective download rate in response to the calculated download rate of the video frame having a significant difference from an average download rate of N frames before the video frame; determining the calculated download rate of the video frame as an ignored download rate in response to the calculated download rate of the video frame not having a significant difference from an average download rate of N frames preceding the video frame, wherein N is the number of frames of the video block; determining an average of the download rates of the video frames having the effective download rate of the last M video frames as a download rate of a next video block.

Code rate determination module 630 is configured to determine a code rate for the next video block based on the predicted download time of the next video block, the length of time of the next video block, and the predicted download rate of the next video block.

Fig. 7 is a block diagram illustrating an electronic device for determining a code rate according to an exemplary embodiment of the present disclosure. The electronic device 700 may be, for example: a smart phone, a tablet computer, an MP4(Moving Picture Experts Group Audio Layer IV) player, a notebook computer or a desktop computer. The electronic device 700 may also be referred to by other names such as user equipment, portable terminal, laptop terminal, desktop terminal, and so forth.

In general, the electronic device 700 includes: a processor 701 and a memory 702.

The processor 701 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on. The processor 701 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 701 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 701 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed on the display screen. In some embodiments, the processor 701 may further include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning.

Memory 702 may include one or more computer-readable storage media, which may be non-transitory. Memory 702 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 702 is used to store at least one instruction for execution by processor 701 to implement a method of determining a code rate provided by a method embodiment of the present disclosure as shown in fig. 4.

In some embodiments, the electronic device 700 may further optionally include: a peripheral interface 703 and at least one peripheral. The processor 701, the memory 702, and the peripheral interface 703 may be connected by buses or signal lines. Various peripheral devices may be connected to peripheral interface 703 via a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 704, touch screen display 705, camera 706, audio circuitry 707, positioning components 708, and power source 709.

The peripheral interface 703 may be used to connect at least one peripheral related to I/O (Input/Output) to the processor 701 and the memory 702. In some embodiments, processor 701, memory 702, and peripheral interface 703 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 701, the memory 702, and the peripheral interface 703 may be implemented on a separate chip or circuit board, which is not limited in this embodiment.

The Radio Frequency circuit 704 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuitry 704 communicates with communication networks and other communication devices via electromagnetic signals. The rf circuit 704 converts an electrical signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 704 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuitry 704 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: metropolitan area networks, various generation mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the radio frequency circuit 704 may also include NFC (Near Field Communication) related circuits, which are not limited by this disclosure.

The display screen 705 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 705 is a touch display screen, the display screen 705 also has the ability to capture touch signals on or over the surface of the display screen 705. The touch signal may be input to the processor 701 as a control signal for processing. At this point, the display 705 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display 705 may be one, disposed on the front panel of the electronic device 700; in other embodiments, the display 705 can be at least two, respectively disposed on different surfaces of the terminal 700 or in a folded design; in still other embodiments, the display 705 may be a flexible display disposed on a curved surface or on a folded surface of the terminal 700. Even more, the display 705 may be arranged in a non-rectangular irregular pattern, i.e. a shaped screen. The Display 705 may be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), or the like.

The camera assembly 706 is used to capture images or video. Optionally, camera assembly 706 includes a front camera and a rear camera. Generally, a front camera is disposed at a front panel of the terminal, and a rear camera is disposed at a rear surface of the terminal. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 706 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.

The audio circuitry 707 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 701 for processing or inputting the electric signals to the radio frequency circuit 704 to realize voice communication. For the purpose of stereo sound collection or noise reduction, a plurality of microphones may be provided at different portions of the terminal 700. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 701 or the radio frequency circuit 704 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, the audio circuitry 707 may also include a headphone jack.

The positioning component 708 is operable to locate a current geographic Location of the electronic device 700 to implement a navigation or LBS (Location Based Service). The Positioning component 708 can be a Positioning component based on the GPS (Global Positioning System) in the united states, the beidou System in china, the graves System in russia, or the galileo System in the european union.

The power supply 709 is used to supply power to various components in the electronic device 700. The power source 709 may be alternating current, direct current, disposable batteries, or rechargeable batteries. When power source 709 includes a rechargeable battery, the rechargeable battery may support wired or wireless charging. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, the electronic device 700 also includes one or more sensors 710. The one or more sensors 710 include, but are not limited to: acceleration sensor 711, gyro sensor 712, pressure sensor 713, fingerprint sensor 714, optical sensor 715, and proximity sensor 716.

The acceleration sensor 711 can detect the magnitude of acceleration in three coordinate axes of a coordinate system established with the terminal 700. For example, the acceleration sensor 711 may be used to detect components of the gravitational acceleration in three coordinate axes. The processor 701 may control the touch screen 705 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 711. The acceleration sensor 711 may also be used for acquisition of motion data of a game or a user.

The gyro sensor 712 may detect a body direction and a rotation angle of the terminal 700, and the gyro sensor 712 may cooperate with the acceleration sensor 711 to acquire a 3D motion of the terminal 700 by the user. From the data collected by the gyro sensor 712, the processor 701 may implement the following functions: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.

Pressure sensors 713 may be disposed on a side bezel of terminal 700 and/or an underlying layer of touch display 705. When the pressure sensor 713 is disposed on a side frame of the terminal 700, a user's grip signal on the terminal 700 may be detected, and the processor 701 performs right-left hand recognition or shortcut operation according to the grip signal collected by the pressure sensor 713. When the pressure sensor 713 is disposed at a lower layer of the touch display 705, the processor 701 controls the operability control on the UI according to the pressure operation of the user on the touch display 705. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.

The fingerprint sensor 714 is used for collecting a fingerprint of a user, and the processor 701 identifies the identity of the user according to the fingerprint collected by the fingerprint sensor 714, or the fingerprint sensor 714 identifies the identity of the user according to the collected fingerprint. When the user identity is identified as a trusted identity, the processor 701 authorizes the user to perform relevant sensitive operations, including unlocking a screen, viewing encrypted information, downloading software, paying, changing settings, and the like. The fingerprint sensor 714 may be disposed on the front, back, or side of the electronic device 700. When a physical button or vendor Logo is provided on the electronic device 700, the fingerprint sensor 714 may be integrated with the physical button or vendor Logo.

The optical sensor 715 is used to collect the ambient light intensity. In one embodiment, the processor 701 may control the display brightness of the touch display 705 based on the ambient light intensity collected by the optical sensor 715. Specifically, when the ambient light intensity is high, the display brightness of the touch display screen 705 is increased; when the ambient light intensity is low, the display brightness of the touch display 705 is turned down. In another embodiment, processor 701 may also dynamically adjust the shooting parameters of camera assembly 706 based on the ambient light intensity collected by optical sensor 715.

A proximity sensor 716, also referred to as a distance sensor, is typically disposed on the front panel of the electronic device 700. The proximity sensor 716 is used to capture the distance between the user and the front of the electronic device 700. In one embodiment, when the proximity sensor 716 detects that the distance between the user and the front surface of the terminal 700 gradually decreases, the processor 701 controls the touch display 705 to switch from the bright screen state to the dark screen state; when the proximity sensor 716 detects that the distance between the user and the front surface of the electronic device 700 gradually becomes larger, the processor 701 controls the touch display screen 705 to switch from the breath screen state to the bright screen state.

Those skilled in the art will appreciate that the configuration shown in fig. 7 does not constitute a limitation of the electronic device 700 and may include more or fewer components than those shown, or combine certain components, or employ a different arrangement of components.

Fig. 8 shows a schematic diagram of a QoE index of a code rate determination method of an exemplary embodiment of the present disclosure with respect to a related art code rate control algorithm.

As shown in fig. 8, the method of controlling a bitrate of a video according to an exemplary embodiment of the present disclosure and the other 7 methods were respectively run and compared in a DASH simulation system.

DASH simulation system: the DASH system simulated by the experiment consists of a web server and a dash.js-based player. The video is divided into 2-second video blocks, and the video is coded into six code rates of 0.30, 0.75, 1.20, 1.85, 2.85 and 4.30Mbps by using the H.264 standard. The maximum buffer of the client is 6 seconds. The experiment used a test bandwidth trace generated based on the HSDPA network data set and the express (Kwai) network data set. The two data sets collect throughput data that is continuously measured by the mobile device while playing video in two different scenes.

The QoE calculation method comprises the following steps: the QoE model adopted by the algorithm is as follows:

wherein, r (k) is the code rate of the kth video block, and f (k) is the pause duration of the kth video block. The QoE index consists of three parts of video quality, pause time and video quality fluctuation. Through test analysis, a parameter beta is selected in the test₁4.3 to ensure the same as the maximum code rate, β₂0.25 to make the code rate and the katton ratio larger in the live scene.

And (3) testing an algorithm: experiments the method of controlling the bitrate of video according to the present disclosure (Cratus and Cratus-HM) was compared with the QoE achieved with other class 3 (total 7) algorithms at two network bandwidth datasets. The 7 algorithms are:

I. download rate based algorithm

RB: the maximum code rate is selected that does not exceed the predicted bandwidth.

Akamai: after bandwidth prediction, a heuristic method is adopted to adjust the code rate, and the method is more conservative compared with RB.

II. QoE index-based algorithm

MPC: the code rate is selected by maximizing the QoE function for the future 5 blocks based on the current buffer amount and the predicted throughput.

Robust MPC (RMPC): the same code rate selection method as MPC is used. To mitigate the impact of prediction error, the Robust MPC divides the throughput prediction by the maximum prediction error observed by the history 5 block to achieve a conservative prediction approach.

LOLYPOP: the code rate is selected by minimizing the probability of stuck through prediction of the network throughput distribution.

III, algorithm based on cache

BBA: the code rate decision is directly related to the size of the cache to form a three-segment piecewise function.

Hybrid Control (HB): and selecting the code rate which can reach the minimum time delay under the condition of ensuring that the buffer amount is not less than 3 s.

And (3) performance test results: the performance test results for the seven algorithms described above and the method of the present disclosure are shown in fig. 8. The method disclosed by the invention realizes the highest total QoE on both network throughput data and network throughput data, and the average QoE is improved by 27.3% compared with Akamai, is improved by 28.6% compared with RB, is improved by 17.1% compared with MPC, is improved by 24.2% compared with RMPC, is improved by 22.4% compared with LOLYPOP, is improved by 12.3% compared with HB, and is improved by 19.9% compared with BB. The main advantage of the method of the present disclosure is that the number of stuck occurrences under different network conditions is effectively reduced.

According to an embodiment of the present disclosure, there may also be provided a computer-readable storage medium storing instructions that, when executed by at least one processor, cause the at least one processor to perform a code rate determination method according to the present disclosure. Examples of the computer-readable storage medium herein include: read-only memory (ROM), random-access programmable read-only memory (PROM), electrically erasable programmable read-only memory (EEPROM), random-access memory (RAM), dynamic random-access memory (DRAM), static random-access memory (SRAM), flash memory, non-volatile memory, CD-ROM, CD-R, CD + R, CD-RW, CD + RW, DVD-ROM, DVD-R, DVD + R, DVD-RW, DVD + RW, DVD-RAM, BD-ROM, BD-R, BD-R LTH, BD-RE, Blu-ray or compact disc memory, Hard Disk Drive (HDD), solid-state drive (SSD), card-type memory (such as a multimedia card, a Secure Digital (SD) card or a extreme digital (XD) card), magnetic tape, a floppy disk, a magneto-optical data storage device, an optical data storage device, a hard disk, a magnetic tape, a magneto-optical data storage device, a, A solid state disk, and any other device configured to store and provide a computer program and any associated data, data files, and data structures to a processor or computer in a non-transitory manner such that the processor or computer can execute the computer program. The computer program in the computer-readable storage medium described above can be run in an environment deployed in a computer apparatus, such as a client, a host, a proxy device, a server, and the like, and further, in one example, the computer program and any associated data, data files, and data structures are distributed across a networked computer system such that the computer program and any associated data, data files, and data structures are stored, accessed, and executed in a distributed fashion by one or more processors or computers.

According to an embodiment of the present disclosure, there may also be provided a computer program product, in which instructions are executable by a processor of a computer device to perform the above-mentioned method.

According to the method and the device for controlling the code rate of the video, the electronic equipment and the computer-readable storage medium, the high QoE and the low delay of the live video can be realized in the terminal equipment by accurately adjusting the video cache. The exemplary embodiment of the present disclosure enhances the control capability by frame-level network bandwidth measurement and controlling the size and variation of the video buffer, and converts the multi-target control problem into an equivalent single-target problem, improving the robustness of the algorithm. Exemplary embodiments of the present disclosure may ensure that video buffer size is near a target level under uncertain network conditions, thereby reducing video stuck phenomenon.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method for determining a code rate, comprising:

predicting the downloading time for downloading the next video block of the video according to the cache occupation amount and the cache variation when the current video block of the video is downloaded;

predicting the downloading rate of the next video block according to the downloading rate condition of the current video block;

determining a code rate for the next video block according to the predicted download time of the next video block, the time length of the next video block, and the predicted download rate of the next video block.

2. The method of claim 1, wherein predicting a download time for downloading a next video block of the video based on the buffer occupancy and the buffer variance at the current time comprises:

and predicting the downloading time of the next video block for downloading the video by using a sliding mode control model established based on the relation between the cache occupation amount and the cache variation amount when the video block of the video is downloaded and the downloading time of the video block.

3. The method of claim 2, wherein the sliding mode control model is designed such that the buffer occupancy of the video converges to a target value and is not affected by network throughput variations and parameter settings of the sliding mode control model.

4. The method of claim 3, wherein the sliding-mode control model is constructed as:

wherein, B_f(k) Represents the k < th >The buffer occupancy when the video block is downloaded, Δ b (k) represents the buffer variation when the kth video block is downloaded, λ is a constant greater than 0,

wherein, the download time T of the k video block_kIs calculated as:

T_k＝U(k)-T_k-1，

5. The method of claim 1, wherein predicting the download rate of the next video block based on the download rate status of the current video block comprises:

measuring the download rate of the last M video frames of the current video block;

determining a download rate of a next video block based on the download rates of the last M video frames, wherein M is a positive integer.

6. The method of claim 5, wherein the measuring a download rate of a last M video frames of a current video block comprises:

recording the receiving completion time of each video frame in the last M video frames of the current video block;

and calculating the downloading rate of the video frame according to the size of the video frame and the difference between the receiving completion time of the video frame and the receiving completion time of the previous video frame of the video frame.

7. The method of claim 5, wherein the determining a download rate for a next video block based on the download rate for the last M video frames comprises:

determining the calculated download rate of the video frame as an effective download rate in response to the calculated download rate of the video frame having a significant difference from an average download rate of N frames preceding the video frame;

determining the calculated download rate of the video frame as an ignored download rate in response to the calculated download rate of the video frame not having a significant difference from an average download rate of N frames preceding the video frame;

determining an average of download rates of video frames having an effective download rate among the last M video frames as a download rate of a next video block,

where N is the number of frames of the video block.

8. A code rate determination apparatus, comprising:

the downloading time prediction module is configured to predict the downloading time for downloading the next video block of the video according to the cache occupancy and the cache variation when the current video block of the video is downloaded;

the download rate prediction module is configured to predict the download rate of the next video block according to the download rate condition of the current video block;

a code rate determination module configured to determine a code rate for the next video block according to the predicted download time of the next video block, the time length of the next video block, and the predicted download rate of the next video block.

9. An electronic device, comprising:

at least one processor;

at least one memory storing computer-executable instructions,

wherein the computer-executable instructions, when executed by the at least one processor, cause the at least one processor to perform the code rate determination method of any of claims 1 to 7.

10. A storage medium in which instructions, when executed by a processor of an electronic device, enable the electronic device to perform a code rate determination method as claimed in any of claims 1 to 7.