CN114040230A

CN114040230A - Video code rate determining method and device, electronic equipment and storage medium thereof

Info

Publication number: CN114040230A
Application number: CN202111315458.2A
Authority: CN
Inventors: 杨啖; 周超
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2021-11-08
Filing date: 2021-11-08
Publication date: 2022-02-11
Anticipated expiration: 2041-11-08
Also published as: CN114040230B

Abstract

The present disclosure provides a method and an apparatus for determining a video bitrate, an electronic device and a storage medium thereof, wherein the method for determining the video bitrate comprises: obtaining information about video segments of a video, wherein the video segments are transcoded into a plurality of code rate steps and each code rate step has a corresponding quality score, and the information about the video segments comprises the quality scores of the video segments in each code rate step; acquiring network state information and player information during a downloading period of a downloaded video fragment, and inputting the acquired network state information, player information and information about the video fragment into a video code rate determination model to obtain a code rate gear selected for the video fragment to be downloaded; and requesting to download the video fragment corresponding to the code rate gear of the selected video fragment to be downloaded. According to the method and the device, the video code rate can be adaptively switched based on the video quality perception information, so that a smoother video watching experience is provided.

Description

Video code rate determining method and device, electronic equipment and storage medium thereof

Technical Field

The present disclosure relates to the field of internet technologies, and in particular, to a method and an apparatus for determining a bit rate of a video to be downloaded, an electronic device, and a storage medium.

Background

In recent years, with the further development of mobile internet and 4G and 5G technologies, streaming media services are increasingly popular. In the multimedia industry, a series of mature Video On Demand (VOD) platform companies have emerged. Therefore, the method has great significance in researching how to improve the user viewing experience of VOD users in the case of the extremely rapidly growing on-demand services.

In VOD services, multi-rate techniques are typically used to ensure that the user enjoys high-definition, low-bitrate viewing quality. The multi-code-rate technology is used for providing different resolution levels (such as super-definition, high-definition, standard definition, smoothness and the like) for a user, and the user can select adaptive definition according to the network environment. Since the user needs a certain interaction in selecting definition, VOD service providers choose to develop an automatic algorithm to automatically adapt the definition suitable for the user, which is called an Adaptive Bitrate Adaptive Algorithm (ABR). Therefore, the research of the ABR algorithm has important significance on the experience of the user.

However, the ABR method in the related art usually causes the user to watch the video in a stuck state after the network is severely jittered, and the bit rate are frequently switched, so that the understanding of the video quality is lacked, and the problem that the bandwidth cannot be predicted cannot be solved.

Disclosure of Invention

The present disclosure provides a method, an apparatus, an electronic device, and a storage medium for determining a video bitrate and training a video bitrate determination model, so as to solve at least the problems of adaptive video bitrate switching in the related art, and also not solve any of the above problems.

According to a first aspect of the present disclosure, there is provided a training method for a video bitrate determination model, comprising: acquiring information about video fragments included in a training sample video, wherein the video fragments of the training sample video are transcoded into a plurality of code rate gears and each code rate gear has a corresponding quality score, and the information about the video fragments comprises the quality scores of the video fragments in each code rate gear; acquiring network state information and player information during a downloading period for downloading video fragments through a pre-constructed playing environment, and inputting the acquired network state information, player information and information about the video fragments into a video code rate determination model to obtain code rate gears selected for the video fragments to be downloaded; and constructing a reward function based on the quality score, the playing pause condition and the code rate gear switching times corresponding to the code rate gear selected for each video fragment, and adjusting the parameters of the video code rate determination model by using the reward function.

According to the first aspect of the disclosure, the quality score of a video slice is determined by an objective index of video coding of the video slice and/or a subjective perception evaluation of a user, wherein the objective index of video coding includes at least one of the following indexes: and evaluating and fusing the VMAF by Peak Signal Noise Ratio (PSNR), Structural Similarity Index (SSIM) or a video multi-method.

According to a first aspect of the present disclosure, the pre-constructed playing environment includes a client player, a client buffer, and a content distribution server, where the client player downloads a video segment from the content distribution server and stores the downloaded video segment in the client buffer, and there is an interface for transferring a quality score of the video segment between the client player, the client buffer, and the content distribution server.

According to a first aspect of the disclosure, the training sample video is obtained from a video data set comprising an on-demand video and a short video.

According to a first aspect of the disclosure, the video bitrate determination model has a deep reinforcement learning structure comprising a value-based deep neural network and a policy-based deep neural network, wherein the policy-based deep neural network is configured to output a bitrate notch selection for a video slice and the value-based deep neural network is configured to score a bitrate notch selection action for the policy-based deep neural network.

According to a first aspect of the disclosure, a policy-based deep neural network has fully-connected one-dimensional convolutional layers and employs a Softmax function as an activation function, wherein the policy-based deep neural network outputs selected probabilities corresponding to the plurality of rate steps.

According to a first aspect of the disclosure, the network status information comprises bandwidth information; the player information includes a current buffer size of the player; the information about the video slices comprises the code rate of the downloaded video slices, the data volume and the quality fraction of the video slices to be downloaded and the number of the remaining video slices of the training sample video.

According to a first aspect of the disclosure, network state information, player information and information about video slices during a predetermined number of download periods prior to a current download period are input to a video bitrate determination model to determine bitrate gears for the video slices to be downloaded.

According to a second aspect of the present disclosure, there is provided a training apparatus for a video bitrate determination model, comprising: a video slicing information obtaining unit configured to obtain information about video slices included in a training sample video, wherein the video slices of the training sample video are transcoded into a plurality of rate steps and each rate step has a corresponding quality score, and the information about the video slices includes the quality scores of the video slices at each rate step; a code rate determination unit configured to acquire network state information and player information during a download period in which video slices are downloaded through a pre-constructed play environment, and input the acquired network state information, player information, and information on the video slices into a video code rate determination model to obtain a code rate gear selected for the video slices to be downloaded; a training unit configured to construct a reward function based on the quality score, the playing stuck condition and the number of times of code rate gear switching corresponding to the selected code rate gear for each video slice, and to adjust parameters of the video code rate determination model using the reward function.

According to a second aspect of the disclosure, the quality score of a video slice is determined by an objective index of video coding of the video slice and/or a subjective perception evaluation of a user, wherein the objective index of video coding comprises at least one of the following indexes: and evaluating and fusing the VMAF by Peak Signal Noise Ratio (PSNR), Structural Similarity Index (SSIM) or a video multi-method.

According to a second aspect of the present disclosure, the pre-constructed playing environment includes a client player, a client buffer, and a content distribution server, where the client player downloads a video segment from the content distribution server and stores the downloaded video segment in the client buffer, and there is an interface between the client player, the client buffer, and the content distribution server for transferring a quality score of the video segment.

According to a second aspect of the disclosure, the training sample video is obtained from a video data set comprising an on-demand video and a short video.

According to a second aspect of the disclosure, the video bitrate determination model has a deep reinforcement learning structure comprising a value-based deep neural network and a policy-based deep neural network, wherein the policy-based deep neural network is configured to output a bitrate notch selection for a video slice and the value-based deep neural network is configured to score a bitrate notch selection action for the policy-based deep neural network.

According to a second aspect of the disclosure, a policy-based deep neural network has fully-connected one-dimensional convolutional layers and employs a Softmax function as an activation function, wherein the policy-based deep neural network outputs selected probabilities corresponding to the plurality of rate steps.

According to a second aspect of the disclosure, the network status information comprises bandwidth information; the player information includes a current buffer size of the player; the information about the video slices comprises the code rate of the downloaded video slices, the data volume and the quality fraction of the video slices to be downloaded and the number of the remaining video slices of the training sample video.

According to a second aspect of the disclosure, network state information, player information and information about video slices during a predetermined number of download periods prior to a current download period are input to a video bitrate determination model to determine bitrate gears for the video slices to be downloaded.

According to a third aspect of the present disclosure, there is provided a video bitrate determination method, including: obtaining information about video segments of a video, wherein the video segments are transcoded into a plurality of code rate steps and each code rate step has a corresponding quality score, and the information about the video segments comprises the quality scores of the video segments in each code rate step; acquiring network state information and player information during a downloading period of a downloaded video fragment, and inputting the acquired network state information, player information and information about the video fragment into a video code rate determination model to obtain a code rate gear selected for the video fragment to be downloaded; and requesting to download the video fragment corresponding to the code rate gear of the selected video fragment to be downloaded, wherein the video code rate determination model is obtained by training based on the method.

According to a third aspect of the disclosure, the network status information comprises bandwidth information; the player information includes a current buffer size of the player; the information about the video slices comprises the bitrate of the downloaded video slices, the data volume and quality fraction of the video slices to be downloaded and the number of the remaining video slices of the video.

According to a fourth aspect of the present disclosure, there is provided a video bitrate determination apparatus, including: the information acquisition unit is configured to acquire information about video fragments of a video, wherein the video fragments are transcoded into a plurality of code rate gears and each code rate gear has a corresponding quality score, and the information about the video fragments comprises the quality scores of the video fragments in each code rate gear; a code rate determination unit configured to acquire network state information and player information during a download period of the downloaded video slice, and input the acquired network state information, player information and information on the video slice into a video code rate determination model to obtain a code rate gear selected for the video slice to be downloaded; and the video fragment downloading unit is configured to request to download the video fragments corresponding to the code rate gears of the selected video fragments to be downloaded, wherein the video code rate determination model is obtained by training based on the method.

According to a fourth aspect of the disclosure, the network status information comprises bandwidth information; the player information includes a current buffer size of the player; the information about the video slices comprises the bitrate of the downloaded video slices, the data volume and quality fraction of the video slices to be downloaded and the number of the remaining video slices of the video.

According to a fifth aspect of the present disclosure, there is provided an electronic apparatus, comprising: at least one processor; at least one memory storing computer-executable instructions, wherein the computer-executable instructions, when executed by the at least one processor, cause the at least one processor to perform the training method and the code rate determination method as described above.

According to a sixth aspect of the present disclosure, there is provided a storage medium, wherein instructions, when executed by a processor of an electronic device, enable the electronic device to perform the training method and the code rate determination method as described above.

According to a seventh aspect of the present disclosure, there is provided a computer program product in which instructions are executed by at least one processor in an electronic device to perform the training method and the code rate determination method as described above.

The technical scheme provided according to the embodiment of the disclosure at least brings the following beneficial effects: according to the training method and/or the code rate determination method of the code rate determination model disclosed by the exemplary embodiment of the disclosure, various scene conditions such as a network and a player are fully considered, multi-code rate switching is performed based on video quality perception, the watching experience of a user is improved, and the user can enjoy a clearer and smoother watching effect.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.

Fig. 1 is a system environment illustrating an implementation of a method of determining a video bitrate according to an exemplary embodiment.

Fig. 2 is a flowchart illustrating a method of training a video bitrate determination model according to an example embodiment.

FIG. 3 is a schematic diagram illustrating an on-demand environment for training a video bitrate determination model according to an exemplary embodiment.

Fig. 4 is a schematic diagram illustrating a reinforcement learning process of a video bitrate determination model according to an exemplary embodiment.

Fig. 5 is a block diagram illustrating an apparatus for training a video bitrate determination model according to an exemplary embodiment.

Fig. 6 is a flowchart illustrating a method of determining a video bitrate according to an exemplary embodiment.

Fig. 7 is a block diagram illustrating an apparatus for determining a video bitrate according to an exemplary embodiment.

Fig. 8 is a block diagram illustrating an electronic device according to an exemplary embodiment.

Detailed Description

In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The embodiments described in the following examples do not represent all embodiments consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

In this case, the expression "at least one of the items" in the present disclosure means a case where three types of parallel expressions "any one of the items", "a combination of any plural ones of the items", and "the entirety of the items" are included. For example, "include at least one of a and B" includes the following three cases in parallel: (1) comprises A; (2) comprises B; (3) including a and B. For another example, "at least one of the first step and the second step is performed", which means that the following three cases are juxtaposed: (1) executing the step one; (2) executing the step two; (3) and executing the step one and the step two.

Fig. 1 illustrates a system environment implementing a method of determining a video bitrate according to an exemplary embodiment of the present disclosure.

In the exemplary embodiments of the present disclosure, a scenario in which an on-demand video service is provided (e.g., a short video application service) is explained as an example.

As shown in fig. 1, the system environment may include a video server 200 and a video client 100. The video server 200 transmits the video requested by the video client 100 to the video client 100. The video may be video slices having different sizes, each of which may have a different bitrate (resolution). The video client 100 may be implemented on a plurality of terminal devices. Here, the terminal device may be a terminal device having a communication function and a video playing function, for example, the terminal device in the embodiments of the present disclosure may be a mobile phone, a tablet computer, a desktop computer, a laptop computer, a handheld computer, a notebook computer, a netbook, a Personal Digital Assistant (PDA), an Augmented Reality (AR)/Virtual Reality (VR) device. A video-on-demand application, such as a short video application, a live application, an online education application, etc., may be executable on the terminal device, and the user may play the video downloaded from the server using the application on the terminal device.

It should be understood that the server 100 may be implemented in various ways, for example, a cluster of servers may be implemented in a distributed manner, and the method of determining video bitrate according to an exemplary embodiment of the present disclosure may be implemented on a distributed device other than locally to the server storing the video.

A method of training a video bitrate determination model according to an exemplary embodiment of the present disclosure will be described below with reference to fig. 2. The training method according to an exemplary embodiment of the present disclosure may be implemented in, for example, a video service providing server, or other electronic devices connected to and communicating with the server. The trained video code rate determination model can be on-line on the client equipment playing the video, so that the self-adaptive code rate determination can be realized on the client equipment.

The adaptive bitrate determination method according to the present disclosure is implemented by the design of a video bitrate determination model based on reinforcement learning. The design may include two parts, a video quality perception and a reinforcement learning framework. The training process will be described in detail below with reference to fig. 2-5.

First, in step S201, information about video slices included in a training sample video is obtained, where the video slices of the training sample video are transcoded into multiple rate steps and each rate step has a corresponding quality score, and the information about video slices includes the quality scores of the video slices in each rate step. According to an exemplary embodiment of the present disclosure, a training sample video may be selected from a set of short videos and on-demand videos that are actually used. For example, the training sample video may be obtained from a short video website or a streaming video website.

According to an exemplary embodiment of the disclosure, video slices of a plurality of different bitrate (resolution) positions can be obtained by transcoding videos in a training sample video set according to quality scores. For example, one video may have 500 video slices, and each video slice may have a rate step corresponding to four resolutions of smooth, standard definition, high definition, and super definition. It should be understood that code rate steps are not so limited, and fewer or more code rate steps may be employed.

According to an exemplary embodiment of the present disclosure, the quality score of a video slice is determined by a video coding objective index of the video slice and/or a user's subjective perception evaluation, wherein the video coding objective index includes at least one of the following indexes: peak-to-noise-ratio (PSNR), structural Similarity index ssim (structural Similarity index) or vmaf (video multimedia Assessment fusion) developed by Netflix. Subjective perceptual evaluation may be a quality score that is manually marked by a user for the look and feel of video quality. The quality score reflects the relative difference in quality between video slices at different rate steps.

For example, for video a, the quality scores at three code rate steps of standard definition, high definition and super definition can be determined as 60, 90 and 105 according to the PSNR values. For the video B, the quality scores under three code rate steps of standard definition, high definition and super definition can be respectively determined to be 40, 90 and 105 according to the PSNR value and the subjective quality score. This may be due to differences in quality scores resulting from differences in video content and video coding. That is, the quality scores of the same video in the same position are consistent, but the data amount of the video slices in the same position of the video has a huge difference, and the specific size of the data amount is related to the content of the video.

It should be understood that, regarding the manner of determining the quality score, the embodiments of the present disclosure are not limited as long as the video quality of different bitrate levels can be objectively and subjectively reflected.

By adding quality perception information into the information of the video fragments of the training sample video, the encoding characteristics of the video can be understood based on the video content in subsequent reinforcement learning, so that the viewing experience which is more suitable for the user who views the video is presented.

Next, in step S202, network status information and player information are acquired during a download cycle in which video slices are downloaded through a pre-constructed playback environment, and the acquired network status information, player information, and information about the video slices are input into a video bitrate determination model to obtain bitrate positions selected for the video slices to be downloaded.

According to an exemplary embodiment of the present disclosure, as shown in fig. 3, the pre-constructed playing environment may include a client player, a client buffer, and a content delivery CDN server, wherein the client player downloads a video segment from the content delivery server and stores the downloaded video segment in the client buffer, and the client player, the client buffer, and the content delivery server have an interface therebetween for transferring a quality score of the video segment.

In an on-demand service scene, a video source is acquired and encoded and then uploaded to a server, the server transcodes the video into a video with high definition, low definition, smoothness and the like through transcoding, and the corresponding video is forwarded to a CDN server. The client player downloads the video content to the local client buffer by fixedly sending a download request to the CDN server so as to complete the playing of the whole video, and in the playing process, if the change of the network speed is observed, the player automatically switches through a multi-rate algorithm (for example, an ABR algorithm shown in FIG. 3) model, so that the adaptive decision of the code rate is realized. Therefore, the on-demand scene is simplified, the on-demand scene and the multi-code-rate algorithm interface are unified through code development, and a foundation is provided for training the multi-code-rate algorithm model by simulating different network environments and downloading and playing behaviors of video sources.

When the playing environment shown in fig. 3 is constructed, besides mechanisms such as playing, downloading, and pausing, a related interface of the video quality score is added, so that the video quality information is conveniently utilized.

The network data set and the video data set used for reinforcement learning of the related art are public open source data sets, and the data set according to the exemplary embodiment of the disclosure mostly adopts the network data set in actual on-demand and short video service, so that the video on-demand and short video service scenes can be better met.

After the training sample video is obtained and the playing environment is constructed as described above, the reinforcement learning of the video bitrate determination model according to the exemplary embodiment of the present disclosure will be explained below with reference to fig. 4.

FIG. 4 is a diagram illustrating a reinforcement learning process of a video bitrate determination model according to an exemplary embodiment. As shown in fig. 4, reinforcement learning can be generally divided into five parts: agent, Environment, Action, Reward function, and observation State. In reinforcement learning, the agent makes corresponding actions through interaction with the environment, and the environment gives corresponding feedback, for example, a reward function tells the agent whether the actions are beneficial or harmful, so as to adjust the agent to train in the right direction. In an exemplary embodiment according to the present disclosure, the playing environment as described above is the environment in fig. 4, the agent is an adaptive code rate determination model residing in the player, and the model can make a corresponding code rate step selection action by observing the state of the network, the player, etc., and then the reward function can feed back the code rate step selection action based on a predetermined principle.

According to an example embodiment of the present disclosure, the video bitrate determination model may be a deep reinforcement learning structure comprising a value-based deep neural network and a policy-based deep neural network, wherein the policy-based deep neural network is configured to output a bitrate notch selection for a video slice, and the value-based deep neural network is configured to score a bitrate notch selection action for the policy-based deep neural network.

In recent years, the development of reinforcement learning, collectively referred to as deep reinforcement learning, has incorporated deep neural networks. Deep reinforcement learning with neural networks can have two types of branches: value-based (Value-based) and Policy-based (Policy-based), Value-based policies are suitable for solving the problem of discrete spatial decisions and small decision space, and Policy-based policies are suitable for continuous spatial decisions. An approximate Policy Optimization (PPO) reinforcement learning structure in an Actor-Critic structure employed by exemplary embodiments of the present disclosure. According to the exemplary embodiment of the disclosure, the Actor network structure outputs the code rate gear to be selected (decided) in a mode of combining full connection and one-dimensional convolution. According to an exemplary embodiment of the present disclosure, the activation function of the Actor network may employ a Softmax function. The Critic network outputs the score of the action of the Actor by adopting a mode of combining full connection with one-dimensional convolution, so that the action of the Actor tends to converge. In an exemplary embodiment of the present disclosure, status information about a playing environment, information about downloaded video segments, and information about video segments to be downloaded may be input to an Actor network and a Critic network, thereby resulting in a selection action for a bitrate level of a video segment to be downloaded.

According to an exemplary embodiment of the present disclosure, the network status information may include bandwidth information, and the bandwidth may be determined according to the amount of data downloaded by the last video clip and the download period. The player information may include the current buffer size of the player. The information about the video slices may include a bitrate of a downloaded video slice, a data amount and a quality score of a video slice to be downloaded, and a number of remaining video slices of the training sample video. It should be appreciated that the above is merely an example, and other suitable indicators may be employed to reflect information regarding network status, player status, and video clips.

For example, the data amount of the downloaded video slices, the downloading time of the downloaded video slices, the size of each bitrate level of the video slices to be downloaded, the current buffer size, the number of the remaining video slices of the video, and the state information of the bitrate of the previous video slice, as shown in table 1 below, may be integrated into a vector of 1 × 7, and then the above information of the downloading period of a predetermined number (e.g., 5) of past video slices constitutes an input matrix of 5 × 7, which is input to the bitrate determination model. The code rate determination model can output an Action matrix of selection probability values aiming at each code rate gear, and the sum of the probability values of each Action is 1. In this way, the code rate gear with the maximum probability is selected in each action, namely the code rate gear of the downloaded video fragment of the player.

It should be understood that the above information is only an example, and those skilled in the art can add other information or reduce the vector dimension of the information of the input model according to actual needs.

Finally, in step S203, a reward function is constructed based on the quality score, the playing stuck condition, and the number of times of switching the code rate gear, which correspond to the code rate gear selected for each video segment, and the parameter of the video code rate determination model is adjusted using the reward function.

The reward function is the most important design in reinforcement learning and determines whether the agent can be guided to select the correct action. For on-demand scenarios, the reward function is to correctly reflect the user's viewing experience, e.g., users tend to see higher definition videos and more fluent videos. Therefore, in an exemplary embodiment according to the present disclosure, clarity and katton are to be used as important indicators of the reward function. In addition, users also generally do not like a high-definition and blurred experience during viewing, and therefore, the reward function according to an exemplary embodiment of the present disclosure also factors the number of rate gear shifts to the reward function. Accordingly, the reward function according to an exemplary embodiment of the present disclosure may be designed, for example, as a linear function as follows:

where N denotes the index of the video slice, N denotes the total number of video slices of the video, Q (Score)_n) Quality Score value, Q (Score), representing the bit rate level of the current video slice_n-1) A quality score value, T, representing the gear of the previous video slice_nIndicating the duration of the pause of the current video slice recorded by the player, beta indicating a penalty factor regarding the pause, and gamma indicating a penalty factor regarding the switching frequency.

The main difference between the reward function and the reinforcement learning multi-rate algorithm of the related art is that: the traditional reinforcement learning is high definition, low stuck and smooth of the whole code rate at present, but actually, the code rate difference occurs in the same gear, so the reward function according to the exemplary embodiment of the disclosure is changed to pursue the smooth of high quality score, low stuck and quality score, and thus the watching consistency in the video playing process is ensured.

The method of training a code rate determination model according to an exemplary embodiment of the present disclosure has the following advantages:

1. the understanding reinforcement learning algorithm based on the video content fully understands the coding characteristics of the video content, and presents the watching experience which is more in line with the expectation of the audience by using the video quality score.

2. Compared with the traditional modeling method: by adopting the reinforcement learning method, the algorithm through the calculation training and the neural network has stronger expression capability than that of the traditional table building method, so that the processing rule is more precise and accurate. And the characteristics of a service scene are obtained through mass data after deep reinforcement learning is successful, and a data driving mode is adopted, so that bandwidth prediction and code rate selection are more accurate and stable.

3. Compared with the traditional reinforcement learning: the video quality score is added to the input as an input feature, and a PPO network structure is adopted to train on an intelligent agent, so that the sampling performance of important features is improved; in environment, the actual on-demand service data set is adopted instead of the public data set; in the reward function, the overall pursuit of high video quality scores and the smoothness of the quality scores is achieved; the improvement enables the multi-code rate algorithm model to be more accurate and stable.

Fig. 5 is a block diagram illustrating a code rate determination model training apparatus according to an exemplary embodiment of the present disclosure. The code rate determination model training apparatus according to the exemplary embodiments of the present disclosure may be implemented in a computer device in hardware, software, or a combination of hardware and software.

As shown in fig. 5, the bitrate determination model training apparatus 500 according to an exemplary embodiment of the present disclosure may include a video slice information acquisition unit 510, a bitrate determination unit 520, and a training unit 530.

According to an example embodiment of the present disclosure, the video slice information obtaining unit 510 may be configured to obtain information on video slices included in a training sample video, where a video slice of the training sample video is transcoded into a plurality of rate steps and each rate step has a corresponding quality score, and the information on video slices includes the quality scores of the video slice in each rate step.

According to an example embodiment of the present disclosure, the bitrate determination unit 520 may be configured to obtain the network status information and the player information during a download period in which the video slices are downloaded through a pre-constructed play environment, and input the obtained network status information, player information, and information on the video slices into the video bitrate determination model to obtain a bitrate position selected for the video slices to be downloaded.

According to an example embodiment of the present disclosure, the training unit 530 may be configured to construct a reward function based on the quality score, the play stuck condition, and the number of rate-gear shifts corresponding to the rate gear selected for each video slice, and to adjust parameters of the video rate determination model using the reward function.

According to an exemplary embodiment of the present disclosure, the quality score of a video slice may be determined by a video coding objective index of the video slice and/or a subjective perception evaluation of a user, wherein the video coding objective index includes at least one of the following indexes: and evaluating and fusing the VMAF by Peak Signal Noise Ratio (PSNR), Structural Similarity Index (SSIM) or a video multi-method.

According to an example embodiment of the present disclosure, the pre-constructed playing environment may include a client player, a client buffer, and a content distribution server, wherein the client player downloads a video segment from the content distribution server and stores the downloaded video segment in the client buffer, and the client player, the client buffer, and the content distribution server have an interface therebetween for transferring a quality score of the video segment.

According to an exemplary embodiment of the present disclosure, the training sample video may be obtained from a video data set including an on-demand video and a short video.

According to an example embodiment of the present disclosure, the video bitrate determination model may have a deep reinforcement learning structure comprising a value-based deep neural network and a policy-based deep neural network, wherein the policy-based deep neural network is configured to output a bitrate notch selection for a video slice, and the value-based deep neural network is configured to score a bitrate notch selection action for the policy-based deep neural network.

According to an example embodiment of the present disclosure, a policy-based deep neural network may have fully-connected one-dimensional convolutional layers and employ a Softmax function as an activation function, wherein the policy-based deep neural network outputs selected probabilities corresponding to the plurality of rate steps.

According to an example embodiment of the present disclosure, the network status information may include bandwidth information.

According to an exemplary embodiment of the present disclosure, the player information may include a current buffer size of the player.

According to an example embodiment of the present disclosure, the information about the video slices may include a bitrate of a downloaded video slice, a data amount and a quality score of a video slice to be downloaded, and a number of remaining video slices of the training sample video.

According to an example embodiment of the present disclosure, network status information, player information, and information about video slices during a predetermined number of download periods prior to a current download period may be input to a video bitrate determination model to determine bitrate gears for the video slices to be downloaded.

The process for training the code rate determination model has already been described above with reference to fig. 2 to 4, and the description is not repeated here.

Fig. 6 shows a flowchart of a method of determining a video bitrate according to an exemplary embodiment of the present disclosure. The method may be performed on a client device, such as for playing video on demand, to enable adaptive video bitrate switching.

As shown in fig. 6, first, in step S601, information about a video slice of a video is obtained, wherein the video slice is transcoded into a plurality of rate steps and each rate step has a corresponding quality score, and the information about the video slice includes the quality score of the video slice in each rate step. When playing a VOD video, the client device may request and obtain information about video segments of the VOD video from the CDN server. The related information of the video slices according to the exemplary embodiment of the present disclosure may further include information of the number of video slices, the data amount at each rate step, and the like.

Next, in step S603, network status information and player information are acquired during a download period in which video slices are downloaded through a pre-constructed playback environment, and the acquired network status information, player information, and information on the video slices are input into a video bitrate determination model to obtain a bitrate notch selected for the video slice to be downloaded. Here, the video bitrate determination model is trained by the training method described above with reference to fig. 2 to 4.

According to an exemplary embodiment of the present disclosure, the network status information may include bandwidth information, and the network bandwidth may be determined by the data amount and the download time of the downloaded video clips.

The player information may include the current buffer size of the player.

The information about the video segments may include a bitrate of the downloaded video segments, a data amount and a quality score of the video segments to be downloaded, and a number of remaining video segments of the VOD video.

Then, in step S605, a request may be made to download the video segment corresponding to the code rate gear of the selected video segment to be downloaded. That is, after determining a bitrate allocation for a video slice to be downloaded, the client device may request the CDN server to download the video slice at the bitrate allocation.

By the method for determining the code rate gear, various scene conditions such as a network and a player can be fully considered in a video-on-demand scene, multi-code-rate switching is carried out based on video quality perception, the watching experience of a user is improved, and the user can enjoy a clearer and smoother watching effect.

Fig. 7 is a block diagram illustrating a code rate determination apparatus according to an exemplary embodiment.

As shown in fig. 7, the bitrate determination apparatus 700 may include an information acquisition unit 710, a bitrate determination unit 720, and a video slice download unit 730.

The information obtaining unit 710 is configured to obtain information on a video slice of a video, wherein the video slice is transcoded into a plurality of rate steps and each rate step has a corresponding quality score, and the information on the video slice comprises the quality score of the video slice in each rate step.

The bitrate determination unit 720 is configured to obtain the network status information and the player information during a download period of the downloaded video slice, and input the obtained network status information, player information and information on the video slice into the video bitrate determination model to obtain a bitrate position selected for the video slice to be downloaded. Here, the video bitrate determination model is trained by the training method described above with reference to fig. 2 to 4.

The video segment downloading unit 730 is configured to request to download the video segment corresponding to the selected code rate gear of the video segment to be downloaded.

Fig. 8 is a block diagram illustrating an electronic device 800 for determining a video bitrate according to an exemplary embodiment of the disclosure. The electronic device 800 may be, for example: a smart phone, a tablet computer, an MP4(Moving Picture Experts Group Audio Layer IV) player, a notebook computer or a desktop computer. The electronic device 800 may also be referred to by other names such as user equipment, portable terminal, laptop terminal, desktop terminal, and so forth.

In general, the electronic device 800 includes: a processor 801 and a memory 802.

The processor 801 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so forth. The processor 801 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 801 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 801 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed on the display screen. In an exemplary embodiment of the present disclosure, the processor 801 may further include an AI (Artificial Intelligence) processor for processing a computing operation related to machine learning.

Memory 802 may include one or more computer-readable storage media, which may be non-transitory. Memory 802 may also include high speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer-readable storage medium in memory 802 is used to store at least one instruction for execution by processor 801 to implement a method of training a rate determination model and/or a method of rate determination of an example embodiment of the present disclosure.

In some embodiments, the electronic device 800 may further optionally include: a peripheral interface 803 and at least one peripheral. The processor 801, memory 802 and peripheral interface 803 may be connected by bus or signal lines. Various peripheral devices may be connected to peripheral interface 803 by a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of a radio frequency circuit 804, a touch screen display 805, a camera 806, an audio circuit 807, a positioning component 808, and a power supply 809.

The peripheral interface 803 may be used to connect at least one peripheral related to I/O (Input/Output) to the processor 801 and the memory 802. In some embodiments, the processor 801, memory 802, and peripheral interface 803 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 801, the memory 802, and the peripheral interface 803 may be implemented on separate chips or circuit boards, which are not limited by this embodiment.

The Radio Frequency circuit 804 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuitry 804 communicates with communication networks and other communication devices via electromagnetic signals. The rf circuit 804 converts an electrical signal into an electromagnetic signal to be transmitted, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 804 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuit 804 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: metropolitan area networks, various generation mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the radio frequency circuit 804 may also include NFC (Near Field Communication) related circuits, which are not limited by this disclosure.

The display screen 805 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display 805 is a touch display, the display 805 also has the ability to capture touch signals on or above the surface of the display 805. The touch signal may be input to the processor 801 as a control signal for processing. At this point, the display 805 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display 805 may be one, disposed on the front panel of the electronic device 800; in other embodiments, the display 805 may be at least two, respectively disposed on different surfaces of the terminal 800 or in a folded design; in still other embodiments, the display 805 may be a flexible display disposed on a curved surface or a folded surface of the terminal 800. Even further, the display 805 may be arranged in a non-rectangular irregular pattern, i.e., a shaped screen. The Display 805 can be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), and other materials.

The camera assembly 806 is used to capture images or video. Optionally, camera assembly 806 includes a front camera and a rear camera. Generally, a front camera is disposed at a front panel of the terminal, and a rear camera is disposed at a rear surface of the terminal. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 806 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.

The audio circuit 807 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 801 for processing or inputting the electric signals to the radio frequency circuit 804 to realize voice communication. For the purpose of stereo sound collection or noise reduction, a plurality of microphones may be provided at different portions of the terminal 800. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 801 or the radio frequency circuit 804 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, the audio circuitry 807 may also include a headphone jack.

The positioning component 808 is configured to locate a current geographic Location of the electronic device 800 to implement navigation or LBS (Location Based Service). The Positioning component 808 may be a Positioning component based on the GPS (Global Positioning System) in the united states, the beidou System in china, the graves System in russia, or the galileo System in the european union.

The power supply 809 is used to power the various components in the electronic device 800. The power supply 809 can be ac, dc, disposable or rechargeable. When the power source 809 comprises a rechargeable battery, the rechargeable battery may support wired or wireless charging. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, the electronic device 800 also includes one or more sensors 810. The one or more sensors 810 include, but are not limited to: acceleration sensor 811, gyro sensor 812, pressure sensor 813, fingerprint sensor 814, optical sensor 815 and proximity sensor 816.

The acceleration sensor 311 can detect the magnitude of acceleration in three coordinate axes of the coordinate system established with the terminal 800. For example, the acceleration sensor 811 may be used to detect the components of the gravitational acceleration in three coordinate axes. The processor 801 may control the touch screen 805 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 811. The acceleration sensor 811 may also be used for acquisition of motion data of a game or a user.

The gyro sensor 812 may detect a body direction and a rotation angle of the terminal 800, and the gyro sensor 812 may cooperate with the acceleration sensor 811 to acquire a 3D motion of the user with respect to the terminal 800. From the data collected by the gyro sensor 812, the processor 801 may implement the following functions: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.

Pressure sensors 813 may be disposed on the side bezel of terminal 800 and/or underneath touch display 805. When the pressure sensor 813 is disposed on the side frame of the terminal 800, the holding signal of the user to the terminal 800 can be detected, and the processor 801 performs left-right hand recognition or shortcut operation according to the holding signal collected by the pressure sensor 813. When the pressure sensor 813 is disposed at a lower layer of the touch display screen 805, control of an operability control on the UI is realized by the processor 801 according to a pressure operation of the user on the touch display screen 805. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.

The fingerprint sensor 814 is used for collecting a fingerprint of the user, and the processor 801 identifies the identity of the user according to the fingerprint collected by the fingerprint sensor 814, or the fingerprint sensor 814 identifies the identity of the user according to the collected fingerprint. Upon identifying that the user's identity is a trusted identity, the processor 801 authorizes the user to perform relevant sensitive operations including unlocking a screen, viewing encrypted information, downloading software, paying for and changing settings, etc. Fingerprint sensor 814 may be disposed on the front, back, or side of electronic device 800. When a physical button or vendor Logo is provided on the electronic device 800, the fingerprint sensor 814 may be integrated with the physical button or vendor Logo.

The optical sensor 815 is used to collect the ambient light intensity. In one embodiment, the processor 801 may control the display brightness of the touch screen 805 based on the ambient light intensity collected by the optical sensor 815. Specifically, when the ambient light intensity is high, the display brightness of the touch display screen 805 is increased; when the ambient light intensity is low, the display brightness of the touch display 805 is turned down. In another embodiment, the processor 801 may also dynamically adjust the shooting parameters of the camera assembly 806 based on the ambient light intensity collected by the optical sensor 815.

A proximity sensor 816, also known as a distance sensor, is typically disposed on the front panel of the electronic device 800. The proximity sensor 816 is used to capture the distance between the user and the front of the electronic device 800. In one embodiment, when the proximity sensor 816 detects that the distance between the user and the front surface of the terminal 800 gradually decreases, the processor 801 controls the touch display 805 to switch from the bright screen state to the dark screen state; when the proximity sensor 816 detects that the distance between the user and the front surface of the electronic device 800 becomes gradually larger, the processor 801 controls the touch display 805 to switch from the breath screen state to the bright screen state.

Those skilled in the art will appreciate that the configuration shown in fig. 8 does not constitute a limitation of electronic device 800, and may include more or fewer components than shown, or combine certain components, or employ a different arrangement of components.

According to an embodiment of the present disclosure, there may also be provided a computer-readable storage medium storing instructions that, when executed by at least one processor, cause the at least one processor to perform a training method and/or a code rate determination method of a code rate determination model according to the present disclosure. Examples of the computer-readable storage medium herein include: read-only memory (ROM), random-access programmable read-only memory (PROM), electrically erasable programmable read-only memory (EEPROM), random-access memory (RAM), dynamic random-access memory (DRAM), static random-access memory (SRAM), flash memory, non-volatile memory, CD-ROM, CD-R, CD + R, CD-RW, CD + RW, DVD-ROM, DVD-R, DVD + R, DVD-RW, DVD + RW, DVD-RAM, BD-ROM, BD-R, BD-R LTH, BD-RE, Blu-ray or compact disc memory, Hard Disk Drive (HDD), solid-state drive (SSD), card-type memory (such as a multimedia card, a Secure Digital (SD) card or a extreme digital (XD) card), magnetic tape, a floppy disk, a magneto-optical data storage device, an optical data storage device, a hard disk, a magnetic tape, a magneto-optical data storage device, a hard disk, a magnetic tape, a magnetic data storage device, a magnetic tape, a magnetic data storage device, a magnetic tape, a magnetic data storage device, a magnetic tape, a magnetic data storage device, a magnetic tape, a magnetic data storage device, A solid state disk, and any other device configured to store and provide a computer program and any associated data, data files, and data structures to a processor or computer in a non-transitory manner such that the processor or computer can execute the computer program. The computer program in the computer-readable storage medium described above can be run in an environment deployed in a computer apparatus, such as a client, a host, a proxy device, a server, and the like, and further, in one example, the computer program and any associated data, data files, and data structures are distributed across a networked computer system such that the computer program and any associated data, data files, and data structures are stored, accessed, and executed in a distributed fashion by one or more processors or computers.

According to an embodiment of the present disclosure, there may also be provided a computer program product, instructions of which are executable by a processor of a computer device to implement a training method and/or a code rate determination method of a code rate determination model according to an exemplary embodiment of the present disclosure.

According to the training method and/or the code rate determination method of the code rate determination model disclosed by the exemplary embodiment of the disclosure, various scene conditions such as a network and a player are fully considered, multi-code rate switching is performed based on video quality perception, the watching experience of a user is improved, and the user can enjoy a clearer and smoother watching effect.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A training method for a video bitrate determination model, comprising:

acquiring information about video fragments included in a training sample video, wherein the video fragments of the training sample video are transcoded into a plurality of code rate gears and each code rate gear has a corresponding quality score, and the information about the video fragments comprises the quality scores of the video fragments in each code rate gear;

acquiring network state information and player information during a downloading period for downloading video fragments through a pre-constructed playing environment, and inputting the acquired network state information, player information and information about the video fragments into a video code rate determination model to obtain code rate gears selected for the video fragments to be downloaded;

and constructing a reward function based on the quality score, the playing pause condition and the code rate gear switching times corresponding to the code rate gear selected for each video fragment, and adjusting the parameters of the video code rate determination model by using the reward function.

2. The method of claim 1, wherein the pre-built playback environment comprises a client player, a client buffer, and a content distribution server, wherein the client player downloads video segments from the content distribution server and stores the downloaded video segments in the client buffer, and wherein the client player, the client buffer, and the content distribution server have an interface therebetween for communicating quality scores for the video segments.

3. The method of claim 1, wherein the video bitrate determination model has a deep reinforcement learning structure comprising a value-based deep neural network and a policy-based deep neural network, wherein the policy-based deep neural network is configured to output a bitrate notch selection for a video slice, and wherein the value-based deep neural network is configured to score a bitrate notch selection action for the policy-based deep neural network.

4. The method of claim 1, wherein,

the network state information includes bandwidth information;

the player information includes a current buffer size of the player;

the information about the video slices comprises the code rate of the downloaded video slices, the data volume and the quality fraction of the video slices to be downloaded and the number of the remaining video slices of the training sample video.

5. A training apparatus for a video bitrate determination model, comprising:

a video slicing information obtaining unit configured to obtain information about video slices included in a training sample video, wherein the video slices of the training sample video are transcoded into a plurality of rate steps and each rate step has a corresponding quality score, and the information about the video slices includes the quality scores of the video slices at each rate step;

a code rate determination unit configured to acquire network state information and player information during a download period in which video slices are downloaded through a pre-constructed play environment, and input the acquired network state information, player information, and information on the video slices into a video code rate determination model to obtain a code rate gear selected for the video slices to be downloaded;

a training unit configured to construct a reward function based on the quality score, the playing stuck condition and the number of times of code rate gear switching corresponding to the selected code rate gear for each video slice, and to adjust parameters of the video code rate determination model using the reward function.

6. The apparatus of claim 5, wherein,

the network state information includes bandwidth information;

the player information includes a current buffer size of the player;

7. A method for determining a video bitrate, comprising:

obtaining information about video segments of a video, wherein the video segments are transcoded into a plurality of code rate steps and each code rate step has a corresponding quality score, and the information about the video segments comprises the quality scores of the video segments in each code rate step;

acquiring network state information and player information during a downloading period of a downloaded video fragment, and inputting the acquired network state information, player information and information about the video fragment into a video code rate determination model to obtain a code rate gear selected for the video fragment to be downloaded;

requesting to download the video fragment corresponding to the code rate gear of the selected video fragment to be downloaded,

wherein the video bitrate determination model is trained based on the method of any one of claims 1 to 4.

8. An apparatus for determining a video bitrate, comprising:

the information acquisition unit is configured to acquire information about video fragments of a video, wherein the video fragments are transcoded into a plurality of code rate gears and each code rate gear has a corresponding quality score, and the information about the video fragments comprises the quality scores of the video fragments in each code rate gear;

a code rate determination unit configured to acquire network state information and player information during a download period of the downloaded video slice, and input the acquired network state information, player information and information on the video slice into a video code rate determination model to obtain a code rate gear selected for the video slice to be downloaded;

a video segment downloading unit configured to request downloading of a video segment corresponding to a code rate position of the selected video segment to be downloaded,

9. An electronic device, comprising:

at least one processor;

at least one memory storing computer-executable instructions,

wherein the computer-executable instructions, when executed by the at least one processor, cause the at least one processor to perform the method of any one of claims 1 to 4 or claim 7.

10. A storage medium having instructions that, when executed by a processor of an electronic device, enable the electronic device to perform the method of any of claims 1 to 4 or claim 7.