CN114040230B

CN114040230B - Video code rate determining method and device, electronic equipment and storage medium thereof

Info

Publication number: CN114040230B
Application number: CN202111315458.2A
Authority: CN
Inventors: 杨啖; 周超
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2021-11-08
Filing date: 2021-11-08
Publication date: 2024-03-29
Anticipated expiration: 2041-11-08
Also published as: CN114040230A

Abstract

The disclosure provides a video code rate determining method, a device, an electronic device and a storage medium thereof, wherein the video code rate determining method comprises the following steps: acquiring information about video slices of a video, wherein the video slices are transcoded into a plurality of code rate steps and each code rate step has a corresponding quality score, and the information about the video slices comprises the quality score of the video slices under each code rate step; acquiring network state information and player information during a downloading period of downloading video clips, and inputting the acquired network state information, player information and information about the video clips into a video code rate determination model to obtain code rate gears selected for the video clips to be downloaded; requesting to download the video clips corresponding to the code rate gear of the selected video clip to be downloaded. The method and the device can adaptively switch the video code rate based on the video quality perception information, so that a smoother video watching experience is provided.

Description

Video code rate determining method and device, electronic equipment and storage medium thereof

Technical Field

The disclosure relates to the technical field of internet, and in particular relates to a method, a device, electronic equipment and a storage medium for determining a code rate of a video to be downloaded.

Background

In recent years, with the further development of mobile internet and 4G and 5G technologies, streaming media services are increasingly popular. In the multimedia industry, a range of mature Video On Demand (VOD) platform companies are emerging. Therefore, the method is of great significance to the very fast-growing on-demand service for researching how to improve the user viewing experience of VOD users.

In VOD services, a multi-rate technique is typically used to ensure that users enjoy high-definition, low-catton viewing quality. The multi-code rate technology is to provide different resolution gear (such as super-definition, high definition, standard definition, fluency and the like) for the user, and the user can select the adaptive definition according to the quality of the network environment. Because of the interaction required by the user's selection resolution, VOD service providers choose to develop automatic algorithms to automatically adapt the resolution appropriate for the user, a mechanism known as the multiple bit rate adaptation Algorithm (ABR). Therefore, research on the ABR algorithm is of great significance to the user experience.

However, the ABR method of the related art generally causes a user to watch a card after the network is severely dithered, and the code rate gear is frequently switched, so that the understanding of video quality is lacking, and the problem that the bandwidth cannot be predicted cannot be solved.

Disclosure of Invention

The present disclosure provides a method, apparatus, electronic device, and storage medium for determining a video bitrate and training a video bitrate determination model, so as to at least solve the problems of adaptively switching video bitrates in the related art, or not solve any of the above problems.

According to a first aspect of the present disclosure, there is provided a training method for a video bitrate determination model, comprising: acquiring information about video slices included in a training sample video, wherein the video slices of the training sample video are transcoded into a plurality of code rate steps and each code rate step has a corresponding quality score, and the information about the video slices comprises the quality scores of the video slices under each code rate step; acquiring network state information and player information during a downloading period of downloading video clips through a pre-constructed playing environment, and inputting the acquired network state information, player information and information about the video clips into a video code rate determining model to obtain code rate gears selected for the video clips to be downloaded; and constructing a reward function based on the quality score, the play clamping condition and the code rate gear switching times corresponding to the code rate gear selected for each video slice, and adjusting parameters of the video code rate determination model by using the reward function.

According to a first aspect of the disclosure, the quality score of a video slice is determined by a video coding objective index of the video slice and/or subjective perception assessment of a user, wherein the video coding objective index comprises at least one of the following indexes: the peak signal to noise ratio PSNR, structural similarity index SSIM, or video multi-method evaluates the fusion VMAF.

According to a first aspect of the present disclosure, the pre-built playing environment includes a client player, a client buffer, and a content distribution server, wherein the client player downloads video clips from the content distribution server and stores the downloaded video clips into the client buffer, and interfaces for delivering quality scores of the video clips are provided between the client player, the client buffer, and the content distribution server.

According to a first aspect of the disclosure, the training sample video is obtained from a video dataset comprising on-demand video and short video.

According to a first aspect of the disclosure, the video bitrate determination model has a depth reinforcement learning structure comprising a value-based depth neural network and a policy-based depth neural network, wherein the policy-based depth neural network is configured to output a bitrate gear selection for video slicing, the value-based depth neural network is configured to score a bitrate gear selection action for the policy-based depth neural network.

According to a first aspect of the present disclosure, a policy-based deep neural network has fully connected one-dimensional convolutional layers and employs a Softmax function as an activation function, wherein the policy-based deep neural network outputs probabilities of selection corresponding to the plurality of code rate steps.

According to a first aspect of the present disclosure, the network status information comprises bandwidth information; the player information includes the current buffer size of the player; the information about the video clips includes a code rate of the downloaded video clips, a data amount and a quality score of the video clips to be downloaded, and a number of remaining video clips of the training sample video.

According to a first aspect of the present disclosure, network state information, player information and information about video clips during a predetermined number of download periods preceding a current download period are input into a video bitrate determination model to determine a bitrate gear for a video clip to be downloaded.

According to a second aspect of the present disclosure, there is provided a training apparatus for a video bitrate determination model, comprising: a video slicing information obtaining unit configured to obtain information about video slices included in the training sample video, wherein the video slices of the training sample video are transcoded into a plurality of code rate steps and each code rate step has a corresponding quality score, and the information about the video slices includes the quality score of the video slices at each code rate step; a code rate determining unit configured to acquire network state information and player information during a download period of downloading video clips through a pre-constructed play environment, and input the acquired network state information, player information and information about the video clips into a video code rate determining model to obtain code rate gears selected for the video clips to be downloaded; and a training unit configured to construct a bonus function based on a quality score corresponding to a code rate gear selected for each video clip, a play-out stuck condition, and a code rate gear switching number, and adjust parameters of the video code rate determination model using the bonus function.

According to a second aspect of the present disclosure, the quality score of a video slice is determined by a video coding objective index of the video slice and/or subjective perception assessment of a user, wherein the video coding objective index comprises at least one of the following indexes: the peak signal to noise ratio PSNR, structural similarity index SSIM, or video multi-method evaluates the fusion VMAF.

According to a second aspect of the present disclosure, the pre-built playback environment includes a client player, a client buffer, and a content distribution server, wherein the client player downloads video clips from the content distribution server and stores the downloaded video clips into the client buffer, and interfaces for delivering quality scores of the video clips are provided between the client player, the client buffer, and the content distribution server.

According to a second aspect of the present disclosure, the training sample video is obtained from a video dataset comprising on-demand video and short video.

According to a second aspect of the disclosure, the video bitrate determination model has a depth reinforcement learning structure comprising a value-based depth neural network and a policy-based depth neural network, wherein the policy-based depth neural network is configured to output a bitrate gear selection for video slicing, the value-based depth neural network is configured to score a bitrate gear selection action for the policy-based depth neural network.

According to a second aspect of the present disclosure, a policy-based deep neural network has fully connected one-dimensional convolutional layers and employs a Softmax function as an activation function, wherein the policy-based deep neural network outputs probabilities of selection corresponding to the plurality of code rate gears.

According to a second aspect of the present disclosure, the network status information includes bandwidth information; the player information includes the current buffer size of the player; the information about the video clips includes a code rate of the downloaded video clips, a data amount and a quality score of the video clips to be downloaded, and a number of remaining video clips of the training sample video.

According to a second aspect of the present disclosure, network state information, player information, and information about video clips during a predetermined number of download periods preceding a current download period are input into a video bitrate determination model to determine a bitrate gear for a video clip to be downloaded.

According to a third aspect of the present disclosure, there is provided a video bitrate determining method, including: acquiring information about video slices of a video, wherein the video slices are transcoded into a plurality of code rate steps and each code rate step has a corresponding quality score, and the information about the video slices comprises the quality score of the video slices under each code rate step; acquiring network state information and player information during a downloading period of downloading video clips, and inputting the acquired network state information, player information and information about the video clips into a video code rate determination model to obtain code rate gears selected for the video clips to be downloaded; requesting to download the video clips corresponding to the code rate gear of the selected video clip to be downloaded, wherein the video code rate determination model is trained based on the method.

According to a third aspect of the present disclosure, the network status information comprises bandwidth information; the player information includes the current buffer size of the player; the information about the video clips includes a code rate of the downloaded video clips, a data amount and a quality score of the video clips to be downloaded, and a number of remaining video clips of the video.

According to a fourth aspect of the present disclosure, there is provided a video bitrate determining apparatus, including: an information acquisition unit configured to acquire information about a video slice of a video, wherein the video slice is transcoded into a plurality of code rate steps and each code rate step has a corresponding quality score, the information about the video slice including the quality score of the video slice at each code rate step; a code rate determining unit configured to acquire network state information and player information during a download period of a downloaded video clip, and input the acquired network state information, player information, and information about the video clip into a video code rate determining model to obtain a code rate gear selected for the video clip to be downloaded; and the video fragment downloading unit is configured to request to download the video fragments corresponding to the code rate gear of the selected video fragments to be downloaded, wherein the video code rate determining model is trained based on the method.

According to a fourth aspect of the present disclosure, the network status information includes bandwidth information; the player information includes the current buffer size of the player; the information about the video clips includes a code rate of the downloaded video clips, a data amount and a quality score of the video clips to be downloaded, and a number of remaining video clips of the video.

According to a fifth aspect of the present disclosure, there is provided an electronic apparatus, comprising: at least one processor; at least one memory storing computer-executable instructions, wherein the computer-executable instructions, when executed by the at least one processor, cause the at least one processor to perform the training method and the code rate determination method as described above.

According to a sixth aspect of the present disclosure, there is provided a storage medium, which when executed by a processor of an electronic device, enables the electronic device to perform the training method and the code rate determination method as described above.

According to a seventh aspect of the present disclosure, there is provided a computer program product, instructions in which are executed by at least one processor in an electronic device to perform the training method and the code rate determination method as described above.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects: according to the training method and/or the code rate determining method of the code rate determining model, various scene conditions such as a network, a player and the like are fully considered, multi-code rate switching is performed based on video quality perception, viewing experience of a user is improved, and the user enjoys clearer and smoother viewing effects.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure and do not constitute an undue limitation on the disclosure.

Fig. 1 is a system environment illustrating a method of determining a video bitrate according to an exemplary embodiment.

Fig. 2 is a flowchart illustrating a method of training a video bitrate determination model according to an example embodiment.

Fig. 3 is a schematic diagram illustrating an on-demand environment for training a video bitrate determination model according to an example embodiment.

Fig. 4 is a schematic diagram illustrating a reinforcement learning process of a video bitrate determination model according to an example embodiment.

Fig. 5 is a block diagram illustrating an apparatus for training a video bitrate determination model according to an example embodiment.

Fig. 6 is a flowchart illustrating a method of determining a video bitrate according to an example embodiment.

Fig. 7 is a block diagram illustrating an apparatus for determining a video bitrate according to an exemplary embodiment.

Fig. 8 is a block diagram illustrating an electronic device according to an example embodiment.

Detailed Description

In order to enable those skilled in the art to better understand the technical solutions of the present disclosure, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the foregoing figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the disclosure described herein may be capable of operation in sequences other than those illustrated or described herein. The embodiments described in the examples below are not representative of all embodiments consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.

It should be noted that, in this disclosure, "at least one of the items" refers to a case where three types of juxtaposition including "any one of the items", "a combination of any of the items", "an entirety of the items" are included. For example, "including at least one of a and B" includes three cases side by side as follows: (1) comprises A; (2) comprising B; (3) includes A and B. For example, "at least one of the first and second steps is executed", that is, three cases are juxtaposed as follows: (1) performing step one; (2) executing the second step; (3) executing the first step and the second step.

Fig. 1 illustrates a system environment implementing a method of determining a video bitrate according to an exemplary embodiment of the present disclosure.

In an exemplary embodiment of the present disclosure, a scenario in which an on-demand video service is provided (e.g., a short video application service) is described as an example.

As shown in fig. 1, the system environment may include a video server 200 and a video client 100. The video server 200 transmits the video requested by the video client 100 to the video client 100. The video may be video slices of different sizes, each of which may have a different code rate (resolution). The video client 100 may be implemented on a plurality of terminal devices. Here, the terminal device may be a terminal device having a communication function and a video playing function, and for example, the terminal device in the embodiment of the present disclosure may be a mobile phone, a tablet computer, a desktop, a laptop, a handheld computer, a notebook, a netbook, a personal digital assistant (personal digital assistant, PDA), an augmented reality (augmented reality, AR)/Virtual Reality (VR) device. A video-on-demand application, such as a short video application, a live broadcast application, an online educational application, etc., may be executable on the terminal device, which the user may use to play video downloaded from the server.

It should be appreciated that the server 100 may be implemented in various ways, for example, a cluster of servers may be implemented in a distributed manner, and that the method of determining video code rates according to exemplary embodiments of the present disclosure may be implemented on distributed devices other than locally on the server storing the video.

A method of training a video bitrate determination model according to an exemplary embodiment of the present disclosure will be described below with reference to fig. 2. Training methods according to exemplary embodiments of the present disclosure may be implemented in, for example, a video service providing server, or in other electronic devices that connect to and communicate with the server. The trained video code rate determination model can be online on a client device playing video, so that adaptive code rate determination can be realized on the client device.

The adaptive code rate determination method according to the present disclosure is implemented by the design of a reinforcement learning-based video code rate determination model. The design may include two parts of a video quality perception and reinforcement learning framework. The training process will be described in detail with reference to fig. 2 to 5.

First, in step S201, information about a video slice included in a training sample video is acquired, wherein the video slice of the training sample video is transcoded into a plurality of code rate steps and each code rate step has a corresponding quality score, and the information about the video slice includes the quality score of the video slice at each code rate step. According to exemplary embodiments of the present disclosure, training sample videos may be selected from a set of short videos and on-demand videos that are in actual use. For example, the training sample video may be obtained from a short video website or a streaming video website.

According to an exemplary embodiment of the present disclosure, video clips of multiple different code rate (resolution) levels may be obtained by transcoding video in quality scores in a training sample video set. For example, one video may have 500 video slices, and each video slice may have code rate steps corresponding to four resolutions of fluency, standard definition, high definition, and super definition. It should be appreciated that code rate steps are not limited thereto and fewer or more code rate steps may be employed.

According to an exemplary embodiment of the present disclosure, the quality score of a video slice is determined by a video coding objective index of the video slice and/or subjective feeling assessment of a user, wherein the video coding objective index comprises at least one of the following indexes: peak-signal-to-noise-ratio (PSNR), structural similarity index SSIM (Structural Similarity Index), or Netflix. Subjective experience assessment may be manually marked by a user with a quality score for the look and feel of video quality. The quality score reflects the relative difference in quality between video slices at different code rate levels.

For example, for video a, the quality scores at three code rate steps, namely standard definition, high definition and super definition, can be determined to be 60, 90 and 105 according to the PSNR value. And for the video B, the quality scores under three code rate gears of standard definition, high definition and super definition can be respectively determined to be 40, 90 and 105 according to the PSNR value and subjective quality score scoring. This may be due to differences in quality scores caused by differences in video content and video coding. That is, the quality scores are consistent in the same gear of the same video, but the data amounts of the video clips of the same gear of the video have great differences, the specific sizes of which are related to the video content.

It should be understood that the embodiments of the present disclosure are not limited in the manner of determining the quality score, as long as the video quality of different code rate steps can be reflected objectively and subjectively.

By adding quality perception information to the information of the video slices of the training sample video, the coding characteristics of the video can be understood based on the video content in subsequent reinforcement learning, thereby presenting a viewing experience that is more consistent with the desires of the user viewing the video.

Next, in step S202, network state information and player information are acquired during a download period in which video clips are downloaded through a pre-built playback environment, and the acquired network state information, player information, and information about the video clips are input into a video rate determination model to obtain a rate shift selected for the video clips to be downloaded.

According to an exemplary embodiment of the present disclosure, as shown in fig. 3, a pre-built playback environment may include a client player, a client buffer, and a content delivery CDN server, wherein the client player downloads video clips from the content delivery server and stores the downloaded video clips into the client buffer, and interfaces for delivering quality scores of the video clips are provided between the client player, the client buffer, and the content delivery server.

In the video-on-demand service scene, a video source is uploaded to a server after being collected and encoded, the server transcodes the video into high-definition, low-definition, smooth and other definition videos, and forwards the corresponding video to a CDN server. The client player downloads the video content to the local client buffer by fixedly sending a download request to the CDN server so as to complete playing of the whole video, and in the playing process, if a change of the network speed is observed, the player automatically switches through a multi-code rate algorithm (for example, an ABR algorithm shown in fig. 3) model, thereby realizing self-adaptive decision of the code rate. Therefore, the on-demand scene is simplified, the on-demand scene is integrated with the multi-code rate algorithm interface through code development, and the basis is provided for training the multi-code rate algorithm model through simulating downloading and playing behaviors of different network environments and video sources.

When the playing environment shown in fig. 3 is constructed, the related interfaces of video quality scores are added besides mechanisms such as playing, downloading, clamping and the like, so that the video quality information can be conveniently utilized.

The network data set and the video data set used for reinforcement learning in the related art are public open source data sets, and the data sets according to the exemplary embodiment of the present disclosure mostly adopt the network data sets in the actual video-on-demand and short video service, so that the video-on-demand and short video service scenes can be more ensured to be met.

After the training sample video is obtained and the playback environment is constructed as described above, reinforcement learning of the video rate determination model according to an exemplary embodiment of the present disclosure will be described below with reference to fig. 4.

Fig. 4 is a schematic diagram illustrating a reinforcement learning process of a video bitrate determination model according to an example embodiment. As shown in fig. 4, reinforcement learning can be generally divided into five parts: agents (agents), environments (environments), actions (actions), rewards functions (review), and observation states (Observe State). In reinforcement learning, the agent makes corresponding actions through interactions with the environment, which gives corresponding feedback, e.g., through a reward function, telling the agent whether the actions are beneficial or detrimental, thereby adjusting the agent's training in the correct direction. In an exemplary embodiment according to the present disclosure, the playing environment as described above is the environment in fig. 4, and the agent is an adaptive code rate determination model residing in the player, which can make a corresponding code rate shift selection action by observing the state of the network, the player, etc., and then the bonus function can feed back the code rate shift selection action based on a predetermined principle.

According to an example embodiment of the present disclosure, the video bitrate determination model may be a depth reinforcement learning structure including a value-based depth neural network configured to output a bitrate gear selection for video slicing and a policy-based depth neural network configured to score a bitrate gear selection action for the policy-based depth neural network.

In recent years, the development of reinforcement learning has incorporated deep neural networks, which are collectively referred to as deep reinforcement learning. Deep reinforcement learning with neural networks can have two types of branches: value based and Policy based, value based policies are suitable for solving the problem of discrete spatial decisions and small decision space, and Policy based policies are suitable for continuous spatial decisions. Approximate policy optimization (PPO, proximal Policy Optimization) reinforcement learning structures in the Actor-Critic structure employed by exemplary embodiments of the present disclosure. According to the exemplary embodiment of the present disclosure, the Actor network structure selects a mode of combining full connection with one-dimensional convolution, and outputs a code rate gear to be selected (decided). According to an exemplary embodiment of the present disclosure, the activation function of the Actor network may employ a Softmax function. The Critic network selects a mode of combining full connection with one-dimensional convolution and outputs scores of the action of the Actor, so that the action of the Actor tends to converge. In an exemplary embodiment of the present disclosure, status information about a playback environment, information about a downloaded video clip, and information about a video clip to be downloaded may be input to an Actor network and a Critic network, thereby resulting in a selection action for a code rate gear of the video clip to be downloaded.

According to an exemplary embodiment of the present disclosure, the network status information may include bandwidth information, and the bandwidth may be determined according to the amount of data and a download period of the last video clip download. The player information may include the current buffer size of the player. The information about the video clips may include a code rate of the downloaded video clips, a data amount and quality score of the video clips to be downloaded, and a number of remaining video clips of the training sample video. It should be appreciated that the above is merely an example, and that other suitable metrics may be employed to reflect the network status, player status, and related information of the video clips.

For example, the data amount of the downloaded video clips, the download time of the downloaded video clips, the size of each code rate shift of the video clips to be downloaded, the current buffer size, the number of remaining video clips of the video, and the status information of the code rate of the previous video clip may be integrated into one 1*7 vector as shown in table 1 below, and then the above information of the download period of the past predetermined number (e.g., 5) of video clips may be input into the code rate determination model as an input matrix of 5*7. The code rate determination model may output an Action matrix of selection probability values for each code rate gear, the sum of the probability values for each Action being 1. Thus, the code rate gear with the highest probability is selected in each action, namely the code rate gear of the downloaded video clips of the player.

It should be understood that the above information is only an example, and those skilled in the art may add other information or reduce the vector dimension of the information of the input model according to actual needs.

Finally, in step S203, a bonus function is constructed based on the quality score, the play-out stuck condition, and the number of code rate shift times corresponding to the code rate shift selected for each video clip, and parameters of the video code rate determination model are adjusted using the bonus function.

The reward function is the most important design in reinforcement learning, which determines whether or not it can instruct the agent to select the correct action. For on-demand scenes, the bonus function should be able to correctly reflect the user viewing experience, e.g., the user is more inclined to see higher definition video and more inclined to see smoother video. Thus, in an exemplary embodiment according to the present disclosure, sharpness and katon are to be important indicators of the bonus function. In addition, users generally dislikeThe bonus function according to exemplary embodiments of the present disclosure also factors the number of code rate range switches as a bonus function because of a high definition, blurry experience during viewing. Accordingly, a bonus function according to an exemplary embodiment of the present disclosure may be designed as a linear function, for example, as follows:

Where N represents the index of the video slices, N represents the total number of video slices of the video, Q (Score _n ) Quality Score value, Q (Score) _n-1 ) Quality score value, T, representing the gear of the previous video clip _n Representing the length of the clip of the current video clip recorded by the player, β representing the penalty factor for clip, and γ representing the penalty factor for switch frequency.

The main difference between the reward function and the reinforcement learning multi-code rate algorithm of the related technology is that: at present, the traditional reinforcement learning is high-definition, low-jamming and overall code rate stabilization, but in practice, the same gear can also have different code rates, so that the reward function according to the exemplary embodiment of the present disclosure is changed to pursue the stabilization of high quality score, low-jamming and quality score, so that the viewing consistency in the video playing process is ensured.

The method of training a code rate determination model according to an exemplary embodiment of the present disclosure has the following advantages:

1. the video content-based understanding reinforcement learning algorithm fully understands the coding characteristics of the video content, and utilizes the video quality score to present viewing experience which is more in line with the expected viewing experience of audiences.

2. Compared with the traditional modeling method: by adopting the reinforcement learning method, the algorithm has stronger expression capacity than the algorithm of the traditional meter building method through calculation training and the neural network, so that the rule which can be processed is finer and more accurate. And the characteristics of the service scene are successfully obtained through massive data by deep reinforcement learning, and a data driving mode is adopted, so that bandwidth prediction and code rate selection are more accurate and stable.

3. Compared with the traditional reinforcement learning: the video quality fraction is added as an input characteristic on the input, and the network structure of the PPO is adopted for training on an intelligent body, so that the sampliness of important characteristics is increased; in the environment, the actual on-demand service data set is adopted instead of the public data set; in the bonus function, the high video quality score and the smoothness of the quality score are pursued as a whole; the improvement enables the multi-code rate algorithm model to be more accurate and stable.

Fig. 5 is a block diagram illustrating a code rate determination model training apparatus according to an exemplary embodiment of the present disclosure. The code rate determination model training apparatus according to the exemplary embodiments of the present disclosure may be implemented in a computer device in hardware, software, or a combination of hardware and software.

As shown in fig. 5, a rate determination model training apparatus 500 according to an exemplary embodiment of the present disclosure may include a video clip information acquisition unit 510, a rate determination unit 520, and a training unit 530.

According to an exemplary embodiment of the present disclosure, the video clip information obtaining unit 510 may be configured to obtain information about a video clip included in a training sample video, wherein the video clip of the training sample video is transcoded into a plurality of code rate steps and each code rate step has a corresponding quality score, and the information about the video clip includes the quality score of the video clip at each code rate step.

According to an exemplary embodiment of the present disclosure, the code rate determining unit 520 may be configured to acquire network state information and player information during a download period of downloading video clips through a pre-built playback environment, and input the acquired network state information, player information, and information about the video clips into a video code rate determining model to obtain a code rate gear selected for the video clips to be downloaded.

According to an exemplary embodiment of the present disclosure, the training unit 530 may be configured to construct a bonus function based on a quality score, a play-stuck condition, and a number of rate shift times corresponding to a rate shift selected for each video clip, and adjust parameters of the video rate determination model using the bonus function.

According to an exemplary embodiment of the present disclosure, the quality score of a video slice may be determined by a video coding objective index of the video slice and/or subjective feeling assessment of a user, wherein the video coding objective index comprises at least one of the following indexes: the peak signal to noise ratio PSNR, structural similarity index SSIM, or video multi-method evaluates the fusion VMAF.

According to an exemplary embodiment of the present disclosure, the pre-built playback environment may include a client player, a client buffer, and a content distribution server, wherein the client player downloads video clips from the content distribution server and stores the downloaded video clips into the client buffer, and an interface for delivering quality scores of the video clips is provided between the client player, the client buffer, and the content distribution server.

According to an exemplary embodiment of the present disclosure, the training sample video may be obtained from a video data set including on-demand video and short video.

According to an example embodiment of the present disclosure, the video bitrate determination model may have a depth reinforcement learning structure including a value-based depth neural network configured to output a bitrate gear selection for video slicing and a policy-based depth neural network configured to score a bitrate gear selection action for the policy-based depth neural network.

According to an exemplary embodiment of the present disclosure, a policy-based deep neural network may have fully connected one-dimensional convolutional layers and employ a Softmax function as an activation function, wherein the policy-based deep neural network outputs probabilities of selection corresponding to the plurality of code rate gears.

According to an exemplary embodiment of the present disclosure, the network status information may include bandwidth information.

According to an exemplary embodiment of the present disclosure, the player information may include a current buffer size of the player.

According to an exemplary embodiment of the present disclosure, the information about the video clips may include a code rate of the downloaded video clips, a data amount and a quality score of the video clips to be downloaded, and the number of remaining video clips of the training sample video.

According to an exemplary embodiment of the present disclosure, network state information, player information, and information about video clips during a predetermined number of download periods prior to a current download period may be input into a video bitrate determination model to determine a bitrate gear for a video clip to be downloaded.

The procedure for training the code rate determination model has been described above with reference to fig. 2 to 4, and will not be repeated here.

Fig. 6 shows a flowchart of a method of determining a video bitrate according to an exemplary embodiment of the present disclosure. The method may be performed on a client device, such as for playing video-on-demand, to enable adaptive video rate switching.

As shown in fig. 6, first, in step S601, information about a video clip of a video is acquired, wherein the video clip is transcoded into a plurality of code rate steps and each code rate step has a corresponding quality score, and the information about the video clip includes the quality score of the video clip at each code rate step. When playing a VOD video, the client device may request and acquire related information of a video clip of the VOD video from the CDN server. The related information of the video clips according to the exemplary embodiments of the present disclosure may further include information of the number of video clips, the amount of data at each code rate shift, and the like.

Next, in step S603, network state information and player information are acquired during a download period of downloading video clips through a pre-built playback environment, and the acquired network state information, player information, and information about the video clips are input into a video rate determination model to obtain a rate shift selected for the video clips to be downloaded. Here, the video rate determination model is trained by the training method described above with reference to fig. 2 to 4.

According to an exemplary embodiment of the present disclosure, the network status information may include bandwidth information, and the network bandwidth may be determined by the data amount of the downloaded video clips and the download time.

The player information may include the current buffer size of the player.

The information about the video clips may include a code rate of the downloaded video clips, a data amount and quality score of the video clips to be downloaded, and the number of remaining video clips of the VOD video.

Then, in step S605, it may be requested to download a video clip corresponding to the code rate shift of the selected video clip to be downloaded. That is, after determining the code rate level of the video clip to be downloaded, the client device may request the CDN server to download the video clip of the code rate level.

By the method for determining the code rate gear, various scene conditions such as a network and a player can be fully considered under the scene of video on demand, multi-code rate switching is performed based on video quality perception, viewing experience of a user is improved, and the user enjoys clearer and smoother viewing effects.

Fig. 7 is a block diagram illustrating a code rate determining apparatus according to an exemplary embodiment.

As shown in fig. 7, the code rate determining apparatus 700 may include an information acquisition unit 710, a code rate determining unit 720, and a video slice downloading unit 730.

The information obtaining unit 710 is configured to obtain information about a video slice of a video, wherein the video slice is transcoded into a plurality of code rate steps and each code rate step has a corresponding quality score, the information about the video slice comprising the quality score of the video slice at each code rate step.

The code rate determination unit 720 is configured to acquire network state information and player information during a download period of a downloaded video clip, and input the acquired network state information, player information, and information about the video clip into a video code rate determination model to obtain a code rate gear selected for the video clip to be downloaded. Here, the video rate determination model is trained by the training method described above with reference to fig. 2 to 4.

The video clip downloading unit 730 is configured to request downloading of a video clip corresponding to the code rate range of the selected video clip to be downloaded.

Fig. 8 is a block diagram illustrating a structure of an electronic device 800 for determining a video bitrate according to an example embodiment of the disclosure. The electronic device 800 may be, for example: smart phones, tablet computers, MP4 (Moving Picture Experts Group Audio Layer IV, motion picture expert compression standard audio layer 4) players, notebook computers or desktop computers. Electronic device 800 may also be referred to by other names of user devices, portable terminals, laptop terminals, desktop terminals, and the like.

Generally, the electronic device 800 includes: a processor 801 and a memory 802.

Processor 801 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like. The processor 801 may be implemented in at least one hardware form of DSP (Digital Signal Processing ), FPGA (Field Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array ). The processor 801 may also include a main processor, which is a processor for processing data in an awake state, also referred to as a CPU (Central Processing Unit ), and a coprocessor; a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor 801 may integrate a GPU (Graphics Processing Unit, image processor) for rendering and rendering of content required to be displayed by the display screen. In an exemplary embodiment of the present disclosure, the processor 801 may also include an AI (Artificial Intelligence ) processor for processing computing operations related to machine learning.

Memory 802 may include one or more computer-readable storage media, which may be non-transitory. Memory 802 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 802 is used to store at least one instruction for execution by processor 801 to implement a training method and/or a code rate determination method of a code rate determination model of an exemplary embodiment of the present disclosure.

In some embodiments, the electronic device 800 may further optionally include: a peripheral interface 803, and at least one peripheral. The processor 801, the memory 802, and the peripheral interface 803 may be connected by a bus or signal line. Individual peripheral devices may be connected to the peripheral device interface 803 by buses, signal lines, or a circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 804, a touch display 805, a camera 806, audio circuitry 807, a positioning component 808, and a power supply 809.

Peripheral interface 803 may be used to connect at least one Input/Output (I/O) related peripheral to processor 801 and memory 802. In some embodiments, processor 801, memory 802, and peripheral interface 803 are integrated on the same chip or circuit board; in some other embodiments, either or both of the processor 801, the memory 802, and the peripheral interface 803 may be implemented on separate chips or circuit boards, which is not limited in this embodiment.

The Radio Frequency circuit 804 is configured to receive and transmit RF (Radio Frequency) signals, also known as electromagnetic signals. The radio frequency circuit 804 communicates with a communication network and other communication devices via electromagnetic signals. The radio frequency circuit 804 converts an electrical signal into an electromagnetic signal for transmission, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 804 includes: antenna systems, RF transceivers, one or more amplifiers, tuners, oscillators, digital signal processors, codec chipsets, subscriber identity module cards, and so forth. The radio frequency circuitry 804 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocol includes, but is not limited to: metropolitan area networks, various generations of mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks, and/or WiFi (Wireless Fidelity ) networks. In some embodiments, the radio frequency circuitry 804 may also include NFC (Near Field Communication, short range wireless communication) related circuitry, which is not limited by the present disclosure.

The display 805 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display 805 is a touch display, the display 805 also has the ability to collect touch signals at or above the surface of the display 805. The touch signal may be input as a control signal to the processor 801 for processing. At this time, the display 805 may also be used to provide virtual buttons and/or virtual keyboards, also referred to as soft buttons and/or soft keyboards. In some embodiments, the display 805 may be one and disposed on a front panel of the electronic device 800; in other embodiments, the display 805 may be at least two, respectively disposed on different surfaces of the terminal 800 or in a folded design; in still other embodiments, the display 805 may be a flexible display disposed on a curved surface or a folded surface of the terminal 800. Even more, the display 805 may be arranged in an irregular pattern other than rectangular, i.e., a shaped screen. The display 805 may be made of LCD (Liquid Crystal Display ), OLED (Organic Light-Emitting Diode) or other materials.

The camera assembly 806 is used to capture images or video. Optionally, the camera assembly 806 includes a front camera and a rear camera. Typically, the front camera is disposed on the front panel of the terminal and the rear camera is disposed on the rear surface of the terminal. In some embodiments, the at least two rear cameras are any one of a main camera, a depth camera, a wide-angle camera and a tele camera, so as to realize that the main camera and the depth camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize a panoramic shooting and Virtual Reality (VR) shooting function or other fusion shooting functions. In some embodiments, the camera assembly 806 may also include a flash. The flash lamp can be a single-color temperature flash lamp or a double-color temperature flash lamp. The dual-color temperature flash lamp refers to a combination of a warm light flash lamp and a cold light flash lamp, and can be used for light compensation under different color temperatures.

Audio circuitry 807 may include a microphone and a speaker. The microphone is used for collecting sound waves of users and the environment, converting the sound waves into electric signals, inputting the electric signals to the processor 801 for processing, or inputting the electric signals to the radio frequency circuit 804 for voice communication. For stereo acquisition or noise reduction purposes, a plurality of microphones may be respectively disposed at different portions of the terminal 800. The microphone may also be an array microphone or an omni-directional pickup microphone. The speaker is used to convert electrical signals from the processor 801 or the radio frequency circuit 804 into sound waves. The speaker may be a conventional thin film speaker or a piezoelectric ceramic speaker. When the speaker is a piezoelectric ceramic speaker, not only the electric signal can be converted into a sound wave audible to humans, but also the electric signal can be converted into a sound wave inaudible to humans for ranging and other purposes. In some embodiments, audio circuit 807 may also include a headphone jack.

The location component 808 is utilized to locate the current geographic location of the electronic device 800 for navigation or LBS (Location Based Service, location-based services). The positioning component 808 may be a positioning component based on the United states GPS (Global Positioning System ), the Beidou system of China, the Granati system of Russia, or the Galileo system of the European Union.

The power supply 809 is used to power the various components in the electronic device 800. The power supply 809 may be an alternating current, direct current, disposable battery, or rechargeable battery. When the power supply 809 includes a rechargeable battery, the rechargeable battery may support wired or wireless charging. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, the electronic device 800 also includes one or more sensors 810. The one or more sensors 810 include, but are not limited to: acceleration sensor 811, gyroscope sensor 812, pressure sensor 813, fingerprint sensor 814, optical sensor 815, and proximity sensor 816.

The acceleration sensor 311 can detect the magnitudes of accelerations on three coordinate axes of the coordinate system established with the terminal 800. For example, the acceleration sensor 811 may be used to detect components of gravitational acceleration in three coordinate axes. The processor 801 may control the touch display screen 805 to display a user interface in a landscape view or a portrait view according to the gravitational acceleration signal acquired by the acceleration sensor 811. Acceleration sensor 811 may also be used for the acquisition of motion data of a game or user.

The gyro sensor 812 may detect a body direction and a rotation angle of the terminal 800, and the gyro sensor 812 may collect a 3D motion of the user to the terminal 800 in cooperation with the acceleration sensor 811. The processor 801 may implement the following functions based on the data collected by the gyro sensor 812: motion sensing (e.g., changing UI according to a tilting operation by a user), image stabilization at shooting, game control, and inertial navigation.

The pressure sensor 813 may be disposed at a side frame of the terminal 800 and/or at a lower layer of the touch display 805. When the pressure sensor 813 is disposed on a side frame of the terminal 800, a grip signal of the terminal 800 by a user may be detected, and the processor 801 performs left-right hand recognition or shortcut operation according to the grip signal collected by the pressure sensor 813. When the pressure sensor 813 is disposed at the lower layer of the touch display screen 805, the processor 801 performs control of the operability control on the UI according to the pressure operation of the user on the touch display screen 805. The operability controls include at least one of a button control, a scroll bar control, an icon control, and a menu control.

The fingerprint sensor 814 is used to collect a fingerprint of a user, and the processor 801 identifies the identity of the user based on the fingerprint collected by the fingerprint sensor 814, or the fingerprint sensor 814 identifies the identity of the user based on the collected fingerprint. Upon recognizing that the user's identity is a trusted identity, the processor 801 authorizes the user to perform relevant sensitive operations including unlocking the screen, viewing encrypted information, downloading software, paying for and changing settings, etc. The fingerprint sensor 814 may be provided on the front, back, or side of the electronic device 800. When a physical key or vendor Logo is provided on the electronic device 800, the fingerprint sensor 814 may be integrated with the physical key or vendor Logo.

The optical sensor 815 is used to collect the ambient light intensity. In one embodiment, the processor 801 may control the display brightness of the touch display screen 805 based on the intensity of ambient light collected by the optical sensor 815. Specifically, when the intensity of the ambient light is high, the display brightness of the touch display screen 805 is turned up; when the ambient light intensity is low, the display brightness of the touch display screen 805 is turned down. In another embodiment, the processor 801 may also dynamically adjust the shooting parameters of the camera module 806 based on the ambient light intensity collected by the optical sensor 815.

A proximity sensor 816, also referred to as a distance sensor, is typically provided on the front panel of the electronic device 800. The proximity sensor 816 is used to collect the distance between the user and the front of the electronic device 800. In one embodiment, when the proximity sensor 816 detects that the distance between the user and the front of the terminal 800 gradually decreases, the processor 801 controls the touch display 805 to switch from the bright screen state to the off screen state; when the proximity sensor 816 detects that the distance between the user and the front surface of the electronic device 800 gradually increases, the processor 801 controls the touch display 805 to switch from the off-screen state to the on-screen state.

Those skilled in the art will appreciate that the structure shown in fig. 8 is not limiting and that more or fewer components than shown may be included or certain components may be combined or a different arrangement of components may be employed.

According to an embodiment of the present disclosure, there may also be provided a computer-readable storage medium storing instructions, wherein the instructions, when executed by at least one processor, cause the at least one processor to perform a training method and/or a code rate determination method according to a code rate determination model of the present disclosure. Examples of the computer readable storage medium herein include: read-only memory (ROM), random-access programmable read-only memory (PROM), electrically erasable programmable read-only memory (EEPROM), random-access memory (RAM), dynamic random-access memory (DRAM), static random-access memory (SRAM), flash memory, nonvolatile memory, CD-ROM, CD-R, CD + R, CD-RW, CD+RW, DVD-ROM, DVD-R, DVD + R, DVD-RW, DVD+RW, DVD-RAM, BD-ROM, BD-R, BD-R LTH, BD-RE, blu-ray or optical disk storage, hard Disk Drives (HDD), solid State Disks (SSD), card memory (such as multimedia cards, secure Digital (SD) cards or ultra-fast digital (XD) cards), magnetic tape, floppy disks, magneto-optical data storage, hard disks, solid state disks, and any other means configured to store computer programs and any associated data, data files and data structures in a non-transitory manner and to provide the computer programs and any associated data, data files and data structures to a processor or computer to enable the processor or computer to execute the programs. The computer programs in the computer readable storage media described above can be run in an environment deployed in a computer device, such as a client, host, proxy device, server, etc., and further, in one example, the computer programs and any associated data, data files, and data structures are distributed across networked computer systems such that the computer programs and any associated data, data files, and data structures are stored, accessed, and executed in a distributed fashion by one or more processors or computers.

In accordance with embodiments of the present disclosure, a computer program product may also be provided, instructions in which are executable by a processor of a computer device to implement a training method and/or a code rate determination method of a code rate determination model in accordance with exemplary embodiments of the present disclosure.

According to the training method and/or the code rate determining method of the code rate determining model, various scene conditions such as a network, a player and the like are fully considered, multi-code rate switching is performed based on video quality perception, viewing experience of a user is improved, and the user enjoys clearer and smoother viewing effects.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A training method for a video code rate determination model, comprising:

acquiring information about video slices included in a training sample video, wherein the video slices of the training sample video are transcoded into a plurality of code rate steps and each code rate step has a corresponding quality score, and the information about the video slices comprises the quality scores of the video slices under each code rate step;

acquiring network state information and player information during a downloading period of downloading video clips through a pre-constructed playing environment, and inputting the acquired network state information, player information and information about the video clips into a video code rate determining model to obtain code rate gears selected for the video clips to be downloaded;

constructing a bonus function based on a quality score corresponding to a code rate gear selected for each video clip, a play-out stuck condition, and a number of code rate gear switches, and adjusting parameters of a video code rate determination model using the bonus function,

The pre-built playing environment comprises a client player, a client buffer and a content distribution server, wherein the client player downloads video fragments from the content distribution server and stores the downloaded video fragments into the client buffer, and interfaces for transmitting quality scores of the video fragments are arranged among the client player, the client buffer and the content distribution server.

2. The method of claim 1, wherein a quality score of a video slice is determined by a video coding objective index of the video slice and/or a subjective perception assessment of a user, wherein the video coding objective index comprises at least one of: the peak signal to noise ratio PSNR, structural similarity index SSIM, or video multi-method evaluates the fusion VMAF.

3. The method of claim 1, wherein the training sample video is obtained from a video dataset comprising video on demand and short videos.

4. The method of claim 1, wherein the video bitrate determination model has a deep reinforcement learning structure comprising a value-based deep neural network and a policy-based deep neural network, wherein the policy-based deep neural network is configured to output a bitrate gear selection for video slicing, the value-based deep neural network configured to score a bitrate gear selection action for the policy-based deep neural network.

5. The method of claim 4, wherein the policy-based deep neural network has fully-connected one-dimensional convolutional layers and employs a Softmax function as the activation function,

wherein the policy-based deep neural network outputs probabilities of selection corresponding to the plurality of code rate gears.

6. The method of claim 1, wherein,

the network state information includes bandwidth information;

the player information includes the current buffer size of the player;

the information about video clips also includes a code rate of the downloaded video clips, a data amount and quality score of the video clips to be downloaded, and a number of remaining video clips of the training sample video.

7. The method of claim 6, wherein the obtaining a code rate gear selected for the video clip to be downloaded comprises: network state information, player information, and information about video clips during a predetermined number of download periods prior to a current download period are input into a video rate determination model to determine a rate gear for the video clip to be downloaded.

8. A training apparatus for a video bitrate determination model, comprising:

A video slicing information obtaining unit configured to obtain information about video slices included in the training sample video, wherein the video slices of the training sample video are transcoded into a plurality of code rate steps and each code rate step has a corresponding quality score, and the information about the video slices includes the quality score of the video slices at each code rate step;

a code rate determining unit configured to acquire network state information and player information during a download period of downloading video clips through a pre-constructed play environment, and input the acquired network state information, player information and information about the video clips into a video code rate determining model to obtain code rate gears selected for the video clips to be downloaded;

a training unit configured to construct a bonus function based on a quality score corresponding to a code rate gear selected for each video clip, a play-out stuck condition, and a number of code rate gear switches, and adjust parameters of a video code rate determination model using the bonus function,

9. The apparatus of claim 8, wherein a quality score of a video slice is determined by a video coding objective index of the video slice and/or a subjective perception rating of a user, wherein a video coding objective index comprises at least one of: the peak signal to noise ratio PSNR, structural similarity index SSIM, or video multi-method evaluates the fusion VMAF.

10. The apparatus of claim 8, wherein the training sample video is obtained from a video data set comprising video on demand and short video.

11. The apparatus of claim 8, wherein the video bitrate determination model has a deep reinforcement learning structure comprising a value-based deep neural network and a policy-based deep neural network, wherein the policy-based deep neural network is configured to output a bitrate gear selection for video slicing, the value-based deep neural network configured to score a bitrate gear selection action for the policy-based deep neural network.

12. The apparatus of claim 11, wherein the policy-based deep neural network has fully-connected one-dimensional convolutional layers and employs a Softmax function as the activation function,

13. The apparatus of claim 8, wherein,

the network state information includes bandwidth information;

the player information includes the current buffer size of the player;

14. The apparatus of claim 13, wherein the code rate determination unit is configured to input network state information, player information, and information about video clips during a predetermined number of download periods prior to a current download period into the video code rate determination model to determine a code rate gear for the video clip to be downloaded.

15. A method for determining a video code rate, comprising:

acquiring information about video slices of a video, wherein the video slices are transcoded into a plurality of code rate steps and each code rate step has a corresponding quality score, and the information about the video slices comprises the quality score of the video slices under each code rate step;

Acquiring network state information and player information during a downloading period of downloading video clips, and inputting the acquired network state information, player information and information about the video clips into a video code rate determination model to obtain code rate gears selected for the video clips to be downloaded;

requesting to download the video clips corresponding to the code rate gear of the selected video clip to be downloaded,

wherein the video code rate determination model is trained based on the method of any one of claims 1-7.

16. A video code rate determining apparatus, comprising:

an information acquisition unit configured to acquire information about a video slice of a video, wherein the video slice is transcoded into a plurality of code rate steps and each code rate step has a corresponding quality score, the information about the video slice including the quality score of the video slice at each code rate step;

a code rate determining unit configured to acquire network state information and player information during a download period of a downloaded video clip, and input the acquired network state information, player information, and information about the video clip into a video code rate determining model to obtain a code rate gear selected for the video clip to be downloaded;

A video slice downloading unit configured to request downloading of a video slice corresponding to a code rate gear of the selected video slice to be downloaded,

17. An electronic device, comprising:

at least one processor;

at least one memory storing computer-executable instructions,

wherein the computer executable instructions, when executed by the at least one processor, cause the at least one processor to perform the method of any one of claims 1 to 7 or claim 15.

18. A storage medium, which when executed by a processor of an electronic device, enables the electronic device to perform the method of any one of claims 1 to 7 or claim 15.

19. A computer program product, characterized in that instructions in the computer program product are executed by at least one processor in an electronic device to perform the method of any of claims 1 to 7 or 15.