CN108989838B

CN108989838B - DASH code rate self-adaption method based on video content complexity perception

Info

Publication number: CN108989838B
Application number: CN201810894238.1A
Authority: CN
Inventors: 刘子和
Original assignee: Individual
Current assignee: Individual
Priority date: 2018-08-07
Filing date: 2018-08-07
Publication date: 2021-07-30
Anticipated expiration: 2038-08-07
Also published as: CN108989838A

Abstract

The invention discloses a DASH code rate self-adaption method based on video content complexity perception, which belongs to the field of content distribution network video self-adaption transmission.A server side in the invention perceives the content complexity of a video to obtain the content complexity level of each fragment in the video; the server side marks the content complexity level in an MPD file of a corresponding video; the client receives and utilizes the MPD file marked with the content complexity level to realize DASH code rate self-adaptation; the invention avoids playing stagnation caused by buffer underflow as much as possible when the network condition is not good, and avoids waste of network resources caused by buffer overflow when the network condition is good; the complexity of the video fragments can be distinguished to select the fragment code rate, the high-code-rate version is selected for the fragments with high complexity as much as possible, the subjective watching quality of the fragments with high complexity is improved, the watching quality of all the fragments of the video is guaranteed to be higher, and therefore better experience quality is provided for users.

Description

DASH code rate self-adaption method based on video content complexity perception

Technical Field

The invention relates to the field of content distribution network video adaptive transmission, in particular to a DASH code rate adaptive method based on video content complexity perception.

Background

The principle of DASH (dynamic adaptive HTTP Streaming) technology is to cut video source files of different code rate versions into segments that are continuous on a time axis and have a fixed playing time duration at a server, store the address and control information of playing content in a Media Presentation Description (MPD), and use node elements such as segments, adaptive sets, descriptions, segment lists, and segments to describe a media stream structure. The client requests the video fragments closest to the real-time network bandwidth from the server by using the HTTP get method, so that the smooth playing of the video content is realized while the overflow or exhaustion of the player cache is prevented.

The traditional DASH rate adaptation method dynamically requests to download the rate level slice most suitable for the current network condition according to the detected congestion condition or bandwidth condition of the network, but ignores the requirement of the content characteristics (such as video content complexity) of the video on the video coding rate. Video content complexity is a property that reflects the intensity of video motion. Assuming that the viewing quality of the video is fixed, the video slices with high content complexity need higher code rate due to containing more information, and the video slices with low content complexity have lower requirement on the code rate due to containing less information. Therefore, to achieve a high viewing quality for all slices of the video, a higher bitrate version needs to be selected for high complexity slices and a relatively lower bitrate version needs to be selected for low complexity slices. However, the existing DASH rate adaptive schemes do not consider the influence of the complexity of the video content on the video viewing quality, do not distinguish and process the slices with different complexities, but uniformly select the slices according to the average rate, which actually causes the subjective viewing quality between the slices with different complexities to be continuously changed in the continuous playing process of the video, and the average viewing quality of the slice with high complexity is obviously lower than that of the slice with low complexity, thereby greatly reducing the viewing experience quality of the user.

Disclosure of Invention

The invention aims to: the DASH code rate self-adaptive method based on video content complexity perception is provided, and the technical problem that the user watching quality is greatly reduced due to the fact that only the average code rate is adopted to select the fragments in the existing DASH code rate self-adaptive method is solved.

The technical scheme adopted by the invention is as follows:

a DASH code rate self-adaptive method based on video content complexity perception comprises the following steps:

step 1: the server side senses the content complexity of the video to obtain the content complexity level of each fragment in the video;

step 2: the server side marks the content complexity level in an MPD file of a corresponding video;

and step 3: and the client receives and utilizes the MPD file marked with the content complexity level to realize DASH code rate self-adaptation.

Further, the step 1 specifically comprises:

step 11: the server side encodes the video to obtain N video versions with different code rates;

step 12: shot boundary detection is carried out on the N video versions;

step 13: the N video versions are subjected to content complexity perception, and content complexity levels of the same shot of different video versions are obtained;

step 14: and for the fragments of more than one shot, selecting the content complexity grade of the shot with the highest shot duration occupied by the shot duration as the content complexity grade of the fragment.

Further, in step 13, the step of sensing the content complexity is as follows:

step 131: acquiring the average coding bit number R of each frame of B frames and P frames of each shot under each video version_(B，P)And video average number of coding bits per frame R, wherein both B frame and P frame represent inter-frame predictive coding frames;

step 132: using said average number of coded bits per frame R_(B，P)And averaging the encoding bit number R of each frame of the video to obtain the content complexity of the shot in the video version, wherein the content complexity value of the shot is

Step 133: for the same video shot with different video versions, the mode of the content complexity value is taken as the content complexity value of the shot, namely the video content complexity of the same shot and different video versions is the same.

Step 134: and clustering the content complexity values of all the shots of the video into K levels by using a clustering method, and taking the levels as the content complexity levels of the corresponding shots.

Further, the step 3 specifically includes:

step 31: downloading and analyzing the MPD file marked with the content complexity level by the client;

step 32: the client downloads the fragments of the video with the minimum code rate in the N video versions, and the video is played after the number of the fragments in the buffer reaches a threshold value;

step 33: obtaining available buffer resources in the buffer area, judging whether fragmentation screening is needed or not, if so, skipping to the step 34, otherwise, loading fragments according to the size of the available buffer resources;

step 34: detecting the network condition in real time to obtain a network bandwidth predicted value;

step 35: screening the next fragment needing to be loaded into the buffer area by utilizing the available buffer resources and the network bandwidth prediction value;

step 36: performing secondary screening on the fragments screened in the step 35 by using the content complexity level of the fragments in the MPD file;

step 37: after the screening in steps 35 and 36, if the selectable segments exist, obtaining the subjective viewing quality of the selectable segments, and selecting the segment with the highest quality as the segment of the buffer area loaded next time; if the optional fragments do not exist, selecting the fragments with the lowest code rate as the fragments of the buffer area loaded next time;

step 38: and repeating the steps 33-37 until the video fragments are loaded in sequence.

Further, in step 33, the loading the segment according to the size of the available cache resource specifically includes: when the total playing time of the fragments loaded into the buffer area exceeds the upper limit of the buffer area, selecting the fragments with the highest code rate as the next fragments to be loaded; and when the total playing time of the fragments loaded into the buffer area is lower than the lower limit of the buffer area, selecting the fragments with the lowest code rate as the next fragments to be loaded.

Further, in step 34, the predicted value of the network bandwidth is a ratio of a bit number of the fragment to a transmission time, where the transmission time is a time from when the client sends the HTTP get request for the fragment to when the fragment is completely received.

Further, in step 35, the screening method includes: and selecting the transmission time under the current network condition to be less than the playing time of the available buffer resource.

Further, in step 36, the secondary screening method includes:

step 361: judging whether the content complexity level of the fragments screened in the step 35 is lower than a threshold, if so, skipping to a step 362, otherwise, skipping to a step 363;

step 362: selecting the fragments with the code rate lower than the network bandwidth predicted value as optional fragments;

step 363: and calculating the difference sum of the network bandwidth predicted value and the code rate of the previous omega fragments, and selecting the fragment with the code rate lower than the sum of the network bandwidth predicted value and the difference sum at the current moment as the optional fragment.

Further, in step 37, the subjective viewing quality is a difference between an MOS value of the segment and a loss value, where the MOS value refers to an artificial score, and the loss value refers to a loss value of switching between the segments.

In summary, due to the adoption of the technical scheme, the invention has the beneficial effects that:

the method of the invention can provide higher average code rate, and the code rate switching rate between the fragments is low, and the subjective quality loss caused by switching is lower. In addition, the invention can avoid playing stagnation caused by buffer underflow as much as possible when the network condition is not good, avoid waste of network resources caused by buffer overflow when the network condition is good, can also distinguish the complexity of the video fragments to select the fragment code rate, and select the version with high code rate for the fragments with high complexity as much as possible, improves the subjective watching quality of the fragments with high complexity, and ensures that the watching quality of all the fragments of the video is higher, thereby providing better experience quality for users.

Drawings

The invention will now be described, by way of example, with reference to the accompanying drawings, in which:

FIG. 1 is a system schematic of the present invention;

FIG. 2 is a flow chart of video content complexity awareness in the present invention;

fig. 3 is a flow chart of client side code rate adaptation in the present invention.

Detailed Description

All of the features disclosed in this specification, or all of the steps in any method or process so disclosed, may be combined in any combination, except combinations of features and/or steps that are mutually exclusive.

The present invention is described in detail below with reference to fig. 1-3.

Further, the step 1 specifically comprises:

step 12: shot boundary detection is carried out on the N video versions;

Further, in step 13, the step of sensing the content complexity is as follows:

step 13 l: acquiring the average coding bit number R of each frame of B frames and P frames of each shot under each video version_(B，P)And the average number of bits R encoded per frame of video, where both B and P frames represent interframesPredicting the encoded frame;

Further, the step 3 specifically includes:

Further, in step 36, the secondary screening method includes:

Detailed description of the preferred embodiment 1

The invention discloses a DASH code rate self-adaption method based on video content complexity perception, and particularly relates to a server side and a client side.

The invention specifically comprises the following steps:

step 11: the server side encodes the video according to the MPEG standard to obtain N video versions with different code rates, wherein the video is the video which is required to be downloaded from the server side and is desired to be watched by a user;

step 12: shot boundary detection is carried out on the N video versions, and shot boundaries detected by videos of different versions are the same;

step 13: performing content complexity perception on the N video versions to obtain the content complexity level of each shot under each video version; counting frame information of the video by taking a lens as a unit, wherein the frame information comprises the number and the bit number of I frames, P frames and B frames of the video; the I frame is an intra-frame compression coding frame, the information content of contained data is large, and the texture characteristic of a video is reflected; the P frame and the B frame are inter-frame prediction coding frames, and the images are compressed by fully reducing the inter-frame related time redundancy information in the image sequence, so that the information content of the contained data is less, and the motion characteristic of the video is reflected; in a shot, the larger the information content ratio of P frame data and B frame data is, the higher the motion intensity of the shot is, namely the higher the content complexity of the video is;

step 131: acquiring the average coding bit number R of each frame of B frames and P frames of each shot under each video version_(B，P) And the average number of coded bits R of each frame of video;

step 132: using said average number of coded bits per frame R_(B，P)And averaging the encoding bit number R of each frame of the video to obtain the content complexity of the shot, wherein the content complexity value of the shot is

Step 133: and taking the mode of the content complexity values of different code rate versions of the same lens as the content complexity value of the lens, namely the video content complexity of the different code rate versions of the same lens is the same.

Step 14: slicing the N video versions according to the same time length unit to obtain slices with different code rates, namely, the number of the slices and the time lengths of the slices of the different video versions are the same, and only the code rates are different; and for the fragments of more than one shot, selecting the content complexity level of the shot with the highest shot duration occupied by the shot duration as the content complexity level of the fragment.

Step 2: the server side marks the content complexity level in an MPD file of a corresponding video; and the server generates an MPD file for recording the video information, wherein the MPD file comprises information such as the URL, the code rate, the content complexity level and the like of the fragments.

Step 31: downloading and analyzing the MPD file marked with the content complexity grade by the client to obtain information such as URL (uniform resource locator), code rate, content complexity grade and the like of the video fragment;

step 32: the client downloads the fragments of the video with the minimum code rate in the N video versions, and the video is played after the number of the fragments in the buffer reaches a threshold value, so that the video can be quickly started by adopting the method, even if a user watches the video at the highest speed; and (3) for each fragment loaded into the buffer area subsequently, performing the operation in the steps 33 to 37 to select the fragment with the optimal code rate at the moment, so as to realize the code rate self-adaption of the fragment loaded into the buffer area and enable a user to view the video with the optimal quality.

Step 33: obtaining available buffer resources in the buffer area, namely the total playing time of the fragments loaded into the buffer area, judging whether fragment screening is needed or not, if so, skipping to the step 34, otherwise, loading the fragments according to the size of the available buffer resources, namely, when the total playing time of the fragments loaded into the buffer area exceeds the upper limit of the buffer area, selecting the fragments with the highest code rate as the next fragments to be loaded, and avoiding resource waste caused by overflow of the buffer area; when the total playing time of the fragments loaded into the buffer area is lower than the lower limit of the buffer area, selecting the fragments with the lowest code rate as the next fragments to be loaded, and avoiding playing pause caused by buffer area underflow; the next slice refers to the slice that needs to be loaded into the buffer at the current time, and the slice is arranged at the last slice of the loaded slices in the buffer.

Step 34: detecting the network condition in real time to obtain a network bandwidth predicted value: the predicted value of the network bandwidth is approximately estimated by the ratio of the bit number of the fragment to the transmission time from the time when the client sends the HTTP get request for the fragment to the time when the fragment is completely received, and the sequence of step 34 and step 33 can be exchanged.

Step 35: and screening the next fragment needing to be loaded into the buffer area by using the available buffer resource and the network bandwidth prediction value, namely selecting the fragment with the transmission time less than the playing time of the available buffer resource under the current network condition, ensuring that the next fragment can arrive before the buffer area is consumed, and avoiding playing interruption caused by buffer underflow when the network condition is not good.

Step 37: after the screening in steps 35 and 36, if there are selectable segments, obtaining subjective viewing quality of the selectable segments, and selecting the segment with the highest quality as the segment of the next loaded buffer area for requesting, downloading and playing; if the optional fragments do not exist, selecting the fragments with the lowest code rate as the fragments of the next loaded buffer area for requesting, downloading and playing;

the subjective viewing quality is the difference value between the MOS value and the loss value of the segment, wherein the MOS value refers to artificial scoring, namely, the scoring of 1-5 points is carried out on the viewed segment by a group of non-professional persons under a specific environment (comprising laboratory brightness, display brightness, background brightness, contrast, viewing distance and the like), and then the score of the person is averaged to obtain the subjective viewing quality; the loss value refers to the loss value of inter-slice handover.

Specific example 2

This example provides a specific implementation scheme based on example 1.

step 1: the server side encodes the video with one end for 10 minutes according to corresponding code rates in a code rate set [90, 180, 360, 540, 720, 1080, 1440, 1880] Kbps to obtain 8 different video versions; shot boundary detection is carried out on 8 video versions, and shot boundaries detected by videos of different versions are the same; the method comprises the steps of sensing the content complexity of 8 video versions, clustering the content complexity values of all shots of the video into 3 grades by using a clustering method, wherein the content complexity grades of the same shot of different video versions are the same; and for more than one shot, selecting the content complexity level of the shot with the highest shot duration occupied by the shot duration as the content complexity level of the shot.

Step 2: and the server generates an MPD file for recording the video information, wherein the MPD file comprises information such as the URL, the code rate, the content complexity level and the like of the fragments.

And step 3: downloading and analyzing the MPD file marked with the content complexity grade by the client to obtain information such as URL (uniform resource locator), code rate, content complexity grade and the like of the video fragment; the client downloads the fragments of the video with the minimum code rate in the 8 video versions, and plays the video after the number of the fragments in the buffer reaches 5, and the video can be quickly started by adopting the method; obtaining available buffer resources in the buffer area, namely the total playing time of the fragments loaded into the buffer area, and judging whether fragment screening is needed or not, namely: if the size of the buffer area is 60 seconds, when the total playing time of the fragments loaded into the buffer area exceeds 60 seconds, selecting the fragment with the highest code rate as the next fragment to be loaded, and avoiding resource waste caused by overflow of the buffer area; when the total playing time of the fragments loaded into the buffer area is less than 30 seconds, selecting the fragments with the lowest code rate, namely the fragments with the code rate of 90Kbps, as the next fragments to be loaded, and avoiding playing pause caused by buffer area underflow;

when the total playing time of the fragments loaded into the buffer area is 30-60 seconds, detecting the network condition in real time to obtain a predicted value of the network bandwidth; namely, selecting the fragments of which the transmission time is less than the playing time of the available buffer resources under the current network condition; whether the content complexity level of the screened fragments is lower than 2 or not, if the content complexity level of the screened fragments is lower than 2, selecting the fragments with the code rate lower than the network bandwidth predicted value as optional fragments, if the content complexity level of the screened fragments is higher than 2, calculating the difference sum of the network bandwidth predicted value and the code rate of the first 30 fragments, selecting the fragments with the code rate lower than the sum of the network bandwidth predicted value and the difference sum at the current moment as the optional fragments, if the optional fragments exist after screening, obtaining the subjective viewing quality of the optional fragments, selecting the fragments with the highest quality as the fragments of a next loaded buffer area, and if the optional fragments do not exist, selecting the fragments with the lowest code rate as the fragments of the next loaded buffer area to request, download and play;

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. A DASH code rate self-adaptive method based on video content complexity perception is characterized in that: the method comprises the following steps:

and step 3: the client receives and utilizes the MPD file marked with the content complexity level to realize DASH code rate self-adaptation;

the step 1 specifically comprises the following steps:

step 12: shot boundary detection is carried out on the N video versions;

step 14: slicing the N video versions according to the same time length unit to obtain slices with different code rates, and selecting the content complexity grade of the lens with the highest time length of the slices as the content complexity grade of the slice for the slices with more than one lens;

in step 13, the step of sensing the content complexity is as follows:

step 131: acquiring the average coding bit number R of each frame of B frames and P frames of each shot under each video version_(B,P)And video average number of coding bits per frame R, wherein both B frame and P frame represent inter-frame predictive coding frames;

step 132: using said average number of coded bits per frame R_(B,P)And averaging the encoding bit number R of each frame of the video to obtain the content complexity of the shot in the video version, wherein the content complexity value of the shot is R_(B,P)/R;

Step 133: for the same video shot with different video versions, the mode of the content complexity value is taken as the content complexity value of the shot, namely the video content complexity of the same shot and different video versions is the same;

step 134: clustering the content complexity values of all shots of the video into K levels by using a clustering method, and taking the levels as the content complexity levels of the corresponding shots;

the step 3 specifically comprises the following steps:

step 38: repeating the steps 33-37 until the video fragments are loaded in sequence;

in step 33, the loading the segment according to the size of the available cache resource specifically includes: and when the total playing time of the fragments loaded into the buffer area exceeds the upper limit of the buffer area, selecting the fragment with the highest code rate as the next fragment to be loaded.

2. The DASH bitrate adaptive method based on video content complexity awareness according to claim 1, wherein: in step 34, the network bandwidth prediction value is a ratio of a bit number of the fragment to a transmission time, where the transmission time is a time from when the client sends an http get request for the fragment to when the fragment is completely received.

3. The DASH bitrate adaptive method based on video content complexity awareness according to claim 2, wherein: in the step 35, the screening method comprises the following steps: and selecting the fragments with the transmission time less than the playing time of the available buffer resources under the current network condition for loading.

4. The DASH bitrate adaptive method based on video content complexity awareness according to claim 1, wherein: in the step 36, the secondary screening method includes:

step 361: judging whether the content complexity level of the fragments screened in the step 35 is lower than a threshold value, if so, skipping to a step 362, otherwise, skipping to a step 363;

step 362: selecting the fragments with the code rate lower than the predicted value of the network bandwidth as optional fragments;

5. The DASH bitrate adaptive method based on video content complexity awareness according to claim 1, wherein: in step 37, the subjective viewing quality is a difference between an MOS value of the segment and a loss value, where the MOS value refers to an artificial score, and the loss value refers to a loss value of switching between the segments.