CN108989838B - DASH code rate self-adaption method based on video content complexity perception - Google Patents

DASH code rate self-adaption method based on video content complexity perception Download PDF

Info

Publication number
CN108989838B
CN108989838B CN201810894238.1A CN201810894238A CN108989838B CN 108989838 B CN108989838 B CN 108989838B CN 201810894238 A CN201810894238 A CN 201810894238A CN 108989838 B CN108989838 B CN 108989838B
Authority
CN
China
Prior art keywords
video
fragments
content complexity
code rate
fragment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810894238.1A
Other languages
Chinese (zh)
Other versions
CN108989838A (en
Inventor
刘子和
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201810894238.1A priority Critical patent/CN108989838B/en
Publication of CN108989838A publication Critical patent/CN108989838A/en
Application granted granted Critical
Publication of CN108989838B publication Critical patent/CN108989838B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
    • H04N21/23418Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • H04N21/234309Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements by transcoding between formats or standards, e.g. from MPEG-2 to MPEG-4 or from Quicktime to Realvideo
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/235Processing of additional data, e.g. scrambling of additional data or processing content descriptors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/238Interfacing the downstream path of the transmission network, e.g. adapting the transmission rate of a video stream to network bandwidth; Processing of multiplex streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/24Monitoring of processes or resources, e.g. monitoring of server load, available bandwidth, upstream requests
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/262Content or additional data distribution scheduling, e.g. sending additional data at off-peak times, updating software modules, calculating the carousel transmission frequency, delaying a video stream transmission, generating play-lists
    • H04N21/26208Content or additional data distribution scheduling, e.g. sending additional data at off-peak times, updating software modules, calculating the carousel transmission frequency, delaying a video stream transmission, generating play-lists the scheduling operation being performed under constraints
    • H04N21/26216Content or additional data distribution scheduling, e.g. sending additional data at off-peak times, updating software modules, calculating the carousel transmission frequency, delaying a video stream transmission, generating play-lists the scheduling operation being performed under constraints involving the channel capacity, e.g. network bandwidth
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/262Content or additional data distribution scheduling, e.g. sending additional data at off-peak times, updating software modules, calculating the carousel transmission frequency, delaying a video stream transmission, generating play-lists
    • H04N21/26208Content or additional data distribution scheduling, e.g. sending additional data at off-peak times, updating software modules, calculating the carousel transmission frequency, delaying a video stream transmission, generating play-lists the scheduling operation being performed under constraints
    • H04N21/26233Content or additional data distribution scheduling, e.g. sending additional data at off-peak times, updating software modules, calculating the carousel transmission frequency, delaying a video stream transmission, generating play-lists the scheduling operation being performed under constraints involving content or additional data duration or size, e.g. length of a movie, size of an executable file
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440218Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by transcoding between formats or standards, e.g. from MPEG-2 to MPEG-4
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/442Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed, the storage space available from the internal hard disk
    • H04N21/44209Monitoring of downstream path of the transmission network originating from a server, e.g. bandwidth variations of a wireless network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments

Abstract

The invention discloses a DASH code rate self-adaption method based on video content complexity perception, which belongs to the field of content distribution network video self-adaption transmission.A server side in the invention perceives the content complexity of a video to obtain the content complexity level of each fragment in the video; the server side marks the content complexity level in an MPD file of a corresponding video; the client receives and utilizes the MPD file marked with the content complexity level to realize DASH code rate self-adaptation; the invention avoids playing stagnation caused by buffer underflow as much as possible when the network condition is not good, and avoids waste of network resources caused by buffer overflow when the network condition is good; the complexity of the video fragments can be distinguished to select the fragment code rate, the high-code-rate version is selected for the fragments with high complexity as much as possible, the subjective watching quality of the fragments with high complexity is improved, the watching quality of all the fragments of the video is guaranteed to be higher, and therefore better experience quality is provided for users.

Description

DASH code rate self-adaption method based on video content complexity perception
Technical Field
The invention relates to the field of content distribution network video adaptive transmission, in particular to a DASH code rate adaptive method based on video content complexity perception.
Background
The principle of DASH (dynamic adaptive HTTP Streaming) technology is to cut video source files of different code rate versions into segments that are continuous on a time axis and have a fixed playing time duration at a server, store the address and control information of playing content in a Media Presentation Description (MPD), and use node elements such as segments, adaptive sets, descriptions, segment lists, and segments to describe a media stream structure. The client requests the video fragments closest to the real-time network bandwidth from the server by using the HTTP get method, so that the smooth playing of the video content is realized while the overflow or exhaustion of the player cache is prevented.
The traditional DASH rate adaptation method dynamically requests to download the rate level slice most suitable for the current network condition according to the detected congestion condition or bandwidth condition of the network, but ignores the requirement of the content characteristics (such as video content complexity) of the video on the video coding rate. Video content complexity is a property that reflects the intensity of video motion. Assuming that the viewing quality of the video is fixed, the video slices with high content complexity need higher code rate due to containing more information, and the video slices with low content complexity have lower requirement on the code rate due to containing less information. Therefore, to achieve a high viewing quality for all slices of the video, a higher bitrate version needs to be selected for high complexity slices and a relatively lower bitrate version needs to be selected for low complexity slices. However, the existing DASH rate adaptive schemes do not consider the influence of the complexity of the video content on the video viewing quality, do not distinguish and process the slices with different complexities, but uniformly select the slices according to the average rate, which actually causes the subjective viewing quality between the slices with different complexities to be continuously changed in the continuous playing process of the video, and the average viewing quality of the slice with high complexity is obviously lower than that of the slice with low complexity, thereby greatly reducing the viewing experience quality of the user.
Disclosure of Invention
The invention aims to: the DASH code rate self-adaptive method based on video content complexity perception is provided, and the technical problem that the user watching quality is greatly reduced due to the fact that only the average code rate is adopted to select the fragments in the existing DASH code rate self-adaptive method is solved.
The technical scheme adopted by the invention is as follows:
a DASH code rate self-adaptive method based on video content complexity perception comprises the following steps:
step 1: the server side senses the content complexity of the video to obtain the content complexity level of each fragment in the video;
step 2: the server side marks the content complexity level in an MPD file of a corresponding video;
and step 3: and the client receives and utilizes the MPD file marked with the content complexity level to realize DASH code rate self-adaptation.
Further, the step 1 specifically comprises:
step 11: the server side encodes the video to obtain N video versions with different code rates;
step 12: shot boundary detection is carried out on the N video versions;
step 13: the N video versions are subjected to content complexity perception, and content complexity levels of the same shot of different video versions are obtained;
step 14: and for the fragments of more than one shot, selecting the content complexity grade of the shot with the highest shot duration occupied by the shot duration as the content complexity grade of the fragment.
Further, in step 13, the step of sensing the content complexity is as follows:
step 131: acquiring the average coding bit number R of each frame of B frames and P frames of each shot under each video version(B,P)And video average number of coding bits per frame R, wherein both B frame and P frame represent inter-frame predictive coding frames;
step 132: using said average number of coded bits per frame R(B,P)And averaging the encoding bit number R of each frame of the video to obtain the content complexity of the shot in the video version, wherein the content complexity value of the shot is
Figure BDA0001757286890000021
Step 133: for the same video shot with different video versions, the mode of the content complexity value is taken as the content complexity value of the shot, namely the video content complexity of the same shot and different video versions is the same.
Step 134: and clustering the content complexity values of all the shots of the video into K levels by using a clustering method, and taking the levels as the content complexity levels of the corresponding shots.
Further, the step 3 specifically includes:
step 31: downloading and analyzing the MPD file marked with the content complexity level by the client;
step 32: the client downloads the fragments of the video with the minimum code rate in the N video versions, and the video is played after the number of the fragments in the buffer reaches a threshold value;
step 33: obtaining available buffer resources in the buffer area, judging whether fragmentation screening is needed or not, if so, skipping to the step 34, otherwise, loading fragments according to the size of the available buffer resources;
step 34: detecting the network condition in real time to obtain a network bandwidth predicted value;
step 35: screening the next fragment needing to be loaded into the buffer area by utilizing the available buffer resources and the network bandwidth prediction value;
step 36: performing secondary screening on the fragments screened in the step 35 by using the content complexity level of the fragments in the MPD file;
step 37: after the screening in steps 35 and 36, if the selectable segments exist, obtaining the subjective viewing quality of the selectable segments, and selecting the segment with the highest quality as the segment of the buffer area loaded next time; if the optional fragments do not exist, selecting the fragments with the lowest code rate as the fragments of the buffer area loaded next time;
step 38: and repeating the steps 33-37 until the video fragments are loaded in sequence.
Further, in step 33, the loading the segment according to the size of the available cache resource specifically includes: when the total playing time of the fragments loaded into the buffer area exceeds the upper limit of the buffer area, selecting the fragments with the highest code rate as the next fragments to be loaded; and when the total playing time of the fragments loaded into the buffer area is lower than the lower limit of the buffer area, selecting the fragments with the lowest code rate as the next fragments to be loaded.
Further, in step 34, the predicted value of the network bandwidth is a ratio of a bit number of the fragment to a transmission time, where the transmission time is a time from when the client sends the HTTP get request for the fragment to when the fragment is completely received.
Further, in step 35, the screening method includes: and selecting the transmission time under the current network condition to be less than the playing time of the available buffer resource.
Further, in step 36, the secondary screening method includes:
step 361: judging whether the content complexity level of the fragments screened in the step 35 is lower than a threshold, if so, skipping to a step 362, otherwise, skipping to a step 363;
step 362: selecting the fragments with the code rate lower than the network bandwidth predicted value as optional fragments;
step 363: and calculating the difference sum of the network bandwidth predicted value and the code rate of the previous omega fragments, and selecting the fragment with the code rate lower than the sum of the network bandwidth predicted value and the difference sum at the current moment as the optional fragment.
Further, in step 37, the subjective viewing quality is a difference between an MOS value of the segment and a loss value, where the MOS value refers to an artificial score, and the loss value refers to a loss value of switching between the segments.
In summary, due to the adoption of the technical scheme, the invention has the beneficial effects that:
the method of the invention can provide higher average code rate, and the code rate switching rate between the fragments is low, and the subjective quality loss caused by switching is lower. In addition, the invention can avoid playing stagnation caused by buffer underflow as much as possible when the network condition is not good, avoid waste of network resources caused by buffer overflow when the network condition is good, can also distinguish the complexity of the video fragments to select the fragment code rate, and select the version with high code rate for the fragments with high complexity as much as possible, improves the subjective watching quality of the fragments with high complexity, and ensures that the watching quality of all the fragments of the video is higher, thereby providing better experience quality for users.
Drawings
The invention will now be described, by way of example, with reference to the accompanying drawings, in which:
FIG. 1 is a system schematic of the present invention;
FIG. 2 is a flow chart of video content complexity awareness in the present invention;
fig. 3 is a flow chart of client side code rate adaptation in the present invention.
Detailed Description
All of the features disclosed in this specification, or all of the steps in any method or process so disclosed, may be combined in any combination, except combinations of features and/or steps that are mutually exclusive.
The present invention is described in detail below with reference to fig. 1-3.
A DASH code rate self-adaptive method based on video content complexity perception comprises the following steps:
step 1: the server side senses the content complexity of the video to obtain the content complexity level of each fragment in the video;
step 2: the server side marks the content complexity level in an MPD file of a corresponding video;
and step 3: and the client receives and utilizes the MPD file marked with the content complexity level to realize DASH code rate self-adaptation.
Further, the step 1 specifically comprises:
step 11: the server side encodes the video to obtain N video versions with different code rates;
step 12: shot boundary detection is carried out on the N video versions;
step 13: the N video versions are subjected to content complexity perception, and content complexity levels of the same shot of different video versions are obtained;
step 14: and for the fragments of more than one shot, selecting the content complexity grade of the shot with the highest shot duration occupied by the shot duration as the content complexity grade of the fragment.
Further, in step 13, the step of sensing the content complexity is as follows:
step 13 l: acquiring the average coding bit number R of each frame of B frames and P frames of each shot under each video version(B,P)And the average number of bits R encoded per frame of video, where both B and P frames represent interframesPredicting the encoded frame;
step 132: using said average number of coded bits per frame R(B,P)And averaging the encoding bit number R of each frame of the video to obtain the content complexity of the shot in the video version, wherein the content complexity value of the shot is
Figure BDA0001757286890000041
Step 133: for the same video shot with different video versions, the mode of the content complexity value is taken as the content complexity value of the shot, namely the video content complexity of the same shot and different video versions is the same.
Step 134: and clustering the content complexity values of all the shots of the video into K levels by using a clustering method, and taking the levels as the content complexity levels of the corresponding shots.
Further, the step 3 specifically includes:
step 31: downloading and analyzing the MPD file marked with the content complexity level by the client;
step 32: the client downloads the fragments of the video with the minimum code rate in the N video versions, and the video is played after the number of the fragments in the buffer reaches a threshold value;
step 33: obtaining available buffer resources in the buffer area, judging whether fragmentation screening is needed or not, if so, skipping to the step 34, otherwise, loading fragments according to the size of the available buffer resources;
step 34: detecting the network condition in real time to obtain a network bandwidth predicted value;
step 35: screening the next fragment needing to be loaded into the buffer area by utilizing the available buffer resources and the network bandwidth prediction value;
step 36: performing secondary screening on the fragments screened in the step 35 by using the content complexity level of the fragments in the MPD file;
step 37: after the screening in steps 35 and 36, if the selectable segments exist, obtaining the subjective viewing quality of the selectable segments, and selecting the segment with the highest quality as the segment of the buffer area loaded next time; if the optional fragments do not exist, selecting the fragments with the lowest code rate as the fragments of the buffer area loaded next time;
step 38: and repeating the steps 33-37 until the video fragments are loaded in sequence.
Further, in step 33, the loading the segment according to the size of the available cache resource specifically includes: when the total playing time of the fragments loaded into the buffer area exceeds the upper limit of the buffer area, selecting the fragments with the highest code rate as the next fragments to be loaded; and when the total playing time of the fragments loaded into the buffer area is lower than the lower limit of the buffer area, selecting the fragments with the lowest code rate as the next fragments to be loaded.
Further, in step 34, the predicted value of the network bandwidth is a ratio of a bit number of the fragment to a transmission time, where the transmission time is a time from when the client sends the HTTP get request for the fragment to when the fragment is completely received.
Further, in step 35, the screening method includes: and selecting the transmission time under the current network condition to be less than the playing time of the available buffer resource.
Further, in step 36, the secondary screening method includes:
step 361: judging whether the content complexity level of the fragments screened in the step 35 is lower than a threshold, if so, skipping to a step 362, otherwise, skipping to a step 363;
step 362: selecting the fragments with the code rate lower than the network bandwidth predicted value as optional fragments;
step 363: and calculating the difference sum of the network bandwidth predicted value and the code rate of the previous omega fragments, and selecting the fragment with the code rate lower than the sum of the network bandwidth predicted value and the difference sum at the current moment as the optional fragment.
Further, in step 37, the subjective viewing quality is a difference between an MOS value of the segment and a loss value, where the MOS value refers to an artificial score, and the loss value refers to a loss value of switching between the segments.
Detailed description of the preferred embodiment 1
The invention discloses a DASH code rate self-adaption method based on video content complexity perception, and particularly relates to a server side and a client side.
The invention specifically comprises the following steps:
a DASH code rate self-adaptive method based on video content complexity perception comprises the following steps:
step 1: the server side senses the content complexity of the video to obtain the content complexity level of each fragment in the video;
step 11: the server side encodes the video according to the MPEG standard to obtain N video versions with different code rates, wherein the video is the video which is required to be downloaded from the server side and is desired to be watched by a user;
step 12: shot boundary detection is carried out on the N video versions, and shot boundaries detected by videos of different versions are the same;
step 13: performing content complexity perception on the N video versions to obtain the content complexity level of each shot under each video version; counting frame information of the video by taking a lens as a unit, wherein the frame information comprises the number and the bit number of I frames, P frames and B frames of the video; the I frame is an intra-frame compression coding frame, the information content of contained data is large, and the texture characteristic of a video is reflected; the P frame and the B frame are inter-frame prediction coding frames, and the images are compressed by fully reducing the inter-frame related time redundancy information in the image sequence, so that the information content of the contained data is less, and the motion characteristic of the video is reflected; in a shot, the larger the information content ratio of P frame data and B frame data is, the higher the motion intensity of the shot is, namely the higher the content complexity of the video is;
step 131: acquiring the average coding bit number R of each frame of B frames and P frames of each shot under each video version(B,P) And the average number of coded bits R of each frame of video;
step 132: using said average number of coded bits per frame R(B,P)And averaging the encoding bit number R of each frame of the video to obtain the content complexity of the shot, wherein the content complexity value of the shot is
Figure BDA0001757286890000061
Step 133: and taking the mode of the content complexity values of different code rate versions of the same lens as the content complexity value of the lens, namely the video content complexity of the different code rate versions of the same lens is the same.
Step 134: and clustering the content complexity values of all the shots of the video into K levels by using a clustering method, and taking the levels as the content complexity levels of the corresponding shots.
Step 14: slicing the N video versions according to the same time length unit to obtain slices with different code rates, namely, the number of the slices and the time lengths of the slices of the different video versions are the same, and only the code rates are different; and for the fragments of more than one shot, selecting the content complexity level of the shot with the highest shot duration occupied by the shot duration as the content complexity level of the fragment.
Step 2: the server side marks the content complexity level in an MPD file of a corresponding video; and the server generates an MPD file for recording the video information, wherein the MPD file comprises information such as the URL, the code rate, the content complexity level and the like of the fragments.
And step 3: and the client receives and utilizes the MPD file marked with the content complexity level to realize DASH code rate self-adaptation.
Step 31: downloading and analyzing the MPD file marked with the content complexity grade by the client to obtain information such as URL (uniform resource locator), code rate, content complexity grade and the like of the video fragment;
step 32: the client downloads the fragments of the video with the minimum code rate in the N video versions, and the video is played after the number of the fragments in the buffer reaches a threshold value, so that the video can be quickly started by adopting the method, even if a user watches the video at the highest speed; and (3) for each fragment loaded into the buffer area subsequently, performing the operation in the steps 33 to 37 to select the fragment with the optimal code rate at the moment, so as to realize the code rate self-adaption of the fragment loaded into the buffer area and enable a user to view the video with the optimal quality.
Step 33: obtaining available buffer resources in the buffer area, namely the total playing time of the fragments loaded into the buffer area, judging whether fragment screening is needed or not, if so, skipping to the step 34, otherwise, loading the fragments according to the size of the available buffer resources, namely, when the total playing time of the fragments loaded into the buffer area exceeds the upper limit of the buffer area, selecting the fragments with the highest code rate as the next fragments to be loaded, and avoiding resource waste caused by overflow of the buffer area; when the total playing time of the fragments loaded into the buffer area is lower than the lower limit of the buffer area, selecting the fragments with the lowest code rate as the next fragments to be loaded, and avoiding playing pause caused by buffer area underflow; the next slice refers to the slice that needs to be loaded into the buffer at the current time, and the slice is arranged at the last slice of the loaded slices in the buffer.
Step 34: detecting the network condition in real time to obtain a network bandwidth predicted value: the predicted value of the network bandwidth is approximately estimated by the ratio of the bit number of the fragment to the transmission time from the time when the client sends the HTTP get request for the fragment to the time when the fragment is completely received, and the sequence of step 34 and step 33 can be exchanged.
Step 35: and screening the next fragment needing to be loaded into the buffer area by using the available buffer resource and the network bandwidth prediction value, namely selecting the fragment with the transmission time less than the playing time of the available buffer resource under the current network condition, ensuring that the next fragment can arrive before the buffer area is consumed, and avoiding playing interruption caused by buffer underflow when the network condition is not good.
Step 36: performing secondary screening on the fragments screened in the step 35 by using the content complexity level of the fragments in the MPD file;
step 361: judging whether the content complexity level of the fragments screened in the step 35 is lower than a threshold, if so, skipping to a step 362, otherwise, skipping to a step 363;
step 362: selecting the fragments with the code rate lower than the network bandwidth predicted value as optional fragments;
step 363: and calculating the difference sum of the network bandwidth predicted value and the code rate of the previous omega fragments, and selecting the fragment with the code rate lower than the sum of the network bandwidth predicted value and the difference sum at the current moment as the optional fragment.
Step 37: after the screening in steps 35 and 36, if there are selectable segments, obtaining subjective viewing quality of the selectable segments, and selecting the segment with the highest quality as the segment of the next loaded buffer area for requesting, downloading and playing; if the optional fragments do not exist, selecting the fragments with the lowest code rate as the fragments of the next loaded buffer area for requesting, downloading and playing;
the subjective viewing quality is the difference value between the MOS value and the loss value of the segment, wherein the MOS value refers to artificial scoring, namely, the scoring of 1-5 points is carried out on the viewed segment by a group of non-professional persons under a specific environment (comprising laboratory brightness, display brightness, background brightness, contrast, viewing distance and the like), and then the score of the person is averaged to obtain the subjective viewing quality; the loss value refers to the loss value of inter-slice handover.
Step 38: and repeating the steps 33-37 until the video fragments are loaded in sequence.
Specific example 2
This example provides a specific implementation scheme based on example 1.
A DASH code rate self-adaptive method based on video content complexity perception comprises the following steps:
step 1: the server side encodes the video with one end for 10 minutes according to corresponding code rates in a code rate set [90, 180, 360, 540, 720, 1080, 1440, 1880] Kbps to obtain 8 different video versions; shot boundary detection is carried out on 8 video versions, and shot boundaries detected by videos of different versions are the same; the method comprises the steps of sensing the content complexity of 8 video versions, clustering the content complexity values of all shots of the video into 3 grades by using a clustering method, wherein the content complexity grades of the same shot of different video versions are the same; and for more than one shot, selecting the content complexity level of the shot with the highest shot duration occupied by the shot duration as the content complexity level of the shot.
Step 2: and the server generates an MPD file for recording the video information, wherein the MPD file comprises information such as the URL, the code rate, the content complexity level and the like of the fragments.
And step 3: downloading and analyzing the MPD file marked with the content complexity grade by the client to obtain information such as URL (uniform resource locator), code rate, content complexity grade and the like of the video fragment; the client downloads the fragments of the video with the minimum code rate in the 8 video versions, and plays the video after the number of the fragments in the buffer reaches 5, and the video can be quickly started by adopting the method; obtaining available buffer resources in the buffer area, namely the total playing time of the fragments loaded into the buffer area, and judging whether fragment screening is needed or not, namely: if the size of the buffer area is 60 seconds, when the total playing time of the fragments loaded into the buffer area exceeds 60 seconds, selecting the fragment with the highest code rate as the next fragment to be loaded, and avoiding resource waste caused by overflow of the buffer area; when the total playing time of the fragments loaded into the buffer area is less than 30 seconds, selecting the fragments with the lowest code rate, namely the fragments with the code rate of 90Kbps, as the next fragments to be loaded, and avoiding playing pause caused by buffer area underflow;
when the total playing time of the fragments loaded into the buffer area is 30-60 seconds, detecting the network condition in real time to obtain a predicted value of the network bandwidth; namely, selecting the fragments of which the transmission time is less than the playing time of the available buffer resources under the current network condition; whether the content complexity level of the screened fragments is lower than 2 or not, if the content complexity level of the screened fragments is lower than 2, selecting the fragments with the code rate lower than the network bandwidth predicted value as optional fragments, if the content complexity level of the screened fragments is higher than 2, calculating the difference sum of the network bandwidth predicted value and the code rate of the first 30 fragments, selecting the fragments with the code rate lower than the sum of the network bandwidth predicted value and the difference sum at the current moment as the optional fragments, if the optional fragments exist after screening, obtaining the subjective viewing quality of the optional fragments, selecting the fragments with the highest quality as the fragments of a next loaded buffer area, and if the optional fragments do not exist, selecting the fragments with the lowest code rate as the fragments of the next loaded buffer area to request, download and play;
step 38: and repeating the steps 33-37 until the video fragments are loaded in sequence.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (5)

1. A DASH code rate self-adaptive method based on video content complexity perception is characterized in that: the method comprises the following steps:
step 1: the server side senses the content complexity of the video to obtain the content complexity level of each fragment in the video;
step 2: the server side marks the content complexity level in an MPD file of a corresponding video;
and step 3: the client receives and utilizes the MPD file marked with the content complexity level to realize DASH code rate self-adaptation;
the step 1 specifically comprises the following steps:
step 11: the server side encodes the video to obtain N video versions with different code rates;
step 12: shot boundary detection is carried out on the N video versions;
step 13: the N video versions are subjected to content complexity perception, and content complexity levels of the same shot of different video versions are obtained;
step 14: slicing the N video versions according to the same time length unit to obtain slices with different code rates, and selecting the content complexity grade of the lens with the highest time length of the slices as the content complexity grade of the slice for the slices with more than one lens;
in step 13, the step of sensing the content complexity is as follows:
step 131: acquiring the average coding bit number R of each frame of B frames and P frames of each shot under each video version(B,P)And video average number of coding bits per frame R, wherein both B frame and P frame represent inter-frame predictive coding frames;
step 132: using said average number of coded bits per frame R(B,P)And averaging the encoding bit number R of each frame of the video to obtain the content complexity of the shot in the video version, wherein the content complexity value of the shot is R(B,P)/R;
Step 133: for the same video shot with different video versions, the mode of the content complexity value is taken as the content complexity value of the shot, namely the video content complexity of the same shot and different video versions is the same;
step 134: clustering the content complexity values of all shots of the video into K levels by using a clustering method, and taking the levels as the content complexity levels of the corresponding shots;
the step 3 specifically comprises the following steps:
step 31: downloading and analyzing the MPD file marked with the content complexity level by the client;
step 32: the client downloads the fragments of the video with the minimum code rate in the N video versions, and the video is played after the number of the fragments in the buffer reaches a threshold value;
step 33: obtaining available buffer resources in the buffer area, judging whether fragmentation screening is needed or not, if so, skipping to the step 34, otherwise, loading fragments according to the size of the available buffer resources;
step 34: detecting the network condition in real time to obtain a network bandwidth predicted value;
step 35: screening the next fragment needing to be loaded into the buffer area by utilizing the available buffer resources and the network bandwidth prediction value;
step 36: performing secondary screening on the fragments screened in the step 35 by using the content complexity level of the fragments in the MPD file;
step 37: after the screening in steps 35 and 36, if the selectable segments exist, obtaining the subjective viewing quality of the selectable segments, and selecting the segment with the highest quality as the segment of the buffer area loaded next time; if the optional fragments do not exist, selecting the fragments with the lowest code rate as the fragments of the buffer area loaded next time;
step 38: repeating the steps 33-37 until the video fragments are loaded in sequence;
in step 33, the loading the segment according to the size of the available cache resource specifically includes: and when the total playing time of the fragments loaded into the buffer area exceeds the upper limit of the buffer area, selecting the fragment with the highest code rate as the next fragment to be loaded.
2. The DASH bitrate adaptive method based on video content complexity awareness according to claim 1, wherein: in step 34, the network bandwidth prediction value is a ratio of a bit number of the fragment to a transmission time, where the transmission time is a time from when the client sends an http get request for the fragment to when the fragment is completely received.
3. The DASH bitrate adaptive method based on video content complexity awareness according to claim 2, wherein: in the step 35, the screening method comprises the following steps: and selecting the fragments with the transmission time less than the playing time of the available buffer resources under the current network condition for loading.
4. The DASH bitrate adaptive method based on video content complexity awareness according to claim 1, wherein: in the step 36, the secondary screening method includes:
step 361: judging whether the content complexity level of the fragments screened in the step 35 is lower than a threshold value, if so, skipping to a step 362, otherwise, skipping to a step 363;
step 362: selecting the fragments with the code rate lower than the predicted value of the network bandwidth as optional fragments;
step 363: and calculating the difference sum of the network bandwidth predicted value and the code rate of the previous omega fragments, and selecting the fragment with the code rate lower than the sum of the network bandwidth predicted value and the difference sum at the current moment as the optional fragment.
5. The DASH bitrate adaptive method based on video content complexity awareness according to claim 1, wherein: in step 37, the subjective viewing quality is a difference between an MOS value of the segment and a loss value, where the MOS value refers to an artificial score, and the loss value refers to a loss value of switching between the segments.
CN201810894238.1A 2018-08-07 2018-08-07 DASH code rate self-adaption method based on video content complexity perception Active CN108989838B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810894238.1A CN108989838B (en) 2018-08-07 2018-08-07 DASH code rate self-adaption method based on video content complexity perception

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810894238.1A CN108989838B (en) 2018-08-07 2018-08-07 DASH code rate self-adaption method based on video content complexity perception

Publications (2)

Publication Number Publication Date
CN108989838A CN108989838A (en) 2018-12-11
CN108989838B true CN108989838B (en) 2021-07-30

Family

ID=64556193

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810894238.1A Active CN108989838B (en) 2018-08-07 2018-08-07 DASH code rate self-adaption method based on video content complexity perception

Country Status (1)

Country Link
CN (1) CN108989838B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10999614B2 (en) 2016-03-31 2021-05-04 Rovi Guides, Inc. Methods and systems for efficiently downloading media assets
CN109819312B (en) * 2019-03-19 2020-06-30 四川长虹电器股份有限公司 Player system based on dynamic buffer area and control method thereof

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102630006B (en) * 2012-03-20 2017-03-08 中广传播集团有限公司 A kind of apparatus and method of transmitting video-frequency flow
CN103096084B (en) * 2013-01-29 2015-09-23 海能达通信股份有限公司 Method, the Apparatus and system of code check Automatic adjusument in a kind of wireless transmission
US9755993B2 (en) * 2014-07-24 2017-09-05 Cisco Technology, Inc. Joint quality management across multiple streams
US9794601B2 (en) * 2014-10-21 2017-10-17 Cisco Technology, Inc. Dynamic programming across multiple streams
CN107566854B (en) * 2016-06-30 2020-08-07 华为技术有限公司 Method and device for acquiring and sending media content
CN106713956B (en) * 2016-11-16 2020-09-15 上海交通大学 Code rate control and version selection method and system for dynamic self-adaptive video streaming media
CN106993237B (en) * 2017-04-13 2019-05-10 中北大学 Dynamic self-adapting code rate selection method based on MPEG-DASH agreement

Also Published As

Publication number Publication date
CN108989838A (en) 2018-12-11

Similar Documents

Publication Publication Date Title
US10298985B2 (en) Systems and methods for performing quality based streaming
US20190327510A1 (en) Systems and Methods for Performing Quality Based Streaming
US20220030244A1 (en) Content adaptation for streaming
TWI511544B (en) Techniques for adaptive video streaming
KR101716071B1 (en) Adaptive streaming techniques
US8218657B2 (en) System and method for automatic adjustment of streaming video bit rate
US20130304934A1 (en) Methods and systems for controlling quality of a media session
US9313529B2 (en) Video streaming
JP2009525706A (en) Method and system for rate control in an encoder
EP2903219B1 (en) Method, player and terminal for selecting code stream segmentations based on stream media
US20160182594A1 (en) Adaptive streaming
CN106688239A (en) Video downloading method, apparatus, and system
US10827181B1 (en) Differential adaptive bitrate streaming based on scene complexity
CN106686409B (en) Streaming media code rate self-adaption method and device, server and terminal
EP3520421B1 (en) Viewer importance adaptive bit rate delivery
EP2589223A1 (en) Video streaming
CN108989838B (en) DASH code rate self-adaption method based on video content complexity perception
US11563990B2 (en) Method and apparatus for automatic HLS bitrate adaptation
CN111490983A (en) Adaptive streaming media code rate selection system based on video perception quality
KR20130135268A (en) Video coding
WO2014066975A1 (en) Methods and systems for controlling quality of a media session
Zhang et al. A QOE-driven approach to rate adaptation for dynamic adaptive streaming over http
CN114339269A (en) Video transmission method, multicast management platform, terminal and storage medium
Petrangeli et al. Design and evaluation of a dash-compliant second screen video player for live events in mobile scenarios
Wilk et al. The content-aware video adaptation service for mobile devices

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant