CN114640851A - Self-adaptive omnidirectional video streaming method based on quality perception - Google Patents
Self-adaptive omnidirectional video streaming method based on quality perception Download PDFInfo
- Publication number
- CN114640851A CN114640851A CN202210272188.XA CN202210272188A CN114640851A CN 114640851 A CN114640851 A CN 114640851A CN 202210272188 A CN202210272188 A CN 202210272188A CN 114640851 A CN114640851 A CN 114640851A
- Authority
- CN
- China
- Prior art keywords
- video
- bit rate
- quality
- viewpoint
- rate level
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/146—Data rate or code amount at the encoder output
- H04N19/149—Data rate or code amount at the encoder output by estimating the code amount by means of a model, e.g. mathematical model or statistical model
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/80—Responding to QoS
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/154—Measured or subjectively estimated visual quality after decoding, e.g. measurement of distortion
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/167—Position within a video image, e.g. region of interest [ROI]
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- Algebra (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Computer Networks & Wireless Communication (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
The invention discloses a quality-perception-based self-adaptive omnidirectional video streaming method, which considers two quality determining factors, namely viewpoint movement and brightness change, specific to omnidirectional video and adopts user investigation and research to quantify the influence of the characteristics on subjective perception quality. By introducing the two characteristics to improve the SSIM measurement, a user perception quality model is established. On the basis that the MPC algorithm makes a decision for each video block, the quality allocation is performed for all regions within the video block to maximize the SSIM quality of the overall video block. The invention can realize more reasonable quality distribution for the video area, and further improve the utilization rate of the streaming media resources and the quality of user experience.
Description
Technical Field
The invention relates to the technical field of streaming media transmission, in particular to a quality-perception-based self-adaptive omnidirectional video streaming method.
Background
In recent years, omni-directional video has become one of the emerging internet traffic. Meanwhile, the transmission of an omni-directional video stream is more challenging than a conventional video stream. To create a panoramic experience, omnidirectional video must stream panoramic content in a high resolution, hitless manner, which results in a significant consumption of bandwidth and resources. In the adaptive transmission process, the omnidirectional video is first projected into a common two-dimensional planar video, which is then temporally sliced into video blocks. For each video block, the encoder transcodes it to multiple bit rate levels (representing different qualities). Finally, the video blocks of the respective bit rate levels are further spatially cut into video regions. Similar to conventional adaptive methods, the video player can dynamically switch the quality level at the boundary of two consecutive video regions to cope with fluctuations in network bandwidth.
Most of the current adaptive algorithms employ a viewpoint-driven approach, in which only video content in the viewpoint area (user-facing area) is streamed with high quality. However, such methods have the following limitations: firstly, the viewpoint area is usually much larger than the computer screen, and the content of the streaming viewpoint area still needs at least twice the bandwidth of the common video under the same quality; secondly, because of the need to pre-fetch the content of the view area, the player must predict the user behavior (i.e. view movement), and any prediction error may lead to a degradation of the user quality of experience (QoE); finally, to accommodate the movement of viewpoints, omni-directional video must be spatially sliced and transcoded into multiple quality levels, which can greatly increase the size of the video. Since the user's perception of the quality of the omnidirectional video is different from that of the ordinary video and is uniquely influenced by the movement of the viewpoint, this feature helps to further reduce the bandwidth requirement and improve the quality of the user experience. Therefore, the existing adaptive omnidirectional video streaming algorithm still has certain defects in resource allocation and QoE maximization, and cannot meet the deployment and development requirements of the current high-quality omnidirectional streaming media service. Therefore, a more scientific and efficient adaptive omni-directional video streaming method is urgently needed.
Disclosure of Invention
The invention aims to solve the problems of low resource utilization rate, low video service quality and the like of the conventional self-adaptive omnidirectional video streaming method, and provides a self-adaptive omnidirectional video streaming method based on quality perception.
In order to solve the problems, the invention is realized by the following technical scheme:
the self-adaptive omnidirectional video streaming method based on quality perception comprises the following steps:
step 1, projecting an omnidirectional video file to a two-dimensional plane by using a common symmetric projection method; cutting the projected video file into a plurality of video blocks with equal length, and transcoding each video block into different bit rate levels;
step 2, cutting each video frame of the video block of each bit rate level into a plurality of video areas from space, and further cutting each video area into a plurality of video windows from space;
step 3, calculating the relative viewpoint moving speed and viewpoint brightness change of each pixel point of each video frame of each bit rate level video block based on the viewpoint track preset by the omnidirectional video file, wherein:
vb(p,q)=|vu-v0 b(p,q)|
lb(p,q)=|lu b(p,q)-lm b|
step 4, calculating the relative viewpoint moving speed and viewpoint brightness change of each video area under each bit rate level, wherein:
step 5, calculating a just visible difference threshold value based on a viewpoint of each pixel point of the video frame under each bit rate level, wherein:
step 6, calculating the weight of each video window under each bit rate level, wherein:
and 7, calculating the user perception quality of each video window under each bit rate level, wherein:
step 8, calculating the user perception quality of each video area under each bit rate level, wherein:
step 9, establishing a user perception quality query table of the relative viewpoint moving speed and viewpoint brightness change of each video area under each bit rate level based on the relative viewpoint moving speed and viewpoint brightness change obtained in the step 4 and the user perception quality obtained in the step 8;
step 10, modeling the adaptive bit rate decision of the video blocks as an optimization problem based on model predictive control by using an ABR algorithm, and determining the bit rate of each video block by solving the optimization problem, namely correspondingly determining the bit rate of each video frame of the video blocks;
step 11, predicting the viewpoint position of a future time point by using the existing linear regression method based on the historical viewpoint position of the current user, using the lowest viewpoint moving speed of the current time point in the past m seconds as the predicted value of the viewpoint moving speed of the future time point, and finally calculating the relative viewpoint moving speed and viewpoint brightness change of all video areas under each bit rate level by adopting the same formula in the steps 3 and 4;
step 12, calculating the relative viewpoint moving speed and viewpoint brightness change of all video regions at each bit rate level obtained in step 11 and the relative viewpoint moving speed and viewpoint brightness change of the corresponding video region at the corresponding bit rate level obtained in step 4, determining the user perception quality corresponding to the video region with the minimum euclidean distance by using the user perception quality query table constructed in step 9, and finally obtaining the user perception quality of all video regions at each bit rate level;
step 13, based on the bit rate of the video block determined in step 10 and the user perceived quality of all video regions at each bit rate level determined in step 12, further determining the quality of each video region in all video frames of the video block with the goal of maximizing the overall user perceived quality of the video block when the total quality of all video regions of all video frames of the video block is lower than or equal to the bit rate of the video block;
in the above formula, vb(p, q) represents the relative viewpoint movement speed, l, of the pixel point (p, q) at bit rate level bb(p, q) represents the viewpoint brightness variation of the pixel point (p, q) at the bit rate level b, vuIndicating the moving speed of the user's viewpoint, v0 b(p, q) represents the motion velocity of the object at pixel point (p, q) at bit rate level b, lu b(p, q) denotes the luminance at a pixel point (p, q) at bit rate level b, lm bThe average brightness of all pixel points in a video region where m-second foresight points are located under the bit rate level b is represented, m is a set value, and | represents an absolute value symbol;representing the relative viewpoint movement speed of the video area i at the bit rate level b,representing the viewpoint brightness change of a video area i under the bit rate level b, wherein N is the number of all pixel points of the video area i; JNDb(p, q) represents the view-based just visible disparity threshold for pixel point (p, q) at bit rate level b, CJNDb(p, q) represents the just visible difference threshold of the pixel point (p, q) at bit rate level b based on the video content characteristics, a represents a given non-zero constant;representing the weight of the j-th video window of video area i at bit rate level b, i representing the number of video areas, j representing the number of video windows, NijRepresenting the number of all pixel points of the jth video window of the video area i;representing the user perceived quality of the jth video window of video region i at bit rate level b;j-th video representing video area i at bit rate level bThe average value of the gray levels of all pixel points in the window,expressing the mean value of all pixel points of a jth video window of a video area i under a source video bit rate o;representing the variance of all pixel gray levels in the jth video window of video region i at bit rate level b,expressing the variance of all pixel points of a jth video window of a video area i under a source video bit rate o;representing the covariance of all pixel points of gray levels of a jth video window of a video area i under a bit rate level b and a source video bit rate o; a is1、a2Are all given non-zero constants;representing the user perceived quality of video region i at bit rate level b; n is a radical ofiRepresenting the number of all video windows of video region i.
The optimization problem constructed in the step 10 is as follows:
max(α*Q(bt)-β*Rebuf(bt)-γ*Smooth(Q(bt)))
in the formula, btFor the bit rate of the current tth video block to be decided, Q (b)t) Representing the bit rate btVideo quality of the next tth video block, Rebuf (b)t) Representing the bit rate btThe Cartin time, Smooth (Q (b)) of the next tth video Blockt) Represents the bit rate b)tThe quality smoothness of the next tth video block, α represents the video quality weight, β represents the katon time weight, and γ represents the quality smoothness weight.
The Euclidean distance D of the step 12iComprises the following steps:
in the formula (I), the compound is shown in the specification,representing the relative viewpoint movement speed of video area i at bit rate level b in the user perceived quality look-up table at step 9,representing the viewpoint brightness change of the video area i under the bit rate level b in the user perception quality query table in the step 9, wherein N is the number of pixel points of the video area i;representing the relative viewpoint moving speed of the video area i predicted at step 11,representing the viewpoint brightness variation of the video area i predicted in step 11; diRepresenting the euclidean distance.
The objective function of step 13 is:
wherein i is the number of the video area to be decided currently, M is the number of the video areas in the video block, and SiRepresenting the size of the current video area to be decided i, btiFor the bit rate of the current video area to be decided i,representing bit rate level btiUser perception quality of the current video area to be decided, i, btIs the bit rate of the video block.
Compared with the prior art, the method and the device have the advantages that on the basis of the traditional viewpoint driving method, the user perception quality model is established by utilizing two quality determining factors specific to the omnidirectional video, so that the video quality perceived by the user subjectively can be calculated more accurately. And establishing an adaptive quality decision model by adjusting the quality of the video block and all video areas in the video block. Based on the two models, the invention can realize more reasonable video quality distribution and further improve the utilization rate of streaming media resources and the quality of user experience.
Drawings
Fig. 1 is a diagram of an application scenario of the present invention.
Fig. 2 is a general flow chart of a quality-aware based adaptive omni-directional video streaming method;
FIG. 3 is a flow diagram of a user perceived quality model;
fig. 4 is a flow chart of an adaptive quality decision model.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to specific examples.
Fig. 1 is a diagram of an application scenario of the present invention, which mainly includes a video server, a Content Delivery Network (CDN) and a video player. At the video server side, the projected flat video file is first cut into multiple video blocks of equal length, which are then transcoded into different bit rate levels (representing different sharpness and quality) using an encoder. The content distribution network acquires video blocks of each bit rate level of the flat video file from the video server, spatially cuts the video blocks of each bit rate level into a plurality of video areas for storage, and calculates the user perception quality of all the areas of the video blocks of each bit rate level based on the historical viewpoint data and the user perception quality model. The video player outputs the bit rate level of each video block and the quality of each video region in the block based on a quality decider, and requests a content distribution network to download the corresponding video region through the internet.
An adaptive omni-directional video streaming method based on quality perception, as shown in fig. 2, includes the following steps:
step 1, projecting an omnidirectional video file to a two-dimensional plane by using a common symmetric projection method; and cutting the projected video file into a plurality of video blocks with equal length, and transcoding each video block into different bit rate levels.
And preprocessing the omnidirectional video file, and projecting the omnidirectional video file to a two-dimensional plane by using a symmetric projection method-ERP projection technology. The FFmpeg tool is used to segment the projected video file into multiple video blocks of equal length (e.g., 1 second), and each video block is transcoded into different bit rate levels corresponding to different sharpness and video quality. In the present embodiment, 750kbps, 1200kbps and 1850kbps correspond to low definition, standard definition and high definition, respectively.
Step 2, spatially cutting each video frame of video blocks of respective bit rate levels into a plurality of video regions (e.g., a 6 × 12 grid), and further spatially cutting each video region into a plurality of video windows (e.g., a 3 × 3 grid).
And 3, calculating the relative viewpoint moving speed and viewpoint brightness change of each pixel point of each video frame of each bit rate level video block based on the viewpoint track preset by the omnidirectional video file.
According to the characteristics of the omnidirectional video, the perception of the user on the quality of the omnidirectional video is influenced by the movement of the viewpoint of the user. Therefore, the influence of two quality determining factors, namely the relative viewpoint moving speed and the viewpoint brightness change, which are specific to the omnidirectional video on the user perception quality is quantized through research of the user, so that the quality of the omnidirectional video which is subjectively perceived by the user is more accurately modeled.
The relative viewpoint moving speed refers to the user viewpoint moving speed vu(derived from the speed of the glasses-worn device when the user watches the omnidirectional video) and the motion speed v of the object at the pixel point (p, q) of the video frame at bit rate level b0 b(p, q) is the relative viewpoint moving speed v of each pixel of the video frame at bit rate level bb(p, q) is:
vb(p,q)=|vu-v0 b(p,q)|
in the formula, | | represents an absolute value symbol.
Viewpoint luminance change refers to luminance l at a video frame pixel point (p, q) at bit rate level bu b(p, q) and average brightness l of all pixel points in viewpoint area at time point m seconds before current time pointm bThe difference is the viewpoint brightness change l of each pixel of the video frame at bit rate level bb(p, q) is:
lb(p,q)=|lu b(p,q)-lm b|
in the formula, | | represents an absolute value symbol, and m is a set value.
And 4, calculating the relative viewpoint moving speed and viewpoint brightness change of each video area under each bit rate level.
in the formula, N is the number of pixels in the video area i.
And 5, calculating Just visible Difference (JND) thresholds of each pixel point of the video frames at each bit rate level based on the viewpoints.
The conventional JND threshold is related only to video content characteristics, and is referred to as a JND threshold based on video content characteristics. According to the research result of the present invention, the JND threshold is not only related to the video content characteristics, but also related to the relative viewpoint moving speed and viewpoint brightnessThe change is related to what is called the view-based JND threshold. Wherein JND thresholds follow v at the same bit rate level bb(p, q) increases and increases with lbThe (p, q) is increased and then decreased, and the two factors have independent effects on the JND threshold. By fitting the rules of the user study data, the JND threshold based on the viewpoint can be calculated as follows:
in the formula, JNDb(p, q) represents the view-based JND threshold at pixel point (p, q) at bitrate level b, CJNDb(p, q) denotes the JND threshold at pixel point (p, q) at bit rate level b based on the video content characteristics, a being a given non-zero constant.
And 6, calculating the weight of each video window under each bit rate level.
View-based JND threshold JND at pixel point (p, q) at bitrate level bbThe smaller (p, q) is, the more easily a user perceives quality difference, so that the weight value corresponding to the pixel point (p, q) is higher, and conversely, the JND threshold JND based on the viewpoint at the pixel point (p, q) is higherbThe larger (p, q) is, the less the user can perceive the quality difference, and thus the weight value corresponding to the pixel point (p, q) is lower. Since the pixels in different video windows have different importance, the invention defines the weight based on new quality determining factors (relative viewpoint moving speed and viewpoint brightness change), wherein the weight of the jth video window of the video area i at the bit rate level bComprises the following steps:
where i denotes the number of the video area, j denotes the number of the video window, NijJ-th video window representing video area iThe number of pixels of the port.
And 7, calculating the user perception quality of each video window at each bit rate level.
Referring to fig. 3, if the calculation is performed based on the Structural Similarity Index Metric (SSIM), the user perceived quality of the jth video window of the video area i at the bit rate level b is calculatedComprises the following steps:
in the formula (I), the compound is shown in the specification,represents the average value of all pixel points in the jth video window of the video area i at the bit rate level b,expressing the mean value of all pixel points of a jth video window of a video area i under a source video bit rate o;represents the variance of all pixel gray levels of the jth video window of the video area i at the bit rate level b,expressing the variance of all pixel points of a jth video window of a video area i under a source video bit rate o;representing the covariance of all pixel points of gray levels of a jth video window of a video area i under a bit rate level b and a source video bit rate o; a is1、a2Are given non-zero constants.
And 8, calculating the user perception quality of each video area under each bit rate level.
Calculating based on the user perceived quality and weight of all video windows in the video area, the user perceived quality of the video area i under the bit rate level bComprises the following steps:
in the formula, NiRepresenting the number of video windows of video region i.
And 9, establishing a user perception quality query table of the relative viewpoint moving speed and viewpoint brightness change of each video area under each bit rate level based on the relative viewpoint moving speed and viewpoint brightness change obtained in the step 4 and the user perception quality obtained in the step 8.
The format of the user perceived quality look-up table is as follows:
an adaptive quality decision model is then made based on the constructed user perceived quality look-up table, see fig. 4.
Step 10, modeling the adaptive bit rate decision of the video block as an optimization problem based on Model Predictive Control (MPC) by using an ABR algorithm, and determining the bit rate of each video block by solving the optimization problem, that is, determining the bit rate of each video frame of the video block correspondingly.
The bit rate of a video block refers to the amount of data transmitted per second, and is an average concept, for example, the bit rate of a video block of 3-5s is the total data volume/(3-5 s), and this total data volume includes the data volume of all video frames in the video block, and the data volume of each video frame is not the same, so there is no way to say exactly how much the bit rate of a video frame is because it is not measured in this way but it can be determined that once the bit rate of a video block determines that it includes the data volume of each video frame, the bit rate is determined accordingly.
The present invention utilizes the existing ABR algorithm to solve the bit rate of each video block. The optimization goal of the ABR algorithm is to maximize user QoE, where video quality, quality smoothness and katon time have a significant impact on user QoE, for which the present invention uses a linear QoE model based on the above factors as the optimization goal of the algorithm. Specifically, the optimization problem P to be solved is defined as follows:
P:max(QoE=α*Q(bt)-β*Rebuf(bt)-γ*Smooth(Q(bt)))
in the equation, the problem P represents an optimization goal to maximize the quality of user experience (QoE), i.e., the adaptive bitrate algorithm. btFor the bit rate of the current tth video block to be decided, Q (b)t) Representing the bit rate btVideo quality of the next tth video block, Rebuf (b)t) Representing the bit rate btThe Cartin time, Smooth (Q (b)) of the next tth video Blockt) Represents the bit rate b)tThe quality smoothness of the next tth video block, α represents the video quality weight, β represents the katon time weight, and γ represents the quality smoothness weight.
The specific process for solving the optimization problem P is as follows: upon a request to download video block T, the bit rate of the current video block T is selected based on the goal of maximizing the overall QoE of the next T video blocks by enumerating all bit rate combinations for the future T video blocks. After the video block is downloaded, it will be moved forward T future time views. This process repeats until all video block transmissions are complete.
Step 11, predicting the viewpoint position at a future time point by using the existing linear regression method based on the historical viewpoint position of the current user, and using the minimum viewpoint moving speed of the past m seconds of the current time point as the predicted value of the viewpoint moving speed at the future time pointFinally, the same formula in steps 3 and 4 is adopted to calculate the relative viewpoint moving speed of all video areas under each bit rate levelAnd viewpoint luminance variation
Step 12, calculating the relative viewpoint moving speed and viewpoint brightness change of all video regions at each bit rate level obtained in step 11 and the relative viewpoint moving speed and viewpoint brightness change of the corresponding video region at the corresponding bit rate level obtained in step 4, and determining the user perceived quality corresponding to the video region with the minimum euclidean distance by using the user perceived quality lookup table constructed in step 9The user perceived quality (SSIM) of all video regions at various bit rate levels can be obtained.
Determining the SSIM value of each video region at each bit rate level according to the SSIM of the entry or the closest entry matching the two features of the relative viewpoint moving speed and viewpoint brightness change stored in the user perceived quality lookup table in step 9, so that the client can roughly estimate the overall user perceived quality of the video block without acquiring video content, that is, comparing the calculated relative viewpoint moving speed and viewpoint brightness change information of all video regions at each bit rate level with the corresponding video region information at the corresponding bit rate level in the lookup table to determine the user perceived quality (SSIM) of all video regions at each bit rate level.
Step 13, based on the bit rate b of the video block determined in step 10tAnd based on the stepsThe user perceived quality of all video regions at each bit rate level determined in step 12 is further determined with the goal of maximizing the overall user perceived quality (SSIM) of the video block in the case where the total quality of all video regions of all video frames of the video block is less than or equal to the video block bit rate.
After determining the bit rate of a video block, the bit rate of all video frames in the block can be determined, which determines the total quality of all video regions in the video frame. In the case where the total quality of all video regions of all video frames is less than or equal to the video block bit rate, the quality of each video region in the video block is output by defining an objective function of the overall user perceived quality and solving.
Based on the obtained SSIM values of the video areas in the video blocks with different bit rate levels, the quality level of each video area in the video block is determined by maximizing the overall SSIM value of the video block, and an objective function is defined as:
where i is the current video region number to be decided, btiAnd M is the number of video areas in the video block, wherein M is the bit rate of the current video area i to be decided. SiIs the size of the video area i,bit rate level b determined for step 12tiUser perceived quality of lower video region i, btIs the bit rate of the video block t determined in step 10. The specific quality decision process is as follows: combining enumeration and greedy, for any pair of video regions (e.g., video regions 1 and 2), if a quality assignment is found (b)t1、bt2) Is distributed over another mass (b't1、b′t2) More rational, i.e. higher totalAnd a smaller total size (S) of the video area1+S2) Then the latter allocation is excluded in the quality allocation iteration of the remaining video regions (e.g. 3, 4.., M), and the process repeats until the quality allocation of all video regions ends.
The invention considers two quality determining factors, namely viewpoint moving speed and brightness change, which are specific to the omnidirectional video, and quantifies the influence of the viewpoint moving speed and the brightness change on the subjective perception quality of the user by adopting user investigation and research. By introducing the two characteristics to improve the SSIM measurement, a user perception quality model is established. On the basis of the MPC algorithm's bit rate decision for each video block, quality allocation is made to all regions within the video block to maximize the overall SSIM quality of the video block. The invention can realize more reasonable quality distribution for the video area, and further improve the utilization rate of the streaming media resources and the quality of user experience.
It should be noted that, although the above-mentioned embodiments of the present invention are illustrative, the present invention is not limited thereto, and thus the present invention is not limited to the above-mentioned embodiments. Other embodiments, which can be made by those skilled in the art in light of the teachings of the present invention, are considered to be within the scope of the present invention without departing from its principles.
Claims (4)
1. The self-adaptive omnidirectional video streaming method based on quality perception is characterized by comprising the following steps of:
step 1, projecting an omnidirectional video file to a two-dimensional plane by using a common symmetric projection method; cutting the projected video file into a plurality of video blocks with equal length, and transcoding each video block into different bit rate levels;
step 2, cutting each video frame of the video block of each bit rate level into a plurality of video areas from space, and further cutting each video area into a plurality of video windows from space;
step 3, calculating the relative viewpoint moving speed and viewpoint brightness change of each pixel point of each video frame of each bit rate level video block based on the viewpoint track preset by the omnidirectional video file, wherein:
vb(p,q)=|vu-v0 b(p,q)|
lb(p,q)=|lu b(p,q)-lm b|
step 4, calculating the relative viewpoint moving speed and viewpoint brightness change of each video area under each bit rate level, wherein:
step 5, calculating a just visible difference threshold value based on a viewpoint of each pixel point of the video frame under each bit rate level, wherein:
step 6, calculating the weight of each video window under each bit rate level, wherein:
and 7, calculating the user perception quality of each video window under each bit rate level, wherein:
step 8, calculating the user perception quality of each video area under each bit rate level, wherein:
step 9, establishing a user perception quality query table of the relative viewpoint moving speed and viewpoint brightness change of each video area under each bit rate level based on the relative viewpoint moving speed and viewpoint brightness change obtained in the step 4 and the user perception quality obtained in the step 8;
step 10, modeling adaptive bit rate decision of the video blocks by using an ABR algorithm as an optimization problem based on model predictive control, and determining the bit rate of each video block by solving the optimization problem, namely correspondingly determining the bit rate of each video frame of the video blocks;
step 11, predicting the viewpoint position of a future time point by using the existing linear regression method based on the historical viewpoint position of the current user, using the lowest viewpoint moving speed of the current time point in the past m seconds as the predicted value of the viewpoint moving speed of the future time point, and finally calculating the relative viewpoint moving speed and viewpoint brightness change of all video areas under each bit rate level by adopting the same formula in the steps 3 and 4;
step 12, calculating the relative viewpoint moving speed and viewpoint brightness change of all video regions at each bit rate level obtained in step 11 and the relative viewpoint moving speed and viewpoint brightness change of the corresponding video region at the corresponding bit rate level obtained in step 4, determining the user perception quality corresponding to the video region with the minimum euclidean distance by using the user perception quality query table constructed in step 9, and finally obtaining the user perception quality of all video regions at each bit rate level;
step 13, based on the bit rate of the video block determined in step 10 and the user perceived quality of all video areas at each bit rate level determined in step 12, in the case that the total quality of all video areas of all video frames of the video block is lower than or equal to the bit rate of the video block, further determining the quality of each video area in all video frames of the video block with the aim of maximizing the total user perceived quality of the video block;
in the above formula, vb(p, q) represents the relative viewpoint movement speed, l, of the pixel point (p, q) at the bit rate level bb(p, q) represents the viewpoint brightness variation of the pixel point (p, q) at the bit rate level b, vuIndicating the moving speed of the user's viewpoint, v0 b(p, q) represents the motion velocity of the object at pixel point (p, q) at bit rate level b, lu b(p, q) denotes the luminance at a pixel point (p, q) at bit rate level b, lm bThe average brightness of all pixel points in a video region where m-second foresight points are located under the bit rate level b is represented, m is a set value, and | represents an absolute value symbol;representing the relative viewpoint movement speed of the video area i at the bit rate level b,representing the viewpoint brightness change of a video area i under the bit rate level b, wherein N is the number of all pixel points of the video area i; JNDb(p, q) represents the view-based just visible disparity threshold for pixel point (p, q) at bit rate level b, CJNDb(p, q) represents the just visible difference threshold of the pixel point (p, q) at bit rate level b based on the video content characteristics, a represents a given non-zero constant;representing the weight of the j-th video window of video area i at bit rate level b, i representing the number of video areas, j representing the number of video windows, NijRepresenting the number of all pixel points of the jth video window of the video area i;representing the user perceived quality of the jth video window of video region i at bit rate level b;represents the average value of all pixel points in the jth video window of the video area i at the bit rate level b,expressing the mean value of the gray levels of all pixel points of the jth video window of the video area i under the source video bit rate o;representing the variance of all pixel gray levels in the jth video window of video region i at bit rate level b,expressing the variance of all pixel gray levels of a jth video window of a video area i under a source video bit rate o;representing the covariance of all pixel points of gray levels of a jth video window of a video area i under a bit rate level b and a source video bit rate o; a is1、a2Are all given non-zero constants;representing the user perceived quality of video region i at bit rate level b; ni represents the number of all video windows of video region i.
2. The adaptive omni-directional video streaming method based on quality perception according to claim 1, wherein the optimization problem constructed in step 10 is:
max(α*Q(bt)-β*Rebuf(bt)-γ*Smooth(Q(bt)))
in the formula, btFor the bit rate of the current tth video block to be decided, Q (b)t) Representing the bit rate btVideo quality of the next tth video block, Rebuf (b)t) Representing the bit rate btThe pause time, Smo, of the next tth video blockoth(Q(bt) Represents the bit rate b)tThe quality smoothness of the next tth video block, α represents the video quality weight, β represents the katon time weight, and γ represents the quality smoothness weight.
3. The quality-aware-based adaptive omni-directional video streaming method according to claim 1, wherein the euclidean distance D of step 12 is a distance of euclideaniComprises the following steps:
in the formula (I), the compound is shown in the specification,representing the relative viewpoint movement speed of video area i at bit rate level b in the user perceived quality look-up table at step 9,representing the viewpoint brightness change of the video area i under the bit rate level b in the user perception quality query table in the step 9, wherein N is the number of pixel points of the video area i;representing the relative viewpoint moving speed of the video area i predicted at step 11,representing the viewpoint brightness change of the video area i predicted in step 11; diRepresenting the euclidean distance.
4. The quality-aware based adaptive omni-directional video streaming method according to claim 1, wherein the objective function of step 13 is:
wherein i is the number of the video area to be decided currently, M is the number of the video areas in the video block, and SiRepresenting the size of the current video area to be decided i, btiFor the bit rate of the current video area i to be decided,representing bit rate level btiUser perception quality of the current video area to be decided, i, btIs the bit rate of the video block.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210272188.XA CN114640851B (en) | 2022-03-18 | 2022-03-18 | Self-adaptive omnidirectional video stream transmission method based on quality perception |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210272188.XA CN114640851B (en) | 2022-03-18 | 2022-03-18 | Self-adaptive omnidirectional video stream transmission method based on quality perception |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114640851A true CN114640851A (en) | 2022-06-17 |
CN114640851B CN114640851B (en) | 2023-06-23 |
Family
ID=81949829
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210272188.XA Active CN114640851B (en) | 2022-03-18 | 2022-03-18 | Self-adaptive omnidirectional video stream transmission method based on quality perception |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114640851B (en) |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160227172A1 (en) * | 2013-08-29 | 2016-08-04 | Smart Services Crc Pty Ltd | Quality controller for video image |
US20170155903A1 (en) * | 2015-11-30 | 2017-06-01 | Canon Kabushiki Kaisha | Method, apparatus and system for encoding and decoding video data according to local luminance intensity |
US20180041788A1 (en) * | 2015-02-07 | 2018-02-08 | Zhou Wang | Method and system for smart adaptive video streaming driven by perceptual quality-of-experience estimations |
CN110248212A (en) * | 2019-05-27 | 2019-09-17 | 上海交通大学 | 360 degree of video stream server end code rate adaptive transmission methods of multi-user and system |
US20190289296A1 (en) * | 2017-01-30 | 2019-09-19 | Euclid Discoveries, Llc | Video Characterization For Smart Encoding Based On Perceptual Quality Optimization |
CN112825557A (en) * | 2019-11-20 | 2021-05-21 | 北京大学 | Self-adaptive sensing time-space domain quantization method aiming at video coding |
CN112929691A (en) * | 2021-01-29 | 2021-06-08 | 复旦大学 | Multi-user panoramic video transmission method |
WO2021236059A1 (en) * | 2020-05-19 | 2021-11-25 | Google Llc | Dynamic parameter selection for quality-normalized video transcoding |
US20210385502A1 (en) * | 2018-10-19 | 2021-12-09 | Samsung Electronics Co., Ltd. | Method and device for evaluating subjective quality of video |
-
2022
- 2022-03-18 CN CN202210272188.XA patent/CN114640851B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160227172A1 (en) * | 2013-08-29 | 2016-08-04 | Smart Services Crc Pty Ltd | Quality controller for video image |
US20180041788A1 (en) * | 2015-02-07 | 2018-02-08 | Zhou Wang | Method and system for smart adaptive video streaming driven by perceptual quality-of-experience estimations |
US20170155903A1 (en) * | 2015-11-30 | 2017-06-01 | Canon Kabushiki Kaisha | Method, apparatus and system for encoding and decoding video data according to local luminance intensity |
US20190289296A1 (en) * | 2017-01-30 | 2019-09-19 | Euclid Discoveries, Llc | Video Characterization For Smart Encoding Based On Perceptual Quality Optimization |
US20210385502A1 (en) * | 2018-10-19 | 2021-12-09 | Samsung Electronics Co., Ltd. | Method and device for evaluating subjective quality of video |
CN110248212A (en) * | 2019-05-27 | 2019-09-17 | 上海交通大学 | 360 degree of video stream server end code rate adaptive transmission methods of multi-user and system |
CN112825557A (en) * | 2019-11-20 | 2021-05-21 | 北京大学 | Self-adaptive sensing time-space domain quantization method aiming at video coding |
WO2021236059A1 (en) * | 2020-05-19 | 2021-11-25 | Google Llc | Dynamic parameter selection for quality-normalized video transcoding |
CN112929691A (en) * | 2021-01-29 | 2021-06-08 | 复旦大学 | Multi-user panoramic video transmission method |
Non-Patent Citations (3)
Title |
---|
DI YUAN 等: "Visual JND A Perceptual Measurement in Video Coding", IEEE ACCESS * |
翟宇轩;刘怡桑;徐艺文;陈忠辉;房颖;赵铁松;: "基于HTTP自适应流媒体传输的3D视频质量评价", 北京航空航天大学学报, no. 12 * |
车慧丽: "基于视觉感知的视频质量客观评价", 硕士电子期刊 * |
Also Published As
Publication number | Publication date |
---|---|
CN114640851B (en) | 2023-06-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20220030244A1 (en) | Content adaptation for streaming | |
CN107211193B (en) | Intelligent adaptive video streaming method and system driven by perception experience quality estimation | |
US20130304934A1 (en) | Methods and systems for controlling quality of a media session | |
EP2612495B1 (en) | Adaptive streaming of video at different quality levels | |
US8578436B2 (en) | Method for two time-scales video stream transmission control | |
US20170347159A1 (en) | Qoe analysis-based video frame management method and apparatus | |
JPH10257489A (en) | Device and method for adjusting amount of bits to be generated for encoding image | |
CN112584119B (en) | Self-adaptive panoramic video transmission method and system based on reinforcement learning | |
CN112272299A (en) | Video coding method, device, equipment and storage medium | |
CN110099294B (en) | Dynamic self-adaptive streaming media code rate allocation method for keeping space-time consistency of 360-degree video | |
WO2023134523A1 (en) | Content adaptive video coding method and apparatus, device and storage medium | |
Hoang et al. | Lexicographic bit allocation for MPEG video | |
CN112055263A (en) | 360-degree video streaming transmission system based on significance detection | |
Hsu | Mec-assisted fov-aware and qoe-driven adaptive 360 video streaming for virtual reality | |
WO2014066975A1 (en) | Methods and systems for controlling quality of a media session | |
Zhang et al. | A 360 video adaptive streaming scheme based on multiple video qualities | |
Chi et al. | Region-of-interest video coding based on rate and distortion variations for H. 263+ | |
CN114640851B (en) | Self-adaptive omnidirectional video stream transmission method based on quality perception | |
KR20040062732A (en) | Bit rate control system based on object | |
JP2016510567A (en) | Method and apparatus for context-based video quality assessment | |
Tanjung et al. | QoE Optimization in DASH-Based Multiview Video Streaming | |
Moon et al. | An Uniformalized Quality Encoding in Cloud Transcoding System | |
CN114071121A (en) | Image quality evaluation device and image quality evaluation method thereof | |
Takagi et al. | Subjective video quality estimation to determine optimal spatio-temporal resolution | |
CN114666620B (en) | Self-adaptive streaming media method based on visual sensitivity |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |