CN114640851A - Self-adaptive omnidirectional video streaming method based on quality perception - Google Patents

Self-adaptive omnidirectional video streaming method based on quality perception Download PDF

Info

Publication number
CN114640851A
CN114640851A CN202210272188.XA CN202210272188A CN114640851A CN 114640851 A CN114640851 A CN 114640851A CN 202210272188 A CN202210272188 A CN 202210272188A CN 114640851 A CN114640851 A CN 114640851A
Authority
CN
China
Prior art keywords
video
bit rate
quality
viewpoint
rate level
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210272188.XA
Other languages
Chinese (zh)
Other versions
CN114640851B (en
Inventor
王传
吴霄汉
吴岚
梁晶
刘胜
黄寒梅
李靓平
刘鸿谋
莫冬花
李明星
黎菲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangxi Haohua Technology Co ltd
Original Assignee
Guangxi Haohua Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangxi Haohua Technology Co ltd filed Critical Guangxi Haohua Technology Co ltd
Priority to CN202210272188.XA priority Critical patent/CN114640851B/en
Publication of CN114640851A publication Critical patent/CN114640851A/en
Application granted granted Critical
Publication of CN114640851B publication Critical patent/CN114640851B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/149Data rate or code amount at the encoder output by estimating the code amount by means of a model, e.g. mathematical model or statistical model
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/80Responding to QoS
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/154Measured or subjectively estimated visual quality after decoding, e.g. measurement of distortion
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/167Position within a video image, e.g. region of interest [ROI]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Algebra (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The invention discloses a quality-perception-based self-adaptive omnidirectional video streaming method, which considers two quality determining factors, namely viewpoint movement and brightness change, specific to omnidirectional video and adopts user investigation and research to quantify the influence of the characteristics on subjective perception quality. By introducing the two characteristics to improve the SSIM measurement, a user perception quality model is established. On the basis that the MPC algorithm makes a decision for each video block, the quality allocation is performed for all regions within the video block to maximize the SSIM quality of the overall video block. The invention can realize more reasonable quality distribution for the video area, and further improve the utilization rate of the streaming media resources and the quality of user experience.

Description

Self-adaptive omnidirectional video streaming method based on quality perception
Technical Field
The invention relates to the technical field of streaming media transmission, in particular to a quality-perception-based self-adaptive omnidirectional video streaming method.
Background
In recent years, omni-directional video has become one of the emerging internet traffic. Meanwhile, the transmission of an omni-directional video stream is more challenging than a conventional video stream. To create a panoramic experience, omnidirectional video must stream panoramic content in a high resolution, hitless manner, which results in a significant consumption of bandwidth and resources. In the adaptive transmission process, the omnidirectional video is first projected into a common two-dimensional planar video, which is then temporally sliced into video blocks. For each video block, the encoder transcodes it to multiple bit rate levels (representing different qualities). Finally, the video blocks of the respective bit rate levels are further spatially cut into video regions. Similar to conventional adaptive methods, the video player can dynamically switch the quality level at the boundary of two consecutive video regions to cope with fluctuations in network bandwidth.
Most of the current adaptive algorithms employ a viewpoint-driven approach, in which only video content in the viewpoint area (user-facing area) is streamed with high quality. However, such methods have the following limitations: firstly, the viewpoint area is usually much larger than the computer screen, and the content of the streaming viewpoint area still needs at least twice the bandwidth of the common video under the same quality; secondly, because of the need to pre-fetch the content of the view area, the player must predict the user behavior (i.e. view movement), and any prediction error may lead to a degradation of the user quality of experience (QoE); finally, to accommodate the movement of viewpoints, omni-directional video must be spatially sliced and transcoded into multiple quality levels, which can greatly increase the size of the video. Since the user's perception of the quality of the omnidirectional video is different from that of the ordinary video and is uniquely influenced by the movement of the viewpoint, this feature helps to further reduce the bandwidth requirement and improve the quality of the user experience. Therefore, the existing adaptive omnidirectional video streaming algorithm still has certain defects in resource allocation and QoE maximization, and cannot meet the deployment and development requirements of the current high-quality omnidirectional streaming media service. Therefore, a more scientific and efficient adaptive omni-directional video streaming method is urgently needed.
Disclosure of Invention
The invention aims to solve the problems of low resource utilization rate, low video service quality and the like of the conventional self-adaptive omnidirectional video streaming method, and provides a self-adaptive omnidirectional video streaming method based on quality perception.
In order to solve the problems, the invention is realized by the following technical scheme:
the self-adaptive omnidirectional video streaming method based on quality perception comprises the following steps:
step 1, projecting an omnidirectional video file to a two-dimensional plane by using a common symmetric projection method; cutting the projected video file into a plurality of video blocks with equal length, and transcoding each video block into different bit rate levels;
step 2, cutting each video frame of the video block of each bit rate level into a plurality of video areas from space, and further cutting each video area into a plurality of video windows from space;
step 3, calculating the relative viewpoint moving speed and viewpoint brightness change of each pixel point of each video frame of each bit rate level video block based on the viewpoint track preset by the omnidirectional video file, wherein:
vb(p,q)=|vu-v0 b(p,q)|
lb(p,q)=|lu b(p,q)-lm b|
step 4, calculating the relative viewpoint moving speed and viewpoint brightness change of each video area under each bit rate level, wherein:
Figure BDA0003553990290000021
Figure BDA0003553990290000022
step 5, calculating a just visible difference threshold value based on a viewpoint of each pixel point of the video frame under each bit rate level, wherein:
Figure BDA0003553990290000023
step 6, calculating the weight of each video window under each bit rate level, wherein:
Figure BDA0003553990290000024
and 7, calculating the user perception quality of each video window under each bit rate level, wherein:
Figure BDA0003553990290000025
step 8, calculating the user perception quality of each video area under each bit rate level, wherein:
Figure BDA0003553990290000026
step 9, establishing a user perception quality query table of the relative viewpoint moving speed and viewpoint brightness change of each video area under each bit rate level based on the relative viewpoint moving speed and viewpoint brightness change obtained in the step 4 and the user perception quality obtained in the step 8;
step 10, modeling the adaptive bit rate decision of the video blocks as an optimization problem based on model predictive control by using an ABR algorithm, and determining the bit rate of each video block by solving the optimization problem, namely correspondingly determining the bit rate of each video frame of the video blocks;
step 11, predicting the viewpoint position of a future time point by using the existing linear regression method based on the historical viewpoint position of the current user, using the lowest viewpoint moving speed of the current time point in the past m seconds as the predicted value of the viewpoint moving speed of the future time point, and finally calculating the relative viewpoint moving speed and viewpoint brightness change of all video areas under each bit rate level by adopting the same formula in the steps 3 and 4;
step 12, calculating the relative viewpoint moving speed and viewpoint brightness change of all video regions at each bit rate level obtained in step 11 and the relative viewpoint moving speed and viewpoint brightness change of the corresponding video region at the corresponding bit rate level obtained in step 4, determining the user perception quality corresponding to the video region with the minimum euclidean distance by using the user perception quality query table constructed in step 9, and finally obtaining the user perception quality of all video regions at each bit rate level;
step 13, based on the bit rate of the video block determined in step 10 and the user perceived quality of all video regions at each bit rate level determined in step 12, further determining the quality of each video region in all video frames of the video block with the goal of maximizing the overall user perceived quality of the video block when the total quality of all video regions of all video frames of the video block is lower than or equal to the bit rate of the video block;
in the above formula, vb(p, q) represents the relative viewpoint movement speed, l, of the pixel point (p, q) at bit rate level bb(p, q) represents the viewpoint brightness variation of the pixel point (p, q) at the bit rate level b, vuIndicating the moving speed of the user's viewpoint, v0 b(p, q) represents the motion velocity of the object at pixel point (p, q) at bit rate level b, lu b(p, q) denotes the luminance at a pixel point (p, q) at bit rate level b, lm bThe average brightness of all pixel points in a video region where m-second foresight points are located under the bit rate level b is represented, m is a set value, and | represents an absolute value symbol;
Figure BDA0003553990290000031
representing the relative viewpoint movement speed of the video area i at the bit rate level b,
Figure BDA0003553990290000032
representing the viewpoint brightness change of a video area i under the bit rate level b, wherein N is the number of all pixel points of the video area i; JNDb(p, q) represents the view-based just visible disparity threshold for pixel point (p, q) at bit rate level b, CJNDb(p, q) represents the just visible difference threshold of the pixel point (p, q) at bit rate level b based on the video content characteristics, a represents a given non-zero constant;
Figure BDA0003553990290000033
representing the weight of the j-th video window of video area i at bit rate level b, i representing the number of video areas, j representing the number of video windows, NijRepresenting the number of all pixel points of the jth video window of the video area i;
Figure BDA0003553990290000034
representing the user perceived quality of the jth video window of video region i at bit rate level b;
Figure BDA0003553990290000035
j-th video representing video area i at bit rate level bThe average value of the gray levels of all pixel points in the window,
Figure BDA0003553990290000036
expressing the mean value of all pixel points of a jth video window of a video area i under a source video bit rate o;
Figure BDA0003553990290000037
representing the variance of all pixel gray levels in the jth video window of video region i at bit rate level b,
Figure BDA0003553990290000038
expressing the variance of all pixel points of a jth video window of a video area i under a source video bit rate o;
Figure BDA0003553990290000039
representing the covariance of all pixel points of gray levels of a jth video window of a video area i under a bit rate level b and a source video bit rate o; a is1、a2Are all given non-zero constants;
Figure BDA00035539902900000313
representing the user perceived quality of video region i at bit rate level b; n is a radical ofiRepresenting the number of all video windows of video region i.
The optimization problem constructed in the step 10 is as follows:
max(α*Q(bt)-β*Rebuf(bt)-γ*Smooth(Q(bt)))
in the formula, btFor the bit rate of the current tth video block to be decided, Q (b)t) Representing the bit rate btVideo quality of the next tth video block, Rebuf (b)t) Representing the bit rate btThe Cartin time, Smooth (Q (b)) of the next tth video Blockt) Represents the bit rate b)tThe quality smoothness of the next tth video block, α represents the video quality weight, β represents the katon time weight, and γ represents the quality smoothness weight.
The Euclidean distance D of the step 12iComprises the following steps:
Figure BDA00035539902900000310
in the formula (I), the compound is shown in the specification,
Figure BDA00035539902900000311
representing the relative viewpoint movement speed of video area i at bit rate level b in the user perceived quality look-up table at step 9,
Figure BDA00035539902900000312
representing the viewpoint brightness change of the video area i under the bit rate level b in the user perception quality query table in the step 9, wherein N is the number of pixel points of the video area i;
Figure BDA0003553990290000041
representing the relative viewpoint moving speed of the video area i predicted at step 11,
Figure BDA0003553990290000042
representing the viewpoint brightness variation of the video area i predicted in step 11; diRepresenting the euclidean distance.
The objective function of step 13 is:
Figure BDA0003553990290000043
wherein i is the number of the video area to be decided currently, M is the number of the video areas in the video block, and SiRepresenting the size of the current video area to be decided i, btiFor the bit rate of the current video area to be decided i,
Figure BDA0003553990290000044
representing bit rate level btiUser perception quality of the current video area to be decided, i, btIs the bit rate of the video block.
Compared with the prior art, the method and the device have the advantages that on the basis of the traditional viewpoint driving method, the user perception quality model is established by utilizing two quality determining factors specific to the omnidirectional video, so that the video quality perceived by the user subjectively can be calculated more accurately. And establishing an adaptive quality decision model by adjusting the quality of the video block and all video areas in the video block. Based on the two models, the invention can realize more reasonable video quality distribution and further improve the utilization rate of streaming media resources and the quality of user experience.
Drawings
Fig. 1 is a diagram of an application scenario of the present invention.
Fig. 2 is a general flow chart of a quality-aware based adaptive omni-directional video streaming method;
FIG. 3 is a flow diagram of a user perceived quality model;
fig. 4 is a flow chart of an adaptive quality decision model.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to specific examples.
Fig. 1 is a diagram of an application scenario of the present invention, which mainly includes a video server, a Content Delivery Network (CDN) and a video player. At the video server side, the projected flat video file is first cut into multiple video blocks of equal length, which are then transcoded into different bit rate levels (representing different sharpness and quality) using an encoder. The content distribution network acquires video blocks of each bit rate level of the flat video file from the video server, spatially cuts the video blocks of each bit rate level into a plurality of video areas for storage, and calculates the user perception quality of all the areas of the video blocks of each bit rate level based on the historical viewpoint data and the user perception quality model. The video player outputs the bit rate level of each video block and the quality of each video region in the block based on a quality decider, and requests a content distribution network to download the corresponding video region through the internet.
An adaptive omni-directional video streaming method based on quality perception, as shown in fig. 2, includes the following steps:
step 1, projecting an omnidirectional video file to a two-dimensional plane by using a common symmetric projection method; and cutting the projected video file into a plurality of video blocks with equal length, and transcoding each video block into different bit rate levels.
And preprocessing the omnidirectional video file, and projecting the omnidirectional video file to a two-dimensional plane by using a symmetric projection method-ERP projection technology. The FFmpeg tool is used to segment the projected video file into multiple video blocks of equal length (e.g., 1 second), and each video block is transcoded into different bit rate levels corresponding to different sharpness and video quality. In the present embodiment, 750kbps, 1200kbps and 1850kbps correspond to low definition, standard definition and high definition, respectively.
Step 2, spatially cutting each video frame of video blocks of respective bit rate levels into a plurality of video regions (e.g., a 6 × 12 grid), and further spatially cutting each video region into a plurality of video windows (e.g., a 3 × 3 grid).
And 3, calculating the relative viewpoint moving speed and viewpoint brightness change of each pixel point of each video frame of each bit rate level video block based on the viewpoint track preset by the omnidirectional video file.
According to the characteristics of the omnidirectional video, the perception of the user on the quality of the omnidirectional video is influenced by the movement of the viewpoint of the user. Therefore, the influence of two quality determining factors, namely the relative viewpoint moving speed and the viewpoint brightness change, which are specific to the omnidirectional video on the user perception quality is quantized through research of the user, so that the quality of the omnidirectional video which is subjectively perceived by the user is more accurately modeled.
The relative viewpoint moving speed refers to the user viewpoint moving speed vu(derived from the speed of the glasses-worn device when the user watches the omnidirectional video) and the motion speed v of the object at the pixel point (p, q) of the video frame at bit rate level b0 b(p, q) is the relative viewpoint moving speed v of each pixel of the video frame at bit rate level bb(p, q) is:
vb(p,q)=|vu-v0 b(p,q)|
in the formula, | | represents an absolute value symbol.
Viewpoint luminance change refers to luminance l at a video frame pixel point (p, q) at bit rate level bu b(p, q) and average brightness l of all pixel points in viewpoint area at time point m seconds before current time pointm bThe difference is the viewpoint brightness change l of each pixel of the video frame at bit rate level bb(p, q) is:
lb(p,q)=|lu b(p,q)-lm b|
in the formula, | | represents an absolute value symbol, and m is a set value.
And 4, calculating the relative viewpoint moving speed and viewpoint brightness change of each video area under each bit rate level.
Relative viewpoint moving speed of video area i at bit rate level b
Figure BDA0003553990290000051
Comprises the following steps:
Figure BDA0003553990290000052
viewpoint luminance variation of video region i at bit rate level b
Figure BDA0003553990290000053
Comprises the following steps:
Figure BDA0003553990290000054
in the formula, N is the number of pixels in the video area i.
And 5, calculating Just visible Difference (JND) thresholds of each pixel point of the video frames at each bit rate level based on the viewpoints.
The conventional JND threshold is related only to video content characteristics, and is referred to as a JND threshold based on video content characteristics. According to the research result of the present invention, the JND threshold is not only related to the video content characteristics, but also related to the relative viewpoint moving speed and viewpoint brightnessThe change is related to what is called the view-based JND threshold. Wherein JND thresholds follow v at the same bit rate level bb(p, q) increases and increases with lbThe (p, q) is increased and then decreased, and the two factors have independent effects on the JND threshold. By fitting the rules of the user study data, the JND threshold based on the viewpoint can be calculated as follows:
Figure BDA0003553990290000061
in the formula, JNDb(p, q) represents the view-based JND threshold at pixel point (p, q) at bitrate level b, CJNDb(p, q) denotes the JND threshold at pixel point (p, q) at bit rate level b based on the video content characteristics, a being a given non-zero constant.
And 6, calculating the weight of each video window under each bit rate level.
View-based JND threshold JND at pixel point (p, q) at bitrate level bbThe smaller (p, q) is, the more easily a user perceives quality difference, so that the weight value corresponding to the pixel point (p, q) is higher, and conversely, the JND threshold JND based on the viewpoint at the pixel point (p, q) is higherbThe larger (p, q) is, the less the user can perceive the quality difference, and thus the weight value corresponding to the pixel point (p, q) is lower. Since the pixels in different video windows have different importance, the invention defines the weight based on new quality determining factors (relative viewpoint moving speed and viewpoint brightness change), wherein the weight of the jth video window of the video area i at the bit rate level b
Figure BDA0003553990290000062
Comprises the following steps:
Figure BDA0003553990290000063
where i denotes the number of the video area, j denotes the number of the video window, NijJ-th video window representing video area iThe number of pixels of the port.
And 7, calculating the user perception quality of each video window at each bit rate level.
Referring to fig. 3, if the calculation is performed based on the Structural Similarity Index Metric (SSIM), the user perceived quality of the jth video window of the video area i at the bit rate level b is calculated
Figure BDA0003553990290000064
Comprises the following steps:
Figure BDA0003553990290000065
in the formula (I), the compound is shown in the specification,
Figure BDA0003553990290000066
represents the average value of all pixel points in the jth video window of the video area i at the bit rate level b,
Figure BDA0003553990290000067
expressing the mean value of all pixel points of a jth video window of a video area i under a source video bit rate o;
Figure BDA0003553990290000068
represents the variance of all pixel gray levels of the jth video window of the video area i at the bit rate level b,
Figure BDA0003553990290000069
expressing the variance of all pixel points of a jth video window of a video area i under a source video bit rate o;
Figure BDA00035539902900000610
representing the covariance of all pixel points of gray levels of a jth video window of a video area i under a bit rate level b and a source video bit rate o; a is1、a2Are given non-zero constants.
And 8, calculating the user perception quality of each video area under each bit rate level.
Calculating based on the user perceived quality and weight of all video windows in the video area, the user perceived quality of the video area i under the bit rate level b
Figure BDA00035539902900000611
Comprises the following steps:
Figure BDA0003553990290000071
in the formula, NiRepresenting the number of video windows of video region i.
And 9, establishing a user perception quality query table of the relative viewpoint moving speed and viewpoint brightness change of each video area under each bit rate level based on the relative viewpoint moving speed and viewpoint brightness change obtained in the step 4 and the user perception quality obtained in the step 8.
The format of the user perceived quality look-up table is as follows:
Figure BDA0003553990290000072
an adaptive quality decision model is then made based on the constructed user perceived quality look-up table, see fig. 4.
Step 10, modeling the adaptive bit rate decision of the video block as an optimization problem based on Model Predictive Control (MPC) by using an ABR algorithm, and determining the bit rate of each video block by solving the optimization problem, that is, determining the bit rate of each video frame of the video block correspondingly.
The bit rate of a video block refers to the amount of data transmitted per second, and is an average concept, for example, the bit rate of a video block of 3-5s is the total data volume/(3-5 s), and this total data volume includes the data volume of all video frames in the video block, and the data volume of each video frame is not the same, so there is no way to say exactly how much the bit rate of a video frame is because it is not measured in this way but it can be determined that once the bit rate of a video block determines that it includes the data volume of each video frame, the bit rate is determined accordingly.
The present invention utilizes the existing ABR algorithm to solve the bit rate of each video block. The optimization goal of the ABR algorithm is to maximize user QoE, where video quality, quality smoothness and katon time have a significant impact on user QoE, for which the present invention uses a linear QoE model based on the above factors as the optimization goal of the algorithm. Specifically, the optimization problem P to be solved is defined as follows:
P:max(QoE=α*Q(bt)-β*Rebuf(bt)-γ*Smooth(Q(bt)))
in the equation, the problem P represents an optimization goal to maximize the quality of user experience (QoE), i.e., the adaptive bitrate algorithm. btFor the bit rate of the current tth video block to be decided, Q (b)t) Representing the bit rate btVideo quality of the next tth video block, Rebuf (b)t) Representing the bit rate btThe Cartin time, Smooth (Q (b)) of the next tth video Blockt) Represents the bit rate b)tThe quality smoothness of the next tth video block, α represents the video quality weight, β represents the katon time weight, and γ represents the quality smoothness weight.
The specific process for solving the optimization problem P is as follows: upon a request to download video block T, the bit rate of the current video block T is selected based on the goal of maximizing the overall QoE of the next T video blocks by enumerating all bit rate combinations for the future T video blocks. After the video block is downloaded, it will be moved forward T future time views. This process repeats until all video block transmissions are complete.
Step 11, predicting the viewpoint position at a future time point by using the existing linear regression method based on the historical viewpoint position of the current user, and using the minimum viewpoint moving speed of the past m seconds of the current time point as the predicted value of the viewpoint moving speed at the future time point
Figure BDA0003553990290000073
Finally, the same formula in steps 3 and 4 is adopted to calculate the relative viewpoint moving speed of all video areas under each bit rate level
Figure BDA0003553990290000074
And viewpoint luminance variation
Figure BDA0003553990290000075
Figure BDA0003553990290000076
Figure BDA0003553990290000081
Step 12, calculating the relative viewpoint moving speed and viewpoint brightness change of all video regions at each bit rate level obtained in step 11 and the relative viewpoint moving speed and viewpoint brightness change of the corresponding video region at the corresponding bit rate level obtained in step 4, and determining the user perceived quality corresponding to the video region with the minimum euclidean distance by using the user perceived quality lookup table constructed in step 9
Figure BDA0003553990290000082
The user perceived quality (SSIM) of all video regions at various bit rate levels can be obtained.
Determining the SSIM value of each video region at each bit rate level according to the SSIM of the entry or the closest entry matching the two features of the relative viewpoint moving speed and viewpoint brightness change stored in the user perceived quality lookup table in step 9, so that the client can roughly estimate the overall user perceived quality of the video block without acquiring video content, that is, comparing the calculated relative viewpoint moving speed and viewpoint brightness change information of all video regions at each bit rate level with the corresponding video region information at the corresponding bit rate level in the lookup table to determine the user perceived quality (SSIM) of all video regions at each bit rate level.
Step 13, based on the bit rate b of the video block determined in step 10tAnd based on the stepsThe user perceived quality of all video regions at each bit rate level determined in step 12 is further determined with the goal of maximizing the overall user perceived quality (SSIM) of the video block in the case where the total quality of all video regions of all video frames of the video block is less than or equal to the video block bit rate.
After determining the bit rate of a video block, the bit rate of all video frames in the block can be determined, which determines the total quality of all video regions in the video frame. In the case where the total quality of all video regions of all video frames is less than or equal to the video block bit rate, the quality of each video region in the video block is output by defining an objective function of the overall user perceived quality and solving.
Based on the obtained SSIM values of the video areas in the video blocks with different bit rate levels, the quality level of each video area in the video block is determined by maximizing the overall SSIM value of the video block, and an objective function is defined as:
Figure BDA0003553990290000083
where i is the current video region number to be decided, btiAnd M is the number of video areas in the video block, wherein M is the bit rate of the current video area i to be decided. SiIs the size of the video area i,
Figure BDA0003553990290000084
bit rate level b determined for step 12tiUser perceived quality of lower video region i, btIs the bit rate of the video block t determined in step 10. The specific quality decision process is as follows: combining enumeration and greedy, for any pair of video regions (e.g., video regions 1 and 2), if a quality assignment is found (b)t1、bt2) Is distributed over another mass (b't1、b′t2) More rational, i.e. higher total
Figure BDA0003553990290000085
And a smaller total size (S) of the video area1+S2) Then the latter allocation is excluded in the quality allocation iteration of the remaining video regions (e.g. 3, 4.., M), and the process repeats until the quality allocation of all video regions ends.
The invention considers two quality determining factors, namely viewpoint moving speed and brightness change, which are specific to the omnidirectional video, and quantifies the influence of the viewpoint moving speed and the brightness change on the subjective perception quality of the user by adopting user investigation and research. By introducing the two characteristics to improve the SSIM measurement, a user perception quality model is established. On the basis of the MPC algorithm's bit rate decision for each video block, quality allocation is made to all regions within the video block to maximize the overall SSIM quality of the video block. The invention can realize more reasonable quality distribution for the video area, and further improve the utilization rate of the streaming media resources and the quality of user experience.
It should be noted that, although the above-mentioned embodiments of the present invention are illustrative, the present invention is not limited thereto, and thus the present invention is not limited to the above-mentioned embodiments. Other embodiments, which can be made by those skilled in the art in light of the teachings of the present invention, are considered to be within the scope of the present invention without departing from its principles.

Claims (4)

1. The self-adaptive omnidirectional video streaming method based on quality perception is characterized by comprising the following steps of:
step 1, projecting an omnidirectional video file to a two-dimensional plane by using a common symmetric projection method; cutting the projected video file into a plurality of video blocks with equal length, and transcoding each video block into different bit rate levels;
step 2, cutting each video frame of the video block of each bit rate level into a plurality of video areas from space, and further cutting each video area into a plurality of video windows from space;
step 3, calculating the relative viewpoint moving speed and viewpoint brightness change of each pixel point of each video frame of each bit rate level video block based on the viewpoint track preset by the omnidirectional video file, wherein:
vb(p,q)=|vu-v0 b(p,q)|
lb(p,q)=|lu b(p,q)-lm b|
step 4, calculating the relative viewpoint moving speed and viewpoint brightness change of each video area under each bit rate level, wherein:
Figure FDA0003553990280000011
Figure FDA0003553990280000012
step 5, calculating a just visible difference threshold value based on a viewpoint of each pixel point of the video frame under each bit rate level, wherein:
Figure FDA0003553990280000013
step 6, calculating the weight of each video window under each bit rate level, wherein:
Figure FDA0003553990280000014
and 7, calculating the user perception quality of each video window under each bit rate level, wherein:
Figure FDA0003553990280000015
step 8, calculating the user perception quality of each video area under each bit rate level, wherein:
Figure FDA0003553990280000016
step 9, establishing a user perception quality query table of the relative viewpoint moving speed and viewpoint brightness change of each video area under each bit rate level based on the relative viewpoint moving speed and viewpoint brightness change obtained in the step 4 and the user perception quality obtained in the step 8;
step 10, modeling adaptive bit rate decision of the video blocks by using an ABR algorithm as an optimization problem based on model predictive control, and determining the bit rate of each video block by solving the optimization problem, namely correspondingly determining the bit rate of each video frame of the video blocks;
step 11, predicting the viewpoint position of a future time point by using the existing linear regression method based on the historical viewpoint position of the current user, using the lowest viewpoint moving speed of the current time point in the past m seconds as the predicted value of the viewpoint moving speed of the future time point, and finally calculating the relative viewpoint moving speed and viewpoint brightness change of all video areas under each bit rate level by adopting the same formula in the steps 3 and 4;
step 12, calculating the relative viewpoint moving speed and viewpoint brightness change of all video regions at each bit rate level obtained in step 11 and the relative viewpoint moving speed and viewpoint brightness change of the corresponding video region at the corresponding bit rate level obtained in step 4, determining the user perception quality corresponding to the video region with the minimum euclidean distance by using the user perception quality query table constructed in step 9, and finally obtaining the user perception quality of all video regions at each bit rate level;
step 13, based on the bit rate of the video block determined in step 10 and the user perceived quality of all video areas at each bit rate level determined in step 12, in the case that the total quality of all video areas of all video frames of the video block is lower than or equal to the bit rate of the video block, further determining the quality of each video area in all video frames of the video block with the aim of maximizing the total user perceived quality of the video block;
in the above formula, vb(p, q) represents the relative viewpoint movement speed, l, of the pixel point (p, q) at the bit rate level bb(p, q) represents the viewpoint brightness variation of the pixel point (p, q) at the bit rate level b, vuIndicating the moving speed of the user's viewpoint, v0 b(p, q) represents the motion velocity of the object at pixel point (p, q) at bit rate level b, lu b(p, q) denotes the luminance at a pixel point (p, q) at bit rate level b, lm bThe average brightness of all pixel points in a video region where m-second foresight points are located under the bit rate level b is represented, m is a set value, and | represents an absolute value symbol;
Figure FDA0003553990280000021
representing the relative viewpoint movement speed of the video area i at the bit rate level b,
Figure FDA0003553990280000022
representing the viewpoint brightness change of a video area i under the bit rate level b, wherein N is the number of all pixel points of the video area i; JNDb(p, q) represents the view-based just visible disparity threshold for pixel point (p, q) at bit rate level b, CJNDb(p, q) represents the just visible difference threshold of the pixel point (p, q) at bit rate level b based on the video content characteristics, a represents a given non-zero constant;
Figure FDA0003553990280000023
representing the weight of the j-th video window of video area i at bit rate level b, i representing the number of video areas, j representing the number of video windows, NijRepresenting the number of all pixel points of the jth video window of the video area i;
Figure FDA0003553990280000024
representing the user perceived quality of the jth video window of video region i at bit rate level b;
Figure FDA0003553990280000025
represents the average value of all pixel points in the jth video window of the video area i at the bit rate level b,
Figure FDA0003553990280000026
expressing the mean value of the gray levels of all pixel points of the jth video window of the video area i under the source video bit rate o;
Figure FDA0003553990280000027
representing the variance of all pixel gray levels in the jth video window of video region i at bit rate level b,
Figure FDA0003553990280000028
expressing the variance of all pixel gray levels of a jth video window of a video area i under a source video bit rate o;
Figure FDA0003553990280000029
representing the covariance of all pixel points of gray levels of a jth video window of a video area i under a bit rate level b and a source video bit rate o; a is1、a2Are all given non-zero constants;
Figure FDA00035539902800000210
representing the user perceived quality of video region i at bit rate level b; ni represents the number of all video windows of video region i.
2. The adaptive omni-directional video streaming method based on quality perception according to claim 1, wherein the optimization problem constructed in step 10 is:
max(α*Q(bt)-β*Rebuf(bt)-γ*Smooth(Q(bt)))
in the formula, btFor the bit rate of the current tth video block to be decided, Q (b)t) Representing the bit rate btVideo quality of the next tth video block, Rebuf (b)t) Representing the bit rate btThe pause time, Smo, of the next tth video blockoth(Q(bt) Represents the bit rate b)tThe quality smoothness of the next tth video block, α represents the video quality weight, β represents the katon time weight, and γ represents the quality smoothness weight.
3. The quality-aware-based adaptive omni-directional video streaming method according to claim 1, wherein the euclidean distance D of step 12 is a distance of euclideaniComprises the following steps:
Figure FDA0003553990280000031
in the formula (I), the compound is shown in the specification,
Figure FDA0003553990280000032
representing the relative viewpoint movement speed of video area i at bit rate level b in the user perceived quality look-up table at step 9,
Figure FDA0003553990280000033
representing the viewpoint brightness change of the video area i under the bit rate level b in the user perception quality query table in the step 9, wherein N is the number of pixel points of the video area i;
Figure FDA0003553990280000034
representing the relative viewpoint moving speed of the video area i predicted at step 11,
Figure FDA0003553990280000035
representing the viewpoint brightness change of the video area i predicted in step 11; diRepresenting the euclidean distance.
4. The quality-aware based adaptive omni-directional video streaming method according to claim 1, wherein the objective function of step 13 is:
Figure FDA0003553990280000036
wherein i is the number of the video area to be decided currently, M is the number of the video areas in the video block, and SiRepresenting the size of the current video area to be decided i, btiFor the bit rate of the current video area i to be decided,
Figure FDA0003553990280000037
representing bit rate level btiUser perception quality of the current video area to be decided, i, btIs the bit rate of the video block.
CN202210272188.XA 2022-03-18 2022-03-18 Self-adaptive omnidirectional video stream transmission method based on quality perception Active CN114640851B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210272188.XA CN114640851B (en) 2022-03-18 2022-03-18 Self-adaptive omnidirectional video stream transmission method based on quality perception

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210272188.XA CN114640851B (en) 2022-03-18 2022-03-18 Self-adaptive omnidirectional video stream transmission method based on quality perception

Publications (2)

Publication Number Publication Date
CN114640851A true CN114640851A (en) 2022-06-17
CN114640851B CN114640851B (en) 2023-06-23

Family

ID=81949829

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210272188.XA Active CN114640851B (en) 2022-03-18 2022-03-18 Self-adaptive omnidirectional video stream transmission method based on quality perception

Country Status (1)

Country Link
CN (1) CN114640851B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160227172A1 (en) * 2013-08-29 2016-08-04 Smart Services Crc Pty Ltd Quality controller for video image
US20170155903A1 (en) * 2015-11-30 2017-06-01 Canon Kabushiki Kaisha Method, apparatus and system for encoding and decoding video data according to local luminance intensity
US20180041788A1 (en) * 2015-02-07 2018-02-08 Zhou Wang Method and system for smart adaptive video streaming driven by perceptual quality-of-experience estimations
CN110248212A (en) * 2019-05-27 2019-09-17 上海交通大学 360 degree of video stream server end code rate adaptive transmission methods of multi-user and system
US20190289296A1 (en) * 2017-01-30 2019-09-19 Euclid Discoveries, Llc Video Characterization For Smart Encoding Based On Perceptual Quality Optimization
CN112825557A (en) * 2019-11-20 2021-05-21 北京大学 Self-adaptive sensing time-space domain quantization method aiming at video coding
CN112929691A (en) * 2021-01-29 2021-06-08 复旦大学 Multi-user panoramic video transmission method
WO2021236059A1 (en) * 2020-05-19 2021-11-25 Google Llc Dynamic parameter selection for quality-normalized video transcoding
US20210385502A1 (en) * 2018-10-19 2021-12-09 Samsung Electronics Co., Ltd. Method and device for evaluating subjective quality of video

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160227172A1 (en) * 2013-08-29 2016-08-04 Smart Services Crc Pty Ltd Quality controller for video image
US20180041788A1 (en) * 2015-02-07 2018-02-08 Zhou Wang Method and system for smart adaptive video streaming driven by perceptual quality-of-experience estimations
US20170155903A1 (en) * 2015-11-30 2017-06-01 Canon Kabushiki Kaisha Method, apparatus and system for encoding and decoding video data according to local luminance intensity
US20190289296A1 (en) * 2017-01-30 2019-09-19 Euclid Discoveries, Llc Video Characterization For Smart Encoding Based On Perceptual Quality Optimization
US20210385502A1 (en) * 2018-10-19 2021-12-09 Samsung Electronics Co., Ltd. Method and device for evaluating subjective quality of video
CN110248212A (en) * 2019-05-27 2019-09-17 上海交通大学 360 degree of video stream server end code rate adaptive transmission methods of multi-user and system
CN112825557A (en) * 2019-11-20 2021-05-21 北京大学 Self-adaptive sensing time-space domain quantization method aiming at video coding
WO2021236059A1 (en) * 2020-05-19 2021-11-25 Google Llc Dynamic parameter selection for quality-normalized video transcoding
CN112929691A (en) * 2021-01-29 2021-06-08 复旦大学 Multi-user panoramic video transmission method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
DI YUAN 等: "Visual JND A Perceptual Measurement in Video Coding", IEEE ACCESS *
翟宇轩;刘怡桑;徐艺文;陈忠辉;房颖;赵铁松;: "基于HTTP自适应流媒体传输的3D视频质量评价", 北京航空航天大学学报, no. 12 *
车慧丽: "基于视觉感知的视频质量客观评价", 硕士电子期刊 *

Also Published As

Publication number Publication date
CN114640851B (en) 2023-06-23

Similar Documents

Publication Publication Date Title
US20220030244A1 (en) Content adaptation for streaming
CN107211193B (en) Intelligent adaptive video streaming method and system driven by perception experience quality estimation
US20130304934A1 (en) Methods and systems for controlling quality of a media session
EP2612495B1 (en) Adaptive streaming of video at different quality levels
US8578436B2 (en) Method for two time-scales video stream transmission control
US20170347159A1 (en) Qoe analysis-based video frame management method and apparatus
JPH10257489A (en) Device and method for adjusting amount of bits to be generated for encoding image
CN112584119B (en) Self-adaptive panoramic video transmission method and system based on reinforcement learning
CN112272299A (en) Video coding method, device, equipment and storage medium
CN110099294B (en) Dynamic self-adaptive streaming media code rate allocation method for keeping space-time consistency of 360-degree video
WO2023134523A1 (en) Content adaptive video coding method and apparatus, device and storage medium
Hoang et al. Lexicographic bit allocation for MPEG video
CN112055263A (en) 360-degree video streaming transmission system based on significance detection
Hsu Mec-assisted fov-aware and qoe-driven adaptive 360 video streaming for virtual reality
WO2014066975A1 (en) Methods and systems for controlling quality of a media session
Zhang et al. A 360 video adaptive streaming scheme based on multiple video qualities
Chi et al. Region-of-interest video coding based on rate and distortion variations for H. 263+
CN114640851B (en) Self-adaptive omnidirectional video stream transmission method based on quality perception
KR20040062732A (en) Bit rate control system based on object
JP2016510567A (en) Method and apparatus for context-based video quality assessment
Tanjung et al. QoE Optimization in DASH-Based Multiview Video Streaming
Moon et al. An Uniformalized Quality Encoding in Cloud Transcoding System
CN114071121A (en) Image quality evaluation device and image quality evaluation method thereof
Takagi et al. Subjective video quality estimation to determine optimal spatio-temporal resolution
CN114666620B (en) Self-adaptive streaming media method based on visual sensitivity

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant