Multi-view video bit distribution method based on virtual view quality model
Technical Field
The invention relates to the field of multi-view video bit allocation, in particular to a multi-view video bit allocation method based on a virtual view quality model.
Background
6DoF (six free) video is the development target of interactive media. In the 6DoF video, a user can experience a scene at any angle at any position, and experience of being personally on the scene is obtained. 6DoF is a generic term for three degrees of freedom that rotate about the x, y, and z axes and three degrees of freedom that translate along the x, y, and z axes. Currently, the international organization for standardization is actively advancing the formulation of standards related to 6DoF video applications. Multi-view color and depth are determined as one representation of a 6DoF application scene. In a multi-view color and Depth Based 6DoF video system, virtual views are obtained by Depth-Image-Based Rendering (DIBR) technology. Unlike the Free Viewpoint Video (FVV) system, the 6DoF Video system supports a more degree of freedom scene experience, which needs to be achieved by more Viewpoint color and depth Video. The huge amount of data for multi-view color and depth video puts a huge pressure on the transmission. Therefore, according to the watching position of the user, bit allocation is carried out on the relevant viewpoints, and high-quality application of the 6DoF video under the condition that the network bandwidth is limited is realized.
Bit allocation of single-view video adopts a hierarchical strategy to allocate bits to different coding objects: firstly, allocating target bit numbers for each GOP at a GOP (group of pictures) level according to the channel rate and the buffer area state; then, carrying out frame-level bit allocation according to the weight of each frame image in the GOP; and finally, determining the target bit number for each Coding Tree Unit (CTU) (coding Tree Unit) in the image according to the total target bit number of the current image. For multi-view video, bit allocation among views needs to be further considered on the basis of bit allocation of single-view video. From the experience quality of a user, researchers propose a view-point-level bit allocation method based on an experience quality model, and the method has poor applicability due to strong dependence on the quality model. In a multi-view video system, a user does not always watch the same view at a fixed position, and the view to be watched is switched according to the change of video content. For such a situation, some researchers provide a bit allocation scheme depending on the viewpoint switching probability, and the allocation ratio changes with the switching probability, but the viewpoint switching model is only suitable for switching among the existing viewpoints, is not suitable for a system with virtual viewpoints, and has fewer application occasions of the bit allocation method.
The 6DoF video system is very complex and the related standardization work is still continuously advancing. Currently, MPEG releases multi-view color video and corresponding depth video sequences captured by a flat-laid camera array for standardized testing, but there is no bit allocation scheme for the video sequences. Compared with multi-view color and depth video distributed in one dimension, multi-view color and depth video distributed in a plane includes both horizontal and vertical parallax. Therefore, the virtual viewpoint distortion model, the user experience quality, the viewpoint switching probability, and the like of the conventional multi-viewpoint color and depth video are no longer suitable for the multi-viewpoint color and depth video of the plane distribution.
The invention provides a multi-view video bit allocation method based on a virtual view quality model, which aims at multi-view videos collected by a camera array arranged in a plane.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a multi-view video bit allocation method based on a virtual view quality model aiming at multi-view videos collected by a planar camera array, which can effectively improve the subjective and objective quality of virtual views and improve the visual experience of users.
In order to solve the technical problems, the invention adopts the following technical scheme:
a multi-view video bit distribution method based on a virtual view quality model comprises the following steps:
s1, distributing the bit number of the texture video and the depth video according to the preset proportion based on the current target bit number R, RT,tIndicating the number of bits to which the texture video is allocated, RT,dIndicating the number of bits to which the depth video is allocated;
s2, based on the position (X) of the camera at the virtual viewpoint in the three-dimensional spacev,Yv,Zv) And the position (X) of the camera at the reference viewpoint around the virtual viewpoint1,Y1,Z1)、(X2,Y2,Z2)、(X3,Y3,Z3) And (X)4,Y4,Z4) Calculating the base line distance between the virtual viewpoint and each reference viewpoint, and the base line distance d between the virtual viewpoint and the ith reference viewpointiThe calculation formula of (2) is as follows:
s3, calculating the weight of the base line distance between the virtual viewpoint and each reference viewpoint based on the base line distance between the virtual viewpoint and each reference viewpoint, wiA weight representing a baseline distance between the virtual viewpoint and the ith reference viewpoint:
s4, calculating the view-level bit distribution weight of the texture video of each reference view based on the weight of the base line distance between the virtual view and each reference view, Wt,iA view level bit allocation weight of the texture video representing the ith reference view;
s5, calculating a view-level bit allocation weight, W, of the depth video of each reference view based on the weight of the baseline distance between the virtual view and each reference viewd,iA view level bit allocation weight of the depth video representing the ith reference view;
s6, number of bits R assigned based on texture videoT,tAnd calculating the bit number of texture video distribution of each reference viewpoint by the viewpoint level bit distribution weight of the texture video of each reference viewpoint, Rt,iThe number of bits representing texture video allocation of the ith reference view,
Rt,i=Wt,i×RT,t;
s7, number of bits R assigned based on depth videoT,dAnd calculating the bit number of the depth video distribution of each reference viewpoint by the viewpoint level bit distribution weight of the depth video of each reference viewpoint, Rd,iThe number of bits representing the depth video allocation of the ith reference view,
Rd,i=Wd,i×RT,d;
s8, independently encoding the texture and depth videos of each view using the HM platform according to the number of bits allocated to the texture video and the depth video of each view.
Preferably, the preset ratio in step S1 is 5:1,
preferably, in step S4:
quality of virtual viewpoint Q without depth distortionTAnd the coding quantization parameter of the texture video satisfies the following formula:
in the formula, ξiIs a linear coefficient, QP, in the texture video related virtual view quality model corresponding to the ith reference viewt.iCoding quantization parameters of texture video being virtual views corresponding to i reference views, CTIs a variable independent of the reference viewpoint compression distortion;
ξiand wiSatisfy the relation of ξi=mtwi+nt,mtAnd ntThe coefficients obtained by linear fitting have values of-0.684 and 0.020 in this order;
in h.265/HEVC, the relationship between the lagrangian multiplier λ and the video coding distortion D satisfies:
D=αλβ
wherein α and β are model parameters related to characteristics of the video content;
the video coding quality Q and the video coding distortion D satisfy the following conditions:
Q=10×log102552/D
the relationship between the encoding quantization parameters QP and λ satisfies:
QP=4.2005lnλ+13.71122
then the following is satisfied between QP and Q:
QP=a×Q+b
wherein, a is-0.996/beta, b is-9.9612 ln alpha/beta + 47.9440/beta + 14.1221;
let the coding quality of the color video of the ith reference viewpoint be Qt,iThe corresponding model parameter related to the video content characteristic is αt,iAnd βt,iThen, the virtual viewpoint quality prediction model related to texture video is expressed as:
according to the principle that the quality is the best when distortion is minimized, the virtual viewpoint quality related to the texture video quality can be maximized by reasonably distributing the bit number of the texture video among the reference viewpoints, and further the virtual viewpoint distortion related to the texture video is minimized, namely:
converting the problem into a constraint optimization problem, and introducing a Lagrange multiplier lambdaTConstructing a cost function:
the optimal solution needs to satisfy:
namely:
because:
Q=10log102552/D
then:
in h.265/HEVC, the following formula is satisfied:
D=C×RK
where C and K are parameters related to the video content and coding characteristics;
in the multi-view color and depth video system with plane distribution, the video contents and coding characteristics of four reference views around the virtual view are similar, and C, K and a corresponding to each reference view are considered to bet,iRespectively, are approximately equal, then:
is equivalent to:
then the bit allocation weight corresponding to the ith reference viewpoint texture video is:
preferably, in step S5:
quality of virtual viewpoint Q without texture distortionDAnd the coding quantization parameter of the depth video satisfies the following conditions:
in the formula, ζiIs a linear coefficient, QP, in the depth video-dependent virtual view quality model corresponding to the ith reference viewd.iIs the ith referenceCoding quantization parameter, C, of depth video of virtual viewpoint corresponding to viewpointDIs a variable independent of the reference viewpoint compression distortion;
ζiand wiSatisfy the relationship of (1) (#)i=mdwi+nd,mdAnd ndThe coefficients obtained for the linear fit were-0.194 and-0.105 in order;
similarly, in step S4, the following formula can be theoretically derived:
then, the bit allocation weight corresponding to the ith reference viewpoint depth video is:
compared with the prior art, the invention has the advantages that: considering that the influence of each reference viewpoint on the quality of the virtual viewpoints is related to the baseline distance weight between cameras in the process of drawing the virtual viewpoints, on the basis of constructing a virtual viewpoint quality model, viewpoint-level bit distribution weights of texture videos and depth videos related to the baseline distance weight of the cameras are theoretically derived respectively, and texture distortion and depth distortion are minimized according to the weights, so that the purpose of minimizing the virtual viewpoint distortion is achieved. Compared with the viewpoint level bit average distribution method, the method can effectively improve the rendering quality of the virtual viewpoints, the quality improvement degree is related to the deviation of the virtual viewpoints from the center distance of each reference viewpoint, and the improvement effect is more obvious the farther the deviation is.
Drawings
FIG. 1 is a block diagram of an overall implementation of the method of the present invention;
FIG. 2 is a schematic view of virtual viewpoint rendering;
FIG. 3a is a graph of the effect of the sequence 'OrangeKitchen' texture video QP on the quality of a virtual view when there is no depth distortion for the reference view;
FIG. 3b is the effect of the sequence 'OrangeKitchen' depth video QP on the quality of the virtual view when there is no texture distortion for the reference view;
FIG. 4a is a relationship between texture video linear coefficients and baseline distance weights in a virtual viewpoint quality model;
FIG. 4b is a relationship between depth video linear coefficients and baseline distance weights in a virtual viewpoint quality model;
FIG. 5a is a schematic diagram of a diagonal path;
FIG. 5b is a schematic diagram of a free viewing path;
FIG. 6a is a 50 th frame effect diagram and a partial enlarged view of a sequence 'TechnicolorpPainter' drawn from a distortion-free reference viewpoint;
FIG. 6b is a partial enlarged view of four reference viewpoints corresponding to the enlarged area of FIG. 6 a;
FIG. 6c is a diagram of the effect and a partial enlarged view of the 50 th frame of the sequence 'Technicolorppainter' drawn by referring to the viewpoint in the QP (40,45) equipartition bit method;
FIG. 6d is a partial enlarged view of four encoded reference views corresponding to the enlarged region of FIG. 6 c;
FIG. 6e is a diagram of the effect and a partially enlarged view of the 50 th frame of the sequence 'Technicolorpainer' rendered with reference to the viewpoint by the method of the present invention at QP (40, 45);
fig. 6f is a partial enlarged view of four coded reference views of the enlarged area of 6 e.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings.
As shown in fig. 1, the present invention discloses a multi-view video bit allocation method based on a virtual view quality model, which comprises the following steps:
s1, distributing the bit number of the texture video and the depth video according to the preset proportion based on the current target bit number R, RT,tIndicating the number of bits to which the texture video is allocated, RT,dIndicating the number of bits to which the depth video is allocated;
in this embodiment, the reason why the bits are allocated to the texture video and the depth video first is as follows: in case only the virtual view distortion caused by coding distortion is considered, the original reference view will be takenThe virtual view obtained by point drawing is recorded as S
vAnd the virtual view drawn by the reference viewpoint with coding distortion is recorded as
The virtual view obtained by drawing the original texture video and the corresponding depth video with coding distortion is
The virtual viewpoint is distorted D
vComprises the following steps:
due to the fact that
Are random and uncorrelated, and therefore
The terms are negligible. Because in the multi-view color and depth video system of planar distribution, as shown in the virtual view rendering diagram of fig. 2, the virtual view pixel value S (x)
v,y
v) From the pixel values S of four reference viewpoints
1(x
1,y
1)、S
2(x
2,y
2)、S
3(x
3,y
3) And S
4(x
4,y
4) To determine:
wherein, wiA weight representing a baseline distance between the virtual viewpoint and the ith reference viewpoint. The virtual viewpoint distortion can be further expressed as:
when i ≠ j,
and
and
random and not correlated, then:
thus, the following virtual viewpoint quality model can be constructed:
wherein D isTAnd DDRepresenting virtual view distortion caused by color and depth video coding distortion, respectively.
Therefore, the virtual viewpoint distortion can be decomposed into two parts related to texture video and depth video, and D is minimized by adjusting the bit allocation of the texture and the depth video of each viewpointTAnd DDThereby achieving the purpose of minimizing the distortion of the virtual viewpoint. Therefore, the present invention adopts a bit allocation scheme between the first videos (i.e., texture and depth videos) and the second reference views to implement bit allocation of multi-view videos.
S2, based on the position (X) of the camera at the virtual viewpoint in the three-dimensional spacev,Yv,Zv) And the position (X) of the camera at the reference viewpoint around the virtual viewpoint1,Y1,Z1)、(X2,Y2,Z2)、(X3,Y3,Z3) And (X)4,Y4,Z4) Calculating the baseline distance between the virtual viewpoint and each reference viewpoint, and the baseline between the virtual viewpoint and the ith reference viewpointDistance diThe calculation formula of (2) is as follows:
s3, calculating the weight of the base line distance between the virtual viewpoint and each reference viewpoint based on the base line distance between the virtual viewpoint and each reference viewpoint, wiA weight representing a baseline distance between the virtual viewpoint and the ith reference viewpoint:
s4, calculating the view-level bit distribution weight of the texture video of each reference view based on the weight of the base line distance between the virtual view and each reference view, Wt,iA view level bit allocation weight of the texture video representing the ith reference view;
s5, calculating a view-level bit allocation weight, W, of the depth video of each reference view based on the weight of the baseline distance between the virtual view and each reference viewd,iA view level bit allocation weight of the depth video representing the ith reference view;
s6, number of bits R assigned based on texture videoT,tAnd calculating the bit number of texture video distribution of each reference viewpoint by the viewpoint level bit distribution weight of the texture video of each reference viewpoint, Rt,iThe number of bits representing texture video allocation of the ith reference view,
Rt,i=Wt,i×RT,t;
s7, number of bits R assigned based on depth videoT,dAnd calculating the bit number of the depth video distribution of each reference viewpoint by the viewpoint level bit distribution weight of the depth video of each reference viewpoint, Rd,iThe number of bits representing the depth video allocation of the ith reference view,
Rd,i=Wd,i×RT,d;
s8, independently encoding the texture and depth videos of each view using the HM platform according to the number of bits allocated to the texture video and the depth video of each view.
In specific implementation, the preset ratio in step S1 is 5:1,
in the specific implementation, in step S4:
in order to study the relationship between texture video distortion and virtual viewpoint distortion, under the condition of no depth distortion, the virtual viewpoint quality obtained by drawing reconstructed color videos with different reconstruction quality combinations is studied by changing the coding Quantization Parameter (QP) of one viewpoint texture video and fixing the QPs of other three viewpoints. Found through experiments. Quality of virtual viewpoint Q without depth distortionTAnd the coding quantization parameter of the texture video satisfies the following formula:
in the formula, ξiIs a linear coefficient, QP, in the texture video related virtual view quality model corresponding to the ith reference viewt.iIs the coding quantization parameter of the texture video of the virtual view corresponding to the ith reference view, CTIs a variable unrelated to the compression distortion of the reference viewpoint, fig. 3a is the influence of the color video QP of the sequence 'orange kitchen' on the quality of the virtual viewpoint when the reference viewpoint has no depth distortion;
from the virtual viewpoint distortion model, linear coefficients ξ associated with texture videoiDistance from baseline weight wiBy corresponding ξ for different virtual viewpoints in different sequencesiAnd wiFound after statistics, ξiAnd wiSatisfy the relation of ξi=mtwi+nt,mtAnd ntThe coefficients obtained by linear fitting have values of-0.684 and 0.020 in sequence, and the relationship between the linear coefficient of the texture video and the baseline distance weight in the virtual viewpoint quality model is shown in fig. 4 a;
in h.265/HEVC, the relationship between the lagrangian multiplier λ and the video coding distortion D satisfies:
D=αλβ
wherein α and β are model parameters related to characteristics of the video content;
the video coding quality Q and the video coding distortion D satisfy the following conditions:
Q=10×log102552/D
the relationship between the encoding quantization parameters QP and λ satisfies:
QP=4.2005lnλ+13.71122
then the following is satisfied between QP and Q:
QP=a×Q+b
wherein, a is-0.996/beta, b is-9.9612 ln alpha/beta + 47.9440/beta + 14.1221;
let the coding quality of the color video of the ith reference viewpoint be Qt,iThe corresponding model parameter related to the video content characteristic is αt,iAnd βt,iThen, the virtual viewpoint quality prediction model related to texture video is expressed as:
according to the principle that the quality is the best when distortion is minimized, the virtual viewpoint quality related to the texture video quality can be maximized by reasonably distributing the bit number of the texture video among the reference viewpoints, and further the virtual viewpoint distortion related to the texture video is minimized, namely:
converting the problem into a constraint optimization problem, and introducing a Lagrange multiplier lambdaTConstructing a cost function:
the optimal solution needs to satisfy:
namely:
because:
Q=10log102552/D
then:
in h.265/HEVC, the following formula is satisfied:
D=C×RK
where C and K are parameters related to the video content and coding characteristics;
since video contents and encoding characteristics of four viewpoints around a virtual viewpoint are similar in a plane-distributed multi-viewpoint color and depth video system, C, K and a corresponding to each viewpoint can be considered as corresponding to each viewpointt,iRespectively, are approximately equal, then:
is equivalent to:
then the bit allocation weight corresponding to the ith reference viewpoint texture video is:
in the specific implementation, in step S5:
to study the relationship between depth video distortion and virtual viewpoint distortion, depth viewing is performed without texture distortion by changing one viewpointAnd (3) frequency coding Quantization Parameters (QPs) and QPs of other three viewpoints are fixed, and the quality of virtual viewpoints obtained by drawing the reconstructed depth video with different reconstruction quality combinations is researched. Through experiments, the quality Q of the virtual viewpoint is found when no texture distortion existsDAnd the coding quantization parameter of the depth video satisfies the following conditions:
in the formula, ζiIs a linear coefficient, QP, in the depth video-dependent virtual view quality model corresponding to the ith reference viewd.iIs the coding quantization parameter of the depth video of the virtual view corresponding to the ith reference view, CDIs a variable independent of the reference viewpoint compression distortion; fig. 3b is the effect of depth video QP of sequence 'OrangeKitchen' on virtual view quality when there is no texture distortion for the reference view.
From the distortion model of the virtual viewpoint, the linear coefficients associated with the depth video are related to the weight of the baseline distance. By corresponding ζ at different virtual viewpoints in different sequencesiAnd wiAfter statistics, the zeta between the two is found to be satisfiedi=mdwi+nd,mdAnd ndThe coefficients obtained for the linear fit were-0.194 and-0.105 in order; the relationship between the depth video linear coefficients and the baseline distance weights in the virtual viewpoint quality model is shown in fig. 4 b.
Similarly, in step S4, the following formula can be theoretically derived:
then, the bit allocation weight corresponding to the ith reference viewpoint depth video is:
in summary, compared with the prior art, the invention has the advantages that: considering that the influence of each reference viewpoint on the quality of the virtual viewpoints is related to the baseline distance weight between cameras in the process of drawing the virtual viewpoints, on the basis of constructing a virtual viewpoint quality model, viewpoint-level bit distribution weights of texture videos and depth videos related to the baseline distance weight of the cameras are theoretically derived respectively, and texture distortion and depth distortion are minimized according to the weights, so that the purpose of minimizing the virtual viewpoint distortion is achieved. Compared with the viewpoint level bit average distribution method, the method can effectively improve the rendering quality of the virtual viewpoints, the quality improvement degree is related to the deviation of the virtual viewpoints from the center distance of each reference viewpoint, and the improvement effect is more obvious the farther the deviation is.
To further illustrate the feasibility and effectiveness of the method of the present invention, the following experiments were conducted.
In the present embodiment, the reference software HM16.20 of HEVC is used to independently encode the color video and the depth video of each view, and the rendering platform VSRS4.3 provided by MPEG-I is used to perform virtual view rendering. The test sequences are listed in Table 1, xiyjIndicating the view of the ith column and the jth row in the corresponding sequence. Virtual viewpoint distribution As shown in FIGS. 5a and 5b, ViRepresents the ith reference viewpoint, viRepresenting the ith virtual viewpoint. In order to measure the influence of coding distortion on the quality of a virtual viewpoint, a Peak Signal-to-noise ratio (PSNR) is obtained by using the virtual viewpoint obtained by rendering color and depth videos without coding distortion as a reference.
TABLE 1 relevant parameters of the experimental sequences
In this embodiment, 4 code rates are selected for testing the virtual viewpoint in the diagonal path in fig. 5a, the corresponding color video QP is {25,30,35,40}, the depth video QP is {34,39,42,45}, and the test results are shown in table 2 and compared with the performance of the viewpoint-level bit average allocation method (i.e., the equipartition bit method). In table Qpro、QaveRespectively representing the corresponding virtual of the inventive method and equipartition bit methodViewpoint quality, measured as PSNR, the amount of improvement in quality is represented by Δ Q:
ΔQ=Qpro-Qave
as can be seen from table 2, compared with the equipartition bit method, the bit allocation method based on the virtual viewpoint quality model provided by the present invention can obtain a virtual viewpoint with higher quality, and the degree of quality improvement is related to the deviation of the virtual viewpoint position from the center distance of four reference viewpoints. The weight of bit allocation in the method of the invention is related to the fusion weight when drawing the virtual viewpoint, and the reference viewpoint which is closer to the virtual viewpoint has larger weight when fusing, thereby having larger influence on the quality of the virtual viewpoint, and more bit number is allocated to the viewpoint, otherwise, the bit number is allocated to the viewpoint is less. In FIG. 5a, if the user is at v1When the method of the invention is adopted to transmit the reference viewpoint, the experience scene is allocated to the reference viewpoint V with the maximum fusion proportion during drawing1More bits, thereby raising the virtual viewpoint v1The quality of (c). Experimental results show that the farther the virtual viewpoint deviates from the center position of the reference viewpoint, the larger the bit distribution weight difference between the method and the equipartition bit method is, and the more remarkable the quality improvement effect of the virtual viewpoint is. As shown in Table 2, virtual viewpoint v1And v5The improvement in mass of is significantly higher than v2、v3And v4Due to v1And v5The center is symmetrical, and the quality improvement effects of the two are equivalent. When the virtual viewpoint is positioned at the center of the reference viewpoint, the distribution weights corresponding to the method and the equipartition bit method are equal, so that v is equal3The mass is constant and Δ Q is equal to 0.
In this embodiment, the virtual viewpoint in the free viewing path in fig. 5b is tested at the same code rate, and the result is shown in table 3, where the virtual viewpoint v is1-v7The quality of the virtual viewpoint is improved, and similar to the test result of the diagonal path in table 2, the quality improvement effect is more obvious as the virtual viewpoint deviates from the center position of the reference viewpoint.
To further illustrate the effectiveness of the method of the present invention, subjective experimental results are given below. FIG. 6 is the sequence 'TechnicolorpPainter'Virtual viewpoint v in diagonal path16a, 6c and 6e are respectively an effect graph and a partial enlarged graph obtained by drawing by a distortion-free and average bit method and the method of the invention; FIG. 6b is a partial enlarged view of the original reference viewpoint; FIGS. 6d and 6f are partial enlarged views of four coded reference viewpoints for the equipartition bit method and the method of the present invention when QP is (40,45), respectively, and x corresponds to FIG. 6b, FIG. 6d, and FIG. 6f from left to right in sequence2y2、x3y2、x2y3And x3y3. The experimental results show that, compared with fig. 6e and 6c, after the bit allocation is performed by using the method of the present invention, the quality of the virtual viewpoint is higher, and the texture details in the view are better retained, because the method of the present invention determines the bit allocation of the viewpoint level texture and the depth video according to the baseline distance weight. In this experiment, the viewpoint x closest to the virtual viewpoint2y2Most bits are allocated, and after each view is coded according to the allocated bits, view x2y2、x3y2、x2y3And x3y3Is 37.24dB, 31.71dB, 31.46dB and 30.17dB in this order, from the subjective point of view x2y2The image texture details are well preserved, as shown in fig. 6f, so that when the virtual viewpoint is drawn, the method of the present invention can provide richer texture information, so that the final virtual viewpoint quality is relatively high.
TABLE 2 Objective quality comparison results (dB) along diagonal paths
TABLE 3 Objective quality comparison results (dB) for free look paths
The above is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several changes and modifications can be made without departing from the technical solution, and the technical solution of the changes and modifications should be considered as falling within the scope of the claims of the present application.