Disclosure of Invention
The invention aims to solve the technical problem of providing a method for quickly encoding a three-dimensional panoramic video based on panoramic saliency, wherein the encoding time complexity is low.
The technical scheme adopted by the invention for solving the technical problems is as follows: a stereo panoramic video fast coding method based on panoramic saliency is characterized by comprising the following steps:
step 1: defining a current right viewpoint video frame to be processed except a 1 st frame in the stereoscopic panoramic video in the ERP projection format as a current frame; wherein the width of the current frame is W and the height is H;
step 2: carrying out significance calculation on the current frame to obtain a 3D-Sobel significance map of the current frame;
and step 3: defining a maximum coding unit to be processed currently in a current frame as a current maximum coding unit; wherein, the size of the current maximum coding unit is 64 × 64;
and 4, step 4: judging whether the current maximum coding unit is the uppermost or leftmost maximum coding unit in the current frame, if so, coding the current maximum coding unit by adopting a 3D-HEVC video coder, and then executing the step 11; otherwise, executing step 5;
and 5: calculating the significance strength of a region with the size of 64 multiplied by 64 corresponding to the current maximum coding unit in the 3D-Sobel significant map of the current frame, and marking the region as SILCU(ii) a And calculating a panorama significance threshold value of an area with the size of 64 multiplied by 64 corresponding to the current maximum coding unit in the 3D-Sobel significant image of the current frame, and recording the panorama significance threshold value as THS(ii) a Then judging SILCU≥THSIf yes, judging the current maximum coding unit as a significant block, redefining the current maximum coding unit as the current codingUnit, then execute step 9; if not, judging that the current maximum coding unit is an insignificant block, and then executing step 6;
step 6: let D
LCU(View) represents the optimal recursive depth mean of the coded maximum coding unit corresponding to the current maximum coding unit in the left View video frame corresponding to the current frame, let D
LCU(Col) represents the optimal recursive depth mean of the coded maximum coding unit corresponding to the current maximum coding unit in the right view video frame of the previous frame of the current frame, let D
LCU(LT) represents the optimal recursive depth mean of the coded top-left maximum coding unit of the current maximum coding unit, let D
LCU(L) represents the optimal recursive depth mean of the left-most coded unit of the current largest coded unit, let D
LCU(T) represents an optimal recursive depth mean of the coded upper-side largest coding unit of the current largest coding unit; then, the recursion depth interval of the current maximum coding unit is predicted and is marked as [ D ]
min,D
max],
Wherein D is
minMinimum partition depth, D, representing the current maximum coding unit
maxRepresents the maximum partition depth of the current maximum coding unit, min () is the function of taking the minimum value, max () is the function of taking the maximum value, the symbol
To round down the symbol, the symbol
Is a rounded up symbol;
and 7: jumping to a quadtree structure with the current maximum coding unit as a root node and dividing the depth into DminThe 3D-HEVC video encoder is adopted to encode all coding units in the CU layer in a depth-first traversal manner, and any coding unit in the CU layer is encodedUsing the current coding unit as the element, judging whether the maximum division depth of the current coding unit reaches D or not after the current coding unit is codedmaxOr whether the maximum division depth of the current coding unit reaches 3, if so, continuing to code the uncoded brother nodes of the current coding unit in a depth-first traversal mode until all the brother nodes of the current coding unit are coded, and then executing the step 11; if not, executing step 8;
and 8: calculating the significance strength of the region corresponding to the current coding unit in the 3D-Sobel significance map of the current frame, and recording the significance strength as SI
CUThen, SI is compared
CUAnd SI
LCUIf SI is
CU>SI
LCUThen calculate the recursive depth interval of the current coding unit, which is marked as [ D ]
min,D′
max],
Then order D
max=D′
maxThen returning to the step 7 to continue the execution; if SI is
CU≤SI
LCUThen calculate the recursive depth interval of the current coding unit, which is marked as [ D ]
min,D″
max],
Then order D
max=D″
maxThen returning to the step 7 to continue the execution; wherein D is
CU(View) represents an optimal recursive depth mean of coded coding units corresponding to the current coding unit in the left View video frame corresponding to the current frame, D
CU(Col) represents an optimal recursive depth mean, D, of the coded coding unit corresponding to the current coding unit in the right view video frame of the previous frame of the current frame
CU(LT) represents the optimal recursive depth mean, D, of the coded upper left coding unit of the current coding unit
CU(L) represents the optimal recursive depth mean, D, of the coded left coding unit of the current coding unit
CU(T) represents an optimal recursive depth mean of the coded upper coding unit of the current coding unit;
and step 9: computing by using a BJND modelPerceptual distortion root mean square error, denoted MSE, of a preceding coding unit
Bjnd(ii) a And calculating the statistical root mean square error of the current coding unit, and recording as MSE
S,
Then, the coding unit division threshold value based on the panoramic perception distortion is obtained through calculation and is recorded as TH
split,TH
split=η
1MSE
S+η
2MSE
Bjnd(ii) a Wherein e represents a natural base number, k is a slope, and the value is-2.3334, Q
stepRepresents a quantization step size of the current coding unit, QP represents a quantization parameter of the current coding unit,
MSE
Colrepresents the root mean square error of the coded coding unit corresponding to the current coding unit in the previous frame of the current frame,
representing the quantization step size, QP, of the coded coding unit corresponding to the current coding unit in the frame preceding the current frame
ColRepresenting the quantization parameter of the coded coding unit corresponding to the current coding unit in a frame preceding the current frame, b representing the intercept, with a value of 6.3751, nxn representing the size of the current coding unit, N having a value of 64 or 32 or 16 or 8, η
1And η
2Are all regulatory factors, η
1+η
2=1;
Step 10: calculating the root mean square error of the current coding unit, and recording as MSECur(ii) a Then compares the MSECurAnd THsplitSize of (1), if MSECur≤THsplitIf the current coding unit reaches the optimal division depth, no further division is needed, a 3D-HEVC video encoder is adopted to encode the current coding unit, and then the step 11 is executed; if MSECur>THsplitJumping to a quadtree structure with the current maximum coding unit as a root node and dividing the depth into DminOf a 3D-HEVC video encoder, to depth all coding units in the CU layerCoding in a mode of degree-first traversal, regarding any coding unit in the CU layer as a current coding unit, then returning to the step 9 to continue execution until all sibling nodes of the current coding unit are coded completely, and then executing the step 11;
step 11: taking the next maximum coding unit to be processed in the current frame as the current maximum coding unit, then returning to the step 4 to continue executing until all the maximum coding units in the current frame are processed, and then executing the step 12;
step 12: and (3) taking the next frame of right viewpoint video frame to be processed in the three-dimensional panoramic video in the ERP projection format as a current frame, and then returning to the step (2) to continue executing until all video frames in the three-dimensional panoramic video in the ERP projection format are processed.
In the step 2, the significance of the current frame is calculated by adopting a 3D-Sobel model.
In the step 5, SI is calculated
LCUAnd calculating SI in said step 8
CUThe process is the same, and the specific process is as follows: defining the area of which the significance strength is to be calculated as the area to be processed, recording the significance strength of the area to be processed as SI,
wherein the content of the first and second substances,
which indicates the size of the area to be treated,
has a value of 64 or 32 or 16 or 8,
the coordinate position of the pixel point at the upper left corner of the region to be processed in the 3D-Sobel saliency map of the current frame is represented,
the coordinate position in the 3D-Sobel saliency map representing the current frame is
The pixel value of the pixel point of (a),
in the step 5, THSThe calculation process of (2) is as follows:
step 5_ 1: calculating the ERP dimension weight of each pixel point in the current frame, and recording the ERP dimension weight of the pixel point with the coordinate position (x, y) in the current frame as w
ERP(x,y),
Wherein x is more than or equal to 0 and less than or equal to W-1, and y is more than or equal to 0 and less than or equal to H-1;
step 5_ 2: calculating the ERP dimension weight of the current maximum coding unit, and recording as w
LCU,
Wherein, N ' x N ' represents the size of the current maximum coding unit, i.e. the value of N ' is 64, (i ', j ') represents the coordinate position of the top left pixel point of the current maximum coding unit in the current frame, w
ERP(m ', n') represents ERP dimension weight of pixel point with coordinate position (m ', n') in the current frame, wherein m 'is more than or equal to 0 and less than or equal to W-1, and n' is more than or equal to 0 and less than or equal to H-1;
step 5_ 3: calculate THS,THS=THE+β×(1-wLCU) (ii) a Wherein TH isERepresents the significance threshold at half the height of the current frame, and β represents wLCUThe scaling factor of (2).
In the step 9, MSE
BjndThe calculation process of (2) is as follows:
the N multiplied by N represents the size of the current coding unit, the value of N is 64 or 32 or 16 or 8, (i, j) represents the coordinate position of the upper left pixel point of the current coding unit in the current frame, BJND (m, N) represents the pixel value of the pixel point with the coordinate position (m, N) in the binocular just-noticeable distortion diagram of the current frame, m is more than or equal to 0 and less than or equal to W-1, and N is more than or equal to 0 and less than or equal to H-1.
In the step 9, MSE
ColThe calculation process of (2) is as follows:
the method comprises the steps of obtaining a current coding unit, obtaining a pixel value of a pixel point with a coordinate position (m, N) in a previous frame of a current frame, obtaining a pixel value of a pixel point with a coordinate position (m, N) in the previous frame of the current frame, obtaining a pixel value of a pixel point with a coordinate position (m, N) in a coded and reconstructed image of the previous frame of the current frame, and obtaining a pixel value of a pixel point with a coordinate position (m, N) in the previous frame of the current frame, wherein NxN represents the size of the current coding unit, N is 64 or 32 or 16 or 8, (I, j) represents the coordinate position of a pixel point in the upper left corner of the current coding unit, I (m, N) represents the pixel value of a pixel point with a coordinate position (m, N) in the previous frame of the current frame, I' (m, N) represents the pixel value of a pixel point with a pixel point (m, N) in the coded and reconstructed image in the previous frame of the current frame, m, N is greater than or equal to 0 and equal to or equal to W-1, and equal to or equal to 0.
In the step 10, MSECurThe calculation process of (2) is as follows:
wherein, NxN represents the size of the current coding unit, the value of N is 64 or 32 or 16 or 8, (i, j) represents the coordinate position of the upper left pixel point of the current coding unit in the current frame,
representing the pixel value of the pixel point with the coordinate position of (m, n) in the current frame,
and the pixel value of a pixel point with the coordinate position (m, n) in the coding predicted image of the current frame is represented, wherein m is more than or equal to 0 and less than or equal to W-1, and n is more than or equal to 0 and less than or equal to H-1.
Compared with the prior art, the invention has the advantages that:
the method analyzes the significance of a right viewpoint video frame in the stereoscopic panoramic video, provides a corresponding rapid termination mode respectively for a non-significant region and a significant region in the right viewpoint video frame, and predicts and corrects a recursion depth interval of a current coding unit by using the optimal division depth of adjacent blocks in a time-space domain of the current coding unit for the non-significant region; aiming at the significant region, whether the current coding unit reaches the optimal division depth is judged by calculating and comparing the size relation between the root mean square error of the current coding unit and the division threshold of the coding unit based on the panoramic perception distortion, and experimental tests show that the method can effectively reduce the recursion complexity of the coding unit and save the coding time.
Detailed Description
The invention is described in further detail below with reference to the accompanying examples.
The overall implementation block diagram of the method for rapidly encoding the stereoscopic panoramic video based on the panoramic saliency, which is provided by the invention, is shown in fig. 1, and the method comprises the following steps:
step 1: defining a right viewpoint video frame except a 1 st frame to be processed currently in a stereoscopic panoramic video in an ERP (Enterprise resource planning) projection format as a current frame; wherein the width of the current frame is W and the height is H. Here, all left view video frames and the 1 st frame right view video frame in the stereoscopic panoramic video are encoded by using the existing 3D-HEVC video encoder.
Step 2: and carrying out significance calculation on the current frame to obtain a 3D-Sobel significance map of the current frame.
In this embodiment, in step 2, a 3D-Sobel model is used to perform saliency calculation on the current frame.
And step 3: defining a maximum coding unit to be processed currently in a current frame as a current maximum coding unit; wherein the size of the current maximum coding unit is 64 × 64.
And 4, step 4: judging whether the current maximum coding unit is the uppermost or leftmost maximum coding unit in the current frame, if so, coding the current maximum coding unit by adopting a 3D-HEVC video coder, and then executing the step 11; otherwise, step 5 is executed. Here, the uppermost maximum coding unit in the current frame is the first row maximum coding unit in the current frame, and the leftmost maximum coding unit in the current frame is the first column maximum coding unit in the current frame.
And 5: calculating the significance strength of a region with the size of 64 multiplied by 64 corresponding to the current maximum coding unit in the 3D-Sobel significant map of the current frame, and marking the region as SILCU(ii) a And calculating a panorama significance threshold value of an area with the size of 64 multiplied by 64 corresponding to the current maximum coding unit in the 3D-Sobel significant image of the current frame, and recording the panorama significance threshold value as THS(ii) a Then judging SILCU≥THSIf yes, judging that the current maximum coding unit is a significant block, redefining the current maximum coding unit as the current coding unit, and executing the step 9; if not, the current maximum coding unit is determined to be an insignificant block, and then step 6 is performed.
In this embodiment, TH is performed in step 5SThe calculation process of (2) is as follows:
step 5_ 1: calculating the ERP dimension weight of each pixel point in the current frame, and recording the ERP dimension weight of the pixel point with the coordinate position (x, y) in the current frame as w
ERP(x,y),
Wherein x is more than or equal to 0 and less than or equal to W-1, and y is more than or equal to 0 and less than or equal to H-1.
Step 5_ 2: since the 3D-HEVC video encoder uses CU blocks in a video frame as basic coding units. For the convenience of application in coding, the ERP dimension weight of the current maximum coding unit is calculated and is marked as w
LCU,
Wherein, N ' x N ' represents the size of the current maximum coding unit, i.e. the value of N ' is 64, (i ', j ') represents the coordinate position of the top left pixel point of the current maximum coding unit in the current frame, w
ERPAnd (m ', n') represents the ERP dimension weight of the pixel point with the coordinate position (m ', n') in the current frame, wherein m 'is more than or equal to 0 and less than or equal to W-1, and n' is more than or equal to 0 and less than or equal to H-1.
Step 5_ 3: calculate THS,THS=THE+β×(1-wLCU) (ii) a Wherein TH isERepresenting the significance threshold at half height of the current frame, by a large number of realitiesTest to obtain THE0.3, beta represents wLCUβ is 0.2 through a number of experiments.
FIG. 2 presents a flow diagram for fast termination of non-salient blocks; fig. 3 presents a flow chart for fast termination of a significant block.
Step 6: let D
LCU(View) represents the optimal recursive depth mean of the coded maximum coding unit corresponding to the current maximum coding unit in the left View video frame corresponding to the current frame, let D
LCU(Col) represents the optimal recursive depth mean of the coded maximum coding unit corresponding to the current maximum coding unit in the right view video frame of the previous frame of the current frame, let D
LCU(LT) represents the optimal recursive depth mean of the coded top-left maximum coding unit of the current maximum coding unit, let D
LCU(L) represents the optimal recursive depth mean of the left-most coded unit of the current largest coded unit, let D
LCU(T) represents an optimal recursive depth mean of the coded upper-side largest coding unit of the current largest coding unit; then, the recursion depth interval of the current maximum coding unit is predicted and is marked as [ D ]
min,D
max],
Wherein D is
minMinimum partition depth, D, representing the current maximum coding unit
maxRepresents the maximum partition depth of the current maximum coding unit, min () is the function of taking the minimum value, max () is the function of taking the maximum value, the symbol
To round down the symbol, the symbol
Is a rounded up symbol; here, the coded upper left-most coding unit of the current maximum coding unit is the coded nearest neighbor maximum coding unit located at the upper left side of the current maximum coding unitAnd the encoded left-side maximum coding unit of the current maximum coding unit is the encoded nearest neighbor maximum coding unit positioned on the left side of the current maximum coding unit, and the encoded upper-side maximum coding unit of the current maximum coding unit is the encoded nearest neighbor maximum coding unit positioned on the upper side of the current maximum coding unit.
And 7: jumping to a quadtree structure with the current maximum coding unit as a root node and dividing the depth into DminThe coding unit layer(s) of (1) is (are) coded by a 3D-HEVC video coder in a depth-first traversal manner for all coding units in the CU layer, any coding unit in the CU layer is taken as a current coding unit, and after the current coding unit is coded, whether the maximum division depth of the current coding unit reaches D is judged firstmaxOr whether the maximum division depth of the current coding unit reaches 3, if so, continuing to code the uncoded brother nodes of the current coding unit in a depth-first traversal mode until all the brother nodes of the current coding unit are coded, and then executing the step 11; if not, step 8 is performed.
And 8: calculating the significance strength of the region corresponding to the current coding unit in the 3D-Sobel significance map of the current frame, and recording the significance strength as SI
CUThen, SI is compared
CUAnd SI
LCUIf SI is
CU>SI
LCUAnd if the current coding unit is judged to be an insignificant block, the current maximum coding unit is located at a position where a significant region and an insignificant region in the current frame are bordered, and the current coding unit is located in a region with higher significance, so that to avoid quality degradation caused by too small coding depth, it is necessary to update the D according to the optimal division depth of the adjacent blocks in the time-space domain of the current coding unit
maxThus, the recursive depth interval of the current coding unit is calculated, denoted as [ D ]
min,D′
max],
Then order D
max=D′
maxThen returning to the step 7 to continue the execution; if SI is
CU≤SI
LCUThen, it means that the significance strength of the current coding unit is consistent with or less than that of the current maximum coding unit, the current coding unit is inside the current maximum coding unit, the current coding unit is in a region where the significance is consistent with or less than the whole, and therefore the recursion depth interval of the current coding unit is calculated and is marked as [ D ]
min,D″
max],
Then order D
max=D″
maxThen returning to the step 7 to continue the execution; wherein D is
CU(View) represents an optimal recursive depth mean of coded coding units corresponding to the current coding unit in the left View video frame corresponding to the current frame, D
CU(Col) represents an optimal recursive depth mean, D, of the coded coding unit corresponding to the current coding unit in the right view video frame of the previous frame of the current frame
CU(LT) represents the optimal recursive depth mean, D, of the coded upper left coding unit of the current coding unit
CU(L) represents the optimal recursive depth mean, D, of the coded left coding unit of the current coding unit
CU(T) represents an optimal recursive depth mean of the coded upper coding unit of the current coding unit; here, the coded upper-left coding unit of the current coding unit is a coded nearest neighbor coding unit located on the upper left side of the current coding unit, the coded left coding unit of the current coding unit is a coded nearest neighbor coding unit located on the left side of the current coding unit, and the coded upper-side coding unit of the current coding unit is a coded nearest neighbor coding unit located on the upper side of the current coding unit.
And step 9: the current maximum coding unit is a significant block, which indicates that the current maximum coding unit is in a texture dense area, and the perceptual Distortion root mean square error of the current coding unit is calculated by using the existing classical BJND (Binocular Just Noticeable Distortion) model and is recorded as MSE
Bjnd(ii) a And calculating the statistical root mean square error of the current coding unit, and recording as MSE
S,
Then, the coding unit division threshold value based on the panoramic perception distortion is obtained through calculation and is recorded as TH
split,TH
split=η
1MSE
S+η
2MSE
Bjnd(ii) a Wherein e represents a natural base number, k is a slope, and the value is-2.3334, Q
stepRepresents a quantization step size of the current coding unit, QP represents a quantization parameter of the current coding unit,
MSE
Colrepresents the root mean square error of the coded coding unit corresponding to the current coding unit in the previous frame of the current frame,
representing the quantization step size, QP, of the coded coding unit corresponding to the current coding unit in the frame preceding the current frame
ColRepresenting the quantization parameter of the coded coding unit corresponding to the current coding unit in a frame preceding the current frame, b representing the intercept, with a value of 6.3751, nxn representing the size of the current coding unit, N having a value of 64 or 32 or 16 or 8, η
1And η
2Are all regulatory factors, η
1+η
21 in this example, η
1=η
2=0.5。
In this embodiment, step 9, MSE
BjndThe calculation process of (2) is as follows:
the N multiplied by N represents the size of the current coding unit, the value of N is 64 or 32 or 16 or 8, (i, j) represents the coordinate position of the upper left pixel point of the current coding unit in the current frame, BJND (m, N) represents the pixel value of the pixel point with the coordinate position of (m, N) in the binocular just-noticeable distortion diagram (namely BJND diagram) of the current frame, m is more than or equal to 0 and less than or equal to W-1, and N is more than or equal to 0 and less than or equal to H-1.
In this embodiment, step 9, MSE
ColThe calculation process of (2) is as follows:
the method comprises the steps of obtaining a current coding unit, obtaining a pixel value of a pixel point with a coordinate position (m, N) in a previous frame of a current frame, obtaining a pixel value of a pixel point with a coordinate position (m, N) in the previous frame of the current frame, obtaining a pixel value of a pixel point with a coordinate position (m, N) in a coded and reconstructed image of the previous frame of the current frame, and obtaining a pixel value of a pixel point with a coordinate position (m, N) in the previous frame of the current frame, wherein NxN represents the size of the current coding unit, N is 64 or 32 or 16 or 8, (I, j) represents the coordinate position of a pixel point in the upper left corner of the current coding unit, I (m, N) represents the pixel value of a pixel point with a coordinate position (m, N) in the previous frame of the current frame, I' (m, N) represents the pixel value of a pixel point with a pixel point (m, N) in the coded and reconstructed image in the previous frame of the current frame, m, N is greater than or equal to 0 and equal to or equal to W-1, and equal to or equal to 0.
Step 10: calculating the root mean square error of the current coding unit, and recording as MSECur(ii) a Then compares the MSECurAnd THsplitSize of (1), if MSECur≤THsplitIf the current coding unit reaches the optimal division depth, no further division is needed, a 3D-HEVC video encoder is adopted to encode the current coding unit, and then the step 11 is executed; if MSECur>THsplitJumping to a quadtree structure with the current maximum coding unit as a root node and dividing the depth into DminThe 3D-HEVC video encoder is adopted to encode all coding units in the CU layer in a depth-first traversal manner, and any coding unit in the CU layer is taken as a current coding unit, and then the process returns to step 9 to continue to be executed until all sibling nodes of the current coding unit are encoded, and then step 11 is executed.
In this embodiment, step 10, MSE
CurThe calculation process of (2) is as follows:
wherein, NxN represents the size of the current coding unit, the value of N is 64 or 32 or 16 or 8, (i, j) represents the coordinate position of the upper left pixel point of the current coding unit in the current frame,
representing the pixel value of the pixel point with the coordinate position of (m, n) in the current frame,
and the pixel value of a pixel point with the coordinate position (m, n) in the coding predicted image of the current frame is represented, wherein m is more than or equal to 0 and less than or equal to W-1, and n is more than or equal to 0 and less than or equal to H-1.
Step 11: and taking the next maximum coding unit to be processed in the current frame as the current maximum coding unit, then returning to the step 4 to continue executing until all the maximum coding units in the current frame are processed, and then executing the step 12.
Step 12: and (3) taking the next frame of right viewpoint video frame to be processed in the three-dimensional panoramic video in the ERP projection format as a current frame, and then returning to the step (2) to continue executing until all video frames in the three-dimensional panoramic video in the ERP projection format are processed.
In this embodiment, SI is calculated in step 5
LCUAnd calculating SI in step 8
CUThe process is the same, and the specific process is as follows: defining the area of which the significance strength is to be calculated as the area to be processed, recording the significance strength of the area to be processed as SI,
wherein the content of the first and second substances,
which indicates the size of the area to be treated,
has a value of 64 or 32 or 16 or 8,
the coordinate position of the pixel point at the upper left corner of the region to be processed in the 3D-Sobel saliency map of the current frame is represented,
the coordinate position in the 3D-Sobel saliency map representing the current frame is
The pixel value of the pixel point of (a),
to further illustrate the performance of the method of the present invention, the method of the present invention was tested.
In order to evaluate the effectiveness of the method, an HTM14.1 is selected as a test model, a 64-bit WIN7 operating system with a CPU Intel (R) core (TM) i3, a main frequency of 2.4GHz and a memory of 4G is configured in hardware, and a development tool selects VS 2013. Selecting stereo panoramic video sequences of 'chat', 'experience', 'photomraph', 'riverside', 'scientific _ spot', 'sign _ in', 'tourrist' and 'traffic' as standard test sequences, wherein the test frames are 100 frames, the coding structure is in an HBP random access mode, the GOP length of a group of pictures is 8, and the period of an I frame is 24. The initial quantization parameters QP of the independent viewpoints are 22, 27, 32 and 37 respectively, and the method is tested on the dependent viewpoints.
Table 1 lists specific information of "chat", "experience", "photograph", "riverside", "scientific _ spot", "sign _ in", "tourrist", and "traffic" stereoscopic panoramic video sequences.
TABLE 1 associated parameter information for stereoscopic panoramic video sequences
Table 2 shows the savings in coding time when the method of the present invention is used to code the stereoscopic panoramic video sequence listed in table 1, compared to the HTM original platform method. The time saving rate of coding by adopting the method of the invention compared with the coding by adopting the HTM original platform method is defined as delta TPRO-CU,ΔTPRO-CU=(TOrg-TPRO-CU)/TOrg×100[%]Wherein, TPRO-CURepresenting the coding time, T, of the coding by the method of the inventionOrgRepresenting the encoding time for encoding using the HTM native platform method.
TABLE 2 comparison of time savings for encoding using the method of the present invention versus HTM native platform method
As can be seen from table 2, the encoding using the method of the present invention can save 53.5% of encoding time on average. The method is adopted to carry out coding relative equalization aiming at 8 stereoscopic panoramic video sequences with different scenes and different motion conditions, the effect is better particularly for outdoor stereoscopic panoramic video sequences such as 'riverside' and 'scenic _ spot', and the coding time is saved by 57.8 percent and 56.5 percent respectively.
Table 3 lists the comparison of rate-distortion performance of the stereo panoramic video sequence listed in table 1 encoded by the method of the present invention under different quality evaluation methods. In the quality evaluation, PSNR (Peak Signal-to-Noise Rate), WS-PSNR (weighted to statistical uniform PSNR), and WS-SSIM (weighted-to-statistical-structural similarity, WS-SSIM) are used as quality evaluation indexes, and Rate distortion performance indexes under each quality evaluation method are respectively calculated and are correspondingly marked as BDBRPSNR(%)、BDBRWS-PSNR(%)、BDBRWS-SSIM(%) to evaluate the performance of the process of the invention.
TABLE 3 comparison of rate-distortion performance for different quality evaluation methods for encoding using the method of the present invention
As can be seen from Table 3, the code rate increase of the method of the present invention is below 1% under 3 quality indexes PSNR, WS-PSNR and WS-SSIM, and the average is 0.4%, 0.2% and 0.0%, respectively. WS-PSNR and BDBRWS-PSNR(%) is a quality evaluation index recommended by the panoramic video proposal, and compared with the traditional PSNR and BDBR, the method can be seenPSNR(%),The evaluation index can better embody the characteristics of the stereoscopic panoramic video. The WS-SSIM is proposed based on SSIM, and structural similarity and panoramic latitude factors are considered, so that the subjective quality performance of the coding method can be better reflected. Specifically, due to different scenes and different motion conditions of each stereoscopic panoramic video sequence, the BDBR (representing the percentage of code rate saved by a better coding method under the same objective quality) changes slightly differently, wherein the code rate of the stereoscopic panoramic video sequence (indoor) with complex texture and severe motion is slightly obviously increased, and the stereoscopic panoramic video sequence (indoor) with complex texture, severe motion is good in effect, the stereoscopic panoramic video is an outdoor sequence, north and south polar regions are mostly sky and ground, the occupation ratio is high, the texture is relatively simple, the severe motion degree is slow, and the BDBR is adopted for the stereoscopic panoramic video sequenceWS-PSNR(%)、BDBRWS-SSIMThe code rate is hardly increased by the evaluation index of (%).