CN111464805B

CN111464805B - Three-dimensional panoramic video rapid coding method based on panoramic saliency

Info

Publication number: CN111464805B
Application number: CN202010175774.3A
Authority: CN
Inventors: 杜宝祯; 蒋刚毅; 郁梅; 郑雪蕾; 晁涌
Original assignee: Ningbo University
Current assignee: Beijing Yiqilin Cultural Media Co.,Ltd.; Nantian Shujin (Beijing) Information Industry Development Co.,Ltd.; Shanghai Ruishenglian Information Technology Co ltd
Priority date: 2020-03-13
Filing date: 2020-03-13
Publication date: 2021-08-10
Anticipated expiration: 2040-03-13
Also published as: CN111464805A

Abstract

The invention discloses a panoramic saliency-based stereoscopic panoramic video rapid coding method, which is used for calculating the saliency of a right viewpoint video frame in a stereoscopic panoramic video; dividing the rest maximum coding units except the maximum coding unit at the leftmost side and the topmost side in the right viewpoint video frame into a significant block and an insignificant block by combining the significance calculation result, respectively adopting two different coding fast termination modes aiming at the significant block and the insignificant block, and predicting and correcting the recursion depth interval of the current coding unit by using the optimal division depth of the adjacent block of the current coding unit in the time-space domain aiming at the insignificant block; for the significant block, judging whether the current coding unit reaches the optimal division depth or not by calculating and comparing the size relation between the root mean square error of the current coding unit and the division threshold value of the coding unit based on the panoramic perception distortion; the advantages are that the recursion complexity of the coding unit is effectively reduced and the coding time is saved.

Description

Three-dimensional panoramic video rapid coding method based on panoramic saliency

Technical Field

The invention relates to a video coding technology, in particular to a stereo panoramic video fast coding method based on panoramic saliency.

Background

In recent years, with the rapid development of multimedia networks, stereoscopic panoramic video systems are increasingly popular with people with brand new visual experiences. The stereoscopic panoramic video system can provide the information of the viewpoint of the left channel and the right channel for an observer, so that when the user watches the stereoscopic panoramic video, the depth perception information which is not contained in the single viewpoint panoramic video is increased, and the 'presence' is really realized. Therefore, the three-dimensional panoramic video system has wider application prospect in the fields of virtual reality, simulated driving and the like. However, a better visual experience comes at the cost of video data volume and processing. At present, too high complexity of the encoding time of the stereoscopic panoramic video also becomes one of the bottlenecks that restrict the massive application of the stereoscopic panoramic video. Therefore, how to effectively encode and compress the video is an urgent problem to be solved for the popularization and application of the stereoscopic panoramic video system.

Disclosure of Invention

The invention aims to solve the technical problem of providing a method for quickly encoding a three-dimensional panoramic video based on panoramic saliency, wherein the encoding time complexity is low.

The technical scheme adopted by the invention for solving the technical problems is as follows: a stereo panoramic video fast coding method based on panoramic saliency is characterized by comprising the following steps:

step 1: defining a current right viewpoint video frame to be processed except a 1 st frame in the stereoscopic panoramic video in the ERP projection format as a current frame; wherein the width of the current frame is W and the height is H;

step 2: carrying out significance calculation on the current frame to obtain a 3D-Sobel significance map of the current frame;

and step 3: defining a maximum coding unit to be processed currently in a current frame as a current maximum coding unit; wherein, the size of the current maximum coding unit is 64 × 64;

and 4, step 4: judging whether the current maximum coding unit is the uppermost or leftmost maximum coding unit in the current frame, if so, coding the current maximum coding unit by adopting a 3D-HEVC video coder, and then executing the step 11; otherwise, executing step 5;

and 5: calculating the significance strength of a region with the size of 64 multiplied by 64 corresponding to the current maximum coding unit in the 3D-Sobel significant map of the current frame, and marking the region as SI_LCU(ii) a And calculating a panorama significance threshold value of an area with the size of 64 multiplied by 64 corresponding to the current maximum coding unit in the 3D-Sobel significant image of the current frame, and recording the panorama significance threshold value as TH_S(ii) a Then judging SI_LCU≥TH_SIf yes, judging the current maximum coding unit as a significant block, redefining the current maximum coding unit as the current codingUnit, then execute step 9; if not, judging that the current maximum coding unit is an insignificant block, and then executing step 6;

step 6: let D_LCU(View) represents the optimal recursive depth mean of the coded maximum coding unit corresponding to the current maximum coding unit in the left View video frame corresponding to the current frame, let D_LCU(Col) represents the optimal recursive depth mean of the coded maximum coding unit corresponding to the current maximum coding unit in the right view video frame of the previous frame of the current frame, let D_LCU(LT) represents the optimal recursive depth mean of the coded top-left maximum coding unit of the current maximum coding unit, let D_LCU(L) represents the optimal recursive depth mean of the left-most coded unit of the current largest coded unit, let D_LCU(T) represents an optimal recursive depth mean of the coded upper-side largest coding unit of the current largest coding unit; then, the recursion depth interval of the current maximum coding unit is predicted and is marked as [ D ]_min,D_max]，

Wherein D is_minMinimum partition depth, D, representing the current maximum coding unit_maxRepresents the maximum partition depth of the current maximum coding unit, min () is the function of taking the minimum value, max () is the function of taking the maximum value, the symbol

To round down the symbol, the symbol

Is a rounded up symbol;

and 7: jumping to a quadtree structure with the current maximum coding unit as a root node and dividing the depth into D_minThe 3D-HEVC video encoder is adopted to encode all coding units in the CU layer in a depth-first traversal manner, and any coding unit in the CU layer is encodedUsing the current coding unit as the element, judging whether the maximum division depth of the current coding unit reaches D or not after the current coding unit is coded_maxOr whether the maximum division depth of the current coding unit reaches 3, if so, continuing to code the uncoded brother nodes of the current coding unit in a depth-first traversal mode until all the brother nodes of the current coding unit are coded, and then executing the step 11; if not, executing step 8;

and 8: calculating the significance strength of the region corresponding to the current coding unit in the 3D-Sobel significance map of the current frame, and recording the significance strength as SI_CUThen, SI is compared_CUAnd SI_LCUIf SI is_CU＞SI_LCUThen calculate the recursive depth interval of the current coding unit, which is marked as [ D ]_min,D′_max]，

Then order D_max＝D′_maxThen returning to the step 7 to continue the execution; if SI is_CU≤SI_LCUThen calculate the recursive depth interval of the current coding unit, which is marked as [ D ]_min,D″_max]，

Then order D_max＝D″_maxThen returning to the step 7 to continue the execution; wherein D is_CU(View) represents an optimal recursive depth mean of coded coding units corresponding to the current coding unit in the left View video frame corresponding to the current frame, D_CU(Col) represents an optimal recursive depth mean, D, of the coded coding unit corresponding to the current coding unit in the right view video frame of the previous frame of the current frame_CU(LT) represents the optimal recursive depth mean, D, of the coded upper left coding unit of the current coding unit_CU(L) represents the optimal recursive depth mean, D, of the coded left coding unit of the current coding unit_CU(T) represents an optimal recursive depth mean of the coded upper coding unit of the current coding unit;

and step 9: computing by using a BJND modelPerceptual distortion root mean square error, denoted MSE, of a preceding coding unit_Bjnd(ii) a And calculating the statistical root mean square error of the current coding unit, and recording as MSE_S，

Then, the coding unit division threshold value based on the panoramic perception distortion is obtained through calculation and is recorded as TH_split，TH_split＝η₁MSE_S+η₂MSE_Bjnd(ii) a Wherein e represents a natural base number, k is a slope, and the value is-2.3334, Q_stepRepresents a quantization step size of the current coding unit, QP represents a quantization parameter of the current coding unit,

MSE_Colrepresents the root mean square error of the coded coding unit corresponding to the current coding unit in the previous frame of the current frame,

representing the quantization step size, QP, of the coded coding unit corresponding to the current coding unit in the frame preceding the current frame_ColRepresenting the quantization parameter of the coded coding unit corresponding to the current coding unit in a frame preceding the current frame, b representing the intercept, with a value of 6.3751, nxn representing the size of the current coding unit, N having a value of 64 or 32 or 16 or 8, η₁And η₂Are all regulatory factors, η₁+η₂＝1；

Step 10: calculating the root mean square error of the current coding unit, and recording as MSE_Cur(ii) a Then compares the MSE_CurAnd TH_splitSize of (1), if MSE_Cur≤TH_splitIf the current coding unit reaches the optimal division depth, no further division is needed, a 3D-HEVC video encoder is adopted to encode the current coding unit, and then the step 11 is executed; if MSE_Cur＞TH_splitJumping to a quadtree structure with the current maximum coding unit as a root node and dividing the depth into D_minOf a 3D-HEVC video encoder, to depth all coding units in the CU layerCoding in a mode of degree-first traversal, regarding any coding unit in the CU layer as a current coding unit, then returning to the step 9 to continue execution until all sibling nodes of the current coding unit are coded completely, and then executing the step 11;

step 11: taking the next maximum coding unit to be processed in the current frame as the current maximum coding unit, then returning to the step 4 to continue executing until all the maximum coding units in the current frame are processed, and then executing the step 12;

step 12: and (3) taking the next frame of right viewpoint video frame to be processed in the three-dimensional panoramic video in the ERP projection format as a current frame, and then returning to the step (2) to continue executing until all video frames in the three-dimensional panoramic video in the ERP projection format are processed.

In the step 2, the significance of the current frame is calculated by adopting a 3D-Sobel model.

In the step 5, SI is calculated_LCUAnd calculating SI in said step 8_CUThe process is the same, and the specific process is as follows: defining the area of which the significance strength is to be calculated as the area to be processed, recording the significance strength of the area to be processed as SI,

wherein the content of the first and second substances,

which indicates the size of the area to be treated,

has a value of 64 or 32 or 16 or 8,

the coordinate position of the pixel point at the upper left corner of the region to be processed in the 3D-Sobel saliency map of the current frame is represented,

the coordinate position in the 3D-Sobel saliency map representing the current frame is

The pixel value of the pixel point of (a),

in the step 5, TH_SThe calculation process of (2) is as follows:

step 5_ 1: calculating the ERP dimension weight of each pixel point in the current frame, and recording the ERP dimension weight of the pixel point with the coordinate position (x, y) in the current frame as w_ERP(x,y)，

Wherein x is more than or equal to 0 and less than or equal to W-1, and y is more than or equal to 0 and less than or equal to H-1;

step 5_ 2: calculating the ERP dimension weight of the current maximum coding unit, and recording as w_LCU，

Wherein, N ' x N ' represents the size of the current maximum coding unit, i.e. the value of N ' is 64, (i ', j ') represents the coordinate position of the top left pixel point of the current maximum coding unit in the current frame, w_ERP(m ', n') represents ERP dimension weight of pixel point with coordinate position (m ', n') in the current frame, wherein m 'is more than or equal to 0 and less than or equal to W-1, and n' is more than or equal to 0 and less than or equal to H-1;

step 5_ 3: calculate TH_S，TH_S＝TH_E+β×(1-w_LCU) (ii) a Wherein TH is_ERepresents the significance threshold at half the height of the current frame, and β represents w_LCUThe scaling factor of (2).

In the step 9, MSE_BjndThe calculation process of (2) is as follows:

the N multiplied by N represents the size of the current coding unit, the value of N is 64 or 32 or 16 or 8, (i, j) represents the coordinate position of the upper left pixel point of the current coding unit in the current frame, BJND (m, N) represents the pixel value of the pixel point with the coordinate position (m, N) in the binocular just-noticeable distortion diagram of the current frame, m is more than or equal to 0 and less than or equal to W-1, and N is more than or equal to 0 and less than or equal to H-1.

In the step 9, MSE_ColThe calculation process of (2) is as follows:

the method comprises the steps of obtaining a current coding unit, obtaining a pixel value of a pixel point with a coordinate position (m, N) in a previous frame of a current frame, obtaining a pixel value of a pixel point with a coordinate position (m, N) in the previous frame of the current frame, obtaining a pixel value of a pixel point with a coordinate position (m, N) in a coded and reconstructed image of the previous frame of the current frame, and obtaining a pixel value of a pixel point with a coordinate position (m, N) in the previous frame of the current frame, wherein NxN represents the size of the current coding unit, N is 64 or 32 or 16 or 8, (I, j) represents the coordinate position of a pixel point in the upper left corner of the current coding unit, I (m, N) represents the pixel value of a pixel point with a coordinate position (m, N) in the previous frame of the current frame, I' (m, N) represents the pixel value of a pixel point with a pixel point (m, N) in the coded and reconstructed image in the previous frame of the current frame, m, N is greater than or equal to 0 and equal to or equal to W-1, and equal to or equal to 0.

In the step 10, MSE_CurThe calculation process of (2) is as follows:

wherein, NxN represents the size of the current coding unit, the value of N is 64 or 32 or 16 or 8, (i, j) represents the coordinate position of the upper left pixel point of the current coding unit in the current frame,

representing the pixel value of the pixel point with the coordinate position of (m, n) in the current frame,

and the pixel value of a pixel point with the coordinate position (m, n) in the coding predicted image of the current frame is represented, wherein m is more than or equal to 0 and less than or equal to W-1, and n is more than or equal to 0 and less than or equal to H-1.

Compared with the prior art, the invention has the advantages that:

the method analyzes the significance of a right viewpoint video frame in the stereoscopic panoramic video, provides a corresponding rapid termination mode respectively for a non-significant region and a significant region in the right viewpoint video frame, and predicts and corrects a recursion depth interval of a current coding unit by using the optimal division depth of adjacent blocks in a time-space domain of the current coding unit for the non-significant region; aiming at the significant region, whether the current coding unit reaches the optimal division depth is judged by calculating and comparing the size relation between the root mean square error of the current coding unit and the division threshold of the coding unit based on the panoramic perception distortion, and experimental tests show that the method can effectively reduce the recursion complexity of the coding unit and save the coding time.

Drawings

FIG. 1 is a block diagram of an overall implementation of the method of the present invention;

FIG. 2 is a flow diagram for fast termination of non-salient blocks;

fig. 3 is a flow chart for fast termination of a significant block.

Detailed Description

The invention is described in further detail below with reference to the accompanying examples.

The overall implementation block diagram of the method for rapidly encoding the stereoscopic panoramic video based on the panoramic saliency, which is provided by the invention, is shown in fig. 1, and the method comprises the following steps:

step 1: defining a right viewpoint video frame except a 1 st frame to be processed currently in a stereoscopic panoramic video in an ERP (Enterprise resource planning) projection format as a current frame; wherein the width of the current frame is W and the height is H. Here, all left view video frames and the 1 st frame right view video frame in the stereoscopic panoramic video are encoded by using the existing 3D-HEVC video encoder.

Step 2: and carrying out significance calculation on the current frame to obtain a 3D-Sobel significance map of the current frame.

In this embodiment, in step 2, a 3D-Sobel model is used to perform saliency calculation on the current frame.

And step 3: defining a maximum coding unit to be processed currently in a current frame as a current maximum coding unit; wherein the size of the current maximum coding unit is 64 × 64.

And 4, step 4: judging whether the current maximum coding unit is the uppermost or leftmost maximum coding unit in the current frame, if so, coding the current maximum coding unit by adopting a 3D-HEVC video coder, and then executing the step 11; otherwise, step 5 is executed. Here, the uppermost maximum coding unit in the current frame is the first row maximum coding unit in the current frame, and the leftmost maximum coding unit in the current frame is the first column maximum coding unit in the current frame.

And 5: calculating the significance strength of a region with the size of 64 multiplied by 64 corresponding to the current maximum coding unit in the 3D-Sobel significant map of the current frame, and marking the region as SI_LCU(ii) a And calculating a panorama significance threshold value of an area with the size of 64 multiplied by 64 corresponding to the current maximum coding unit in the 3D-Sobel significant image of the current frame, and recording the panorama significance threshold value as TH_S(ii) a Then judging SI_LCU≥TH_SIf yes, judging that the current maximum coding unit is a significant block, redefining the current maximum coding unit as the current coding unit, and executing the step 9; if not, the current maximum coding unit is determined to be an insignificant block, and then step 6 is performed.

In this embodiment, TH is performed in step 5_SThe calculation process of (2) is as follows:

Wherein x is more than or equal to 0 and less than or equal to W-1, and y is more than or equal to 0 and less than or equal to H-1.

Step 5_ 2: since the 3D-HEVC video encoder uses CU blocks in a video frame as basic coding units. For the convenience of application in coding, the ERP dimension weight of the current maximum coding unit is calculated and is marked as w_LCU，

Wherein, N ' x N ' represents the size of the current maximum coding unit, i.e. the value of N ' is 64, (i ', j ') represents the coordinate position of the top left pixel point of the current maximum coding unit in the current frame, w_ERPAnd (m ', n') represents the ERP dimension weight of the pixel point with the coordinate position (m ', n') in the current frame, wherein m 'is more than or equal to 0 and less than or equal to W-1, and n' is more than or equal to 0 and less than or equal to H-1.

Step 5_ 3: calculate TH_S，TH_S＝TH_E+β×(1-w_LCU) (ii) a Wherein TH is_ERepresenting the significance threshold at half height of the current frame, by a large number of realitiesTest to obtain TH_E0.3, beta represents w_LCUβ is 0.2 through a number of experiments.

FIG. 2 presents a flow diagram for fast termination of non-salient blocks; fig. 3 presents a flow chart for fast termination of a significant block.

To round down the symbol, the symbol

Is a rounded up symbol; here, the coded upper left-most coding unit of the current maximum coding unit is the coded nearest neighbor maximum coding unit located at the upper left side of the current maximum coding unitAnd the encoded left-side maximum coding unit of the current maximum coding unit is the encoded nearest neighbor maximum coding unit positioned on the left side of the current maximum coding unit, and the encoded upper-side maximum coding unit of the current maximum coding unit is the encoded nearest neighbor maximum coding unit positioned on the upper side of the current maximum coding unit.

And 7: jumping to a quadtree structure with the current maximum coding unit as a root node and dividing the depth into D_minThe coding unit layer(s) of (1) is (are) coded by a 3D-HEVC video coder in a depth-first traversal manner for all coding units in the CU layer, any coding unit in the CU layer is taken as a current coding unit, and after the current coding unit is coded, whether the maximum division depth of the current coding unit reaches D is judged first_maxOr whether the maximum division depth of the current coding unit reaches 3, if so, continuing to code the uncoded brother nodes of the current coding unit in a depth-first traversal mode until all the brother nodes of the current coding unit are coded, and then executing the step 11; if not, step 8 is performed.

And 8: calculating the significance strength of the region corresponding to the current coding unit in the 3D-Sobel significance map of the current frame, and recording the significance strength as SI_CUThen, SI is compared_CUAnd SI_LCUIf SI is_CU＞SI_LCUAnd if the current coding unit is judged to be an insignificant block, the current maximum coding unit is located at a position where a significant region and an insignificant region in the current frame are bordered, and the current coding unit is located in a region with higher significance, so that to avoid quality degradation caused by too small coding depth, it is necessary to update the D according to the optimal division depth of the adjacent blocks in the time-space domain of the current coding unit_maxThus, the recursive depth interval of the current coding unit is calculated, denoted as [ D ]_min,D′_max]，

Then order D_max＝D′_maxThen returning to the step 7 to continue the execution; if SI is_CU≤SI_LCUThen, it means that the significance strength of the current coding unit is consistent with or less than that of the current maximum coding unit, the current coding unit is inside the current maximum coding unit, the current coding unit is in a region where the significance is consistent with or less than the whole, and therefore the recursion depth interval of the current coding unit is calculated and is marked as [ D ]_min,D″_max]，

Then order D_max＝D″_maxThen returning to the step 7 to continue the execution; wherein D is_CU(View) represents an optimal recursive depth mean of coded coding units corresponding to the current coding unit in the left View video frame corresponding to the current frame, D_CU(Col) represents an optimal recursive depth mean, D, of the coded coding unit corresponding to the current coding unit in the right view video frame of the previous frame of the current frame_CU(LT) represents the optimal recursive depth mean, D, of the coded upper left coding unit of the current coding unit_CU(L) represents the optimal recursive depth mean, D, of the coded left coding unit of the current coding unit_CU(T) represents an optimal recursive depth mean of the coded upper coding unit of the current coding unit; here, the coded upper-left coding unit of the current coding unit is a coded nearest neighbor coding unit located on the upper left side of the current coding unit, the coded left coding unit of the current coding unit is a coded nearest neighbor coding unit located on the left side of the current coding unit, and the coded upper-side coding unit of the current coding unit is a coded nearest neighbor coding unit located on the upper side of the current coding unit.

And step 9: the current maximum coding unit is a significant block, which indicates that the current maximum coding unit is in a texture dense area, and the perceptual Distortion root mean square error of the current coding unit is calculated by using the existing classical BJND (Binocular Just Noticeable Distortion) model and is recorded as MSE_Bjnd(ii) a And calculating the statistical root mean square error of the current coding unit, and recording as MSE_S，

representing the quantization step size, QP, of the coded coding unit corresponding to the current coding unit in the frame preceding the current frame_ColRepresenting the quantization parameter of the coded coding unit corresponding to the current coding unit in a frame preceding the current frame, b representing the intercept, with a value of 6.3751, nxn representing the size of the current coding unit, N having a value of 64 or 32 or 16 or 8, η₁And η₂Are all regulatory factors, η₁+η₂1 in this example, η₁＝η₂＝0.5。

In this embodiment, step 9, MSE_BjndThe calculation process of (2) is as follows:

the N multiplied by N represents the size of the current coding unit, the value of N is 64 or 32 or 16 or 8, (i, j) represents the coordinate position of the upper left pixel point of the current coding unit in the current frame, BJND (m, N) represents the pixel value of the pixel point with the coordinate position of (m, N) in the binocular just-noticeable distortion diagram (namely BJND diagram) of the current frame, m is more than or equal to 0 and less than or equal to W-1, and N is more than or equal to 0 and less than or equal to H-1.

In this embodiment, step 9, MSE_ColThe calculation process of (2) is as follows:

Step 10: calculating the root mean square error of the current coding unit, and recording as MSE_Cur(ii) a Then compares the MSE_CurAnd TH_splitSize of (1), if MSE_Cur≤TH_splitIf the current coding unit reaches the optimal division depth, no further division is needed, a 3D-HEVC video encoder is adopted to encode the current coding unit, and then the step 11 is executed; if MSE_Cur＞TH_splitJumping to a quadtree structure with the current maximum coding unit as a root node and dividing the depth into D_minThe 3D-HEVC video encoder is adopted to encode all coding units in the CU layer in a depth-first traversal manner, and any coding unit in the CU layer is taken as a current coding unit, and then the process returns to step 9 to continue to be executed until all sibling nodes of the current coding unit are encoded, and then step 11 is executed.

In this embodiment, step 10, MSE_CurThe calculation process of (2) is as follows:

Step 11: and taking the next maximum coding unit to be processed in the current frame as the current maximum coding unit, then returning to the step 4 to continue executing until all the maximum coding units in the current frame are processed, and then executing the step 12.

In this embodiment, SI is calculated in step 5_LCUAnd calculating SI in step 8_CUThe process is the same, and the specific process is as follows: defining the area of which the significance strength is to be calculated as the area to be processed, recording the significance strength of the area to be processed as SI,

wherein the content of the first and second substances,

which indicates the size of the area to be treated,

has a value of 64 or 32 or 16 or 8,

The pixel value of the pixel point of (a),

to further illustrate the performance of the method of the present invention, the method of the present invention was tested.

In order to evaluate the effectiveness of the method, an HTM14.1 is selected as a test model, a 64-bit WIN7 operating system with a CPU Intel (R) core (TM) i3, a main frequency of 2.4GHz and a memory of 4G is configured in hardware, and a development tool selects VS 2013. Selecting stereo panoramic video sequences of 'chat', 'experience', 'photomraph', 'riverside', 'scientific _ spot', 'sign _ in', 'tourrist' and 'traffic' as standard test sequences, wherein the test frames are 100 frames, the coding structure is in an HBP random access mode, the GOP length of a group of pictures is 8, and the period of an I frame is 24. The initial quantization parameters QP of the independent viewpoints are 22, 27, 32 and 37 respectively, and the method is tested on the dependent viewpoints.

Table 1 lists specific information of "chat", "experience", "photograph", "riverside", "scientific _ spot", "sign _ in", "tourrist", and "traffic" stereoscopic panoramic video sequences.

TABLE 1 associated parameter information for stereoscopic panoramic video sequences

Table 2 shows the savings in coding time when the method of the present invention is used to code the stereoscopic panoramic video sequence listed in table 1, compared to the HTM original platform method. The time saving rate of coding by adopting the method of the invention compared with the coding by adopting the HTM original platform method is defined as delta T_PRO-CU，ΔT_PRO-CU＝(T_Org-T_PRO-CU)/T_Org×100[％]Wherein, T_PRO-CURepresenting the coding time, T, of the coding by the method of the invention_OrgRepresenting the encoding time for encoding using the HTM native platform method.

TABLE 2 comparison of time savings for encoding using the method of the present invention versus HTM native platform method

As can be seen from table 2, the encoding using the method of the present invention can save 53.5% of encoding time on average. The method is adopted to carry out coding relative equalization aiming at 8 stereoscopic panoramic video sequences with different scenes and different motion conditions, the effect is better particularly for outdoor stereoscopic panoramic video sequences such as 'riverside' and 'scenic _ spot', and the coding time is saved by 57.8 percent and 56.5 percent respectively.

Table 3 lists the comparison of rate-distortion performance of the stereo panoramic video sequence listed in table 1 encoded by the method of the present invention under different quality evaluation methods. In the quality evaluation, PSNR (Peak Signal-to-Noise Rate), WS-PSNR (weighted to statistical uniform PSNR), and WS-SSIM (weighted-to-statistical-structural similarity, WS-SSIM) are used as quality evaluation indexes, and Rate distortion performance indexes under each quality evaluation method are respectively calculated and are correspondingly marked as BDBR_PSNR(％)、BDBR_WS-PSNR(％)、BDBR_WS-SSIM(%) to evaluate the performance of the process of the invention.

TABLE 3 comparison of rate-distortion performance for different quality evaluation methods for encoding using the method of the present invention

As can be seen from Table 3, the code rate increase of the method of the present invention is below 1% under 3 quality indexes PSNR, WS-PSNR and WS-SSIM, and the average is 0.4%, 0.2% and 0.0%, respectively. WS-PSNR and BDBR_WS-PSNR(%) is a quality evaluation index recommended by the panoramic video proposal, and compared with the traditional PSNR and BDBR, the method can be seen_PSNR(％)，The evaluation index can better embody the characteristics of the stereoscopic panoramic video. The WS-SSIM is proposed based on SSIM, and structural similarity and panoramic latitude factors are considered, so that the subjective quality performance of the coding method can be better reflected. Specifically, due to different scenes and different motion conditions of each stereoscopic panoramic video sequence, the BDBR (representing the percentage of code rate saved by a better coding method under the same objective quality) changes slightly differently, wherein the code rate of the stereoscopic panoramic video sequence (indoor) with complex texture and severe motion is slightly obviously increased, and the stereoscopic panoramic video sequence (indoor) with complex texture, severe motion is good in effect, the stereoscopic panoramic video is an outdoor sequence, north and south polar regions are mostly sky and ground, the occupation ratio is high, the texture is relatively simple, the severe motion degree is slow, and the BDBR is adopted for the stereoscopic panoramic video sequence_WS-PSNR(％)、BDBR_WS-SSIMThe code rate is hardly increased by the evaluation index of (%).

Claims

1. A stereo panoramic video fast coding method based on panoramic saliency is characterized by comprising the following steps:

and 5: calculating the display of a region of size 64 x 64 corresponding to the current maximum coding unit in the 3D-Sobel saliency map of the current frameRemarkable strength, recorded as SI_LCU(ii) a And calculating a panorama significance threshold value of an area with the size of 64 multiplied by 64 corresponding to the current maximum coding unit in the 3D-Sobel significant image of the current frame, and recording the panorama significance threshold value as TH_S(ii) a Then judging SI_LCU≥TH_SIf yes, judging that the current maximum coding unit is a significant block, redefining the current maximum coding unit as the current coding unit, and executing the step 9; if not, judging that the current maximum coding unit is an insignificant block, and then executing step 6;

To round down the symbol, the symbol

Is a rounded up symbol;

and 7: jumping to a quadtree structure with the current maximum coding unit as a root node and dividing the depth into D_minThe method comprises the steps of coding all coding units in a CU layer in a depth-first traversal mode by adopting a 3D-HEVC video coder, regarding any coding unit in the CU layer as a current coding unit, and judging whether the maximum division depth of the current coding unit reaches D or not after the current coding unit is coded_maxOr whether the maximum division depth of the current coding unit reaches 3, if so, continuing to code the uncoded brother nodes of the current coding unit in a depth-first traversal mode until all the brother nodes of the current coding unit are coded, and then executing the step 11; if not, executing step 8;

Then order D_max＝D″_maxThen returning to the step 7 to continue the execution; wherein D is_CU(View) represents an optimal recursive depth mean of coded coding units corresponding to the current coding unit in the left View video frame corresponding to the current frame, D_CU(Col) represents a unit coded in a right view video frame previous to the current frame and the current coding unitOptimal recursive depth mean, D, of the corresponding coded coding unit_CU(LT) represents the optimal recursive depth mean, D, of the coded upper left coding unit of the current coding unit_CU(L) represents the optimal recursive depth mean, D, of the coded left coding unit of the current coding unit_CU(T) represents an optimal recursive depth mean of the coded upper coding unit of the current coding unit;

and step 9: calculating the root mean square error of the perceptual distortion of the current coding unit by using a BJND model, and recording the root mean square error as MSE_Bjnd(ii) a And calculating the statistical root mean square error of the current coding unit, and recording as MSE_S，

Step 10: calculating the root mean square error of the current coding unit, and recording as MSE_Cur(ii) a Then compares the MSE_CurAnd TH_splitSize of (1), if MSE_Cur≤TH_splitIf the current coding unit reaches the optimal division depth, no further division is needed, a 3D-HEVC video encoder is adopted to encode the current coding unit, and then the step 11 is executed; if MSE_Cur＞TH_splitJumping to a quadtree structure with the current maximum coding unit as a root node and dividing the depth into D_minThe 3D-HEVC video encoder is adopted to encode all coding units in the CU layer in a depth-first traversal mode, any coding unit in the CU layer is taken as a current coding unit, then the step 9 is returned to continue to be executed until all brother node encoding of the current coding unit is completed, and then the step 11 is executed;

2. The method according to claim 1, wherein in step 2, a 3D-Sobel model is used to perform saliency calculation on the current frame.

3. The method as claimed in claim 1 or 2, wherein the SI is calculated in step 5_LCUAnd calculating SI in said step 8_CUThe process is the same, and the specific process is as follows: defining the area of which the significance strength is to be calculated as the area to be processed, recording the significance strength of the area to be processed as SI,

wherein the content of the first and second substances,

which indicates the size of the area to be treated,

has a value of 64 or 32 or 16 or 8,

The pixel value of the pixel point of (a),

4. the method as claimed in claim 3, wherein TH in step 5 is TH_SThe calculation process of (2) is as follows:

5. The method as claimed in claim 4, wherein the MSE in step 9 is implemented by using a stereo-panorama saliency-based stereo-panorama video fast coding method_BjndThe calculation process of (2) is as follows:

6. The method as claimed in claim 5, wherein the MSE in step 9 is implemented by using a stereo-panorama saliency-based stereo-panorama video coding method_ColThe calculation process of (2) is as follows:

7. The method as claimed in claim 6, wherein in step 10, MSE is used for fast coding of stereoscopic panoramic video based on panoramic saliency_CurThe calculation process of (2) is as follows: