CN106507116B

CN106507116B - A kind of 3D-HEVC coding method predicted based on 3D conspicuousness information and View Synthesis

Info

Publication number: CN106507116B
Application number: CN201610889330.XA
Authority: CN
Inventors: 安平; 余芳; 严徐乐
Original assignee: University of Shanghai for Science and Technology
Current assignee: University of Shanghai for Science and Technology
Priority date: 2016-10-12
Filing date: 2016-10-12
Publication date: 2019-08-06
Anticipated expiration: 2036-10-12
Also published as: CN106507116A

Abstract

The present invention relates to a kind of 3D-HEVC coding methods predicted based on 3D conspicuousness information and View Synthesis, comprise the concrete steps that: 1) establishing 3D conspicuousness model, obtain conspicuousness information；2) the assigned area of present encoding block is determined；3) coding strategy in different demarcation region is determined；4) original View Synthesis prediction algorithm is improved；5) multi-view video sequences are encoded.This method utilizes human-eye visual characteristic, effectively reduces blocking artifact, while guaranteeing to encode objective quality, can save code rate, improve code efficiency.

Description

A kind of 3D-HEVC coding method predicted based on 3D conspicuousness information and View Synthesis

Technical field

The present invention relates to a kind of three-dimensional high-performance video coding (Three-dimensional high efficiency Video coding, 3D-HEVC) method, it is especially a kind of that (View is predicted based on 3D conspicuousness information and View Synthesis Synthesis prediction, VSP) 3D-HEVC coding method.

Background technique

Currently, video compression technology development is swift and violent, while also having driven the development and research of 3D video coding technique.HEVC It is current state-of-the-art coding standard, and the proposition of 3D-HEVC is intended to improve coding techniques on the basis of HEVC, it is more with Efficient Compression Viewpoint video and corresponding depth data.

In order to further increase 3D video coding performance, VSP has been added in 3D-HEVC coding framework, that is, utilizes Depth information carries out better predictive coding to texture.VSP is a kind of interview prediction coding method, and algorithm idea is to refer to In viewpoint, a prediction block is obtained using the present encoding block that synthesizes of depth information.VSP points are predicted for forward direction View Synthesis (Forward view synthesis prediction, FVSP) and backward View Synthesis predict (Backward view Synthesis prediction, BVSP), wherein FVSP is suitable for the mode of coding depth after first encoding texture, and what is utilized is Pixel all in reference picture is synthesized in virtual image by the depth map of reference view.It is deep that BVSP is suitable for first coding The mode of encoding texture after degree, what is utilized is the depth map of current view point, finds pair in present encoding block in a reference image Pixel is answered, the prediction block of current block is obtained.Since FVSP decoding complex degree is very high, 3D-HEVC is using follow-on BVSP.The texture of present encoding block reference is encoded image between different points of view, and the depth of reference is that encoded image is corresponding Depth map, it is therefore an objective to find a prediction block reduction view for predicting unit (Prediction units, the PU) block of present encoding Redundancy between point.Using the View Synthesis prediction algorithm based on maximum depth value in the platform of current HTM-15.0, that is, compare Relatively entirely encode the upper left of PU block and the depth value of bottom right, upper right and lower-left.If upper left is greater than bottom right and upper right is greater than lower-left, Then it is divided into 28 × 4 pieces for all 8 × 8 pieces, then the depth value at more each 8 × 4 pieces of 4 angles, maximum value therein is taken to make 8 × 4 pieces of depth value thus；Conversely, being then divided into 24 × 8 pieces for all 8 × 8 pieces.Then each 8 × 4 or 4 × 8 are small The parallax of maximum depth value conversion in block is stored.Since 3D-HEVC does not do deblocking filtering to depth block, so generate Depth map has obvious blocking artifact, and the maximum value in four pull angle value of sub- PU block is taken in VSP primal algorithm, if the son Block is just on the boundary of blocking artifact, then obtaining maximum depth value may be inaccurate, to influence coding result.Therefore it needs The case where innovatory algorithm is wanted to avoid sub-block from being located at blocking artifact boundary generation.

In recent years, people were more and more interested in the research of human-eye visual characteristic, and perception Video coding is increasingly by weight Depending on.Since human visual system (Human visual system, HVS) has selectivity to the perception of video scene and image, There is different sensing capabilities to different zones or different objects.In general, object or texture-rich of the human eye to motion intense Region have higher perceptual sensitivity, these regions are known as salient region.Traditional video encoding standard focus on from Redundancy is reduced to improve distortion performance, and ignores influence of the HVS perception diversity to Video coding.Therefore, the present invention mentions Coding method out based on 3D conspicuousness information.

Summary of the invention

It is a kind of based on 3D conspicuousness information and view the purpose of the present invention is proposing on the basis of current video coding technique The 3D-HEVC coding method of point synthesis prediction.This method is based on 3D conspicuousness information and improved View Synthesis prediction algorithm, Code efficiency can be effectively improved, coding quality is improved.

In order to achieve the above objectives, insight of the invention is that

3D conspicuousness model is established according to the visual characteristic of human eye first, obtains the 3D conspicuousness information of coded sequence, and Conspicuousness information is imported in original encoding mechanism；Encoding block is divided into significantly further according to the conspicuousness information of coded sequence Region, non-significant region and the type of intermediate region three, different quantization parameters is distributed according to different types of region (Quantization parameter, QP) and coding mode；Then in order to avoid in VSP coded sub-blocks appear in blocking artifact side Situation in boundary occurs, and the position at each four angles of sub-block is translated a pixel unit respectively inwards, takes four after inside contracting The maximum value at angle is as the depth value to be stored；Conspicuousness information and improved VSP algorithm is finally combined to carry out Video coding, Improve code efficiency.

According to above-mentioned design, the technical scheme is that

A kind of 3D-HEVC coding method predicted based on 3D conspicuousness information and View Synthesis, specific steps include:

1) 3D conspicuousness model is established, conspicuousness information is obtained: on the basis of existing 2D conspicuousness model, passing through increase The depth information of video image establishes the 3D conspicuousness model with depth perception information, obtains the 3D conspicuousness of encoded content Information；

2) the assigned area of present encoding block is determined: according to the 3D conspicuousness information of encoded video, by present encoding Block division of teaching contents is salient region, non-limiting region or intermediate region；

3) coding strategy in different demarcation region is determined: aobvious for difference after carrying out region division to present encoding block Work degree distributes different Q P value, and sets different coding modes；

4) original View Synthesis prediction algorithm is improved: for the boundary effect for avoiding depth block, by original VSP neutron PU block Four angles inside contract a pixel respectively, i.e., the X at sub- four angles of PU block, Y-axis translate a pixel respectively inwards, then take newly-generated The maximum depth value at four angles stored；

5) multi-view video sequences are encoded: utilizing 3D conspicuousness information and improved VSP algorithm, carry out multi-view point video sequence The coding of column.

In above-mentioned steps 1) in, first it is normalized the depth map D of original series to obtain depth weights omega, due to Depth pixel value is 0-255, numerical value is bigger represent it is closer with a distance from video camera, thereforeFollowed by existing at present 2D conspicuousness model obtain texture video notable figure S_C, texture notable figure and depth multiplied by weight are obtained into 3D notable figure S_DC, i.e. S_DC=ω * S_C；S_DCDepth information is increased on the basis of 2D texture notable figure, meets human eye vision attention machine System, has achieved the purpose that its significance of more closer intermediate object is higher.

The step 2) includes the following steps:

A) the 3D notable figure S for first obtaining step 1)_DCIt carries out binary conversion treatment and obtains S_b, i.e.,

Wherein, T is adaptive threshold, is also changed correspondingly according to the value of the different T of video content, after such binaryzation Notable figure S_bPossessing and is more clear clearly demarcated profile, the highlighted place that pixel value is 1 in image is the area-of-interest of human eye, as Element is background area for 0；

B) the significant mean value S of present encoding block is calculated_Ave=S_T/ (H*W), wherein S_TIndicate binaryzation in present encoding block The pixel value summation of notable figure, H, W respectively represent the height and width of present encoding block, S_AveAs present encoding block is averaged Significance carries out region division to current block by calculating the average significance of current block:

For intermediate region, and it is divided into two kinds of situations:

In this way according to average significance, present encoding block four kinds of different types of regions have been divided into just.

In above-mentioned steps 3) in, coding dependency using present encoding block and contiguous block time and spatially, to current Block is terminated skipped in advance, i.e. SKIP mode treatment:

A) if present encoding block is in non-limiting region, if in five adjacent blocks there are three or three SKIP selected above As optimal mode, then it is assumed that the optimum prediction mode of current block be SKIP, otherwise select SKIP and Inter 2N × 2N as Interframe candidate pattern；

B) when present encoding block is in salient region, if five adjacent blocks all select SKIP mode, then it is assumed that current block It selects SKIP as optimum prediction mode, all inter-frame modes is otherwise traversed to current block；

C) when present encoding block is in intermediate region, if there are four or four or more adjacent blocks select SKIP, then it is assumed that it is current Block optimal mode is SKIP, otherwise traverses all inter-frame forecast modes；

D) different quantization parameter QP is distributed to improve code efficiency to different zones block:

In cataloged procedure, coding unit (Coding Units, CU) usual very little of texture complex region, simple flat area Therefore suitable QP is selected for CU, will have a direct impact on coding quality and bit number with biggish CU domain (such as background)；People These regions that eye is paid special attention to are exactly salient region, therefore, the sense of coding are improved by adjusting the QP of marking area CU Know quality；Based on significant information, marking area gives lesser QP, and biggish QP, QP here are distributed in non-significant region₀Refer to flat Initial quantization parameter in platform.

In above-mentioned steps 4) in:

A) upper left of PU block and the depth value of bottom right, upper right and lower-left are relatively entirely encoded: in original coding platform, being write from memory Recognizing with flag bit vspSize indicates division size, when upper left depth is greater than bottom right and upper right depth is greater than lower-left, VspSize is 1, otherwise is 0；If vspSize=1, it is divided into two identical 8 × 4 pieces for all 8 × 8 pieces；If VspSize=0 is then divided into two identical 4 × 8 pieces for all 8 × 8 pieces；

B) in order to avoid the blocking artifact on boundary, comparing each 8 × 4 or 4 × 8 pieces of four angles: upper left, the right side in original platform Under, upper right, the depth value size of lower-left be changed to original four Angle Positions inside contracting a pixel respectively, i.e., horizontal, ordinate difference A pixel unit is translated inwards.

In above-mentioned steps 5) in, using 3D-HEVC encoded test platform, text is configured using 3view+3depth in experiment Part, input is left-in-textures of right three viewpoints and corresponding depth map carry out encoded test；8 videos are tested in experiment in total Sequence, different sequences have different coding parameters.

Compare with the existing technology of the present invention has following obvious prominent substantive distinguishing features and remarkable advantage:

(1) gained maximum depth value inaccuracy when group PU block is on blocking artifact boundary in View Synthesis prediction is avoided The case where, it is correspondingly processed to maximum depth value comparison position is chosen, the influence of blocking artifact can be effectively reduced, selection is most Information is more acurrate for the disparity vector (Disparity vector, DV) of big depth value conversion, to preferably predict current block.

(2) human-eye visual characteristic is fully taken into account, by encoding using information guiding significant in HVS, not by region distribution Code rate can be saved while guaranteeing coding objective quality with QP value, improves code efficiency.

(3) depth map information is incorporated in conspicuousness model, preferably embodies the characteristic of 3D Video coding.

Detailed description of the invention

Fig. 1 is coding method flow chart of the invention.

Fig. 2 is 3D conspicuousness method for establishing model block diagram.

Fig. 3 is the image that different sequences pass through that 3D conspicuousness model extraction goes out.

Fig. 4 contiguous block schematic diagram for the present encoding block time and spatially.

Fig. 5 is improved View Synthesis predictive coding block schematic diagram.

Specific embodiment

Details are as follows for one embodiment of the present of invention son combination attached drawing table:

Used in the embodiment of the present invention is 3D-HEVC encoded test platform HTM-15.0 version, uses 3view+ in experiment 3depth configuration file, input is left-in-textures of right three viewpoints and corresponding depth map carry out encoded test.In experiment in total 8 video sequences are tested, different sequences have different coding parameters.

Referring to Fig. 1, based on the 3D-HEVC coding method that 3D conspicuousness information and View Synthesis are predicted, first according to people The visual characteristic of eye establishes 3D conspicuousness model, obtains the 3D conspicuousness information of coded sequence, and conspicuousness information is imported original In the encoding mechanism of beginning；Further according to coded sequence conspicuousness information by encoding block be divided into marking area, non-significant region and The type of intermediate region three, different QP and coding mode are distributed according to different types of region；Then in order to avoid being compiled in VSP Numeral block appears in that blocking artifact is borderline to be happened, and the position at each four angles of sub-block is translated a pixel respectively inwards Unit takes the maximum value at four after inside contracting angles as the depth value to be stored；Finally combine conspicuousness information and improved VSP algorithm carries out Video coding, improves code efficiency.Its concrete operation step is:

Referring to fig. 2, Fig. 3, above-mentioned steps 1) in establish 3D conspicuousness model obtain conspicuousness information, refer to original sequence first The depth map D of column is normalized to obtain depth weights omega, since depth pixel value is 0-255, numerical value is bigger represent from Video camera distance is closer.ThereforeThe significant of texture video is obtained followed by existing 2D conspicuousness model at present Scheme S_C, texture notable figure and depth multiplied by weight are obtained into 3D notable figure S_DC, i.e. S_DC=ω * S_C。S_DCIn 2D texture notable figure On the basis of increase depth information, meet human eye vision attention mechanism, reached more closer intermediate its significance of object more High purpose.

Above-mentioned steps 2) in determine that the method for the assigned area of present encoding block is:

For intermediate region, and it is divided into two kinds of situations:

Referring to fig. 4, in the step 3) determine different demarcation regional code strategy method be: using present encoding block with Contiguous block time and coding dependency spatially, terminate current block in advance and are skipped, is i.e. SKIP mode treatment:

In cataloged procedure, the coding unit of texture complex region, i.e. Coding Units, the usual very little of CU, simple flat Biggish CU is generally used in region, therefore, is selected suitable QP for CU, be will have a direct impact on coding quality and bit number；Human eye is special These regions of concern are exactly salient region, therefore, the perceived quality of coding are improved by adjusting the QP of marking area CU； Based on significant information, marking area gives lesser QP, and biggish QP, QP here are distributed in non-significant region₀Refer in platform most First quantization parameter.

Referring to Fig. 5, above-mentioned steps 4) in improve the method for original View Synthesis prediction algorithm and be:

The coding parameter of the present embodiment experiment cycle tests is as shown in table 1, is configured in experiment using 3view+3depth File, input is left-in-textures of right three viewpoints and corresponding depth map carry out encoded test.

Table 1

Sequence names	Resolution ratio	Encode frame number	Coded views sequence	QP₀Texture	QP₀Depth
						Kendo	1024×768	9	3-1-5	25、30、35、40	30、35、40、45
Balloons	1024×768	9	3-1-5	25、30、35、40	30、35、40、45
						Newspaper	1024×768	9	4-2-6	25、30、35、40	30、35、40、45
Lovebird	1024×768	9	6-4-8	25、30、35、40	30、35、40、45
						UndoDancer	1920×1088	9	5-1-9	25、30、35、40	30、35、40、45
PoznanStreet	1920×1088	9	4-5-3	25、30、35、40	30、35、40、45
						Shark	1920×1088	9	5-1-9	25、30、35、40	30、35、40、45
GTFly	1920×1088	9	5-1-9	25、30、35、40	30、35、40、45

The present embodiment finally obtained notable figure of cycle tests in establishing 3D conspicuousness model process is as shown in Figure 3.It compiles The comparison of the experimental data of innovatory algorithm and primary platform is as shown in table 2 during code.

Table 2

Sequence names	Video0	Video1	Video2
				Kendo	0.00%	- 0.8%	- 0.6%
Balloons	0.00%	- 0.8%	- 1.3%
				Newspaper	0.00%	- 1.2%	- 0.7%
Lovebird	0.00%	- 0.6%	- 1.8%
				UndoDancer	0.00%	- 1.7%	- 1.6%
PoznanStreet	0.00%	- 1.8%	- 1.7%
				Shark	0.00%	- 3.8%	- 3.6%
GTFly	0.00%	- 2.2%	- 2.3%

As can be seen that the present invention in an encoding process can reasonable contemplation human-eye visual characteristic factor, guarantee objective matter Amount saves code rate and improves code efficiency.

In summary as it can be seen that the present invention considers human-eye visual characteristic comprehensively, the difference of the encoding block according to human eye concern is aobvious The quantization parameter and coding mode of work degree adjustment coding, while boundary blocking artifact when View Synthesis is predicted is avoided, guarantee coding Quality improves code efficiency.

Claims

1. a kind of 3D-HEVC coding method predicted based on 3D conspicuousness information and View Synthesis, specific steps include:

1) 3D conspicuousness model is established, conspicuousness information is obtained: on the basis of 2D conspicuousness model, by increasing video image Depth information, establish have depth perception information 3D conspicuousness model, obtain the 3D conspicuousness information of encoded content；

2) the assigned area of present encoding block is determined:, will be in present encoding block according to the 3D conspicuousness information of encoded video Appearance is divided into salient region, non-limiting region or intermediate region；

3) coding strategy in different demarcation region is determined: after carrying out region division to present encoding block, for different significances Different Q P value is distributed, and sets different coding modes；

4) it improves original View Synthesis and predicts VSP algorithm: for the blocking artifact for avoiding boundary, by four of original VSP neutron PU block Angle inside contracts a pixel respectively, i.e., the X at sub- four angles of PU block, Y-axis translate a pixel respectively inwards, then take newly-generated four The maximum depth value at a angle is stored；

5) multi-view video sequences are encoded: utilizing 3D conspicuousness information and improved VSP algorithm, carries out multi-view video sequences Coding；

In above-mentioned steps 1) in, it first is normalized the depth map D of original series to obtain depth weights omega, due to depth Pixel value is 0-255, bigger according to nearlyr its conspicuousness of adjustment of object distance, thereforeFollowed by 2D conspicuousness The notable figure S of model acquisition texture video_C, texture notable figure and depth multiplied by weight are obtained into 3D notable figure S_DC, i.e. S_DC=ω * S_C；S_DCDepth information is increased on the basis of 2D texture notable figure, meets human eye vision attention mechanism, has reached closer The higher purpose of its significance of object；

In above-mentioned steps 3) in, coding dependency using present encoding block and contiguous block time and spatially, to current block into Row terminates skip in advance, i.e. SKIP mode treatment:

A) if present encoding block is in non-limiting region, if in five adjacent blocks there are three or three SKIP conducts selected above Optimal mode, then it is assumed that the optimum prediction mode of current block is SKIP, and SKIP and Inter2N × 2N is otherwise selected to wait as interframe Lectotype；

B) when present encoding block is in salient region, if five adjacent blocks all select SKIP mode, then it is assumed that current block selection Otherwise SKIP traverses all inter-frame modes as optimum prediction mode to current block；

C) when present encoding block is in intermediate region, if there are four or four or more adjacent blocks select SKIP, then it is assumed that current block is most Good mode is SKIP, otherwise traverses all inter-frame forecast modes；

Based on significant information, marking area gives lesser QP, and biggish QP, QP here are distributed in non-significant region₀Refer in platform Initial quantization parameter.

2. the 3D-HEVC coding method according to claim 1 predicted based on 3D conspicuousness information and View Synthesis, special Sign is that the step 2) includes the following steps:

Wherein, T is adaptive threshold, is also changed correspondingly according to the value of the different T of video content, significant after such binaryzation Scheme S_bPossess and be more clear clearly demarcated profile, the highlighted place that pixel value is 1 in image is the area-of-interest of human eye, and pixel is 0 is background area；

B) the significant mean value S of present encoding block is calculated_Ave=S_T/ (H*W), wherein S_TIndicate that binaryzation is significant in present encoding block The pixel value summation of figure, H, W respectively represent the height and width of present encoding block, S_AveAs present encoding block is average significant Degree carries out region division to current block by calculating the average significance of current block:

For intermediate region, and it is divided into two kinds of situations:

3. the 3D-HEVC coding method according to claim 1 predicted based on 3D conspicuousness information and View Synthesis, special Sign is, in above-mentioned steps 4) in:

A) relatively entirely encode the upper left of PU block and the depth value of bottom right, upper right and lower-left: in original coding platform, default is used Flag bit vspSize indicates division size, and when upper left depth is greater than bottom right and upper right depth is greater than lower-left, vspSize is 1, on the contrary it is 0；If vspSize=1, it is divided into two identical 8 × 4 pieces for all 8 × 8 pieces；If vspSize=0, It is divided into two identical 4 × 8 pieces for all 8 × 8 pieces；

B) in order to avoid the blocking artifact on boundary, corresponding to each 8 × 4 or 4 × 8 pieces of four angles in original platform: upper left, bottom right, Upper right, the depth value size of lower-left are changed to original four Angle Positions inside contracting a pixel respectively, i.e., horizontal, ordinate difference is inwards Translate a pixel unit.

4. the 3D-HEVC coding method according to claim 1 predicted based on 3D conspicuousness information and View Synthesis, special Sign is, in above-mentioned steps 5) in, using 3D-HEVC encoded test platform, 3view+3depth configuration file is used in experiment, Input is left-in-textures of right three viewpoints and corresponding depth map carry out encoded test；It is carried out in experiment using 8 video sequences Test, different sequences have different coding parameters.