CN101729891A

CN101729891A - Method for encoding multi-view depth video

Info

Publication number: CN101729891A
Application number: CN 200910154138
Authority: CN
Inventors: 彭宗举; 蒋刚毅; 郁梅
Original assignee: Ningbo University
Current assignee: Ningbo University
Priority date: 2009-11-05
Filing date: 2009-11-05
Publication date: 2010-06-09
Anticipated expiration: 2029-11-05
Also published as: CN101729891B

Abstract

The invention discloses a method for encoding multi-view depth video, wherein a B frame image is divided into a depth continuous area and a depth discontinuous area according to deviation factors by computing each macro block of all B frames in a multi-view depth video. For each macro block in the depth continuous area, an encoder traverses fewer macro block encoding modes by, so that the search range of the macro block encoding modes of each macro block in the depth continuous area in the multi-view depth video is reduced greatly, which reduces the computational complexity of multi-view depth video encoding, and effectively improves the encoding speed of the multi-view depth video encoding; for each macro block in the depth discontinuous area which has great influence on drawing a virtual view image, the encoder traverses all macro block encoding modes so that each macro block is protected by adopting the method for searching all macro block encoding modes. Therefore, the method of the invention improves the encoding speed and ensures the quality of drawing the virtual view image at the same time.

Description

A kind of coding method of multi-view depth video

Technical field

The present invention relates to a kind of Video signal processing method, especially relate to a kind of coding method of multi-view depth video.

Background technology

Along with the fast development of information acquiring technology, the communication technology, the information processing technology, make people can utilize the free viewpoint video system freely to select viewpoint to appreciate scene and become a reality gradually.The free viewpoint video system has broad application prospects in fields such as video display amusement, education, product introduction, medical treatment and security monitorings.Free viewpoint video is to catch by multicamera system, each viewpoint is accepted the video from the specific camera point of Same Scene independently, and generating any viewpoint video by the drawing virtual view image technology, people just can enjoy the visual point image of optional position like this.

Drawing virtual view image is one of key technology of free viewpoint video.Traditional virtual visual point image is drawn by adopting the two-way colour-video signal, and the method for drafting of this virtual visual point image need be at transmission over networks two-way colour-video signal.In order to reduce transmitted data amount, the method for drafting based on depth image of main flow adopts the coloured image of reference view and the pairing depth image of coloured image of this viewpoint to draw the generation virtual visual point image at present, this method only need be at the colour-video signal and the corresponding deep video signal of a viewpoint of transmission over networks, because the deep video signal is a monochrome video, and it is more level and smooth, so only need the quantity of colour-video signal about 20% just can obtain coding quality preferably, can alleviate the networking transmission pressure like this.But, single channel colour-video signal and corresponding deep video signal freely switch viewpoint because only allowing to carry out interactively in small range, (Multi-view Viewplus Depth MVD) expands the video switch scope of free viewpoint video so the JVT of International Standards Organization (JointVideo Team) has proposed employing multichannel colour-video signal and corresponding multichannel deep video signal.

Fig. 1 has provided the basic structure block diagram of the free viewpoint video system that realizes with multichannel colour-video signal and corresponding multichannel deep video signal, comprising links such as collection, preliminary treatment, coding, transmission, decoding, drawing virtual view image and demonstrations.In Fig. 1, the color video of three viewpoints and corresponding deep video just can be drawn out the video image of any viewpoint by Network Transmission, decoding.Based on the method for drafting of depth image because the depth information of scene is incorporated in the drawing virtual view image, thereby significantly reduced the number of the required reference view of drawing virtual view image, be more suitable for expression like this and realize the free viewpoint video system.

In many viewpoints color video and corresponding deep video signal, what deep video was represented is the range information of corresponding color video scene to the video camera imaging plane, and it quantizes to actual distance value [0,255].Deep video is a very important supplementary in the multi-view video system, can obtain by degree of depth camera, also can calculate by the estimation of Depth program.Because degree of depth camera costs an arm and a leg, the deep video that is used at present test obtains by estimation of Depth software mostly.In order to apply and to reduce cost, the depth information that is used for drawing virtual view image is not suitable for producing by estimation of Depth at receiving terminal, need be at transmitting terminal by gathering or estimating that coding sends receiving terminal to then, therefore, the compressed encoding of deep video is very crucial concerning the free viewpoint video system.

At present, adopted hierarchical B-frame (HBP, Hierarchical B Pictures) forecast coding structure in the multi-view depth video signal encoding compression method in the free viewpoint video system framework.HBP eliminates the temporal redundancy except adopting estimation, has also adopted disparity estimation to eliminate correlation spatial redundancy between viewpoint, can compress preferably the multi-view depth video signal.The HBP forecast coding structure as shown in Figure 2, GOP among Fig. 2 (group of pictures, image sets) length is 8, S ₀～S ₇Represent 8 viewpoints, T ₀～T ₇Be 8 moment of GOP, arrow is represented the interframe referring-to relation, as frame S ₁T ₄With reference to frame S ₁T ₀, frame S ₁T ₈, frame S ₀T ₄With frame S ₂T ₄, frame S wherein ₁T ₀With frame S ₁T ₈Be the time reference frame, frame S ₀T ₄With frame S ₂T ₄Be the viewpoint reference frame.

In order to study multiple view video coding, JVT has proposed based on the multiple view video coding verification model JMVM| of coding standard (Joint Multi-view Video Model) H.264.The reference software of this verification model both can be used for the coding of many viewpoints color video, also can be used for the coding of multi-view depth video.In JMVM, for each macro block, the percent of pass distortion optimization technology is asked for the macro-block coding pattern of the macro-block coding pattern of rate distortion costs minimum as the current macro block of handling, to seek the balance of lower code check and better image quality.If the current macro block place frame of handling is the I frame, then encoder needs Searching I ntra16 * 16, Intra8 * 8 and Intra4 * 4 patterns, finds macro-block coding pattern with the rate distortion costs minimum optimum macro-block coding pattern as this macro block from these three kinds of macro-block coding patterns; If the current macro block place frame of handling is P frame or B frame, then encoder will be searched under the situation of a plurality of reference frames and also need search for SKIP, Inter16 * 16, Inter16 * 8, Inter8 * 16, Inter8 * 8, Inter8 * 8Frext, Intra16 * 16, Intra8 * 8 and Intra4 * 4 patterns successively, find macro-block coding pattern with the rate distortion costs minimum optimum macro-block coding pattern as this macro block from these macro-block coding patterns.Rate distortion costs is by J (s, c, MODE| λ _MODE)=SSD (s, c, MODE|QP)+λ _MODER(s, c MODE) calculate.Wherein, MODE represents a kind of coding mode of current macro, J (s, c, MODE| λ _MODE) representing the rate distortion costs under the MODE pattern, s is original vision signal, c is the reconstructed video signal behind the employing MODE pattern-coding, λ _MODEBe Lagrange multiplier, (s, c MODE) are used for coded macroblocks header, difference vector information and all DCT (Discrete Cosine Transform to R under the expression MODE pattern, discrete cosine transform) all number of coded bits of coefficient, (s, c MODE|QP) are original vision signal and squared differences between the reconstructed video signal and (SSD to SSD, Sum ofSquare Difference), SSD (pass through for s, c by value MODE|QP)

Calculate, B1, B2 are respectively the level and the Vertical number of pixels of current macro, can value 16,8 or 4, and l=(l _x, l _y) ^TThe expression difference vector, QP is for quantizing parameter, the coordinate of [i, j] remarked pixel, s[i, j] represent that coordinate is the pixel value of the pixel of [i, j] in the original vision signal, c[i, j] represent that coordinate is the pixel value of the pixel of [i, j] in the reconstructed video signal.In the JMVM cataloged procedure, adopted all macro-block coding patterns have been searched for to obtain optimum macro-block coding pattern, this method can obtain higher reconstructed image quality, but the method for the full search of macro-block coding pattern causes computation complexity very high.

Summary of the invention

Technical problem to be solved by this invention provides a kind of under the prerequisite of the quality that guarantees to utilize multi-view depth video to draw the virtual view video that obtains, can reduce multi-view depth video calculation of coding complexity, improve the multi-view depth video coding method of the coding rate of multi-view depth video coding.

The present invention solves the problems of the technologies described above the technical scheme that is adopted: a kind of coding method of multi-view depth video, this coding method is on the multiple view video coding verification model JMVM based on coding standard H.264, the hierarchical B-frame coding predict that adopts JVT to recommend is encoded to multi-view depth video, adopts following steps to encode to all the B frames in the multi-view depth video in cataloged procedure:

1), the current B frame of handling of definition is current B frame, the resolution of current B frame is M * N, for current B frame with 16 * 16 macro blocks be unit from left to right, successively each 16 * 16 macro block is handled from top to bottom;

2), with 16 * 16 macro blocks in the upper left corner in the current B frame as macro block when pre-treatment;

3), the current macro block of handling of definition is current macro, calculates the discrepancy factor of current macro, is designated as δ,

Wherein, (x, y) the expression current macro is the coordinate position of unit with 16 * 16 macro blocks in current B frame,

(i, the j) coordinate position of the pixel of expression in the current macro, r (i, j) in the expression current macro coordinate be (i, the brightness value of pixel j), g (x y) represents the mean value of the brightness value sum of all pixels in the current macro,

4), judge that whether the discrepancy factor δ of current macro is less than setting threshold, if, then current macro is labeled as the macro block of degree of depth continuum, encoder adopts the existing search of rate-distortion optimization technology SKIP, Intra8 * 8, Intra4 * 4 and Inter16 * 16 macro-block coding patterns H.264, the macro-block coding pattern of elimination factor distortion cost minimum is as the optimum macro-block coding pattern of current macro from these macro-block coding patterns, and execution in step 6 then); Otherwise, continue to carry out;

5), current macro is labeled as the macro block of degree of depth discontinuity zone, encoder adopts existing rate-distortion optimization technology search SKIP, Inter16 * 16, Inter16 * 8, Inter8 * 16, Inter8 * 8, Inter8 * 8Frext, Intra16 * 16, Intra8 * 8 and Intra4 * 4 macro-block coding patterns H.264, and the macro-block coding pattern of elimination factor distortion cost minimum is as the optimum macro-block coding pattern of current macro from these macro-block coding patterns;

6), encoder utilizes the optimum macro-block coding pattern of current macro to encoding when macro block;

7) 16 * 16, that the next one is pending macro blocks are as current macro, repeated execution of steps 3)～7), all 16 * 16 macroblock codings in current B frame finish, constitute the degree of depth continuum of current B frame by all macro blocks that are labeled as degree of depth continuum, constitute the degree of depth discontinuity zone of current B frame by all macro blocks that are labeled as degree of depth discontinuity zone.

Described setting threshold is 30.

Compared with prior art; the invention has the advantages that by calculating the discrepancy factor of each macro block of all B frames in the multi-view depth video; according to discrepancy factor the B two field picture is divided into degree of depth continuum and degree of depth discontinuity zone; for each macro block in the degree of depth continuum; the macro-block coding pattern that the encoder traversal is less; reduced so widely that in the degree of depth continuum hunting zone of the macro-block coding pattern of each macro block in the multi-view depth video; reduce multi-view depth video calculation of coding complexity thereby reach; effectively improved the coding rate of multi-view depth video coding; and for each macro block that drawing virtual view image is influenced in the bigger degree of depth discontinuity zone; encoder travels through all macro-block coding patterns; owing to the method that adopts all macro-block coding patterns all to search for is protected, so the inventive method has guaranteed the quality of drawing virtual view image when improving coding rate.

Description of drawings

Fig. 1 is the basic structure block diagram of typical free viewpoint video system;

Fig. 2 is the schematic diagram of the hierarchical B-frame coding predict of JVT recommendation;

Fig. 3 is the coordinate position of macro block in the current B frame and the coordinate position schematic diagram of the pixel in this macro block;

The FB(flow block) of Fig. 4 for current B frame is encoded;

Fig. 5 is the search procedure schematic diagram of traditional multi-view depth video macro-block coding pattern;

Fig. 6 is the S of " Ballet " multi-view depth video ₀T ₈The distribution schematic diagram of the optimum macro-block coding pattern in the frame;

Fig. 7 a is the Ss of various macro-block coding patterns at " Ballet " ₀T ₈Distribution proportion schematic diagram in the degree of depth continuum of frame;

Fig. 7 b is the Ss of various macro-block coding patterns at " Ballet " ₀T ₈Distribution proportion schematic diagram in the degree of depth discontinuity zone of frame;

Fig. 8 be in the degree of depth discontinuity zone depth value distortion to the schematic diagram that influences of drawing virtual view image;

Fig. 9 is the S of " Leave Laptop " multi-view depth video ₇T ₈The discrepancy factor δ value of each macro block in the frame;

Figure 10 a is the S of " Leave Laptop " multi-view depth video ₇T ₈The degree of depth continuum in the frame and the schematic diagram of degree of depth discontinuity zone;

Figure 10 b is " Ballet " multi-view depth video S ₀T ₈The degree of depth continuum in the frame and the schematic diagram of degree of depth discontinuity zone;

Figure 11 a is the S of " Leave Laptop " multi-view depth video ₈T ₈Original image;

Figure 11 b is the S of " the Leave Laptop " of drafting after by the JMVM method multi-view depth video being encoded ₈T ₈Virtual visual point image;

Figure 11 c is the S of " the Leave Laptop " of drafting after by coding method of the present invention multi-view depth video being encoded ₈T ₈Virtual visual point image.

Embodiment

Embodiment describes in further detail the present invention below in conjunction with accompanying drawing.

A kind of coding method of multi-view depth video, this coding method is based on the multiple view video coding verification model JMVM of coding standard (Joint Multi-view Video Model) H.264, adopt JVT (Joint Video Team, joint video expert group) hierarchical B-frame (HBP of Tui Jianing, Hierarchical B Pictures) the coded prediction structure is encoded to multi-view depth video, and the hierarchical B-frame coding predict as shown in Figure 2.

In cataloged procedure to all the I frames in the multi-view depth video, with 16 * 16 macro blocks be unit from left to right, successively each 16 * 16 macro block is handled from top to bottom, for each 16 * 16 macro block, encoder searches Intra16 * 16, Intra8 * 8 and Intra4 * 4 patterns find the macro-block coding pattern with rate distortion costs minimum as each 16 * 16 macro block optimum macro-block coding pattern separately from these three kinds of macro-block coding patterns; To all the P frames in the multi-view depth video, with 16 * 16 macro blocks be unit from left to right, successively each 16 * 16 macro block is handled from top to bottom, for each 16 * 16 macro block, search for SKIP under the situation of a plurality of reference frames of encoder searches successively, Inter16 * 16, Inter16 * 8, Inter8 * 16, Inter8 * 8, Inter8 * 8Frext, Intra16 * 16, Intra8 * 8 and Intra4 * 4 patterns find the macro-block coding pattern with rate distortion costs minimum as each 16 * 16 macro block optimum macro-block coding pattern separately from these macro-block coding patterns; And adopt following steps to encode to all the B frames in the multi-view depth video, to the cataloged procedure of the current B frame of handling as shown in Figure 4.

1), the current B frame of handling of definition is current B frame, the resolution of current B frame is M * N, for current B frame with 16 * 16 macro blocks be unit from left to right, successively each 16 * 16 macro block is handled from top to bottom.

2), with 16 * 16 macro blocks in the upper left corner in the current B frame as macro block when pre-treatment.

(i, j) coordinate position of the pixel in the expression current macro, Fig. 3 has provided the coordinate position of a certain macro block in the current B frame and the coordinate position of the pixel in this macro block, r (i, j) coordinate is (i, the brightness value of pixel j), g (x in the expression current macro, y) mean value of the brightness value sum of all pixels in the expression current macro

4), judge that whether the discrepancy factor δ of current macro is less than setting threshold, if, then current macro is labeled as the macro block of degree of depth continuum, encoder adopts the existing search of rate-distortion optimization technology SKIP, Intra8 * 8, Intra4 * 4 and Inter16 * 16 macro-block coding patterns H.264, the macro-block coding pattern of elimination factor distortion cost minimum is as the optimum macro-block coding pattern of current macro from these macro-block coding patterns, and execution in step 6 then); Otherwise, continue to carry out.

5), current macro is labeled as the macro block of degree of depth discontinuity zone, encoder adopts existing rate-distortion optimization technology search SKIP, Inter16 * 16, Inter16 * 8, Inter8 * 16, Inter8 * 8, Inter8 * 8Frext, Intra16 * 16, Intra8 * 8 and Intra4 * 4 macro-block coding patterns H.264, and the macro-block coding pattern of elimination factor distortion cost minimum is as the optimum macro-block coding pattern of current macro from these macro-block coding patterns.

6), encoder utilizes the optimum macro-block coding pattern of current macro to encoding when macro block.

Fig. 5 has provided the search procedure of traditional multi-view depth video macro-block coding pattern, the vertical line that every row pixel is formed among Fig. 5 is the macro-block coding pattern search procedure of a macro block, wherein, black part corresponding the search procedure of optimum macro-block coding pattern, use this way of search, will cause existing in the deep video cataloged procedure a large amount of calculating redundancies.Fig. 6 has provided the S of cycle tests " Ballet " multi-view depth video ₀T ₈The distribution of the optimum macro-block coding pattern in the frame, the optimum macro-block coding pattern of the pairing macro block of solid white line frame is the SKIP pattern among Fig. 6, the optimum macro-block coding pattern of the pairing macro block of dotted border is the Inter pattern, and the optimum macro-block coding pattern of the pairing macro block of solid black lines frame is the Intra pattern.Various macro-block coding patterns are at the S of cycle tests " Ballet " multi-view depth video ₀T ₈Distribution proportion in the degree of depth continuum of frame is shown in Fig. 7 a, and various macro-block coding patterns are at the S of cycle tests " Ballet " multi-view depth video ₀T ₈Distribution proportion in the degree of depth discontinuity zone of frame is shown in Fig. 7 b, from Fig. 7 a and 7b as can be known, at degree of depth discontinuity zone, various macro-block coding pattern proportions are comparatively balanced, and in degree of depth continuum, SKIP, Intra and Inter16 * 16 macro-block coding patterns have accounted for most ratios.In addition, because in multi-view depth video, degree of depth discontinuity zone is the border of object often, its to the influence of drawing virtual view image greater than of the influence of degree of depth continuum to drawing virtual view image.Fig. 8 has provided in the multi-view depth video in the degree of depth discontinuity zone depth value distortion to the influence of drawing virtual view image, as can be seen from Figure 8, the quality of drawing virtual view image descends fast along with the increasing of distortion, therefore the inventive method is for addressing this problem, adopt different macro-block coding pattern hunting zones at each 16 * 16 macro block in degree of depth continuum and the degree of depth discontinuity zone, under the prerequisite that has guaranteed the drawing virtual view image quality, greatly reduce multi-view depth video calculation of coding complexity, effectively improved the coding rate of multi-view depth video.

In this specific embodiment, setting threshold is 30, and in fact the size of setting threshold is to determine by the statistical analysis to the discrepancy factor δ of each macro block in a plurality of multi-view depth video cycle testss.Fig. 9 is the S of cycle tests " LeaveLaptop " multi-view depth video ₇T ₈The discrepancy factor δ value of each macro block in the frame, as can be seen from Figure 9, discrepancy factor δ value can reflect each contours of objects in the deep video image.The macro block that Figure 10 a medium square shows is the S of cycle tests " Leave Laptop " ₇T ₈The macro block of δ in the frame＞30 is the macro block in the degree of depth discontinuity zone, and all the other are the macro block of δ＜30, is the macro block in the degree of depth continuum; The macro block that Figure 10 b medium square shows is the S of cycle tests " Ballet " ₇T ₈The macro block of δ in the frame＞30 is the macro block in the degree of depth discontinuity zone, and all the other are the macro block of δ＜30, is the macro block in the degree of depth continuum.The zone that bigger variation takes place the degree of depth in the deep video has been reacted in the zone that macro block forms of these δ＞30, and the macro-block coding pattern comparatively complicated zone that distributes often, these zones, these zones have the greatest impact to drawing virtual view image.Analyze to determine that threshold value is set to can preferably the B two field picture in the multi-view depth video be divided into two zones at 30 o'clock, i.e. degree of depth continuum and degree of depth discontinuity zone.To each macro block in degree of depth continuum and the degree of depth discontinuity zone, adopt different macro-block coding patterns to encode respectively, keeping having improved the coding rate of multi-view depth video greatly under the drawing virtual view image quality prerequisite.

In order to test the performance of coding method of the present invention, adopted the listed test environment of table 1, setting threshold value 30.At Intel Core2 Duo 3.0GHz, 3.25GB on the computer of internal memory, deep video cycle tests " Breakdancers " and " Ballet " that Microsoft is provided, utilize the deep video HHI of estimation of Depth software DERS3.0 estimation " Book Arrival ", " Door Flowers ", " Leave Laptop " and " Alt Moabit ", " Champagne " of Nagoya university, " Dog " and " Pantomime ", the main public test environment of multi-view point video that proposes according to JVT, the performance of having tested JMVM coding method and coding method of the present invention.

Table 2 has provided each cycle tests and has adopted JMVM method and coding method of the present invention to carry out the situation of the coding rate lifting of multi-view depth video coding, compare with existing JMVM method, the coding rate of coding method of the present invention has improved 3.91～11.33 times.

Utilize drawing virtual view image software VSRS3.0 to carry out drawing virtual view image then, method for drafting is to utilize the color video of left viewpoint and the right viewpoint and corresponding deep video to draw the virtual view video.Wherein the left and right sides viewpoint of each cycle tests and virtual view such as table 3 are listed.Figure 11 a is the S of " Leave Laptop " ₈T ₈Original image, Figure 11 b are the S that utilizes " the Leave Laptop " of JMVM coding depth video rendering ₈T ₈Virtual image, Figure 11 c is " Leave Laptop " S that utilizes coding method coding depth video rendering of the present invention ₈T ₈Virtual image.Comparison diagram 11b and Figure 11 c, the virtual visual point image of the subjective quality of the virtual visual point image of the inventive method coding depth video rendering and JMVM coding method coding depth video rendering is basic identical as can be seen.With the original image shown in Figure 11 a is benchmark, can calculate the PSNR (Peak Signal to Noise Ratio, Y-PSNR) of the virtual visual point image of virtual visual point image by the inventive method coding depth video rendering and JMVM coding method coding depth video rendering.The PSNR that table 4 has provided the virtual visual point image of the inventive method coding depth video rendering deducts the result of PSNR of the virtual visual point image of JMVM coding method coding depth video rendering, Δ PSNR in the table 4 _Y, Δ PSNR _UAnd Δ PSNR _VBe the PSNR difference of YUV component, as can be seen from Table 4, the PSNR of the virtual visual point image of the inventive method coding depth video rendering is the same substantially with the PSNR of the virtual visual point image of JMVM coding method coding depth video rendering.

The tabulation of table 1 test environment

Each deep video sequence of table 2 adopts coding method of the present invention to adopt the coding rate of JMVM method under different quantization parameter situations to promote multiple relatively

Left and right viewpoint of each cycle tests of table 3 and virtual view are selected

Table 4 virtual viewpoint rendering image PNSR difference (dB)

Claims

1. the coding method of a multi-view depth video, it is characterized in that this coding method is on the multiple view video coding verification model JMVM based on coding standard H.264, the hierarchical B-frame coding predict that adopts JVT to recommend is encoded to multi-view depth video, adopts following steps to encode to all the B frames in the multi-view depth video in cataloged procedure:

2. the coding method of a kind of multi-view depth video according to claim 1 is characterized in that described setting threshold is 30.