CN100563339C

CN100563339C - A kind of multichannel video stream encoding method that utilizes depth information

Info

Publication number: CN100563339C
Application number: CN 200810062864
Authority: CN
Inventors: 骆凯; 李东晓; 张明; 何赛军; 石冰; 冯雅美; 谢贤海; 朱梦尧
Original assignee: Zhejiang University ZJU
Current assignee: Wan D display technology (Shenzhen) Co., Ltd.
Priority date: 2008-07-07
Filing date: 2008-07-07
Publication date: 2009-11-25
Anticipated expiration: 2028-07-07
Also published as: CN101309411A

Abstract

The invention discloses a kind of multichannel video stream encoding method that utilizes depth information, encoder is arranged to channel selecting unit, degree of depth generation unit, image rendering unit and four functional units of predict coding unit.Adopt the deep image rendering technology to excavate the main channel with the correlation of a plurality of secondary channels on spatial domain; The power of correlation on spatial domain according to main channel and secondary channels, decoupling zero main channel depth map cycle and main channel image sets length, coupling main channel depth map cycle and secondary channels image sets length are with correlation on the abundant excavated space territory and the correlation on the time-domain; Adopt a plurality of depth images to play up same picture frame, block the cavity that exists in the middle of the rendering image that causes with minimizing; The block information of coding and transmission diagram picture frame is to obtain high-quality reconstructed image frame.The present invention has guaranteed reconstructed image quality, has improved the encoding compression performance.

Description

A kind of multichannel video stream encoding method that utilizes depth information

Technical field

The present invention relates to the moving image treatment technology, relate in particular to a kind of coding method that utilizes the multichannel video stream of depth information.

Background technology

Television system has experienced from the black and white to the colour, from the evolution of analog to digital.Development two-dimentional television system so far offers the image on the single visual angle of remaining of spectators, plane, and the user can not watch image from interested angle according to the needs of oneself, and the image on plane runs counter to user's natural 3D vision experience.

Developing direction as the audio frequency and video technology, Interactive Free viewpoint TV can offer the video flowing of a plurality of passages of user, the user can select the video flowing of one or more passage as viewing angle as required, perhaps produce more virtual views (passage) video flowing, experience the level and smooth scene angle variation effect of watching by the interpolation between the viewpoint (passage); Three-dimensional (solid) TV is on the basis that offers a plurality of passage video flowings of user, support by display device, comprise anaglyph spectacles, auto-stereoscopic display or the like, make the user when watching, experience the degree of depth of scene, experience 3-D effect near natural vision.The combination of Interactive Free viewpoint TV and three-dimensional television will produce wide application prospect, for example Entertainment, education and training, virtual reality or the like in more areas.

Interactive Free viewpoint TV and three-dimensional television system can be divided into that content is obtained, encoding compression, transmission, decoding, five main functional levels of demonstration, compare with present digital television system, said system all greatly increases resource consumption, the functional requirement of each functional level, and a main characteristics is that above-mentioned two television systems all need the video flowing of a plurality of passages of encoding compression expeditiously.

Multichannel video stream exists the huge characteristics of data volume, causes the codec functions level to have very high computation complexity thus.But each passage of multichannel video stream except the translation and the rotation of passage, has the correlation on the very strong spatial domain owing to be shooting to Same Scene between the content of each passage, and this has brought possibility for efficient compression multichannel video stream.

At present the many of research has two classes to the multichannel video stream Methods for Coding in the world, the first kind is based on the MPEG video encoding standard, (these two class methods have characteristics separately to second class for DIBR, Depth-Image-BasedRendering) technology based on deep image rendering.

First kind method is based on the MPEG video encoding standard.

In May, 2003, formulated the video coding international standard H.264/AVC by the common joint video team JVT (Joint Video Team) that forms of the expert of ITU-T and ISO/IEC.H.264 adopted the hybrid encoding frame structure, adopted binary arithmetic coding or the like the advanced technology of minimum 4 * 4 variable-block motion prediction, a plurality of reference image frame, context-adaptive, single passage video flowing is carried out compressed encoding can obtain very high efficient.

JVT is is studying and defining multi-vision-point encoding (MVC, Muiti-view Video Coding) international standard at present.MVC has utilized the correlation of the picture frame between viewpoint inside and the different points of view, utilize and H.264/AVC carry out encoding compression, because the associated prediction in employing time and space coding, radio hookup (Simulcast) with each viewpoint absolute coding is compared, experiment at present shows, under different video contents, the space-time unite coding can improve the gain of 0.5dB to 3dB.Referring to: P.Merkle, A.Smolic and K.Muller, Efficient prediction structures for multiview video coding, IEEE Trans.CSVT, vol.17, no.11, pp.1461-1473,2007.

The difference vector that MVC utilizes the parallax prediction to obtain is represented the displacement between the passage, it determines that the target of the cost function of difference vector is the code check minimum, this is a kind of coarse displacement method for expressing, make this displacement vector be merely able to represent two displacement relations between the passage, and can not be converted into the depth information of object, and then represent the displacement relation between a plurality of passages.So MVC needs a difference vector between the key frame of per two adjacency channels.

MVC is divided into the image sets (GOP, Groupof Picture) of regular length to the picture frame of single passage on time-domain, and the length of GOP has been shown the correlation of picture frame on time-domain of single video flowing.MVC does the parallax prediction to the key frame between two adjacency channels, what difference vector reflected is the correlation of picture frame on spatial domain of two adjacency channels, but the length of GOP has determined the cycle of parallax prediction, thereby has limited the spatial coherence of better excavation picture frame.

MVC each viewpoint passage of need encoding, its coding structure more complicated needs big amount of calculation, long encoding time delay and big reference frame storing space.When the number of active lanes of needs coding increases, the also corresponding increase of code check.MVC coding, transmit all viewpoints, the size of photographic images and video camera distance picture size and the viewing distance with display end linked together, limited the flexibility of display end viewing location like this.

MVC uses parallax to predict the correlation of excavating between viewpoint.But because of the non-identity of the installation site of video camera, camera site, illumination condition, the same area of the picture frame of a plurality of viewpoints of picked-up, its brightness and colourity exist inconsistent.This inconsistent efficient that can influence parallax accuracy for predicting and coding, a kind of method of solution are to add brightness and colourity compensation term in the coupling cost function.Referring to: J.H.Hur, S.Choand Y.L.Lee, Adaptive local illumination change compensation method forH.264/AVC-based multiview video coding, IEEE Trans.CSVT, vol.17, no.11, pp.1496-1505,2007.

2006, AVS (Advanced Video Coding Standard) is confirmed as the video coding national standard.AVS adopts the hybrid encoding frame structure equally, has adopted integer transform, arithmetic coding or the like the advanced technology of variable-block structure, a plurality of reference image frame, pre-convergent-divergent.Coding structure based on MVC also can adopt AVS as coding method.

Second class methods are based on deep image rendering (DIBR) technology.

The advanced three-dimensional television system of European information technology project (IST, Information Society Technologies) (ATTEST, Advanced Three-Dimensional Television System Technology) has adopted the DIBR method.Referring to: C.Fehn, Depth-Image-Based Rendering (DIBR), compression and transmission for a new approach on 3D-TV, in Proceedings ofSPIE, Stereoscopic Displays and Virtual Reality Systems XI, USA, pp.93-104,2004.

The ATTEST system is at coding side only the encode two-dimensional video of a passage (main channel) and the depth map of this passage, adopt the method for DIBR in decoding end, according to depth information and camera parameters, the main channel picture frame that decoding is recovered projects to three dimensions, project to the imaging plane of virtual video camera again, reconstruct a plurality of virtual two-dimensional video passages thus.DIBR utilizes the depth information of a passage to play up a plurality of video channels, compares with MVC, can obtain higher compression ratio, and can not produce owing to the different brightness that cause with parameter of camera position, colourity do not match.

The encode depth map of each main channel picture frame of ATTEST utilizes main channel picture frame and depth map to play up other channel image frames, excavation be main channel and the correlation of other channel image frames on spatial domain.Because the relation of distance between the passage, other channel image frames might be better than and the correlation of main channel picture frame on spatial domain in the correlation on the time-domain.ATTEST has ignored the temporal correlation of other channel image frames.

The DIBR method has been utilized depth map, in the production process of depth map, two sub-pictures is carried out the solid coupling, owing to block, will cause the disappearance of depth information in depth map of a part of scene.In the DIBR process, because the disappearance of scene information, the virtual view channel image frame of playing up is inner the cavity can to occur, and since the decline of virtual visual point image quality, the viewing location in the off-center position, and spectators' stereoscopic vision impression is with influenced.

Alleviate the approach of playing up the inner appearance of synthetic picture frame cavity and have four at present, the one, come filling cavity with the texture around the cavity, the 2nd, it is level and smooth that depth map is carried out filtering, the 3rd, the depth map of a plurality of passages of coding transmission, utilize a plurality of channel image frames and depth map to play up the image to be synthesized of same virtual view, the 4th, adopt comparatively complicated multi-level depth map (LDI, Layered Depth Image) technology, referring to: S.U.Yoon and Y.S.Ho, Multiple color and depth video coding using a hierarchicalrepresentation, IEEE Trans.CS VT, vol.17, no.11,2007.

Summary of the invention

The objective of the invention is to overcome the deficiencies in the prior art, a kind of coding method that utilizes the multichannel video stream of depth information is provided.

The multichannel video stream encoding method that utilizes depth information is that encoder is arranged to channel selecting unit, degree of depth generation unit, image rendering unit and four functional units of predict coding unit, to the multichannel video stream of input, encodes according to the following steps:

1) selects main channel and secondary channels by described channel selecting unit;

2) produce the depth map of main channel picture frame by described degree of depth generation unit;

3) produce the prognostic chart of secondary channels key frame by described image rendering unit;

4) by described predict coding unit the Occlusion Map and the non-key frame of main channel video streaming image frame and depth map, secondary channels key frame carried out compressed encoding according to the video encoding standard method, compressed bit stream behind the output encoder, video encoding standard method comprise video coding international standard MPEG-X, H.26X with video coding national standard AVS.

The video camera of a plurality of passage correspondences to be encoded of described input channel selected cell is one dimension or two-dimensional arrangements, and its optical axis is perpendicular to common plane or converge at photographed scene.

The correlation of picture frame on spatial domain of different passage synchronizations according to the distance between camera parameters and passage and the passage, measured in described channel selecting unit, selects one or more main channel, remaining passage as secondary channels; A channel group is formed in a main channel and a plurality of secondary channels,, a plurality of passages of importing are divided into one or more channel group according to the correlation of different channel image frames on spatial domain.

An image sets for the main channel video flowing, is formed with a plurality of picture frames in described channel selecting unit, for each image sets, one of them picture frame as key frame, as non-key frame, is adopted the B frame predict structure of stratification to remaining picture frame to image sets; For the main channel video flowing, its depth map variable period satisfies 1≤P _D≤ L _MG, and P _DCan be by L _MGDivide exactly P _DBe positive integer, wherein P _DBe the depth map cycle of main channel, L _MGImage sets length for the main channel.

An image sets for the secondary channels video flowing, is formed with a plurality of picture frames in described channel selecting unit, for each image sets, one of them picture frame as key frame, remaining picture frame as non-key frame; Satisfy P _D≤ L _AG≤ L _MG, and P _DCan be by L _AGDivide exactly L _AGCan be by L _MGDivide exactly LA _GBe positive integer, wherein L _AGBe the image sets length of secondary channels, L _MGBe the image sets length of main channel, P _DBe the depth map cycle of main channel.

Described channel selecting unit, in single channel group, for the secondary channels video flowing, there is depth map in the position, main channel of its key frame position correspondence; If there is not depth map in synchronization main channel picture frame, then the picture frame of all secondary channels all is non-key frame constantly.

Described image rendering unit in single channel group, for the key frame of secondary channels video flowing, according to the reconstructed image frame and the depth map of this moment main channel, adopts the method for deep image rendering to produce the prognostic chart that secondary channels is somebody's turn to do key frame constantly.

Described image rendering unit, key frame for the secondary channels video flowing, if the multichannel video stream of input is divided into a plurality of channel group, the main channel of a plurality of channel group that selection and this secondary channels are contiguous, the depth map cycle of these main channels can be divided exactly by the image sets length of this secondary channels, according to the reconstructed image frame and the depth map of these these main channels of moment, the method for employing deep image rendering produces the prognostic chart of the key frame of this secondary channels.

Described predict coding unit for the main channel video flowing, is carried out compressed encoding according to the video encoding standard method; For the main channel depth map, utilize the movable information of the main channel picture frame of this depth map correspondence to predict the movable information of this depth map, carry out compressed encoding according to the video encoding standard method; For the secondary channels video flowing, key frame and its prognostic chart is poor, and the Occlusion Map of generation key frame to Occlusion Map, is carried out compressed encoding according to the video encoding standard method; For the secondary channels video flowing, the key frame rebuild as the reference frame, is adopted the B frame predict structure of stratification to the non-key frame in the image sets, carry out compressed encoding according to the video encoding standard method.

The present invention is directed to the content relevance that multichannel video stream exists on spatial domain, propose to utilize the coding method of depth information.According to main channel picture frame and its depth map, can render prognostic chart apace with the contiguous a plurality of secondary channels picture frames in main channel.Compare with MVC, depth map can be predicted a plurality of secondary channels, and parallax information can only be used for its pairing passage, so can obtain higher encoding compression efficient based on the multi-channel video stream encryption of depth information.

The depth map cycle of main channel has reflected that the main channel is with the content relevance of secondary channels on spatial domain.The same camera parameters of the content relevance of multichannel video stream on spatial domain, especially closely related with the distance between passage and the passage.Distance between the passage is short more, and the correlation of the picture frame of two passage synchronizations is big more, otherwise correlation is more little.So the cycle of main channel depth map should be variable.When the correlation on the spatial domain during greater than the correlation on the time-domain, the cycle of main channel depth map is less, and the picture frame of more secondary channels adopts depth information to play up acquisition; When the correlation on the spatial domain during less than the correlation on the time-domain, the cycle of main channel depth map is bigger, and the picture frame of more secondary channels adopts the prediction on the time-domain to obtain.

Main channel of the present invention depth map variable period, depth map cycle P _DExcursion at 1≤P _D≤ L _MG, and _PDCan be by L _MGDivide exactly L _MGGOP length for the main channel.When the spatial coherence between the passage is very big, P _DEqual 1, each picture frame of main channel all produces depth map, and each picture frame of secondary channels is all played up by the main channel and obtained prognostic chart; When the spatial coherence between the passage is minimum, P _DEqual L _MG, the main channel only produces depth map on key frame.Secondary channels has only key frame to play up via the main channel to obtain prognostic chart, and the non-key frame of secondary channels utilizes the correlation on the time-domain to carry out predictive coding.

The GOP variable-length of secondary channels of the present invention, the GOP length of different auxiliary passage can be different.The GOP length L of secondary channels _AGSatisfy P _D≤ L _AG≤ L _MG, and P _DCan be by L _AGDivide exactly L _AGCan be by L _MGDivide exactly L _AGBe positive integer, L _MGBe the length of the GOP of main channel video flowing, P _DBe the depth map cycle of main channel video flowing.When the spatial coherence of secondary channels and main channel is big, L _AGLess, more secondary channels picture frame is played up through DIBR by the main channel and is obtained prognostic chart; When the spatial coherence of secondary channels and main channel is minimum, L _AGEqual L _MG, secondary channels only obtains prognostic chart in the position of corresponding main channel key frame.

The block information of the prognostic chart of the secondary channels key frame that the present invention's coding and transmission employing DIBR method obtain.There is the cavity in the inside of the secondary channels picture frame that utilizes depth map to play up to obtain.Cause by blocking in the cavity, block refer to a part of picture material in secondary channels as seen, and invisible in the main channel, cause the depth map of main channel not have the corresponding points of synchronization secondary channels picture frame.A method that solves occlusion issue is a coding transmission cavity information.Because the camera parameters of main channel and secondary channels is different, the illumination condition difference makes that playing up the secondary channels key frame prognostic chart that obtains there are differences with photographic images.Therefore empty information and different information all need the transmission that is encoded, to obtain high-quality reconstructed image frame.

When having a plurality of channel group (GOV, Group of View), the present invention adopts the video encoding standard method to carry out encoding compression to a plurality of main channels independently, and the prognostic chart of secondary channels key frame is adopted the acquisition of playing up based on a plurality of depth maps.Single depth map is played up the central cavity of prognostic chart of acquisition, can in the middle of playing up the prognostic chart of acquisition, other depth maps be compensated, so a plurality of depth maps can obtain quality better prediction figure to playing up of same secondary channels key frame, reduce the block information of prognostic chart to be encoded.

The present invention adopts the deep image rendering technology to excavate the main channel with the correlation of a plurality of secondary channels on spatial domain; The power of correlation on spatial domain according to main channel and secondary channels, decoupling zero main channel depth map cycle and main channel GOP length, coupling main channel depth map cycle and secondary channels GOP length are with correlation on the abundant excavated space territory and the correlation on the time-domain; Adopt a plurality of depth images to play up same picture frame, block the cavity that exists in the middle of the rendering image that causes with minimizing; The block information of coding and transmission diagram picture frame is to obtain high-quality reconstructed image frame.Compare with ATTEST, the present invention has guaranteed reconstructed image quality, compares with MVC, and the present invention has obtained higher encoding compression performance.

Description of drawings

Fig. 1 utilizes the multichannel video stream encoder structural representation of depth information for the present invention;

Fig. 2 is the coding structures of 5 passage video flowings of first example on spatial domain;

Fig. 3 is the coding structures of 5 passage video flowings of first example on time-domain and spatial domain;

Fig. 4 is the coding structures of 8 passage video flowings of second example on spatial domain;

Fig. 5 is the coding structures of 8 passage video flowings of second example on time-domain and spatial domain;

Fig. 6 is the coding structure of 9 passage video flowings on spatial domain of the 3rd example two-dimensional arrangements;

Fig. 7 is the coding structure of 36 passage video flowings on spatial domain of the 4th example two-dimensional arrangements.

Embodiment

Below in conjunction with accompanying drawing embodiments of the invention are described.

An image sets for the secondary channels video flowing, is formed with a plurality of picture frames in described channel selecting unit, for each image sets, one of them picture frame as key frame, remaining picture frame as non-key frame; Satisfy P _D≤ L _AG≤ L _MG, and P _DCan be by L _AGDivide exactly L _AGCan be by L _MGDivide exactly L _AGBe positive integer, wherein L _AGBe the image sets length of secondary channels, L _MGBe the image sets length of main channel, P _DBe the depth map cycle of main channel.

Fig. 1 is according to the schematic diagram that utilizes the multichannel video stream encoder of depth information of the present invention.Multichannel video stream encoder carries out compressed encoding, the compressed bit stream behind the output encoder to the multichannel video stream of input.Encoder comprises channel selecting unit 11, degree of depth generation unit 12, image rendering unit 13 and predict coding unit 14.

Embodiment is described in detail the present invention below in conjunction with accompanying drawing.

Embodiment

1, and 5 passage video flowings linear with one dimension to be encoded or arc shooting are example, is elaborated with regard to four functional units and the coding method thereof of multichannel video stream encoder.

(1) the channel selecting unit 11

Referring to Fig. 2, according to relative position, order is designated as

passage

1,2,3,4, No. 55 passages of input.Select No. 3 passages as the main channel, as secondary channels, 5 passages are formed a GOV 4 remaining passages.

Referring to Fig. 3, the GOP length of main channel (No. 3 passages) is decided to be 8, a key frame appears every 7 picture frames; The depth map cycle of main channel is decided to be 4, produces a depth map every 3 picture frames.GOP length with adjacent nearer No. 2, No. 4 passages in main channel is decided to be 4, the GOP length of No. 1, No. 5 passage is decided to be 8.More than Xuan Ding main channel GOP length, depth map cycle and secondary channels GOP length are in order next clearly to describe coding method, and in fact main channel GOP length, depth map cycle and secondary channels GOP length can be decided to be any positive integer that satisfies claim 4 and claim 5.

(2) degree of depth generation unit 12

This unit produces the depth map of main channel picture frame according to the main channel picture frame with the picture frame of the synchronization of the contiguous secondary channels in main channel.

Referring to Fig. 3, transverse axis is designated as channel direction V, and 5 passages from left to right are respectively V1, V2, V3, V4, V5; The longitudinal axis is designated as time orientation T, for convenience of description, and Fig. 39 picture frames on the time orientation that drawn.By the transverse axis and the longitudinal axis are distinguished mark, can determine the position of a picture frame on the space time territory uniquely, for example, V1T1 has determined the picture frame in Fig. 3 upper left corner, the position of other picture frame can determine that by the combination of V and T its label does not draw at Fig. 3 similarly.

Referring to Fig. 3, the depth map cycle of main channel (No. 3 passages) is 4, and the picture frame that indicates D is the depth map of this location drawing picture frame of main channel, indicates the key frame (V1T1 do not mark K, this picture frame also be key frame) of the picture frame of K for this picture frame place passage.

Referring to Fig. 3, in 9 picture frames of main channel, V3T1, V3T5, V3T9 need corresponding depth map.With V3T1 is example, and 12 couples of V3T1 of degree of depth generation unit and V4T1 (perhaps V2T1) make three-dimensional coupling, produces the depth map of V3T1, is designated as D _V3T1

(3) the image rendering unit 13

This unit adopts the DIBR method according to main channel picture frame and depth map, produces the prognostic chart of the key frame of secondary channels synchronization.

Referring to Fig. 3, the camber line of horizontal direction is represented the DIBR method, two input and output that the channel image frame is DIBR that camber line connects, camber line do not connect the picture frame and the depth map of main channel with an end of arrow, an end of camber line band arrow points to the key frame of secondary channels.

Referring to Fig. 3, be example with the key frame V4T1 of No. 4 passages, 13 pairs of main channels of image rendering unit picture frame V3T1 and its depth map D _V3T1, adopt the DIBR method, produce the prognostic chart of V4T1, be designated as P _V4T1

(4) predict coding unit 14

This unit is according to the video encoding standard method, and Occlusion Map and non-key frame to main channel picture frame and depth map, secondary channels key frame carry out encoding compression.

Referring to Fig. 3, the GOP length of main channel is 8, and each GOP comprises 1 key frame and 7 non-key frames, GOP is adopted the B frame coding structure of stratification.Motion prediction on the camber line express time territory of vertical direction points to two vertical camber lines of 1 non-key frame and represents that this non-key frame can be with reference to two picture frames.1 GOP (comprising picture frame V3T1 to V3T8) with the main channel is an example, key frame V3T1 and V3T9 are made intraframe predictive coding, as the B frame, two picture frames with future in past are done predictive coding as the reference frame on the select time direction non-key frame (V3T2 is to V3T7).With non-key frame V3T5 is example, and it selects V3T1 and V3T9 to make inter prediction encoding as the reference frame.

Referring to Fig. 3, the depth map cycle of main channel is 4, every 3 picture frames, 1 depth map of need encoding.1 GOP of main channel comprises 2 depth maps, and these 2 depth maps are formed 1 GOD (Group ofDepth).The coding of GOD is similar to coding to GOP, is that example (comprises depth map D with a GOD of main channel _V3T1And D _V3T5), to D _V3T1And D _V3T9Make intraframe predictive coding, D _V3T5As the B frame, predict D with the motion vector of picture frame V3T5 _V3T5Motion vector, select D _V3T1And D _V3T9As the reference frame, to D _V3T5Make inter prediction encoding.

Referring to Fig. 3, the prognostic chart of the key frame of secondary channels is produced by image rendering unit 13, and is poor to key frame and its prognostic chart of secondary channels, produces the Occlusion Map of this key frame, and the Occlusion Map and the non-key frame of secondary channels are made encoding compression.A GOP with No. 4 secondary channels is example (comprising picture frame V4T1 to V4T4), and the Occlusion Map of key frame V4T1 and V4T5 is designated as R respectively _V4T1, R _V4T5, to R _V4T1And R _V4T5Adopt the video encoding standard method to make dct transform, quantification, entropy coding, as the B frame, two picture frames with future in past are done predictive coding as the reference frame on the select time direction non-key frame (V4T2 is to V4T4).With non-key frame V4T3 is example, and this picture frame selects V4T1 and V4T5 to make inter prediction encoding as the reference frame.

Embodiment

2, and 8 passage video flowings linear with one dimension to be encoded or arc shooting are example, with regard to encoder of the present invention its place different with embodiment 1 of encoding are illustrated.

(1) the channel selecting unit 11

Referring to Fig. 4, according to relative position, order is designated as passage 1 to No. 88 passages of input.Select No. 3, No. 6 passages as the main channel, 6 remaining passages as secondary channels.8 passages are divided into two GOV, and 1,2,3, No. 4 passage is formed a GOV, and 5,6,7, No. 8 passage is formed a GOV.

Referring to Fig. 5, the GOP length of main channel (No. 3, No. 6 passages) is decided to be 8, a key frame appears every 7 picture frames; The depth map cycle of main channel is decided to be 4, produces a depth map every 3 picture frames.The GOP length of 2,4,5, No. 7 passages is decided to be 4, the GOP length of 1, No. 8 passage is decided to be 8.More than Xuan Ding main channel GOP length, depth map cycle and secondary channels GOP length are in order next clearly to describe coding method, and in fact main channel GOP length, depth map cycle and secondary channels GOP length can be decided to be any positive integer that satisfies claim 4 and claim 5.

(2) degree of depth generation unit 12

This unit produces the depth map of main channel picture frame according to the main channel picture frame with the picture frame of the synchronization of the contiguous secondary channels in main channel.The working method of this unit is the same with embodiment 1.

(3) the image rendering unit 13

This unit according to the picture frame and the depth map of the synchronization of the contiguous main channel of secondary channels key frame, adopt the DIBR method, produce the prognostic chart of this secondary channels key frame.When secondary channels had the main channel of a plurality of vicinities, a plurality of main channels were made DIBR respectively, predicted the same key frame of this secondary channels, obtained the prognostic chart of this secondary channels key frame.

Referring to Fig. 5, the camber line of horizontal direction is represented the DIBR method, two input and output that the channel image frame is DIBR that camber line connects, camber line do not connect the picture frame and the depth map of main channel with an end of arrow, an end of camber line band arrow points to the key frame of secondary channels.

Referring to Fig. 5, be example with the key frame V4T1 of No. 4 passages, image rendering unit 13 is according to No. 3 main channel picture frame V3T1 and its depth map D _V3T1, adopt the DIBR method to produce the prognostic chart of V4T1, according to No. 6 main channel picture frame V6T1 and its depth map D _V6T1, employing DIBR method produces the prognostic chart of V4T1, obtains the prognostic chart of No. 4 secondary channels key frame V4T1 according to above-mentioned two prognostic charts.

(4) predict coding unit 14

This unit is according to the video encoding standard method, and Occlusion Map and non-key frame to main channel picture frame and depth map, secondary channels key frame carry out encoding compression.The working method of this unit is the same with embodiment 1.

Embodiment 3, are example with 9 passage video flowings of two-dimensional arrangements to be encoded, with regard to encoder of the present invention its place different with embodiment 1 of encoding are illustrated.

(1) the channel selecting unit 11

Referring to Fig. 6, according to relative position, order is designated as passage 1 to No. 99 passages of input.Select No. 5 passages as the main channel, 8 remaining passages as secondary channels.9 passages are formed a GOV.

The same with embodiment 1, determine the GOP length of the GOP length of main channel and depth map cycle, secondary channels.

(2) degree of depth generation unit 12

(3) the image rendering unit 13

This unit according to the picture frame and the depth map of the synchronization of the contiguous main channel of secondary channels key frame, adopt the DIBR method, produce the prognostic chart of this secondary channels key frame.The working method of this unit is the same with embodiment 1.

4) predict coding unit 14

Embodiment 4, are example with 36 passage video flowings of two-dimensional arrangements to be encoded, with regard to encoder of the present invention its place different with embodiment 1 of encoding are illustrated.

(1) the channel selecting unit 11

Referring to Fig. 7, according to relative position, order is designated as passage 1 to No. 36 36 passages of input.Select No. 8, No. 11, No. 26, No. 29 passages as the main channel, 32 remaining passages as secondary channels.Per 9 passages are formed a GOV, and 36 passages are divided into 4 GOV.

(2) degree of depth generation unit 12

(3) the image rendering unit 13

This unit according to the picture frame and the depth map of the synchronization of the contiguous main channel of secondary channels key frame, adopt the DIBR method, produce the prognostic chart of this secondary channels key frame.When secondary channels had the main channel of a plurality of vicinities, a plurality of main channels were made DIBR respectively, predicted the same key frame of secondary channels, obtained the prognostic chart of this secondary channels key frame.

Referring to Fig. 7, the camber line that connects two passages is represented the DIBR method, and camber line does not connect the main channel with an end of arrow, and an end of camber line band arrow points to secondary channels.8 secondary channels with GOV1 are example, and 1,2, No. 7 secondary channels and No. 8 main channels are contiguous, and these 3 secondary channels select No. 8 main channels to play up as DIBR; 3, No. 9 passages and 8, No. 11 main channels are contiguous, and these 2 secondary channels select 8, No. 11 main channels to play up as DIBR; 13, No. 14 passages and 8, No. 26 main channels are contiguous, and these 2 secondary channels select 8, No. 26 main channels to play up as DIBR; No. 15 passages and 8,11,26, No. 29 main channels are contiguous, and this 1 secondary channels selects above-mentioned 4 main channels to play up as DIBR.

4) predict coding unit 14

Claims

1. multichannel video stream encoding method that utilizes depth information, it is characterized in that encoder is arranged to channel selecting unit, degree of depth generation unit, image rendering unit and four functional units of predict coding unit, to the multichannel video stream of input, encode according to the following steps:

2) by described degree of depth generation unit according to the main channel picture frame with produce the depth map of main channel picture frame with the picture frame of the synchronization of the contiguous secondary channels in main channel;

3) by main channel picture frame and the depth map of described image rendering unit, adopt the prognostic chart that produces the key frame of secondary channels synchronization based on the deep image rendering method according to vicinity;

4) by described predict coding unit the Occlusion Map and the non-key frame of main channel video streaming image frame and depth map, secondary channels key frame carried out compressed encoding according to the video encoding standard method, compressed bit stream behind the output encoder, video encoding standard method comprise video coding international standard MPEG-X, H.26X with video coding national standard AVS;

Described channel selecting unit, in single channel group, for the secondary channels video flowing, there is depth map in the position, main channel of its key frame position correspondence; If there is not depth map in synchronization main channel picture frame, then the picture frame of all secondary channels all is non-key frame constantly;

2. a kind of multichannel video stream encoding method that utilizes depth information as claimed in claim 1, the video camera that it is characterized in that a plurality of passage correspondences to be encoded of described channel selecting unit, be one dimension or two-dimensional arrangements, its optical axis is perpendicular to common plane or converge at photographed scene.

3. a kind of multichannel video stream encoding method that utilizes depth information as claimed in claim 1, it is characterized in that described channel selecting unit, according to the distance between camera parameters and passage and the passage, measure the correlation of picture frame on spatial domain of different passage synchronizations, select one or more main channel, remaining passage as secondary channels; A channel group is formed in a main channel and a plurality of secondary channels,, a plurality of passages of importing are divided into one or more channel group according to the correlation of different channel image frames on spatial domain.

4. a kind of multichannel video stream encoding method that utilizes depth information as claimed in claim 1, it is characterized in that described channel selecting unit, for the main channel video flowing, a plurality of picture frames are formed an image sets, for each image sets, one of them picture frame as key frame, as non-key frame, is adopted the B frame predict structure of stratification to remaining picture frame to image sets; For the main channel video flowing, its depth map variable period satisfies 1≤P _D≤ L _MG, and P _DCan be by L _MGDivide exactly P _DBe positive integer, wherein P _DBe the depth map cycle of main channel, L _MGImage sets length for the main channel.

5. a kind of multichannel video stream encoding method that utilizes depth information as claimed in claim 1, it is characterized in that described channel selecting unit, for the secondary channels video flowing, a plurality of picture frames are formed an image sets, for each image sets, one of them picture frame as key frame, remaining picture frame as non-key frame; Satisfy P _D≤ L _AG≤ L _MG, and P _DCan be by L _AGDivide exactly L _AGCan be by L _MGDivide exactly L _AGBe positive integer, wherein L _AGBe the image sets length of secondary channels, L _MGBe the image sets length of main channel, P _DBe the depth map cycle of main channel.

6. a kind of multichannel video stream encoding method that utilizes depth information as claimed in claim 1, it is characterized in that described image rendering unit, in single channel group, key frame for the secondary channels video flowing, according to the reconstructed image frame and the depth map of this moment main channel, adopt the method for deep image rendering to produce the prognostic chart that secondary channels is somebody's turn to do key frame constantly.

7. a kind of multichannel video stream encoding method that utilizes depth information as claimed in claim 1, it is characterized in that described image rendering unit, key frame for the secondary channels video flowing, if the multichannel video stream of input is divided into a plurality of channel group, the main channel of a plurality of channel group that selection and this secondary channels are contiguous, the depth map cycle of these main channels can be divided exactly by the image sets length of this secondary channels, according to the reconstructed image frame and the depth map of these these main channels of moment, the method for employing deep image rendering produces the prognostic chart of the key frame of this secondary channels.