CN100563339C - A kind of multichannel video stream encoding method that utilizes depth information - Google Patents

A kind of multichannel video stream encoding method that utilizes depth information Download PDF

Info

Publication number
CN100563339C
CN100563339C CN 200810062864 CN200810062864A CN100563339C CN 100563339 C CN100563339 C CN 100563339C CN 200810062864 CN200810062864 CN 200810062864 CN 200810062864 A CN200810062864 A CN 200810062864A CN 100563339 C CN100563339 C CN 100563339C
Authority
CN
China
Prior art keywords
main channel
secondary channels
frame
depth map
key frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN 200810062864
Other languages
Chinese (zh)
Other versions
CN101309411A (en
Inventor
骆凯
李东晓
张明
何赛军
石冰
冯雅美
谢贤海
朱梦尧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wan D display technology (Shenzhen) Co., Ltd.
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN 200810062864 priority Critical patent/CN100563339C/en
Publication of CN101309411A publication Critical patent/CN101309411A/en
Application granted granted Critical
Publication of CN100563339C publication Critical patent/CN100563339C/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a kind of multichannel video stream encoding method that utilizes depth information, encoder is arranged to channel selecting unit, degree of depth generation unit, image rendering unit and four functional units of predict coding unit.Adopt the deep image rendering technology to excavate the main channel with the correlation of a plurality of secondary channels on spatial domain; The power of correlation on spatial domain according to main channel and secondary channels, decoupling zero main channel depth map cycle and main channel image sets length, coupling main channel depth map cycle and secondary channels image sets length are with correlation on the abundant excavated space territory and the correlation on the time-domain; Adopt a plurality of depth images to play up same picture frame, block the cavity that exists in the middle of the rendering image that causes with minimizing; The block information of coding and transmission diagram picture frame is to obtain high-quality reconstructed image frame.The present invention has guaranteed reconstructed image quality, has improved the encoding compression performance.

Description

A kind of multichannel video stream encoding method that utilizes depth information
Technical field
The present invention relates to the moving image treatment technology, relate in particular to a kind of coding method that utilizes the multichannel video stream of depth information.
Background technology
Television system has experienced from the black and white to the colour, from the evolution of analog to digital.Development two-dimentional television system so far offers the image on the single visual angle of remaining of spectators, plane, and the user can not watch image from interested angle according to the needs of oneself, and the image on plane runs counter to user's natural 3D vision experience.
Developing direction as the audio frequency and video technology, Interactive Free viewpoint TV can offer the video flowing of a plurality of passages of user, the user can select the video flowing of one or more passage as viewing angle as required, perhaps produce more virtual views (passage) video flowing, experience the level and smooth scene angle variation effect of watching by the interpolation between the viewpoint (passage); Three-dimensional (solid) TV is on the basis that offers a plurality of passage video flowings of user, support by display device, comprise anaglyph spectacles, auto-stereoscopic display or the like, make the user when watching, experience the degree of depth of scene, experience 3-D effect near natural vision.The combination of Interactive Free viewpoint TV and three-dimensional television will produce wide application prospect, for example Entertainment, education and training, virtual reality or the like in more areas.
Interactive Free viewpoint TV and three-dimensional television system can be divided into that content is obtained, encoding compression, transmission, decoding, five main functional levels of demonstration, compare with present digital television system, said system all greatly increases resource consumption, the functional requirement of each functional level, and a main characteristics is that above-mentioned two television systems all need the video flowing of a plurality of passages of encoding compression expeditiously.
Multichannel video stream exists the huge characteristics of data volume, causes the codec functions level to have very high computation complexity thus.But each passage of multichannel video stream except the translation and the rotation of passage, has the correlation on the very strong spatial domain owing to be shooting to Same Scene between the content of each passage, and this has brought possibility for efficient compression multichannel video stream.
At present the many of research has two classes to the multichannel video stream Methods for Coding in the world, the first kind is based on the MPEG video encoding standard, (these two class methods have characteristics separately to second class for DIBR, Depth-Image-BasedRendering) technology based on deep image rendering.
First kind method is based on the MPEG video encoding standard.
In May, 2003, formulated the video coding international standard H.264/AVC by the common joint video team JVT (Joint Video Team) that forms of the expert of ITU-T and ISO/IEC.H.264 adopted the hybrid encoding frame structure, adopted binary arithmetic coding or the like the advanced technology of minimum 4 * 4 variable-block motion prediction, a plurality of reference image frame, context-adaptive, single passage video flowing is carried out compressed encoding can obtain very high efficient.
JVT is is studying and defining multi-vision-point encoding (MVC, Muiti-view Video Coding) international standard at present.MVC has utilized the correlation of the picture frame between viewpoint inside and the different points of view, utilize and H.264/AVC carry out encoding compression, because the associated prediction in employing time and space coding, radio hookup (Simulcast) with each viewpoint absolute coding is compared, experiment at present shows, under different video contents, the space-time unite coding can improve the gain of 0.5dB to 3dB.Referring to: P.Merkle, A.Smolic and K.Muller, Efficient prediction structures for multiview video coding, IEEE Trans.CSVT, vol.17, no.11, pp.1461-1473,2007.
The difference vector that MVC utilizes the parallax prediction to obtain is represented the displacement between the passage, it determines that the target of the cost function of difference vector is the code check minimum, this is a kind of coarse displacement method for expressing, make this displacement vector be merely able to represent two displacement relations between the passage, and can not be converted into the depth information of object, and then represent the displacement relation between a plurality of passages.So MVC needs a difference vector between the key frame of per two adjacency channels.
MVC is divided into the image sets (GOP, Groupof Picture) of regular length to the picture frame of single passage on time-domain, and the length of GOP has been shown the correlation of picture frame on time-domain of single video flowing.MVC does the parallax prediction to the key frame between two adjacency channels, what difference vector reflected is the correlation of picture frame on spatial domain of two adjacency channels, but the length of GOP has determined the cycle of parallax prediction, thereby has limited the spatial coherence of better excavation picture frame.
MVC each viewpoint passage of need encoding, its coding structure more complicated needs big amount of calculation, long encoding time delay and big reference frame storing space.When the number of active lanes of needs coding increases, the also corresponding increase of code check.MVC coding, transmit all viewpoints, the size of photographic images and video camera distance picture size and the viewing distance with display end linked together, limited the flexibility of display end viewing location like this.
MVC uses parallax to predict the correlation of excavating between viewpoint.But because of the non-identity of the installation site of video camera, camera site, illumination condition, the same area of the picture frame of a plurality of viewpoints of picked-up, its brightness and colourity exist inconsistent.This inconsistent efficient that can influence parallax accuracy for predicting and coding, a kind of method of solution are to add brightness and colourity compensation term in the coupling cost function.Referring to: J.H.Hur, S.Choand Y.L.Lee, Adaptive local illumination change compensation method forH.264/AVC-based multiview video coding, IEEE Trans.CSVT, vol.17, no.11, pp.1496-1505,2007.
2006, AVS (Advanced Video Coding Standard) is confirmed as the video coding national standard.AVS adopts the hybrid encoding frame structure equally, has adopted integer transform, arithmetic coding or the like the advanced technology of variable-block structure, a plurality of reference image frame, pre-convergent-divergent.Coding structure based on MVC also can adopt AVS as coding method.
Second class methods are based on deep image rendering (DIBR) technology.
The advanced three-dimensional television system of European information technology project (IST, Information Society Technologies) (ATTEST, Advanced Three-Dimensional Television System Technology) has adopted the DIBR method.Referring to: C.Fehn, Depth-Image-Based Rendering (DIBR), compression and transmission for a new approach on 3D-TV, in Proceedings ofSPIE, Stereoscopic Displays and Virtual Reality Systems XI, USA, pp.93-104,2004.
The ATTEST system is at coding side only the encode two-dimensional video of a passage (main channel) and the depth map of this passage, adopt the method for DIBR in decoding end, according to depth information and camera parameters, the main channel picture frame that decoding is recovered projects to three dimensions, project to the imaging plane of virtual video camera again, reconstruct a plurality of virtual two-dimensional video passages thus.DIBR utilizes the depth information of a passage to play up a plurality of video channels, compares with MVC, can obtain higher compression ratio, and can not produce owing to the different brightness that cause with parameter of camera position, colourity do not match.
The encode depth map of each main channel picture frame of ATTEST utilizes main channel picture frame and depth map to play up other channel image frames, excavation be main channel and the correlation of other channel image frames on spatial domain.Because the relation of distance between the passage, other channel image frames might be better than and the correlation of main channel picture frame on spatial domain in the correlation on the time-domain.ATTEST has ignored the temporal correlation of other channel image frames.
The DIBR method has been utilized depth map, in the production process of depth map, two sub-pictures is carried out the solid coupling, owing to block, will cause the disappearance of depth information in depth map of a part of scene.In the DIBR process, because the disappearance of scene information, the virtual view channel image frame of playing up is inner the cavity can to occur, and since the decline of virtual visual point image quality, the viewing location in the off-center position, and spectators' stereoscopic vision impression is with influenced.
Alleviate the approach of playing up the inner appearance of synthetic picture frame cavity and have four at present, the one, come filling cavity with the texture around the cavity, the 2nd, it is level and smooth that depth map is carried out filtering, the 3rd, the depth map of a plurality of passages of coding transmission, utilize a plurality of channel image frames and depth map to play up the image to be synthesized of same virtual view, the 4th, adopt comparatively complicated multi-level depth map (LDI, Layered Depth Image) technology, referring to: S.U.Yoon and Y.S.Ho, Multiple color and depth video coding using a hierarchicalrepresentation, IEEE Trans.CS VT, vol.17, no.11,2007.
Summary of the invention
The objective of the invention is to overcome the deficiencies in the prior art, a kind of coding method that utilizes the multichannel video stream of depth information is provided.
The multichannel video stream encoding method that utilizes depth information is that encoder is arranged to channel selecting unit, degree of depth generation unit, image rendering unit and four functional units of predict coding unit, to the multichannel video stream of input, encodes according to the following steps:
1) selects main channel and secondary channels by described channel selecting unit;
2) produce the depth map of main channel picture frame by described degree of depth generation unit;
3) produce the prognostic chart of secondary channels key frame by described image rendering unit;
4) by described predict coding unit the Occlusion Map and the non-key frame of main channel video streaming image frame and depth map, secondary channels key frame carried out compressed encoding according to the video encoding standard method, compressed bit stream behind the output encoder, video encoding standard method comprise video coding international standard MPEG-X, H.26X with video coding national standard AVS.
The video camera of a plurality of passage correspondences to be encoded of described input channel selected cell is one dimension or two-dimensional arrangements, and its optical axis is perpendicular to common plane or converge at photographed scene.
The correlation of picture frame on spatial domain of different passage synchronizations according to the distance between camera parameters and passage and the passage, measured in described channel selecting unit, selects one or more main channel, remaining passage as secondary channels; A channel group is formed in a main channel and a plurality of secondary channels,, a plurality of passages of importing are divided into one or more channel group according to the correlation of different channel image frames on spatial domain.
An image sets for the main channel video flowing, is formed with a plurality of picture frames in described channel selecting unit, for each image sets, one of them picture frame as key frame, as non-key frame, is adopted the B frame predict structure of stratification to remaining picture frame to image sets; For the main channel video flowing, its depth map variable period satisfies 1≤P D≤ L MG, and P DCan be by L MGDivide exactly P DBe positive integer, wherein P DBe the depth map cycle of main channel, L MGImage sets length for the main channel.
An image sets for the secondary channels video flowing, is formed with a plurality of picture frames in described channel selecting unit, for each image sets, one of them picture frame as key frame, remaining picture frame as non-key frame; Satisfy P D≤ L AG≤ L MG, and P DCan be by L AGDivide exactly L AGCan be by L MGDivide exactly LA GBe positive integer, wherein L AGBe the image sets length of secondary channels, L MGBe the image sets length of main channel, P DBe the depth map cycle of main channel.
Described channel selecting unit, in single channel group, for the secondary channels video flowing, there is depth map in the position, main channel of its key frame position correspondence; If there is not depth map in synchronization main channel picture frame, then the picture frame of all secondary channels all is non-key frame constantly.
Described image rendering unit in single channel group, for the key frame of secondary channels video flowing, according to the reconstructed image frame and the depth map of this moment main channel, adopts the method for deep image rendering to produce the prognostic chart that secondary channels is somebody's turn to do key frame constantly.
Described image rendering unit, key frame for the secondary channels video flowing, if the multichannel video stream of input is divided into a plurality of channel group, the main channel of a plurality of channel group that selection and this secondary channels are contiguous, the depth map cycle of these main channels can be divided exactly by the image sets length of this secondary channels, according to the reconstructed image frame and the depth map of these these main channels of moment, the method for employing deep image rendering produces the prognostic chart of the key frame of this secondary channels.
Described predict coding unit for the main channel video flowing, is carried out compressed encoding according to the video encoding standard method; For the main channel depth map, utilize the movable information of the main channel picture frame of this depth map correspondence to predict the movable information of this depth map, carry out compressed encoding according to the video encoding standard method; For the secondary channels video flowing, key frame and its prognostic chart is poor, and the Occlusion Map of generation key frame to Occlusion Map, is carried out compressed encoding according to the video encoding standard method; For the secondary channels video flowing, the key frame rebuild as the reference frame, is adopted the B frame predict structure of stratification to the non-key frame in the image sets, carry out compressed encoding according to the video encoding standard method.
The present invention is directed to the content relevance that multichannel video stream exists on spatial domain, propose to utilize the coding method of depth information.According to main channel picture frame and its depth map, can render prognostic chart apace with the contiguous a plurality of secondary channels picture frames in main channel.Compare with MVC, depth map can be predicted a plurality of secondary channels, and parallax information can only be used for its pairing passage, so can obtain higher encoding compression efficient based on the multi-channel video stream encryption of depth information.
The depth map cycle of main channel has reflected that the main channel is with the content relevance of secondary channels on spatial domain.The same camera parameters of the content relevance of multichannel video stream on spatial domain, especially closely related with the distance between passage and the passage.Distance between the passage is short more, and the correlation of the picture frame of two passage synchronizations is big more, otherwise correlation is more little.So the cycle of main channel depth map should be variable.When the correlation on the spatial domain during greater than the correlation on the time-domain, the cycle of main channel depth map is less, and the picture frame of more secondary channels adopts depth information to play up acquisition; When the correlation on the spatial domain during less than the correlation on the time-domain, the cycle of main channel depth map is bigger, and the picture frame of more secondary channels adopts the prediction on the time-domain to obtain.
Main channel of the present invention depth map variable period, depth map cycle P DExcursion at 1≤P D≤ L MG, and PDCan be by L MGDivide exactly L MGGOP length for the main channel.When the spatial coherence between the passage is very big, P DEqual 1, each picture frame of main channel all produces depth map, and each picture frame of secondary channels is all played up by the main channel and obtained prognostic chart; When the spatial coherence between the passage is minimum, P DEqual L MG, the main channel only produces depth map on key frame.Secondary channels has only key frame to play up via the main channel to obtain prognostic chart, and the non-key frame of secondary channels utilizes the correlation on the time-domain to carry out predictive coding.
The GOP variable-length of secondary channels of the present invention, the GOP length of different auxiliary passage can be different.The GOP length L of secondary channels AGSatisfy P D≤ L AG≤ L MG, and P DCan be by L AGDivide exactly L AGCan be by L MGDivide exactly L AGBe positive integer, L MGBe the length of the GOP of main channel video flowing, P DBe the depth map cycle of main channel video flowing.When the spatial coherence of secondary channels and main channel is big, L AGLess, more secondary channels picture frame is played up through DIBR by the main channel and is obtained prognostic chart; When the spatial coherence of secondary channels and main channel is minimum, L AGEqual L MG, secondary channels only obtains prognostic chart in the position of corresponding main channel key frame.
The block information of the prognostic chart of the secondary channels key frame that the present invention's coding and transmission employing DIBR method obtain.There is the cavity in the inside of the secondary channels picture frame that utilizes depth map to play up to obtain.Cause by blocking in the cavity, block refer to a part of picture material in secondary channels as seen, and invisible in the main channel, cause the depth map of main channel not have the corresponding points of synchronization secondary channels picture frame.A method that solves occlusion issue is a coding transmission cavity information.Because the camera parameters of main channel and secondary channels is different, the illumination condition difference makes that playing up the secondary channels key frame prognostic chart that obtains there are differences with photographic images.Therefore empty information and different information all need the transmission that is encoded, to obtain high-quality reconstructed image frame.
When having a plurality of channel group (GOV, Group of View), the present invention adopts the video encoding standard method to carry out encoding compression to a plurality of main channels independently, and the prognostic chart of secondary channels key frame is adopted the acquisition of playing up based on a plurality of depth maps.Single depth map is played up the central cavity of prognostic chart of acquisition, can in the middle of playing up the prognostic chart of acquisition, other depth maps be compensated, so a plurality of depth maps can obtain quality better prediction figure to playing up of same secondary channels key frame, reduce the block information of prognostic chart to be encoded.
The present invention adopts the deep image rendering technology to excavate the main channel with the correlation of a plurality of secondary channels on spatial domain; The power of correlation on spatial domain according to main channel and secondary channels, decoupling zero main channel depth map cycle and main channel GOP length, coupling main channel depth map cycle and secondary channels GOP length are with correlation on the abundant excavated space territory and the correlation on the time-domain; Adopt a plurality of depth images to play up same picture frame, block the cavity that exists in the middle of the rendering image that causes with minimizing; The block information of coding and transmission diagram picture frame is to obtain high-quality reconstructed image frame.Compare with ATTEST, the present invention has guaranteed reconstructed image quality, compares with MVC, and the present invention has obtained higher encoding compression performance.
Description of drawings
Fig. 1 utilizes the multichannel video stream encoder structural representation of depth information for the present invention;
Fig. 2 is the coding structures of 5 passage video flowings of first example on spatial domain;
Fig. 3 is the coding structures of 5 passage video flowings of first example on time-domain and spatial domain;
Fig. 4 is the coding structures of 8 passage video flowings of second example on spatial domain;
Fig. 5 is the coding structures of 8 passage video flowings of second example on time-domain and spatial domain;
Fig. 6 is the coding structure of 9 passage video flowings on spatial domain of the 3rd example two-dimensional arrangements;
Fig. 7 is the coding structure of 36 passage video flowings on spatial domain of the 4th example two-dimensional arrangements.
Embodiment
Below in conjunction with accompanying drawing embodiments of the invention are described.
The multichannel video stream encoding method that utilizes depth information is that encoder is arranged to channel selecting unit, degree of depth generation unit, image rendering unit and four functional units of predict coding unit, to the multichannel video stream of input, encodes according to the following steps:
1) selects main channel and secondary channels by described channel selecting unit;
2) produce the depth map of main channel picture frame by described degree of depth generation unit;
3) produce the prognostic chart of secondary channels key frame by described image rendering unit;
4) by described predict coding unit the Occlusion Map and the non-key frame of main channel video streaming image frame and depth map, secondary channels key frame carried out compressed encoding according to the video encoding standard method, compressed bit stream behind the output encoder, video encoding standard method comprise video coding international standard MPEG-X, H.26X with video coding national standard AVS.
The video camera of a plurality of passage correspondences to be encoded of described input channel selected cell is one dimension or two-dimensional arrangements, and its optical axis is perpendicular to common plane or converge at photographed scene.
The correlation of picture frame on spatial domain of different passage synchronizations according to the distance between camera parameters and passage and the passage, measured in described channel selecting unit, selects one or more main channel, remaining passage as secondary channels; A channel group is formed in a main channel and a plurality of secondary channels,, a plurality of passages of importing are divided into one or more channel group according to the correlation of different channel image frames on spatial domain.
An image sets for the main channel video flowing, is formed with a plurality of picture frames in described channel selecting unit, for each image sets, one of them picture frame as key frame, as non-key frame, is adopted the B frame predict structure of stratification to remaining picture frame to image sets; For the main channel video flowing, its depth map variable period satisfies 1≤P D≤ L MG, and P DCan be by L MGDivide exactly P DBe positive integer, wherein P DBe the depth map cycle of main channel, L MGImage sets length for the main channel.
An image sets for the secondary channels video flowing, is formed with a plurality of picture frames in described channel selecting unit, for each image sets, one of them picture frame as key frame, remaining picture frame as non-key frame; Satisfy P D≤ L AG≤ L MG, and P DCan be by L AGDivide exactly L AGCan be by L MGDivide exactly L AGBe positive integer, wherein L AGBe the image sets length of secondary channels, L MGBe the image sets length of main channel, P DBe the depth map cycle of main channel.
Described channel selecting unit, in single channel group, for the secondary channels video flowing, there is depth map in the position, main channel of its key frame position correspondence; If there is not depth map in synchronization main channel picture frame, then the picture frame of all secondary channels all is non-key frame constantly.
Described image rendering unit in single channel group, for the key frame of secondary channels video flowing, according to the reconstructed image frame and the depth map of this moment main channel, adopts the method for deep image rendering to produce the prognostic chart that secondary channels is somebody's turn to do key frame constantly.
Described image rendering unit, key frame for the secondary channels video flowing, if the multichannel video stream of input is divided into a plurality of channel group, the main channel of a plurality of channel group that selection and this secondary channels are contiguous, the depth map cycle of these main channels can be divided exactly by the image sets length of this secondary channels, according to the reconstructed image frame and the depth map of these these main channels of moment, the method for employing deep image rendering produces the prognostic chart of the key frame of this secondary channels.
Described predict coding unit for the main channel video flowing, is carried out compressed encoding according to the video encoding standard method; For the main channel depth map, utilize the movable information of the main channel picture frame of this depth map correspondence to predict the movable information of this depth map, carry out compressed encoding according to the video encoding standard method; For the secondary channels video flowing, key frame and its prognostic chart is poor, and the Occlusion Map of generation key frame to Occlusion Map, is carried out compressed encoding according to the video encoding standard method; For the secondary channels video flowing, the key frame rebuild as the reference frame, is adopted the B frame predict structure of stratification to the non-key frame in the image sets, carry out compressed encoding according to the video encoding standard method.
Fig. 1 is according to the schematic diagram that utilizes the multichannel video stream encoder of depth information of the present invention.Multichannel video stream encoder carries out compressed encoding, the compressed bit stream behind the output encoder to the multichannel video stream of input.Encoder comprises channel selecting unit 11, degree of depth generation unit 12, image rendering unit 13 and predict coding unit 14.
Embodiment is described in detail the present invention below in conjunction with accompanying drawing.
Embodiment 1, and 5 passage video flowings linear with one dimension to be encoded or arc shooting are example, is elaborated with regard to four functional units and the coding method thereof of multichannel video stream encoder.
(1) the channel selecting unit 11
Referring to Fig. 2, according to relative position, order is designated as passage 1,2,3,4, No. 55 passages of input.Select No. 3 passages as the main channel, as secondary channels, 5 passages are formed a GOV 4 remaining passages.
Referring to Fig. 3, the GOP length of main channel (No. 3 passages) is decided to be 8, a key frame appears every 7 picture frames; The depth map cycle of main channel is decided to be 4, produces a depth map every 3 picture frames.GOP length with adjacent nearer No. 2, No. 4 passages in main channel is decided to be 4, the GOP length of No. 1, No. 5 passage is decided to be 8.More than Xuan Ding main channel GOP length, depth map cycle and secondary channels GOP length are in order next clearly to describe coding method, and in fact main channel GOP length, depth map cycle and secondary channels GOP length can be decided to be any positive integer that satisfies claim 4 and claim 5.
(2) degree of depth generation unit 12
This unit produces the depth map of main channel picture frame according to the main channel picture frame with the picture frame of the synchronization of the contiguous secondary channels in main channel.
Referring to Fig. 3, transverse axis is designated as channel direction V, and 5 passages from left to right are respectively V1, V2, V3, V4, V5; The longitudinal axis is designated as time orientation T, for convenience of description, and Fig. 39 picture frames on the time orientation that drawn.By the transverse axis and the longitudinal axis are distinguished mark, can determine the position of a picture frame on the space time territory uniquely, for example, V1T1 has determined the picture frame in Fig. 3 upper left corner, the position of other picture frame can determine that by the combination of V and T its label does not draw at Fig. 3 similarly.
Referring to Fig. 3, the depth map cycle of main channel (No. 3 passages) is 4, and the picture frame that indicates D is the depth map of this location drawing picture frame of main channel, indicates the key frame (V1T1 do not mark K, this picture frame also be key frame) of the picture frame of K for this picture frame place passage.
Referring to Fig. 3, in 9 picture frames of main channel, V3T1, V3T5, V3T9 need corresponding depth map.With V3T1 is example, and 12 couples of V3T1 of degree of depth generation unit and V4T1 (perhaps V2T1) make three-dimensional coupling, produces the depth map of V3T1, is designated as D V3T1
(3) the image rendering unit 13
This unit adopts the DIBR method according to main channel picture frame and depth map, produces the prognostic chart of the key frame of secondary channels synchronization.
Referring to Fig. 3, the camber line of horizontal direction is represented the DIBR method, two input and output that the channel image frame is DIBR that camber line connects, camber line do not connect the picture frame and the depth map of main channel with an end of arrow, an end of camber line band arrow points to the key frame of secondary channels.
Referring to Fig. 3, be example with the key frame V4T1 of No. 4 passages, 13 pairs of main channels of image rendering unit picture frame V3T1 and its depth map D V3T1, adopt the DIBR method, produce the prognostic chart of V4T1, be designated as P V4T1
(4) predict coding unit 14
This unit is according to the video encoding standard method, and Occlusion Map and non-key frame to main channel picture frame and depth map, secondary channels key frame carry out encoding compression.
Referring to Fig. 3, the GOP length of main channel is 8, and each GOP comprises 1 key frame and 7 non-key frames, GOP is adopted the B frame coding structure of stratification.Motion prediction on the camber line express time territory of vertical direction points to two vertical camber lines of 1 non-key frame and represents that this non-key frame can be with reference to two picture frames.1 GOP (comprising picture frame V3T1 to V3T8) with the main channel is an example, key frame V3T1 and V3T9 are made intraframe predictive coding, as the B frame, two picture frames with future in past are done predictive coding as the reference frame on the select time direction non-key frame (V3T2 is to V3T7).With non-key frame V3T5 is example, and it selects V3T1 and V3T9 to make inter prediction encoding as the reference frame.
Referring to Fig. 3, the depth map cycle of main channel is 4, every 3 picture frames, 1 depth map of need encoding.1 GOP of main channel comprises 2 depth maps, and these 2 depth maps are formed 1 GOD (Group ofDepth).The coding of GOD is similar to coding to GOP, is that example (comprises depth map D with a GOD of main channel V3T1And D V3T5), to D V3T1And D V3T9Make intraframe predictive coding, D V3T5As the B frame, predict D with the motion vector of picture frame V3T5 V3T5Motion vector, select D V3T1And D V3T9As the reference frame, to D V3T5Make inter prediction encoding.
Referring to Fig. 3, the prognostic chart of the key frame of secondary channels is produced by image rendering unit 13, and is poor to key frame and its prognostic chart of secondary channels, produces the Occlusion Map of this key frame, and the Occlusion Map and the non-key frame of secondary channels are made encoding compression.A GOP with No. 4 secondary channels is example (comprising picture frame V4T1 to V4T4), and the Occlusion Map of key frame V4T1 and V4T5 is designated as R respectively V4T1, R V4T5, to R V4T1And R V4T5Adopt the video encoding standard method to make dct transform, quantification, entropy coding, as the B frame, two picture frames with future in past are done predictive coding as the reference frame on the select time direction non-key frame (V4T2 is to V4T4).With non-key frame V4T3 is example, and this picture frame selects V4T1 and V4T5 to make inter prediction encoding as the reference frame.
Embodiment 2, and 8 passage video flowings linear with one dimension to be encoded or arc shooting are example, with regard to encoder of the present invention its place different with embodiment 1 of encoding are illustrated.
(1) the channel selecting unit 11
Referring to Fig. 4, according to relative position, order is designated as passage 1 to No. 88 passages of input.Select No. 3, No. 6 passages as the main channel, 6 remaining passages as secondary channels.8 passages are divided into two GOV, and 1,2,3, No. 4 passage is formed a GOV, and 5,6,7, No. 8 passage is formed a GOV.
Referring to Fig. 5, the GOP length of main channel (No. 3, No. 6 passages) is decided to be 8, a key frame appears every 7 picture frames; The depth map cycle of main channel is decided to be 4, produces a depth map every 3 picture frames.The GOP length of 2,4,5, No. 7 passages is decided to be 4, the GOP length of 1, No. 8 passage is decided to be 8.More than Xuan Ding main channel GOP length, depth map cycle and secondary channels GOP length are in order next clearly to describe coding method, and in fact main channel GOP length, depth map cycle and secondary channels GOP length can be decided to be any positive integer that satisfies claim 4 and claim 5.
(2) degree of depth generation unit 12
This unit produces the depth map of main channel picture frame according to the main channel picture frame with the picture frame of the synchronization of the contiguous secondary channels in main channel.The working method of this unit is the same with embodiment 1.
(3) the image rendering unit 13
This unit according to the picture frame and the depth map of the synchronization of the contiguous main channel of secondary channels key frame, adopt the DIBR method, produce the prognostic chart of this secondary channels key frame.When secondary channels had the main channel of a plurality of vicinities, a plurality of main channels were made DIBR respectively, predicted the same key frame of this secondary channels, obtained the prognostic chart of this secondary channels key frame.
Referring to Fig. 5, the camber line of horizontal direction is represented the DIBR method, two input and output that the channel image frame is DIBR that camber line connects, camber line do not connect the picture frame and the depth map of main channel with an end of arrow, an end of camber line band arrow points to the key frame of secondary channels.
Referring to Fig. 5, be example with the key frame V4T1 of No. 4 passages, image rendering unit 13 is according to No. 3 main channel picture frame V3T1 and its depth map D V3T1, adopt the DIBR method to produce the prognostic chart of V4T1, according to No. 6 main channel picture frame V6T1 and its depth map D V6T1, employing DIBR method produces the prognostic chart of V4T1, obtains the prognostic chart of No. 4 secondary channels key frame V4T1 according to above-mentioned two prognostic charts.
(4) predict coding unit 14
This unit is according to the video encoding standard method, and Occlusion Map and non-key frame to main channel picture frame and depth map, secondary channels key frame carry out encoding compression.The working method of this unit is the same with embodiment 1.
Embodiment 3, are example with 9 passage video flowings of two-dimensional arrangements to be encoded, with regard to encoder of the present invention its place different with embodiment 1 of encoding are illustrated.
(1) the channel selecting unit 11
Referring to Fig. 6, according to relative position, order is designated as passage 1 to No. 99 passages of input.Select No. 5 passages as the main channel, 8 remaining passages as secondary channels.9 passages are formed a GOV.
The same with embodiment 1, determine the GOP length of the GOP length of main channel and depth map cycle, secondary channels.
(2) degree of depth generation unit 12
This unit produces the depth map of main channel picture frame according to the main channel picture frame with the picture frame of the synchronization of the contiguous secondary channels in main channel.The working method of this unit is the same with embodiment 1.
(3) the image rendering unit 13
This unit according to the picture frame and the depth map of the synchronization of the contiguous main channel of secondary channels key frame, adopt the DIBR method, produce the prognostic chart of this secondary channels key frame.The working method of this unit is the same with embodiment 1.
4) predict coding unit 14
This unit is according to the video encoding standard method, and Occlusion Map and non-key frame to main channel picture frame and depth map, secondary channels key frame carry out encoding compression.The working method of this unit is the same with embodiment 1.
Embodiment 4, are example with 36 passage video flowings of two-dimensional arrangements to be encoded, with regard to encoder of the present invention its place different with embodiment 1 of encoding are illustrated.
(1) the channel selecting unit 11
Referring to Fig. 7, according to relative position, order is designated as passage 1 to No. 36 36 passages of input.Select No. 8, No. 11, No. 26, No. 29 passages as the main channel, 32 remaining passages as secondary channels.Per 9 passages are formed a GOV, and 36 passages are divided into 4 GOV.
The same with embodiment 1, determine the GOP length of the GOP length of main channel and depth map cycle, secondary channels.
(2) degree of depth generation unit 12
This unit produces the depth map of main channel picture frame according to the main channel picture frame with the picture frame of the synchronization of the contiguous secondary channels in main channel.The working method of this unit is the same with embodiment 1.
(3) the image rendering unit 13
This unit according to the picture frame and the depth map of the synchronization of the contiguous main channel of secondary channels key frame, adopt the DIBR method, produce the prognostic chart of this secondary channels key frame.When secondary channels had the main channel of a plurality of vicinities, a plurality of main channels were made DIBR respectively, predicted the same key frame of secondary channels, obtained the prognostic chart of this secondary channels key frame.
Referring to Fig. 7, the camber line that connects two passages is represented the DIBR method, and camber line does not connect the main channel with an end of arrow, and an end of camber line band arrow points to secondary channels.8 secondary channels with GOV1 are example, and 1,2, No. 7 secondary channels and No. 8 main channels are contiguous, and these 3 secondary channels select No. 8 main channels to play up as DIBR; 3, No. 9 passages and 8, No. 11 main channels are contiguous, and these 2 secondary channels select 8, No. 11 main channels to play up as DIBR; 13, No. 14 passages and 8, No. 26 main channels are contiguous, and these 2 secondary channels select 8, No. 26 main channels to play up as DIBR; No. 15 passages and 8,11,26, No. 29 main channels are contiguous, and this 1 secondary channels selects above-mentioned 4 main channels to play up as DIBR.
4) predict coding unit 14
This unit is according to the video encoding standard method, and Occlusion Map and non-key frame to main channel picture frame and depth map, secondary channels key frame carry out encoding compression.The working method of this unit is the same with embodiment 1.

Claims (7)

1. multichannel video stream encoding method that utilizes depth information, it is characterized in that encoder is arranged to channel selecting unit, degree of depth generation unit, image rendering unit and four functional units of predict coding unit, to the multichannel video stream of input, encode according to the following steps:
1) selects main channel and secondary channels by described channel selecting unit;
2) by described degree of depth generation unit according to the main channel picture frame with produce the depth map of main channel picture frame with the picture frame of the synchronization of the contiguous secondary channels in main channel;
3) by main channel picture frame and the depth map of described image rendering unit, adopt the prognostic chart that produces the key frame of secondary channels synchronization based on the deep image rendering method according to vicinity;
4) by described predict coding unit the Occlusion Map and the non-key frame of main channel video streaming image frame and depth map, secondary channels key frame carried out compressed encoding according to the video encoding standard method, compressed bit stream behind the output encoder, video encoding standard method comprise video coding international standard MPEG-X, H.26X with video coding national standard AVS;
Described channel selecting unit, in single channel group, for the secondary channels video flowing, there is depth map in the position, main channel of its key frame position correspondence; If there is not depth map in synchronization main channel picture frame, then the picture frame of all secondary channels all is non-key frame constantly;
Described predict coding unit for the main channel video flowing, is carried out compressed encoding according to the video encoding standard method; For the main channel depth map, utilize the movable information of the main channel picture frame of this depth map correspondence to predict the movable information of this depth map, carry out compressed encoding according to the video encoding standard method; For the secondary channels video flowing, key frame and its prognostic chart is poor, and the Occlusion Map of generation key frame to Occlusion Map, is carried out compressed encoding according to the video encoding standard method; For the secondary channels video flowing, the key frame rebuild as the reference frame, is adopted the B frame predict structure of stratification to the non-key frame in the image sets, carry out compressed encoding according to the video encoding standard method.
2. a kind of multichannel video stream encoding method that utilizes depth information as claimed in claim 1, the video camera that it is characterized in that a plurality of passage correspondences to be encoded of described channel selecting unit, be one dimension or two-dimensional arrangements, its optical axis is perpendicular to common plane or converge at photographed scene.
3. a kind of multichannel video stream encoding method that utilizes depth information as claimed in claim 1, it is characterized in that described channel selecting unit, according to the distance between camera parameters and passage and the passage, measure the correlation of picture frame on spatial domain of different passage synchronizations, select one or more main channel, remaining passage as secondary channels; A channel group is formed in a main channel and a plurality of secondary channels,, a plurality of passages of importing are divided into one or more channel group according to the correlation of different channel image frames on spatial domain.
4. a kind of multichannel video stream encoding method that utilizes depth information as claimed in claim 1, it is characterized in that described channel selecting unit, for the main channel video flowing, a plurality of picture frames are formed an image sets, for each image sets, one of them picture frame as key frame, as non-key frame, is adopted the B frame predict structure of stratification to remaining picture frame to image sets; For the main channel video flowing, its depth map variable period satisfies 1≤P D≤ L MG, and P DCan be by L MGDivide exactly P DBe positive integer, wherein P DBe the depth map cycle of main channel, L MGImage sets length for the main channel.
5. a kind of multichannel video stream encoding method that utilizes depth information as claimed in claim 1, it is characterized in that described channel selecting unit, for the secondary channels video flowing, a plurality of picture frames are formed an image sets, for each image sets, one of them picture frame as key frame, remaining picture frame as non-key frame; Satisfy P D≤ L AG≤ L MG, and P DCan be by L AGDivide exactly L AGCan be by L MGDivide exactly L AGBe positive integer, wherein L AGBe the image sets length of secondary channels, L MGBe the image sets length of main channel, P DBe the depth map cycle of main channel.
6. a kind of multichannel video stream encoding method that utilizes depth information as claimed in claim 1, it is characterized in that described image rendering unit, in single channel group, key frame for the secondary channels video flowing, according to the reconstructed image frame and the depth map of this moment main channel, adopt the method for deep image rendering to produce the prognostic chart that secondary channels is somebody's turn to do key frame constantly.
7. a kind of multichannel video stream encoding method that utilizes depth information as claimed in claim 1, it is characterized in that described image rendering unit, key frame for the secondary channels video flowing, if the multichannel video stream of input is divided into a plurality of channel group, the main channel of a plurality of channel group that selection and this secondary channels are contiguous, the depth map cycle of these main channels can be divided exactly by the image sets length of this secondary channels, according to the reconstructed image frame and the depth map of these these main channels of moment, the method for employing deep image rendering produces the prognostic chart of the key frame of this secondary channels.
CN 200810062864 2008-07-07 2008-07-07 A kind of multichannel video stream encoding method that utilizes depth information Active CN100563339C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 200810062864 CN100563339C (en) 2008-07-07 2008-07-07 A kind of multichannel video stream encoding method that utilizes depth information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 200810062864 CN100563339C (en) 2008-07-07 2008-07-07 A kind of multichannel video stream encoding method that utilizes depth information

Publications (2)

Publication Number Publication Date
CN101309411A CN101309411A (en) 2008-11-19
CN100563339C true CN100563339C (en) 2009-11-25

Family

ID=40125590

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 200810062864 Active CN100563339C (en) 2008-07-07 2008-07-07 A kind of multichannel video stream encoding method that utilizes depth information

Country Status (1)

Country Link
CN (1) CN100563339C (en)

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2197217A1 (en) * 2008-12-15 2010-06-16 Koninklijke Philips Electronics N.V. Image based 3D video format
KR20120081022A (en) * 2009-05-01 2012-07-18 톰슨 라이센싱 3d video coding formats
KR20100135032A (en) * 2009-06-16 2010-12-24 삼성전자주식회사 Conversion device for two dimensional image to three dimensional image and method thereof
WO2012036901A1 (en) 2010-09-14 2012-03-22 Thomson Licensing Compression methods and apparatus for occlusion data
CN103430549B (en) * 2011-03-18 2017-05-10 索尼公司 Image processing device and image processing method
CN102325259A (en) 2011-09-09 2012-01-18 青岛海信数字多媒体技术国家重点实验室有限公司 Method and device for synthesizing virtual viewpoints in multi-viewpoint video
CN102572440B (en) * 2012-03-15 2013-12-18 天津大学 Multi-viewpoint video transmission method based on depth map and distributed video coding
US20130271565A1 (en) * 2012-04-16 2013-10-17 Qualcomm Incorporated View synthesis based on asymmetric texture and depth resolutions
CN102665109A (en) * 2012-04-19 2012-09-12 中兴通讯股份有限公司 Transmitting and receiving method of multimedia video data and corresponding devices
JP6019729B2 (en) * 2012-05-11 2016-11-02 ソニー株式会社 Image processing apparatus, image processing method, and program
CN104854862A (en) * 2012-12-27 2015-08-19 日本电信电话株式会社 Image encoding method, image decoding method, image encoding device, image decoding device, image encoding program, image decoding program, and recording medium
KR101737595B1 (en) 2012-12-27 2017-05-18 니폰 덴신 덴와 가부시끼가이샤 Image encoding method, image decoding method, image encoding device, image decoding device, image encoding program, and image decoding program
CN105474640B (en) * 2013-07-19 2019-03-15 寰发股份有限公司 The method and apparatus that the camera parameters of 3 d video encoding transmit
CN108596964B (en) * 2018-05-02 2020-07-03 厦门美图之家科技有限公司 Depth data acquisition method and device and readable storage medium
CN112633324A (en) * 2020-11-27 2021-04-09 中山大学 System, method and medium for matching stereoscopic vision around the eyes based on neural network
CN117015970A (en) * 2020-12-14 2023-11-07 浙江大学 Decoding method, inter-view prediction method, decoder and encoder
CN115623224A (en) * 2021-07-12 2023-01-17 华为技术有限公司 Data processing method and system and electronic equipment
CN115731633A (en) * 2021-08-30 2023-03-03 成都纵横自动化技术股份有限公司 Visualization method and system for multiple data acquired by sensor
CN113891019A (en) * 2021-09-24 2022-01-04 深圳Tcl新技术有限公司 Video encoding method, video encoding device, shooting equipment and storage medium

Also Published As

Publication number Publication date
CN101309411A (en) 2008-11-19

Similar Documents

Publication Publication Date Title
CN100563339C (en) A kind of multichannel video stream encoding method that utilizes depth information
CN100563340C (en) Multichannel video stream encoder and decoder based on deep image rendering
CN102017628B (en) Coding of depth signal
CN102918836B (en) Frame for asymmetric stereo video encapsulates
US8345751B2 (en) Method and system for encoding a 3D video signal, enclosed 3D video signal, method and system for decoder for a 3D video signal
CN103155571B (en) Decoding stereo video data
CN100496121C (en) Image signal processing method of the interactive multi-view video system
Yea et al. View synthesis prediction for multiview video coding
CN103493483B (en) Decoding multi-view video plus depth content
CN102939763B (en) Calculating disparity for three-dimensional images
CN101170702B (en) Multi-view video coding method
CN105308965A (en) Harmonized inter-view and view synthesis prediction for 3D video coding
KR100738867B1 (en) Method for Coding and Inter-view Balanced Disparity Estimation in Multiview Animation Coding/Decoding System
CN107277550A (en) Multi-view signal codec
CN105308969B (en) View synthesis in 3 D video
CN104243966A (en) method and device for generating, storing, transmitting, receiving and reproducing depth maps
CN102413332B (en) Multi-viewpoint video coding method based on time-domain-enhanced viewpoint synthesis prediction
JP2014528190A (en) Camera and / or depth parameter signaling
CN106105218B (en) The method of camera parameter is handled in 3 dimension Video codings
Kovács et al. Overview of the applicability of H. 264/MVC for real-time light-field applications
CN106664423A (en) Depth picture coding method and device in video coding
Lee et al. A framework of 3D video coding using view synthesis prediction
Merkle et al. Efficient compression of multi-view depth data based on MVC
CN102625097B (en) Method for intra-frame prediction of three-dimensional video and coding and decoding methods
Vetro et al. Analysis of 3D and multiview extensions of the emerging HEVC standard

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C41 Transfer of patent application or patent right or utility model
TR01 Transfer of patent right

Effective date of registration: 20160620

Address after: 518000 new energy building, Nanhai Road, Shenzhen, Guangdong, Nanshan District A838

Patentee after: Meng Qi media (Shenzhen) Co. Ltd.

Address before: 310027 Hangzhou, Zhejiang Province, Zhejiang Road, No. 38

Patentee before: Zhejiang University

C41 Transfer of patent application or patent right or utility model
TR01 Transfer of patent right

Effective date of registration: 20160831

Address after: 518000, 101, 2, Fengyun technology building, Fifth Industrial Zone, North Ring Road, Shenzhen, Guangdong, Nanshan District

Patentee after: World wide technology (Shenzhen) Limited

Address before: 518000 new energy building, Nanhai Road, Shenzhen, Guangdong, Nanshan District A838

Patentee before: Meng Qi media (Shenzhen) Co. Ltd.

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20180903

Address after: 518000 B unit 101, Fengyun mansion 5, Xili street, Nanshan District, Shenzhen, Guangdong.

Patentee after: Wan D display technology (Shenzhen) Co., Ltd.

Address before: 518000 2 of Fengyun tower, Fifth Industrial Zone, Nanshan District North Ring Road, Shenzhen, Guangdong, 101

Patentee before: World wide technology (Shenzhen) Limited