CN102625124A

CN102625124A - Stereo encoding device, decoding device and system

Info

Publication number: CN102625124A
Application number: CN201210055895XA
Authority: CN
Inventors: 白慧慧; 赵耀
Original assignee: Beijing Jiaotong University
Current assignee: Beijing Jiaotong University
Priority date: 2012-03-05
Filing date: 2012-03-05
Publication date: 2012-08-01
Anticipated expiration: 2032-03-05
Also published as: CN102625124B

Abstract

The invention discloses a stereo encoding device, a decoding device and a system. The encoding device comprises a plurality of pairs of encoders which are identical with but independent from one another, and each pair of encoders comprises a left view access and a right access. Each of the accesses comprises an odd-even frame separation module, a CS encoder, a standard encoder and a mode selection module. Left view signals are separated into odd frames and even frames through the odd-even frame separation module, the odd frames are encoded by the CS encoder to obtain CS codes, the mode selection module is used for controlling working mode of the CS encoder, and the even frames are encoded by the standard encoder to obtain key frames. Right view signals are separated into odd frames and even frames through the odd-even frame separation module, the even frames are encoded by the CS encoder to obtain CS codes, the mode selection module is used for controlling working mode of the CS encoder, and the odd frames are encoded by the standard encoder to obtain key frames. The invention further provides the decoding device and the system. The stereo encoding device, the decoding device and the system can be applied to products with multi-view or binocular stereo display.

Description

A kind of D encoding, decoding device and system

Technical field

The present invention relates to the video coding and decoding technology field, relate in particular to a kind of D encoding, decoding device and system.

Background technology

Because 3 D video can be experienced for the user provides the multimedia of high-quality and immersion, has attracted industrial quarters academia extensive studies interest.3 D video generally is divided into many orders (Multiview) representation of video shot at N visual angle and binocular (Stereo-view) representation of video shot at two visual angles.The basic format of binocular video comprises LOOK LEFT and LOOK RIGHT, is to be obtained simultaneously by two close video cameras of distance.Consider the terseness and the practicality of binocular video, at present binocular video is the form of extensive use the most on the 3 D video market.Yet binocular or how visual frequency googol are used to 3 D video according to amount and have been proposed bigger challenge, for example obtain, compress in data and transmission aspect, especially be applied to wireless video sensor network.In plurality of applications,, can not carry out message transmission between the video coding of requirement use low complex degree and the video camera in view of the lower power consumption of video camera.Therefore, be necessary to develop a kind of system and have not communication between high compression efficiency and the low complex degree simultaneous camera concurrently.

In recent years, the MPEG of the VCEG of ITU-T tissue and ISO/IEC organizes and has proposed the H.264/MPEG-4 extension standards of AVC, is used for realizing how visual frequency coding (Multiview video coding is called for short MVC).The basic thought of MVC also is based on the predictive coding of piece, can be good at utilizing the relativity of time domain of correlation and same visual angle between the visual angle.According to correlation between the visual angle and same visual angle relativity of time domain, a kind of binocular video of adaptive prediction structure be coded in " L.Meng, Y.Zhao; A.Wang, J.Pan and H.Bai, " Compatible Stereo Video Coding with Adaptive Prediction Structure; " IEICE Trans.on Information and Systems, vol.E94-D, no.7; Pp.1506-1509,2011. " the middle proposition.In addition, " L.Ding, S.Chien and L.Chen; " Joint Prediction Algorithm and Architecture for Stereo Video Hybrid Coding Systems; " IEEE Trans.on Circuits and Systems for Video Technology, vol.16, no.11; Pp.1324-1337,2006. " also propose a kind of associated prediction algorithm in and designed the binocular video encoder.Though above-mentioned algorithm has obtained higher compression efficiency,, video encoder needs higher power consumption to support predictive coding, and also needs transmission channel to communicate between the video camera.In practical application, be difficult to provide the communication channel between the video camera.

Summary of the invention

The technical problem that the present invention solves is how to reduce video coding power, and does not need transmission channel to communicate between the video camera.

In order to overcome the above problems; A kind of D encoding device; It is characterized in that: comprise that some each comprises a LOOK LEFT and LOOK RIGHT passage to encoder to identical and encoder independently, each passage comprises with lower module: parity frame separation module, CS encoder, standard coders and mode selection module; Separate into odd-numbered frame and even frame behind the LOOK LEFT signal process parity frame separation module; Odd-numbered frame obtains the CS sign indicating number through the CS encoder encodes, and mode selection module is used to control CS encoder mode of operation, and even frame obtains key frame through the standard coders coding; Separate into odd-numbered frame and even frame behind the LOOK RIGHT signal process parity frame separation module, even frame obtains the CS sign indicating number through the CS encoder encodes, and mode selection module is used to control CS encoder mode of operation, and odd-numbered frame obtains key frame through the standard coders coding.

Further, as a kind of preferred version, said D encoding is many order codings of various visual angles or double vision angle binocular coding.

Further, as a kind of preferred version, said mode selection module comprises the SKIP pattern; At first calculate the mean square error between the piece that current block is adjacent same position in the key frame; If should be worth less than threshold value t0, then current block is skipped, and has no measured value to need transmission.

The present invention also provides a kind of three-dimensional decoding device; Comprise some to identical and decoder independently; Every pair of decoder comprises a LOOK LEFT and LOOK RIGHT passage, and each passage comprises with lower module: CS reconstructed module, standard decoder, associating dictionary and parity frame interweave, and the CS frame that receives obtains odd-numbered frame through the CS reconstructed module; The key frame that receives obtains even frame through standard decoder, obtains the output of LOOK LEFT video sequence after odd-numbered frame and even frame interweave through parity frame; The CS frame that receives obtains even frame through the CS reconstructed module, and the key frame that receives obtains odd-numbered frame through standard decoder, obtains the output of LOOK RIGHT video sequence after odd-numbered frame and even frame interweave through parity frame; Obtain the associating dictionary through the decoded data dependence of different visual angles standard decoder, the associating dictionary is used to control the CS reconstructed module.

The present invention also provides a kind of three-dimensional coding/decoding system that is made up of above code device and decoding device.

Through adopting the independently encoder of low complex degree, reduce video coding power, and do not need transmission channel to communicate between the video camera.

Description of drawings

When combining accompanying drawing to consider; Through with reference to following detailed, can more completely understand the present invention better and learn wherein many attendant advantages easily, but accompanying drawing described herein is used to provide further understanding of the present invention; Constitute a part of the present invention; Illustrative examples of the present invention and explanation thereof are used to explain the present invention, do not constitute to improper qualification of the present invention, wherein:

Fig. 1 is based on the distributed compressed sensing binocular video coded system block diagram of associating dictionary;

Reference block in Fig. 2 SINGLE pattern;

Fig. 3 distortion performance compares: (a) " rabbit "; (b) " soccer ".

Embodiment

Followingly describe with reference to Fig. 1-3 pair embodiments of the invention.

For make above-mentioned purpose, feature and advantage can be more obviously understandable, below in conjunction with accompanying drawing and embodiment the present invention done further detailed explanation.

Owing to same processing method has been adopted at each visual angle, so present embodiment is easy to expand to multi-view video coding.Consider the encoder design of low complex degree, each visual angle is separated according to parity frame earlier and is obtained video sequence, and these video sequence are respectively as the CS frame and the key frame of distributed compressed sensing coding then.Design is during decoder, utilizes that correlation and same visual angle relativity of time domain can make key frame generate the associating dictionary between the visual angle, helps obtaining better reconstructed results.

Embodiment 1

A kind of D encoding device; Comprise some to identical and encoder 10 independently; Each comprises a LOOK LEFT and LOOK RIGHT passage to encoder 10; Each passage comprises with lower module: parity frame separation module 1, CS encoder 3, standard coders 4 and mode selection module 2, separate into odd-numbered frame and even frame behind the LOOK LEFT signal process parity frame separation module 1, and odd-numbered frame obtains the CS sign indicating number through CS encoder 3 codings; Mode selection module 2 is used to control CS encoder 3 mode of operations, and even frame obtains key frame through standard coders 4 codings; Separate into odd-numbered frame and even frame behind the LOOK RIGHT signal process parity frame separation module 1; Even frame obtains the CS sign indicating number through CS encoder 3 codings; Mode selection module 2 is used to control CS encoder 3 mode of operations, and odd-numbered frame obtains key frame through standard coders 4 codings.

As shown in Figure 1, at coding side, each visual angle absolute coding does not need communication between two visual angles.

Embodiment 2

A kind of three-dimensional decoding device; Comprise some to identical and decoder 20 independently; Every pair of decoder 20 comprises a LOOK LEFT and LOOK RIGHT passage, and each passage comprises with lower module: CS reconstructed module 6, standard decoder 5, associating dictionary 7 and parity frame interweave 8, and the CS frame that receives obtains odd-numbered frame through CS reconstructed module 6; The key frame that receives obtains even frame through standard decoder 5, and odd-numbered frame and even frame interweave through parity frame and obtain the output of LOOK LEFT video sequence after 8; The CS frame that receives obtains even frame through CS reconstructed module 6, and the key frame that receives obtains odd-numbered frame through standard decoder 5, and odd-numbered frame and even frame interweave through parity frame and obtain the output of LOOK RIGHT video sequence after 8; Obtain associating dictionary 7 through different visual angles standard decoder 5 decoded data dependences, associating dictionary 7 is used to control CS reconstruct 6 modules.

CS encoder and CS reconstruct operation principle are following:

Suppose x ∈ R ⁿBe a discrete signal, u is its coefficient under certain orthogonal basis Ψ, then x=Ψ ^TU.Here, be nonzero element if having only k coefficient in n coefficient, claim that then x is sparse for k under certain orthogonal basis Ψ.Theoretical according to CS, k the nonzero coefficient that need as conventional codec, not go to encode, the flow process of CS encoder is following.

y＝Φx (1)

Here Φ is m * n matrix, y ∈ R ^mBecause m＜n, so primary signal x has been compressed.In CS reconstruct, u can obtain reconstruct through separating following optimization problem.

min||u|| ₁，subject?to?y＝ΦΨ ^Tu (2)

Then according to x=Ψ ^TU finally rebuilds primary signal x.

In the present embodiment, those frames that are encoded to the CS frame for needs are handled according to piecemeal, and block size is 16 * 16.Each piece can be arranged as the column vector x of n * 1, at this moment n=256 according to line scanning.The sampling point that each behavior symmetry Independent B ernoulli of matrix Φ distributes, the element in promptly every row be ± 1, wherein+1 and-1 probability be 1/2.It should be noted that for all pieces and use same matrix Φ, thereby guaranteed the low complex degree that calculates.According to formula (1), can obtain measured value is y, is the column vector of m * 1.Then y through after the scalar quantization in channel.This patent does not have to use fixing orthogonal basis Ψ, but has specifically designed the associating dictionary as Ψ in CS reconstruct.Concrete recovery algorithms has used general log-barrier algorithm to come solution formula (2).

Associating dictionary generating principle is following:

Theoretical according to CS, choosing of matrix Ψ make signal below this orthogonal basis, satisfy maximized sparse property, thereby can effectively reduce the number of measurement values of transmission.In most of CS use, used fixing orthogonal basis as Ψ, for example discrete cosine transform or wavelet transform.Consider the relativity of time domain of vision signal, current block can be predicted by its reference block.Therefore, if current block representes that with the linear combination of its reference block then current block can be regarded sparse signal as.For 3 dimension videos, also has very big correlation between the visual angle.Therefore, this patent utilizes that the relativity of time domain of correlation and same visual angle designs the associating dictionary between the visual angle when the dictionary Ψ of each piece of design.For example, if current block x is positioned at the odd-numbered frame Fk of LOOK LEFT, then the reference block among reference block among the LOOK LEFT even frame Fk+1 and the LOOK RIGHT odd-numbered frame Fk all will be as generating the associating dictionary.Selected reference block is that the position with current block x is the center, and the selected window size is all possible among w * w in reference frame.With selected according to after the line scanning as the row of associating dictionary matrix Ψ.

The model selection principle is following:

In order to improve distortion performance, this patent has designed three kinds of patterns in encoder-side.

Pattern 1 is the SKIP pattern.At first calculate the mean square error (mean square error is called for short MSE) between the piece that current block is adjacent same position in the key frame.If should be worth less than threshold value t0, then current block can be skipped, and has no measured value to need transmission.For the decoding of SKIP pattern, only need the same position piece of its adjacent key frame of copy to get final product.

Pattern 2 is the SINGLE pattern.For current block x among the frame Fk, can select its 4 reference block xt in adjacent key frame Fk+1, xb, xl and xr are as shown in Figure 2.These 4 reference blocks can be from being the center with the x position, and size is for choosing in the window of w * w.Suppose that 4 reference blocks are p, then w=2p+1 with respect to the side-play amount of x.Can calculate the least mean-square error (minimum MSE is called for short MMSE) between x and 4 reference blocks then.If should be worth less than threshold value t1, then piece x can use m1 measured value to carry out the CS coding.Otherwise (the individual measured value encoding block x of m2＞m1) is the L1 mode treatment as mode 3 at this moment need to use m2.Under the SINGLE pattern, during CS reconstruct, with the measured value of the reference block of every row representative in m1 measured value being received earlier relatively and the dictionary, selection wherein has the piece of the piece of least mean-square error as CS reconstruct.The SINGLE pattern effectively reduces the complexity of decoding end.Under the L1 pattern, CS reconstruct will be used m2 the measured value solution formula of being received (2).

With reference to figure 3, the system of having selected two standard video sequence " rabbit " and " soccer " to test this patent here. the resolution of these two cycle testss is that 720 * 480 frame per second are per second 30 frames.What the standard codec adopted is JM 10.2 versions of H.264 encoder.In order to prove the performance of associating dictionary, at the CS encoder with H.264 selected same experiment parameter in the encoder.Can see that from Fig. 3 the system that is proposed compares and do not use the system of associating dictionary (only the adjacent key frame by same visual angle constitutes dictionary) to have more performance: the scheme that in Fig. 3 (a), is proposed for video sequence " rabbit " has obtained in the gain that has obtained about 0.5dB aspect the PSNR value; Gain for video sequence " soccer " in Fig. 3 (b) has surpassed 0.5dB.Its reason possibly be video sequence " soccer " motion has caused the correlation between the visual angle bigger than relativity of time domain so have faster, thereby considers that the associating dictionary of correlation and relativity of time domain has obtained more performance between the visual angle.

In addition, from Fig. 3, can see the gain that 0.5-1dB is arranged for video sequence " rabbit " PSNR value in code check is the 50-300kbps scope, the gain of 0.5-1dB is arranged for video sequence " soccer " PSNR value in code check is the 200-1800kbps scope.

As stated, embodiments of the invention have been carried out explanation at length, but as long as not breaking away from inventive point of the present invention and effect in fact can have a lot of distortion, this will be readily apparent to persons skilled in the art.Therefore, such variation also all is included within protection scope of the present invention.

Claims

1. D encoding device; It is characterized in that: comprise some identical and encoder independently; Each comprises a LOOK LEFT and LOOK RIGHT passage to encoder; Each passage comprises with lower module: parity frame separation module, CS encoder, standard coders and mode selection module, separate into odd-numbered frame and even frame behind the LOOK LEFT signal process parity frame separation module, and odd-numbered frame obtains the CS sign indicating number through the CS encoder encodes; Mode selection module is used to control CS encoder mode of operation, and even frame obtains key frame through the standard coders coding; Separate into odd-numbered frame and even frame behind the LOOK RIGHT signal process parity frame separation module, even frame obtains the CS sign indicating number through the CS encoder encodes, and mode selection module is used to control CS encoder mode of operation, and odd-numbered frame obtains key frame through the standard coders coding.

2. a kind of according to claim 1 D encoding device is characterized in that: said D encoding is many order codings of various visual angles or double vision angle binocular coding.

3. a kind of according to claim 1 D encoding device; It is characterized in that: said mode selection module comprises the SKIP pattern; At first calculate the mean square error between the piece that current block is adjacent same position in the key frame; If should be worth less than threshold value t0, then current block is skipped, and has no measured value to need transmission.

4. three-dimensional decoding device; It is characterized in that: comprise some identical and decoder independently; Every pair of decoder comprises a LOOK LEFT and LOOK RIGHT passage, and each passage comprises with lower module: CS reconstructed module, standard decoder, associating dictionary and parity frame interweave, and the CS frame that receives obtains odd-numbered frame through the CS reconstructed module; The key frame that receives obtains even frame through standard decoder, obtains the output of LOOK LEFT video sequence after odd-numbered frame and even frame interweave through parity frame; The CS frame that receives obtains even frame through the CS reconstructed module, and the key frame that receives obtains odd-numbered frame through standard decoder, obtains the output of LOOK RIGHT video sequence after odd-numbered frame and even frame interweave through parity frame; Obtain the associating dictionary through the decoded data dependence of different visual angles standard decoder, the associating dictionary is used to control the CS reconstructed module.

5. like the said a kind of three-dimensional decoding device of claim 4, it is characterized in that: said solid is decoded as many order decodings of various visual angles or the decoding of double vision angle binocular.

6. a three-dimensional coding/decoding system is characterized in that: be made up of one of said a kind of D encoding device of one of claim 1～3 and claim 4～5 a kind of three-dimensional decoding device.