WO2010021496A2

WO2010021496A2 - Method and apparatus for decoding video image

Info

Publication number: WO2010021496A2
Application number: PCT/KR2009/004632
Authority: WO
Inventors: 서덕영; 박정아; 박광훈; 김규헌
Original assignee: 경희대학교 산학협력단
Priority date: 2008-08-21
Filing date: 2009-08-20
Publication date: 2010-02-25
Also published as: KR101075328B1; WO2010021496A3; KR20100023764A

Abstract

According to the embodiment of the present invention, a method for decoding a video image comprises: a step for receiving a fraction of the information for original image data, a step for generating plural kinds of side information by predicting the original image data, a step for generating the final side information by applying a fraction of the information to each kind of side information, and a step for decoding the original image data using a fraction of the information and the final side information. When the original image data is decoded in a decoder, the data with the minimum errors compared with the original image data can be selected from the plural kinds of information generated by the decoder so that the decoded image quality may be improved. The compression performance for distributed video coding can be improved as a result of excluding the transmission of additional information for confirming the accuracy of the data that decoded the original image.

Description

Method and apparatus for decoding video

The present invention relates to decoding video, and more particularly, to a method for decoding distributed video coding using a plurality of auxiliary information and an apparatus using the same.

A common feature required for a variety of emerging applications, such as portable digital cameras, mobile wireless surveillance cameras and capsule endoscopes, is that the encoder must be simple. Therefore, existing video compression techniques, such as MPEG and H.264, which perform most of complex operations in the encoder, are difficult to apply to such applications.

Distributed video coding is a video encoding technique that performs a motion prediction / compensation process, which takes most of the complexity in encoding a video, by a decoder rather than an encoder. In other words, distributed video coding enables the implementation of a system that performs a motion compensation process that requires the most computation amount on the decoder side, thereby laying the foundation for the realization of the above-described applications.

The theoretical background of distributed video coding is based on the concept of Slepian and Wolf in the 1970s (JD Slepian and JK Wolf, "Noiseless coding of correlated information sources," IEEE Transactions on information Theory, vol. IT-19, pp. 471-480, July 1973) and Wyner and Ziv's information theory.

Slepian / Wolf described Shannon's coding theory that two or more statistically correlated random sequences can be decoded using statistical dependencies on the decoder side.

1 shows a parity rate according to a bit error rate (BER).

In FIG. 1, the horizontal axis represents a bit error rate (BER), and the vertical axis represents a parity transmission rate. In FIG. 1, X and Y are arbitrary sequences having a statistical correlation (hereinafter equal to), and the numbers displayed on the 'puncturing period graph' indicate puncturing intervals of parity bits. Denotes the transmission of parity bits by one bit every two bits. The closer Y is to X, the smaller the entropy H (X | Y). In other words, the closer Y is to X, the more bits X can transmit. Entropy H (X | Y) can be seen as the ideal parity rate.

2 shows an ideal parity rate and actual parity rate for BER. According to the information theory, the ideal parity rate can be expressed as Equation 1 below.

In Equation 1, p is the BER (bit error rate) of Y to X.

Referring to FIG. 2, the ideal parity rate (Ideal Parity Rate H (X | Y)) and the actual parity rate R at 5% (punching interval 6) or 10% (punching interval 4) of BER. It can be seen that this is close.

Wyner-Ziv coding scheme applying the above-described theories for lossless compression to lossy compression through quantization [A. D. Wyner, "Recent Results in the Shannon Theory," IEEE, Information Theory, vol. 20, no. 1, pp. 2-10, Jan. 1974.6. With the advent of the reference, a theoretical background of distributed video coding, which is distributed coding as a video compression coder, has been prepared. Conventional codecs use entropy coders for video compression, while distributed video coding uses channel codes (Y. Zhao, J. Garcia-Frias, "Turbo compression / joint source-channel coding of correlated binary sources. with hidden Markov correlation, "EURASIP Signal Processing Journal, Elsevier, vol. 86, N. 11, pp 3115-3122, Nov. 2006.)

Distributed video coding applies the above-described theory to video coding. For two sequences X and Y, data about the original image to be compressed X and data referring to Y to compress X are generally referred to as side information (SI) in distributed video coding. Y can be made through the motion prediction and compensation process in the decoder. The accuracy of the auxiliary information (i.e. the degree of correlation with the original picture frame) determines the compression performance of distributed video coding.

In the decoding of distributed video coding, the compression rate may be increased as data and auxiliary information of an original image are similar. However, since the decoder does not have the data for the original image, there is a problem that the degree of similarity cannot be accurately known.

Provided are a method and an apparatus for selecting data most similar to an original image in auxiliary information in decoding distributed video coding.

According to an aspect of the present invention, there is provided a method of decoding a video, the method comprising: receiving some information about original image data; Predicting the original image data to generate a plurality of auxiliary information; Generating final assistance information by applying the partial information to each of the plurality of assistance information; And decoding the original image data using the partial information and the final auxiliary information.

A decoder according to another aspect of the present invention includes an auxiliary information generator for predicting original image data to generate a plurality of auxiliary information; A first turbo decoder configured to receive some information on the original image data from an external source, and turbo decode each of the plurality of auxiliary information to generate decoded auxiliary information; A bit selector for receiving the decoded auxiliary information and generating final auxiliary information based on the number of toggles; And a second turbo decoder for reconstructing the original image data using the partial information on the original image data and the final auxiliary information.

When the original video data is decoded by the decoder of the distributed video coding, data having the least error with the original video data can be selected from the plurality of auxiliary information generated by the decoder, so that the quality of the video decoded with the same amount of transmission information can be improved. It can increase.

Since the information for confirming the accuracy of the decoded data of the original image is not transmitted separately, the compression performance of distributed video coding can be improved.

1 shows a parity rate according to a bit error rate (BER).

2 shows an ideal parity rate and actual parity rate for BER.

3 is a block diagram illustrating a structure in which an encoder / decoder of distributed video coding and a conventional intra frame encoder / decoder are combined.

4 is a block diagram illustrating a DVC decoder.

5 is a flowchart illustrating a decoding method of distributed video coding.

FIG. 6 is a graph illustrating a relationship between an average number of toggles for each auxiliary information Y ₁ , Y ₂ , Y ₃ , and Y ₄ and a mean square error (MSE) with each original image.

FIG. 7 is a diagram illustrating final assistant information when four candidate frames generated for an original image are divided into four quadrants to represent a peak signal to noise ratio (PSNR).

FIG. 8 is a diagram illustrating a number of candidate frames selected from each block after dividing a total of 396 blocks of a 'Foreman' image into quadrants to include the same number of blocks.

FIG. 9 is a diagram illustrating a BER between original image data X of a plurality of auxiliary information and decoded auxiliary information.

FIG. 10 is a diagram illustrating a BER between original image data X ′ decoded according to a selection criterion and original image data X of final auxiliary information Y _∞ selected by using the number of toggles.

The method and apparatus for decoding a video according to the present invention can be used for decoding a video in distributed video coding (Distirbuted Video Coding, DVC). In the following description, mainly the DVC, but this is not a limitation.

3 is a block diagram illustrating a structure in which an encoder / decoder of distributed video coding and a conventional intra frame encoder / decoder are combined according to an embodiment of the present invention.

Referring to FIG. 3, X may be data for an image frame to be compressed (ie, data for an original image), for example, data for a Wyner-Ziv frame (WZ frame). Alternatively, when the original image includes a plurality of blocks, a discrete cosine transform (DCT) may be performed on each block in a predetermined order to generate bit streams of the DCT coefficients for each bit plane. However, this is not a limitation and there is no limitation on how to generate the bit stream.

X is input to the DVC encoder 100, and the DVC encoder 100 is punctured to have a parity rate R to transmit a parity bit string P for X. The parity bit string P can be interpreted as some information about X.

K may be data about an intra frame selected from a plurality of frames constituting a moving image (referred to as a key frame). K is encoded by intra frame encoder 300 and passed to intra frame decoder 400. Here, the intra frame encoder 300 and the intra frame decoder 400 may be a conventional encoder / decoder.

K ′ decoded by the intra frame decoder 400 is input to the DVC decoder 200. The DVC decoder 200 may generate a plurality of side information by using K ′.

The DVC decoder 200 turbo-decodes the parity bit string P and the plurality of auxiliary information to generate decoded auxiliary information. At this time, the number of toggles generated during the turbo decoding process and information on the candidate frame are stored, respectively. The final auxiliary information selected from the decoded auxiliary information and the parity bit string are turbo decoded to generate decoded original image data X '.

4 is a block diagram illustrating a DVC decoder. 5 is a flowchart illustrating a decoding method of distributed video coding.

4 and 5, the transmission information X (eg, a bit string in which DCT coefficients for the original image are generated for each bit plane) is transmitted through a DVC encoder 100 (eg, a WZ encoder) to match the parity rate R. The bit string P is generated and transmitted. The DVC decoder 200 receives this parity bit string (S100). The DVC decoder 200 may use the side information generator 210, the first turbo decoder 220, the bit selector 230, and the second turbo decoder 240. The first turbo decoder 220 receives a parity bit string.

The auxiliary information generator 210 generates a plurality of auxiliary information (S200). The plurality of auxiliary information may be made in different ways. For example, various methods such as linear motion interpolation, object-based separation, and multi-frame prediction may be used. There is no limitation on the method of generating the auxiliary information.

The auxiliary information generator 210 generates a plurality of candidate frames using K 'or / and frames stored in a memory (not shown) input from the intra frame decoder, and for each candidate frame, the same as in the DVC encoder 100. A plurality of pieces of auxiliary information can be generated by forming a bit string in a manner (represented by Y ₁ , Y ₂ , Y ₃ , and Y ₄ in FIG. ₄ ). The auxiliary information may be, for example, a bitstream in which DCT coefficients of candidate image frames generated using frames stored in K 'and / or memories are made for each bit plane. The generated plurality of auxiliary information is provided to the first turbo decoder 220.

The final auxiliary information is generated by applying some information of the original image, for example, the parity bit string P, to the plurality of auxiliary information (S300). First, the first turbo decoder 220 turbo decodes each of the plurality of auxiliary information Y ₁ , Y ₂ , Y ₃ , and Y ₄ and a parity bit string. For example, the first turbo decoder 220 may turbo decode the auxiliary information Y ₁ and the parity bit string P. In this case, the first turbo decoder 220 alternates between the original parity bit string and the interleaved parity bit string in the auxiliary information Y ₁ . Repeat decryption. As the iteration process is performed more, the probability that the decoded data is decoded into the original information increases, but the decoded data may be changed several times in this process. The first turbo decoder 220 performs the same process for the auxiliary information Y ₂ , Y ₃ , and Y ₄ , and stores information about the number of toggles and candidate frames generated in the turbo decoding process, respectively.

In the turbo decoding process of the first turbo decoder 220, for example, an error is detected from 10th data (assuming 1) and 20th data (assuming 0) of 100 data of Y ₁ (the 10th data). I changed the data to 1-> 0 and the 20th data to 0-> 1, but the two data errors again during the second turbo decoding process (the 10th data is 0-> 1 and the 20th data is 1> 0). It may change to As described above, the phenomenon in which the data value changes several times during the iteration of turbo decoding is defined as a 'toggle'. The toggle part of the auxiliary information is less reliable than the unchanged part because the data is constantly changing. Therefore, it can be seen that a portion in which the toggle occurs among the auxiliary information has a high probability that an error occurs due to the influence of the channel in the original data.

In FIG. 4, the number of bit toggles of the auxiliary information generated in the turbo decoding process is represented by T ₁ , T ₂ , T ₃ , and T ₄ . In addition, as a result of turbo decoding the parity bit string P and the plurality of auxiliary information, that is, the decoded auxiliary information is represented as X ' ₁ , X' ₂ , X ' ₃ , and X' ₄ . X ' ₁ , X' ₂ , X ' ₃ and X' ₄ may represent bits in some cases or may indicate a part of a bit string.

The first turbo decoder 220 provides the decoded auxiliary information X ' ₁ , X' ₂ , X ' ₃ , X' ₄ and a toggle number T ₁ , T ₂ , T ₃ , T ₄ to the bit selector 230. do. The bit selector 230 generates final auxiliary information Y _∞ based on the number of toggles. The final auxiliary information Y _∞ may be generated by selecting a bit having a minimum number of toggles when X ' ₁ , X' ₂ , X ' ₃ , and X' ₄ each represent a bit. If there are a plurality of bits with the minimum number of toggles in X ' ₁ , X' ₂ , X ' ₃ , and X' ₄ , a selection method is required (X ' ₁ , X' ₂ , X ' ₃ , X' _The same is true when ₄ represents part of a bit string). This will be described later.

The final assistance information Y _∞ is provided to the second turbo decoder 240. The second turbo decoder 240 generates X 'obtained by decoding the original image data using the parity bit string P and the final auxiliary information Y _∞ , which are some pieces of information about the original image (S400). Of course, X ', which has decoded the original image data, may be used as auxiliary information used for turbo decoding together with the parity bit string P again.

Hereinafter, a selection method will be described when a plurality of bit strings having the minimum number of toggles (that is, blocks) exist (even when there are a plurality of bits having the minimum number of toggles). This means that the number of toggles at the same block position for all candidate frames is zero (all when no toggles occur) or is equal to one or more numbers (eg, blocks of at least two candidate frames when using four candidate frames). In this case, the criteria for determining which candidate frame block is best to be selected is required.

First, 1) a block (or a bit) of a candidate frame having the smallest number of toggles in a candidate frame unit in which the number of toggles for each block is accumulated in the candidate frame from left to right and from top to bottom may be selected.

Alternatively, 2) a toggle number of neighboring blocks adjacent to a block having the same toggle number may be used. That is, a block having a smaller toggle number of neighboring blocks for the same toggle number is selected. Here, the periphery may be changed according to the puncturing period of the parity bit string P used in the DVC encoder 100. If the number of toggles of the neighboring blocks is the same, the block of the candidate frame having the smallest number of toggles may be selected in units of candidate frames.

The two selection criteria described above take advantage of the fact that information which is well decoded overall (in units of frames) is more likely to be decoded even in parts (by blocks). The selection criteria can be flexibly applied according to the performance of the candidate frame or the form of the number of toggles.

Hereinafter, a test example for explaining the effect of the present invention. In the following description, it is assumed that a 'Foreman' video having a size of 176 x 144 pixels is used. Each frame of this 'Foreman' video contains 396 blocks of 8 x 8 pixels.

First we look at the relationship between the number of toggles and the difference between X and Y sequences.

Referring to FIG. 6, as the MSE increases, the average number of toggles also increases. That is, it can be seen that the occurrence of a large number of toggles decreases the reliability of the auxiliary information. Therefore, the validity of the method of selecting the final auxiliary information among the plurality of decoded auxiliary information based on the number of toggles can be known.

Referring to FIG. 7, the candidate frame No. 1 has the largest PSNR in quadrant 1, the candidate frame No. 2 is quadrant 2, the candidate frame 3 is quadrant 3, and the candidate frame 4 is quadrant with the highest PSNR. will be. In this case, in the embodiment according to the present invention, the final auxiliary information (Y _∞ ) is selected as bits of the first candidate frame in the first quadrant, the second candidate frame in the second quadrant, the third candidate frame in the third quadrant, and the fourth quadrant. Bits of candidate frame 4 should be selected.

FIG. 8 is a diagram illustrating a number of candidate frames selected from each block after dividing a total of 396 blocks of a 'Foreman' image into quadrants to include the same number of blocks. Referring to FIG. 8, it can be seen that candidate frame 2 is selected in the second quadrant, candidate frame 3 is selected in the third quadrant, and candidate frame 4 is selected in the fourth quadrant. In the first quadrant, all blocks are marked with 0, which means that any of four candidate frames may be selected. 8 shows the feasibility of a method of generating final assistance information using a toggle number.

9 is a diagram comparing BER of a plurality of auxiliary information and decoded auxiliary information.

In the experiment, after DCT conversion of a plurality of pieces of auxiliary information (SI-1 = Y ₁ , SI-2 = Y ₂ , SI-3 = Y ₃ , SI-4 = Y ₄ ) by block, the DCT coefficient is represented by 11 bits. Switching to the bit plane, only the upper 5 bit planes are used. In this case, the total bits would be 396 blocks * 5 bits = 1980 bits long. In FIG. 9, the parity rate is a numerical value related to the compression rate. Since the BER is about 0.05 (5%) before turbo decoding four pieces of auxiliary information, Will be sent.

FIG. 10 is a diagram illustrating a BER when using original image data X ′ decoded according to selection criteria and final auxiliary information Y _∞ selected by using a toggle number. In FIG. 10, Rule 1 and Rule 2 indicate selection criteria. Rule 1 is the smallest toggle count in a candidate frame unit that accumulates all toggle counts for each block while proceeding from left to right and from top to bottom within the frame. The selection of the block (or bit) of the candidate frame is shown, and Rule 2 indicates the use of the number of toggles of the neighboring block adjacent to the block in which the same number of toggles occurred.

Referring to FIG. 10, even when the puncturing interval M is 6, the BER becomes 0 when the turbo decoding is finally performed once more without using the selection criterion. Therefore, the effect on the selection criterion cannot be known. Also, to increase the compression rate, BER is fixed and the puncturing interval is adjusted to reduce the amount of parity.

Referring to BER of each decoded auxiliary information when M = 6 in FIG. 9, it is 0.059, 0.038, 0.040, 0.029. In FIG. 10, when M = 6, that is, the number of toggles is used to determine final auxiliary information according to selection criteria. When used, the BER drops to 0.032, 0.032, and 0.018. In particular, when Rule 2 is used, a BER of 0.018 is better than 0.029 of SI-4 (if M = 6) of FIG. 9. The decoded original image data X 'generated by turbo decoding the final auxiliary information Y _∞ which is closer to the transmission information X once again shows an error rate of 0%.

In the same way, when M = 7, when the final auxiliary information is generated and used according to the selection criteria as the number of toggles, the BER is lower than that of the auxiliary information decoded as 0.021, 0.013, and 0.011. The final decoded original image data X 'is larger at BER than at M = 6 but BER is less than 1%, so even if the compression ratio is slightly increased (that is, the amount of parity bit string is reduced), It will not affect.

Since candidate frames have different characteristics, even though the auxiliary information of a particular candidate frame is reliable information that is close to transmission information X as a whole, the information of another candidate frame may be information that is close to transmission information X partially. . By applying the method of selecting the final auxiliary information by using the number of toggles according to the present invention, it can be seen that even if the amount of parity bit strings is reduced, there is no problem in making a decoded X ′ having a low BER. If two or more candidates have the same number of toggles, the result depends on the selection criteria and the method. Regardless of the final decoding performance and compression ratio, the highest BER is shown when there is no selection criterion, the lower BER than when Rule 1 does not have selection criterion, and the lowest BER is shown by Rule 2. If there is no selection criterion, the candidates are randomly selected, so the performance of the final generated X 'is lower than that of the selection criteria.

Although the present invention has been described above with reference to the embodiments, it will be apparent to those skilled in the art that the present invention may be modified and changed in various ways without departing from the spirit and scope of the present invention. I can understand. Therefore, the present invention is not limited to the above-described embodiment, and the present invention will include all embodiments within the scope of the following claims.

Claims

In the video decoding method,

Receiving some information about the original image data;

Predicting the original image data to generate a plurality of auxiliary information;

Generating final assistance information by applying the partial information to each of the plurality of assistance information; And

And decoding the original image data using the partial information and the final auxiliary information.
The method of claim 1, wherein the partial information is a parity bit string for the original image data.
The method of claim 1, wherein the auxiliary information is generated in units of frames.
The method of claim 1, wherein the final assistance information is

And a plurality of bits selected from each of the decoded auxiliary information generated by turbo decoding each of the plurality of pieces of auxiliary information, wherein the selected bits are toggled generated during the turbo decoding process. ) Is selected based on the number of times.
The method of claim 4, wherein when each of the plurality of auxiliary information is configured in a bit string, the final auxiliary information is generated by selecting a bit having the smallest number of toggles in each of the decoded auxiliary information.
The method of claim 4, wherein the final auxiliary information is selected and generated in units of blocks from each of the decoded auxiliary information.
The method of claim 6, wherein the final assistance information is

If a block having the same number of toggles is generated from different pieces of decoded auxiliary information, a block of decoded auxiliary information having the smallest number of toggles is selected and generated in units of frames.
The method of claim 6, wherein the final assistance information is

If a block having the same number of toggles is generated from different decoded auxiliary information, a block having a smaller number of toggles of a block having the same number of toggles and an adjacent block is selected.
An auxiliary information generator for predicting an original image and generating a plurality of auxiliary information;

A first turbo decoder configured to receive some information on the original image data from an external source, and turbo decode each of the plurality of auxiliary information to generate decoded auxiliary information;

A bit selector for receiving the decoded auxiliary information and generating final auxiliary information based on the number of toggles; And

And a second turbo decoder which restores the original image data by using the partial information and the final auxiliary information about the original image data.
The method of claim 9, wherein the final assistance information is

And a plurality of bits selected from each of the decoded side information generated by turbo decoding each of the plurality of side information with the partial information, wherein the selected bits are toggles generated during the turbo decoding process. ) Decoder selected based on the number of times.