WO2012093879A2

WO2012093879A2 - Competition-based multiview video encoding/decoding device and method thereof

Info

Publication number: WO2012093879A2
Application number: PCT/KR2012/000136
Authority: WO
Inventors: 이진영; 김동현; 류승철; 서정동; 손광훈; 위호천
Original assignee: 삼성전자주식회사; 연세대학교 산학협력단
Priority date: 2011-01-06
Filing date: 2012-01-06
Publication date: 2012-07-12
Also published as: WO2012093879A3

Abstract

Disclosed are a competition-based multiview video encoding/decoding device and a method thereof. The competition-based multiview video encoding/decoding device can improve encoding efficiency by determining a prediction vector with the best encoding performance through an extraction of a spatial prediction vector, a time prediction vector, and a viewpoint prediction vector corresponding to a current block.

Description

Apparatus and method for contention-based multiview video encoding / decoding

One embodiment of the present invention relates to an apparatus and method for multi-view video encoding / decoding, and to an apparatus and method for encoding / decoding a current block using a spatial prediction vector, a time-base prediction vector, or a view-axis prediction vector.

The stereoscopic image refers to a 3D image that simultaneously provides shape information about depth and space. In the case of stereo images, images of different viewpoints are provided to the left and right eyes, whereas stereoscopic images provide the same images as viewed from different directions whenever the viewer views different views. Therefore, in order to generate a stereoscopic image, images captured at various viewpoints are required.

Images taken from various viewpoints to generate stereoscopic images have a large amount of data. Therefore, considering the network infrastructure and terrestrial bandwidth for stereoscopic video, it is possible to achieve compression even using encoding devices optimized for Single-View Video Coding such as MPEG-2, H.264 / AVC, and HEVC. This is almost impossible.

However, since images taken at each viewpoint viewed by the observer are related to each other, there is a lot of overlapping information. Accordingly, a smaller amount of data may be transmitted by using an encoding apparatus optimized for a multiview image capable of removing inter-view redundancy.

Therefore, a multi-view image encoding apparatus optimized for generating a stereoscopic image is required. In particular, there is a need for technology development to efficiently reduce redundancy between time and time points.

A multiview video encoding apparatus according to a first embodiment of the present invention includes a prediction vector extracting unit which extracts a spatial prediction vector of a current block to be encoded; And an index transmitter for transmitting an index for identifying a spatial prediction vector of the current block to a multi-view video decoding apparatus through a bitstream.

A multiview video encoding apparatus according to a second embodiment of the present invention includes a prediction vector extracting unit which extracts a temporal prediction vector of a current block to be encoded; And an index transmitter for transmitting an index for identifying a temporal prediction vector of the current block to a multiview video decoding apparatus through a bitstream.

A multi-view video encoding apparatus according to a third embodiment of the present invention includes a prediction vector extracting unit for extracting a viewpoint prediction vector of a current block to be encoded; And an index transmitter for transmitting an index for identifying the viewpoint prediction vector of the current block to a multiview video decoding apparatus through a bitstream.

A multiview video encoding apparatus according to a fourth embodiment of the present invention includes a prediction vector extracting unit which extracts a spatial prediction vector, a temporal prediction vector, and a spatial prediction vector of a current block to be encoded; And an index transmitter for transmitting an index for identifying a prediction vector used to encode a current block among the spatial prediction vector, the temporal prediction vector, and the spatial prediction vector to a multi-view video decoding apparatus through a bitstream.

A multiview video decoding apparatus according to a first embodiment of the present invention includes an index extractor which extracts an index of a prediction vector from a bitstream received by a multiview video encoding apparatus; And a prediction vector determiner that determines a spatial prediction vector as a final prediction vector for reconstructing a current block based on the index.

A multiview video decoding apparatus according to a second embodiment of the present invention includes an index extractor which extracts an index of a prediction vector from a bitstream received by a multiview video encoding apparatus; And a prediction vector determiner that determines a temporal prediction vector as a final prediction vector for reconstructing a current block based on the index.

A multiview video decoding apparatus according to a third embodiment of the present invention includes an index extractor which extracts an index of a prediction vector from a bitstream received by a multiview video encoding apparatus; And a prediction vector determiner that determines a viewpoint prediction vector as a final prediction vector for reconstructing a current block based on the index.

A multiview video decoding apparatus according to a fourth embodiment of the present invention includes an index extractor which extracts an index of a prediction vector from a bitstream received by a multiview video encoding apparatus; And a prediction vector determiner configured to determine a final prediction vector for reconstructing a current block among a spatial prediction vector, a temporal prediction vector, and a viewpoint prediction vector based on the index.

The multi-view video encoding method according to the first embodiment of the present invention comprises the steps of: extracting a spatial prediction vector of a current block to be encoded; And transmitting an index for identifying a temporal prediction vector of the current block to a multi-view video decoding apparatus through a bitstream.

The multi-view video encoding method according to the second embodiment of the present invention comprises the steps of: extracting a temporal prediction vector of a current block to be encoded; And transmitting an index for identifying a temporal prediction vector of the current block to a multi-view video decoding apparatus through a bitstream.

The multi-view video encoding method according to the third embodiment of the present invention comprises the steps of: extracting a viewpoint prediction vector of a current block to be encoded; And transmitting an index for identifying the viewpoint prediction vector of the current block to a multiview video decoding apparatus through a bitstream.

A multi-view video encoding method according to a fourth embodiment of the present invention includes extracting a spatial prediction vector, a temporal prediction vector, and a spatial prediction vector of a current block to be encoded; And transmitting an index for identifying a prediction vector used to encode a current block among the spatial prediction vector, the temporal prediction vector, and the spatial prediction vector to a multi-view video decoding apparatus through a bitstream.

A multiview video decoding method according to a first embodiment of the present invention includes extracting an index of a prediction vector from a bitstream received by a multiview video encoding apparatus; And determining the spatial prediction vector as the final prediction vector for reconstructing the current block based on the index.

A multiview video decoding method according to a second embodiment of the present invention includes extracting an index of a prediction vector from a bitstream received by a multiview video encoding apparatus; And determining the temporal prediction vector as the final prediction vector for reconstructing the current block based on the index.

A multiview video decoding method according to a third embodiment of the present invention includes extracting an index of a prediction vector from a bitstream received by a multiview video encoding apparatus; And determining the viewpoint prediction vector as the final prediction vector for reconstructing the current block based on the index.

A multiview video decoding method according to a fourth embodiment of the present invention includes extracting an index of a prediction vector from a bitstream received by a multiview video encoding apparatus; And determining a final prediction vector for reconstructing a current block among a spatial prediction vector, a temporal prediction vector, and a viewpoint prediction vector based on the index.

According to an embodiment of the present invention, after selecting candidates of spatial, temporal, and viewpoint prediction vectors for the current block to be encoded, the prediction vector having the best compression performance is determined, and then the current block is determined using the determined prediction vector. By encoding, coding efficiency can be improved.

1 is a diagram for describing an operation of a multiview video encoding apparatus and a multiview video encoding apparatus according to an embodiment of the present invention.

2 is a block diagram illustrating a detailed configuration of a multiview video encoding apparatus according to an embodiment of the present invention.

3 is a block diagram showing a detailed configuration of a multi-view video decoding apparatus according to an embodiment of the present invention.

4 is a diagram illustrating a structure of a multiview video according to an embodiment of the present invention.

5 is a diagram illustrating an example of a reference picture used to encode a current block according to an embodiment of the present invention.

FIG. 6 is a diagram illustrating types of prediction vectors corresponding to current blocks according to an embodiment of the present invention.

7 is a diagram illustrating a multiview video encoding apparatus operating in an inter mode / intra mode according to an embodiment of the present invention.

8 is a diagram illustrating a multiview video encoding apparatus operating in a skip mode according to an embodiment of the present invention.

9 is a diagram illustrating a multiview video decoding apparatus operating in an inter mode / intra mode according to an embodiment of the present invention.

10 is a diagram illustrating a multiview video decoding apparatus operating in a skip mode according to an embodiment of the present invention.

Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

The multi-view video encoding apparatus 101 according to an embodiment of the present invention can remove time-to-time redundancy and inter-view redundancy more efficiently by defining new motion / disparity vectors and encoding multi-view video.

The multi-view video encoding apparatus 101 may encode the input video according to various encoding modes. Here, the multi-view video encoding apparatus 101 may encode an input video by using a prediction vector that indicates a prediction block most similar to the current block in a frame having a different viewpoint or a different time frame from a frame including the current block to be encoded. Can be. Accordingly, the multi-view video encoding apparatus 101 may realize higher encoding performance as the current block and the prediction block are similar. The result derived according to the result of encoding the input video is transmitted to the multi-view video decoding apparatus 102 through the bitstream.

Multi-view video encoding apparatus 101 according to an embodiment of the present invention can improve the encoding performance of the current block by defining a spatial prediction vector, a temporal prediction vector, and a viewpoint prediction vector to be used when encoding the input video. have.

Hereinafter, a motion vector (MV) or a disparity vector (DV) associated with a spatial prediction vector, a temporal prediction vector, or a viewpoint prediction vector is defined as follows. The motion vector of the specific block is determined based on the prediction block indicated by the specific block in a frame different in time from the frame including the specific block. In addition, the disparity vector of the specific block is determined based on the prediction block indicated by the specific block in a frame different from a frame including the specific block.

Referring to FIG. 2, the multi-view video encoding apparatus 101 may include a prediction vector extractor 201 and an index transmitter 202.

Hereinafter, a multi-view video encoding apparatus 101 operating according to four embodiments will be described.

The prediction vector extractor 201 may extract a spatial prediction vector of the current block to be encoded. Here, the spatial prediction vector of the current block may be extracted using a frame including the current block.

In one example, the spatial prediction vector is a first motion vector corresponding to the left block of the current block, a second motion vector corresponding to the top block of the current block, a third motion vector corresponding to the top left block of the current block, or the current block. It may include at least one of a fourth motion vector corresponding to the upper right block of and a fifth motion vector to which a median filter is applied to the first motion vector, the second motion vector, the third motion vector, and the fourth motion vector. .

In another example, the spatial prediction vector includes a first disparity vector corresponding to a left block of the current block, a second disparity vector corresponding to an upper block of the current block, a third disparity vector corresponding to an upper left block of the current block, or a current And a fourth variation vector corresponding to the upper right block of the block and a fifth variation vector to which a median filter is applied to the first variation vector, the second variation vector, the third variation vector, and the fourth variation vector. have.

When the spatial prediction vector is extracted, the index transmitter 202 may transmit an index for identifying the spatial prediction vector of the current block to the multi-view video decoding apparatus 102 through the bitstream.

The prediction vector extractor 201 may extract a temporal prediction vector of the current block to be encoded. Here, the temporal prediction vector of the current block may be extracted using a frame located at a position different in time from the frame including the current block.

As an example, the temporal prediction vector may include a motion vector or a disparity vector of a target block at the same position as the current block in a frame corresponding to a different time than the frame including the current block. For example, if the current block exists at the position (x, y) of frame 1, the temporal prediction vector of the current block is the motion of the target block at the position (x, y) of frame 2 which is different from the frame 1 in time. Vector or variant vectors.

As another example, the temporal prediction vector may include a motion vector or a disparity vector of neighboring blocks adjacent to the target block at the same position as the current block in a frame corresponding to a different time than the frame including the current block. For example, if the current block exists at position (x, y) of frame 1, the temporal prediction vector of the current block is adjacent to the target block at position (x, y) of frame 2, which is different in time from frame 1 The motion vector or the disparity vector of neighboring blocks may be included. Here, the neighboring blocks may include an upper block of the target block, a left block of the target block, an upper right block of the target block, or an upper left block of the target block.

As another example, the temporal prediction vector may include a motion vector or a disparity vector of a target block that is most similar to the current block in a frame corresponding to a time different from a frame including the current block. Here, the target block most similar to the current block refers to a block having a high association with the pixel property and position of the current block.

When the temporal prediction vector is extracted, the index transmitter 202 may transmit an index for identifying the temporal prediction vector of the current block to the multi-view video decoding apparatus 102 through the bitstream.

The prediction vector determiner 201 may extract a prediction vector on the viewpoint of the current block to be encoded. Here, the prediction prediction vector of the current block may be extracted using a frame that is different in view from the frame including the current block.

For example, the viewpoint prediction vector may include a motion vector or a disparity vector of a target block at the same position as the current block in a frame corresponding to a different viewpoint than a frame including the current block. For example, when the current block exists at the (x, y) position of the frame 1, the predictive vector in view of the current block is the position of the target block at the (x, y) position of the frame 2 different from the frame 1. It may include a motion vector or a disparity vector.

As another example, the viewpoint prediction vector may include a motion vector or a disparity vector of neighboring blocks adjacent to a target block at the same position as the current block in a frame corresponding to a different viewpoint than a frame including the current block. For example, if the current block exists at the (x, y) position of frame 1, the predictive vector predicted in the viewpoint of the current block is located at the target block at the (x, y) position of frame 2 that is different from the frame 1 in time. It may include a motion vector or a disparity vector of adjacent neighboring blocks. Here, the neighboring blocks may include an upper block of the target block, a left block of the target block, an upper right block of the target block, or an upper left block of the target block.

As another example, the viewpoint prediction vector may include a motion vector or a disparity vector of a target block that is most similar to the current block in a frame corresponding to a different viewpoint than a frame including the current block. Here, the target block most similar to the current block refers to a block having a high association with the pixel property and position of the current block.

When the viewpoint prediction vector is extracted, the index transmitter 202 may transmit an index for identifying the viewpoint prediction vector of the current block to the multiview video decoding apparatus 102 through the bitstream.

The prediction vector determiner 201 may extract a spatial prediction vector, a temporal prediction vector, and a spatial prediction vector of the current block to be encoded.

Then, the index transmitter 202 multi-views video decoding through the bitstream an index for identifying a final prediction vector determined for encoding the current block among the spatial prediction vector, the temporal prediction vector, and the spatial prediction vector of the current block. Transmit to device 102. For example, the index transmitter 202 may consider the spatial prediction vector in consideration of at least one of a threshold value, a distance of the prediction vector, a bit amount required when compressing the prediction vector, a degree of image quality degradation, or a cost function when compressing the prediction vector. In addition, an index for identifying a prediction vector having the best encoding performance among the temporal prediction vector and the spatial prediction vector may be transmitted.

According to the above-described embodiment, the information included in the bitstream may vary according to the encoding mode of the current block.

If the current block is encoded according to the skip mode, an index for identifying a spatial prediction vector, a temporal prediction vector, or a viewpoint prediction vector is transmitted through the bitstream. At this time, when the current block is included in the P-frame, the index indicates a skip mode (SKIP Mode) associated with the current block. When the current block is included in the B-frame, the index indicates a direct skip mode included in the direct mode associated with the current block.

When the current block is encoded in an encoding mode (inter mode) instead of a skip mode, the prediction vector determined to encode the current block as well as an index for identifying a spatial prediction vector, a temporal prediction vector, or a viewpoint prediction vector. The residual signal, which is a difference between the prediction block indicated by the current block and the current block, may be included in the bitstream. At this time, since the number of bits required due to the residual signal to be encoded decreases as the prediction block and the current block are similar, the encoding performance of the current block can be improved.

Referring to FIG. 3, the multiview video decoding apparatus 102 may include an index extractor 301 and a predictor vector determiner 302.

Hereinafter, a multi-view video decoding apparatus 102 that operates according to four embodiments will be described.

The index extractor 301 may extract the index of the prediction vector from the bitstream received by the multiview video encoding apparatus 101. Then, the prediction vector determiner 302 may determine the spatial prediction vector as the final prediction vector for reconstructing the current block based on the index.

The index extractor 301 may extract the index of the prediction vector from the bitstream received by the multiview video encoding apparatus 101. Then, the prediction vector determiner 302 may determine the temporal prediction vector as the final prediction vector for reconstructing the current block based on the index.

The index extractor 301 may extract the index of the prediction vector from the bitstream received by the multiview video encoding apparatus 101. Then, the prediction vector determiner 302 may determine the viewpoint prediction vector as the final prediction vector for reconstructing the current block based on the index.

The index extractor 301 may extract the index of the prediction vector from the bitstream received by the multiview video encoding apparatus. Then, the prediction vector determiner 302 may determine a final prediction vector for reconstructing the current block among the spatial prediction vector, the temporal prediction vector, and the viewpoint prediction vector based on the index.

For example, the index transmitter 202 may consider the spatial prediction vector in consideration of at least one of a threshold value, a distance of the prediction vector, a bit amount required when compressing the prediction vector, a degree of image quality degradation, or a cost function when compressing the prediction vector. In addition, an index for identifying a prediction vector having the best encoding performance among the temporal prediction vector and the spatial prediction vector may be transmitted.

The spatial prediction vector, the temporal prediction vector, and the viewpoint prediction vector will be described in detail with reference to FIG. 6.

Referring to FIG. 4, when a picture of three views (Left, Center, Right) is received, a multiview video coding method of encoding a GOP (Group of Picture) '8' is shown. In order to encode a multi-view picture, a hierarchical B picture is basically applied to a temporal axis and a view axis, thereby reducing redundancy between pictures.

According to the structure of a multiview video illustrated in FIG. 4, the multiview video encoding apparatus 101 first encodes a left picture (I-view), and then a right picture (P-view) and a center picture (Center). The picture corresponding to three viewpoints can be encoded by encoding Picture: B-view) in order. In the present invention, the frame and the picture may be used in the same concept.

In this case, the left picture may be encoded in such a manner that temporal redundancy is removed by searching for a similar region from previous pictures through motion estimation. In addition, since the right picture is encoded by using the previously encoded left picture as a reference picture, the right picture may be encoded in such a manner that temporal redundancy based on motion estimation and view redundancy based on disparity estimation are removed. have. In addition, since the center picture is encoded by using both the left picture and the right picture that are already encoded as the reference picture, the inter-view redundancy may be removed according to the shift estimation in both directions.

Referring to FIG. 4, in a multiview video encoding scheme, a picture that is encoded without using a reference picture of another view, such as a left picture, is unidirectionally predicted and encoded by a reference picture of another view, such as an I-View and a right picture. A picture is defined as a B-View that predicts and encodes a reference picture of left and right views in both directions, such as a P-View and a center picture.

Frames of MVC are largely classified into six groups according to the prediction structure. Specifically, the six groups include an I-view anchor frame for intra coding, an I-view non-anchor frame for inter-time inter-coding, a P-view anchor frame for inter-view unidirectional inter coding, and a unidirectional inter-coding between views. Classified into P-view non-anchor frame for bidirectional inter-coding between time bases, B-view anchor frame for bidirectional inter-coding between views, and B-view non-anchor frame for bidirectional inter-coding between time-bases. Can be.

5 is a diagram illustrating an example of a reference picture used for encoding a current block according to an embodiment of the present invention.

When the multi-view video encoding apparatus 101 encodes a current block located in a current frame that is the current picture 501,

reference pictures

502 and 503 located in time around the current frame and a reference picture 504 located in the viewpoint around the current frame , 505). In detail, the multi-view video encoding apparatus 101 may search the prediction blocks most similar to the current blocks in the reference pictures 502 ˜ 505 to encode a residual signal between the current block and the prediction block. The multi-view video encoding apparatus 101 may use the Ref1 picture 502 and the Ref2 picture 503 that are different in time from the current frame including the current block to search for the prediction block based on the motion vector. In addition, the Ref3 picture 504 and the Ref4 picture 505 having different viewpoints from the current frame including the current block may be used to search for the prediction block based on the disparity vector.

According to an embodiment of the present invention, the multi-view video encoding apparatus 101 may encode a multi-view video through the following process. However, the following process may be applied to Embodiment 4 of FIGS. 2 and 3, and in Embodiments 1 to 3, a process of calculating encoding performance to select one of a motion vector or a disparity vector to be used for competition may be performed. Can be omitted.

(1) Select the reference picture

(2) Determining by extracting prediction vectors (based on prediction structure)

(3) predict motion vectors or disparity vectors

(4) estimate motion vector or disparity vector

(5) Encoding and motion / variance information entropy encoding using residual signal (However, this step is omitted when the encoding mode is SKIP (DIRECT) mode).

(6) Calculate the coding performance (ex. RD cost)

According to an embodiment of the present invention, the multi-view video encoding apparatus 101 selects a prediction vector having the best encoding performance among a prediction vector corresponding to a current block, that is, a spatial prediction vector, a temporal prediction vector, and a viewpoint prediction vector. The current block can be encoded. That is, the multi-view video encoding apparatus 101 may select the prediction vector having the best encoding performance based on the competition between the prediction vectors.

The prediction vectors may be classified into three groups: a spatial prediction vector, a temporal prediction vector, and a viewpoint prediction vector. The prediction vector shown in FIG. 6 may be classified into three groups as shown in Table 1 below.

The spatial vector means a motion vector or a disparity vector corresponding to at least one neighboring block adjacent to the current block to be encoded.

For example, the spatial prediction vector corresponds to the first motion vector mv _a corresponding to the left block of the current block, the second motion vector mv _b corresponding to the top block of the current block, and the upper left block of the current block. A median to the third motion vector mv _d or the fourth motion vector mv _c corresponding to the upper right block of the current block and the first motion vector, the second motion vector, the third motion vector, and the fourth motion vector. The filter may include at least one of the fifth motion vectors mv _med to which the filter is applied.

In addition, the spatial prediction vector corresponds to the first disparity vector dv _a corresponding to the left block of the current block, the second disparity vector dv _b corresponding to the upper block of the current block, and the upper left block of the current block. Median filter on a third variation vector dv _d or a fourth variation vector dv _c corresponding to the upper right block of the current block and a first variation vector, a second variation vector, a third variation vector, and a fourth variation vector. May include at least one of the fifth variation vectors dv _med applied.

The temporal prediction vector may be determined based on a previous frame N-1 located earlier than the current frame N that includes the current block to be encoded.

For example, the temporal prediction vector is a target block at the same position (x, y) as the current block in a previous frame (Frame N-1) located at a time earlier than the current frame (Frame N) including the current block to be encoded. Motion vector mv _col1 or disparity vector dv _col1 .

In another example, the temporal prediction vector includes a motion vector mv _col2 or a disparity vector dv _col2 of at least one neighboring block adjacent to a target block at the same position as the current block in a previous frame. Here, the neighboring blocks may include a left block, an upper left block, an upper block, and an upper right block of the target block.

As another example, the temporal prediction vector may include a motion vector mv _tcor or a disparity vector dv _tcor of the target block most similar to the current block in the previous frame.

The predictive vector in view may be determined based on a neighboring frame (Inter-view Frame) indicating a different view from the current frame (Frame N) including the current block to be encapsulated.

For example, the viewpoint prediction vector may be a motion vector (mv _gdv1 ) or a disparity vector (dv _gdv1 ) of a target block located at the same position as the current block in a neighboring frame corresponding to a different viewpoint than a current frame including a current block to be _encoded . It may include.

In another example, the viewpoint prediction vector may be a motion vector (mv _gdv2 ) of neighboring blocks adjacent to a target block at the same position as the current block in a neighboring frame corresponding to a different viewpoint than a current frame including a current block to be encoded or It may include a disparity vector (dv _gdv2 ).

As another example, the predictive vector may be a motion vector (mv _vcor ) or a disparity vector (dv _vcor ) of a target block that is most similar to the current block in a neighboring frame corresponding to a different viewpoint than a current frame including a current block to be encoded. It may include.

In one embodiment of the present invention, the motion vector is a vector indicating a specific block (target block or neighboring blocks adjacent to the target block) included in a previous frame indicating the same time point or a different time as the current frame including the current block. Means. Here, the previous frame means a reference picture of the current block.

The disparity vector refers to a vector indicating a specific block (a target block or neighboring blocks adjacent to a target block) included in a neighboring frame indicating a different time or the same time as the current frame including the current block. Here, the neighboring frame means a reference picture of the current block.

According to an embodiment of the present invention, the multi-view video encoding apparatus may extract at least one of a spatial prediction vector, a temporal prediction vector, or a viewpoint prediction vector with respect to a current block to be encoded.

In this case, when a spatial prediction vector, a temporal prediction vector, and a viewpoint prediction vector are extracted with respect to the current block to be encoded, the multiview video encoding apparatus may determine a prediction vector to be used when finally encoding through a competition process between the prediction vectors. You can choose. According to an embodiment of the present invention, the multi-view video encoding apparatus 101 may extract a prediction vector having the best encoding performance from the extracted prediction vectors.

For example, the prediction vector determiner 202 may include (1) a threshold value, (2) a distance between a final determined motion / disparity vector and a prediction vector, (3) a degree of bit quantity and image quality deterioration necessary when encoded with a prediction vector, or ( 4) The prediction vector having the best encoding performance may be determined by considering at least one of the cost functions when the prediction vector is encoded.

Here, the cost function may be determined according to Equation 1 below.

Here, the sum of square difference (SSD) is a squared difference value between the current block s and the prediction block r based on the prediction vector, and λ is a Lagrangian coefficient. R is the number of bits required when encoding a signal obtained by the difference between a current frame to be encoded in a coding mode and a reference frame derived through motion prediction or disparity prediction. R also includes index bits indicating the types of prediction vectors.

It is important to generate an index bit by binarizing the index of the prediction vector in order to encode contention-based motion information or disparity information. Index bits may be defined as shown in Table 2 below. If the candidates of the spatial, temporal and viewpoint prediction vectors are all the same, the multiview video encoding apparatus 101 may not transmit the index bit to the multiview video decoding apparatus 102.

In FIG. 7, inter mode / intra mode means encoding a residual signal that is a difference between a current block to be encoded and a prediction block indicated by a motion vector extracted through motion prediction. The inter mode means that the prediction block is located in a different frame from the current block, and the intra mode means that the current block and the prediction block are located in the same frame. In this case, the spatial prediction vector may be used when encoding in the intra mode, and the temporal prediction vector and the viewpoint prediction vector may be used when encoding in the inter mode.

The multi-view video encoding apparatus 101 according to an embodiment of the present invention may extract a prediction vector corresponding to the current block to be encoded. In this case, the prediction vector may include at least one of a spatial prediction vector, a temporal prediction vector, and a viewpoint prediction vector.

If two or more prediction vectors are extracted, the multiview video encoding apparatus 101 may encode the input image using the final prediction vectors extracted based on the competition between the prediction vectors. In detail, the multi-view video encoding apparatus 101 selects a prediction vector having the best encoding performance among a spatial prediction vector, a temporal prediction vector, and a viewpoint prediction vector to select a final prediction vector for encoding a current frame to be encoded. You can decide. Then, the multi-view video encoding apparatus 101 encodes the current block based on the reference frame indicated by the prediction vector.

The multiview video encoding apparatus 101 transmits the bitstream of the multiview video to the multiview video decoding apparatus 102 as a result of encoding. The multiview video encoding apparatus 101 may also transmit an index bit indicating a type of a prediction vector used when encoding the multiview video to the multiview video decoding apparatus 102 through a bitstream.

The multiview video encoding apparatus 101 of FIG. 8 does not encode a residual signal when compared with the multiview video encoding apparatus of FIG. 7. That is, the multi-view video encoding apparatus 101 of FIG. 8 does not encode a residual signal that is a difference between a prediction block derived through motion prediction or disparity prediction for the current block and the current block. Instead, the multi-view video encoding apparatus 101 may include information (index bits) indicating that the current block is encoded according to the skip mode and transmit the information to the multi-view video decoding apparatus 102.

Referring to FIG. 9, the bitstream transmitted through the multiview video encoding apparatus 101 may include encoding information of a block to be reconstructed and a residual signal of the block.

For example, when the current block to be reconstructed is encoded in the inter mode / intra mode, the multiview video decoding apparatus 102 may extract a prediction vector associated with the current block. In this case, the prediction vector associated with the current block may be determined as index bits included in the bitstream. Then, the multi-view video decoding apparatus 102 may generate the predictive video by motion compensation or disparity compensation of the current block based on the predictive vector, and generate the final output video by combining with the residual signal included in the bitstream. Can be. In this case, the prediction vector may be any one of a spatial prediction vector, a temporal prediction vector, or a viewpoint prediction vector.

The multi-view video decoding apparatus 102 may generate the predictive video by performing motion compensation or disparity compensation based on the prediction vector associated with the current block to be reconstructed. In this case, the prediction vector may be determined according to the index bits of the current block included in the bitstream.

Since the current block encoded in the skip mode is encoded without transmitting the residual signal, the prediction video generated by the multi-view video decoding apparatus 102 may be an output video as it is.

Methods according to an embodiment of the present invention can be implemented in the form of program instructions that can be executed by various computer means and recorded in a computer readable medium. The computer readable medium may include program instructions, data files, data structures, etc. alone or in combination. Program instructions recorded on the media may be those specially designed and constructed for the purposes of the present invention, or they may be of the kind well-known and available to those having skill in the computer software arts.

As described above, the present invention has been described by way of limited embodiments and drawings, but the present invention is not limited to the above embodiments, and those skilled in the art to which the present invention pertains various modifications and variations from such descriptions. This is possible.

Therefore, the scope of the present invention should not be limited to the described embodiments, but should be determined not only by the claims below but also by the equivalents of the claims.

Claims

A prediction vector extracting unit which extracts a spatial prediction vector of the current block to be encoded; And

An index transmitter for transmitting an index for identifying a spatial prediction vector of the current block to a multi-view video decoding apparatus through a bitstream

Multiview video encoding apparatus comprising a.
The method of claim 1,

The spatial prediction vector is

A first motion vector corresponding to a left block of the current block, a second motion vector corresponding to a top block of the current block, a third motion vector corresponding to a top left block of the current block, or a top right block of the current block And a fourth motion vector and at least one of a fifth motion vector to which a median filter is applied to the first motion vector, the second motion vector, the third motion vector, and the fourth motion vector. .
The method of claim 1,

The spatial prediction vector is

A first disparity vector corresponding to a left block of the current block, a second disparity vector corresponding to an upper block of the current block, a third disparity vector corresponding to an upper left block of the current block, or a top right block of the current block A multiview video encoding apparatus comprising at least one of a fourth variation vector and a fifth variation vector to which a median filter is applied to the first variation vector, the second variation vector, the third variation vector, and the fourth variation vector. .
A prediction vector extracting unit which extracts a temporal prediction vector of a current block to be encoded; And

An index transmitter for transmitting an index for identifying a temporal prediction vector of the current block to a multi-view video decoding apparatus through a bitstream

Multiview video encoding apparatus comprising a.
The method of claim 4, wherein

The temporal prediction vector is

And a motion vector (MV) or a disparity vector (DV) of the first target block at the same position as the current block in a frame corresponding to a different time than the frame including the current block. Multiview video encoding apparatus.
The method of claim 4, wherein

The temporal prediction vector is

And a motion vector or a disparity vector of neighboring blocks adjacent to the first target block at the same position as the current block in a frame corresponding to a different time than the frame including the current block.
The method of claim 4, wherein

The temporal prediction vector is

And a motion vector or a disparity vector of a second target block most similar to the current block in a frame corresponding to a time different from that of the frame including the current block.
A prediction vector extracting unit which extracts a prediction vector of a view of a current block to be encoded; And

An index transmitter for transmitting an index for identifying the viewpoint prediction vector of the current block to a multi-view video decoding apparatus through a bitstream

Multiview video encoding apparatus comprising a.
The method of claim 8,

The viewpoint prediction vector is,

And a motion vector or a disparity vector of a first target block located at the same position as the current block in a frame corresponding to a different viewpoint than the frame including the current block.
The method of claim 8,

The viewpoint prediction vector is,

And a motion vector or a disparity vector of neighboring blocks adjacent to the first target block at the same position as the current block in a frame corresponding to a different viewpoint than the frame including the current block.
The method of claim 8,

The viewpoint prediction vector is,

And a motion vector or a disparity vector of a second target block most similar to the current block in a frame corresponding to a viewpoint different from the frame including the current block.
A prediction vector extracting unit which extracts a spatial prediction vector, a temporal prediction vector, and a spatial prediction vector of a current block to be encoded; And

An index transmitter which transmits an index for identifying a prediction vector used to encode a current block among the spatial prediction vector, the temporal prediction vector, and the spatial prediction vector to a multi-view video decoding apparatus through a bitstream.

Multiview video encoding apparatus comprising a.
The method of claim 12,

The index transmission unit,

The spatial prediction vector, the temporal prediction vector, and the spatial prediction vector in consideration of at least one of a threshold value, a distance of the prediction vector, a bit quantity required for compression into the prediction vector, a degree of image quality degradation, or a cost function when compressed into the prediction vector. A multiview video encoding apparatus for transmitting an index for identifying a prediction vector having the best encoding performance.
An index extractor which extracts an index of a prediction vector from a bitstream received by a multiview video encoding apparatus; And

A prediction vector determiner that determines a spatial prediction vector as a final prediction vector for reconstructing a current block based on the index.

Multi-view video decoding apparatus comprising a.
The method of claim 14,

The spatial prediction vector is

A first motion vector corresponding to a left block of the current block, a second motion vector corresponding to a top block of the current block, a third motion vector corresponding to a top left block of the current block, or a top right block of the current block And a fourth motion vector and at least one of a fifth motion vector to which a median filter is applied to the first motion vector, the second motion vector, the third motion vector, and the fourth motion vector. .
The method of claim 14,

The spatial prediction vector is

A first disparity vector corresponding to a left block of the current block, a second disparity vector corresponding to an upper block of the current block, a third disparity vector corresponding to an upper left block of the current block, or a top right block of the current block And a fourth variation vector and at least one of a first variation vector, a second variation vector, a third variation vector, and a fifth variation vector to which a median filter is applied to the fourth variation vector. .
An index extractor which extracts an index of a prediction vector from a bitstream received by a multiview video encoding apparatus; And

A prediction vector determiner that determines a temporal prediction vector as a final prediction vector for reconstructing a current block based on the index.

Multi-view video decoding apparatus comprising a.
The method of claim 17,

The temporal prediction vector is

And a motion vector or a disparity vector of a first target block located at the same position as the current block in a frame corresponding to a different time from the frame including the current block.
The method of claim 17,

The temporal prediction vector is

And a motion vector or a disparity vector of neighboring blocks adjacent to the first target block at the same position as the current block in a frame corresponding to a different time than the frame including the current block.
The method of claim 17,

The temporal prediction vector is

And a motion vector or a disparity vector of a second target block that is most similar to the current block in a frame corresponding to a time different from that of the frame including the current block.
An index extractor which extracts an index of a prediction vector from a bitstream received by a multiview video encoding apparatus; And

A prediction vector determiner that determines a viewpoint prediction vector as a final prediction vector for reconstructing a current block based on the index.

Multi-view video decoding apparatus comprising a.
The method of claim 21,

The viewpoint prediction vector is,

And a motion vector or a disparity vector of the first target block at the same position as the current block in a frame corresponding to a different viewpoint than the frame including the current block.
The method of claim 21,

The viewpoint prediction vector is,

And a motion vector or a disparity vector of neighboring blocks adjacent to the first target block at the same position as the current block in a frame corresponding to a different viewpoint than the frame including the current block.
The method of claim 21,

The viewpoint prediction vector is,

And a motion vector or a disparity vector of a second target block most similar to the current block in a frame corresponding to a viewpoint different from the frame including the current block.
An index extractor which extracts an index of a prediction vector from a bitstream received by a multiview video encoding apparatus; And

A prediction vector determiner that determines a final prediction vector for reconstructing a current block among a spatial prediction vector, a temporal prediction vector, and a viewpoint prediction vector based on the index.

Multi-view video decoding apparatus comprising a.
The method of claim 25,

The index transmission unit,

The spatial prediction vector, the temporal prediction vector, and the spatial prediction vector in consideration of at least one of a threshold value, a distance of the prediction vector, a bit quantity required for compression into the prediction vector, a degree of image quality degradation, or a cost function when compressed into the prediction vector. A multiview video decoding apparatus for transmitting an index for identifying a prediction vector having the best encoding performance.
Extracting a spatial prediction vector of a current block to be encoded; And

Transmitting an index for identifying a temporal prediction vector of the current block to a multi-view video decoding apparatus through a bitstream

Multi-view video encoding method comprising a.
Extracting a temporal prediction vector of the current block to be encoded; And

Transmitting an index for identifying a temporal prediction vector of the current block to a multi-view video decoding apparatus through a bitstream

Multi-view video encoding method comprising a.
Extracting a prediction prediction vector of a current block to be encoded; And

Transmitting an index for identifying the viewpoint prediction vector of the current block to a multiview video decoding apparatus through a bitstream

Multi-view video encoding method comprising a.
Extracting a spatial prediction vector, a temporal prediction vector, and a spatial prediction vector of a current block to be encoded; And

Transmitting an index for identifying a prediction vector used to encode a current block among the spatial prediction vector, the temporal prediction vector, and the spatial prediction vector to a multi-view video decoding apparatus through a bitstream;

Multi-view video encoding method comprising a.
Extracting an index of the prediction vector from the bitstream received by the multiview video encoding apparatus; And

Determining a spatial prediction vector as a final prediction vector for reconstructing a current block based on the index

Multi-view video decoding method comprising a.
Extracting an index of the prediction vector from the bitstream received by the multiview video encoding apparatus; And

Determining a temporal prediction vector as a final prediction vector for reconstructing a current block based on the index

Multi-view video decoding method comprising a.
Extracting an index of the prediction vector from the bitstream received by the multiview video encoding apparatus; And

Determining a viewpoint prediction vector as a final prediction vector for reconstructing a current block based on the index

Multi-view video decoding method comprising a.
Extracting an index of the prediction vector from the bitstream received by the multiview video encoding apparatus; And

Determining a final prediction vector for reconstructing a current block among a spatial prediction vector, a temporal prediction vector, and a viewpoint prediction vector based on the index;

Multi-view video decoding method comprising a.
A computer-readable recording medium having recorded thereon a program for executing the method of any one of claims 27-34.