KR20150010249A

KR20150010249A - Method and apparatus for improving memory efficiency through deriving limited motion information

Info

Publication number: KR20150010249A
Application number: KR1020130084981A
Authority: KR
Inventors: 박광훈; 이민성; 김경용; 허영수; 이윤진
Original assignee: 경희대학교 산학협력단
Priority date: 2013-07-18
Filing date: 2013-07-18
Publication date: 2015-01-28

Abstract

Disclosed are a method and a device for improving memory efficiency through limited motion information induction. The method is provided to induce and limit motion information (motion vector, reference picture number, prediction direction information) on reference blocks of a reference time within only an arbitrary range.

Description

FIELD OF THE INVENTION [0001] The present invention relates to a method and apparatus for enhancing memory efficiency through the derivation of limited motion information,

The present invention relates to a video encoding / decoding method and apparatus, and more particularly, to a method and apparatus for improving memory efficiency through limited motion information derivation.

3D video provides users with a stereoscopic effect as if they are seeing and feeling in the real world through a 3D stereoscopic display device. As a result of this research, the Joint Collaborative Team on 3D Video Coding Extension Development (JCT-3V), a joint standardization group of ISO / IEC Moving Picture Experts Group (MPEG) and ITU-T VCEG (Video Coding Experts Group) Video standards are in progress. Referring to FIG. 1, a three-dimensional video standard uses a real image and its depth information map to provide an advanced data format capable of supporting reproduction of an autostereoscopic image as well as a stereoscopic image, .

The basic 3D video system considered in the 3D video standard is shown in FIG. On the transmitting side, the image contents of N (N = 2) time points are acquired by using a stereo camera, a depth information camera, a multi-view camera, and a two-dimensional image into a three-dimensional image. The obtained image content may include video information of the N view point, its depth map information, camera-related additional information, and the like. The video content at time point N is compressed using the multi-view video encoding method, and the compressed bitstream is transmitted to the terminal through the network. The receiving side decodes the transmitted bit stream using the multi-view video decoding method, and restores the N view image. The reconstructed N-view image generates virtual view images at N or more viewpoints by a depth-image-based rendering (DIBR) process. The generated virtual viewpoint images are reproduced in accordance with various stereoscopic display devices to provide stereoscopic images to the user.

The depth information map used to generate the virtual viewpoint image is a representation of the distance between the camera and the actual object in the real world (depth information corresponding to each pixel at the same resolution as the real image) in a fixed number of bits. The method of generating the depth information map can be divided into a method of acquiring using a camera and a method of automatically generating an actual image using a texture image. In the case of a method of acquiring using a camera, there is a problem that the depth information camera operates only within a certain distance. On the other hand, in the case of a method of automatically generating using a real general image, a depth information map is generated using a disparity between two general images. In other words, the method compares an arbitrary one pixel at the current time with the pixels at the surrounding viewpoint, finds the pixel of the best matching area, and expresses the distance between the pixels by the depth information. As an example of a depth information map that is automatically generated using a real general image, FIG. 2 shows an example of a "balloons" image (FIG. 2 (a)) used in the 3D standardization of MPEG, And an information map (FIG. 2 (b)). The depth information map shown in FIG. 2 actually represents depth information on the screen in 8 bits per pixel.

As an example of a method of encoding an actual image and its depth information map, encoding can be performed using H.264 / AVC (MPEG-4 Part 10 Advanced Video Coding), and another example of MPEG Picture Experts Group) and VCEG (Video Coding Experts Group), which are standardized by the High Efficiency Video Coding (HEVC).

The HEVC performs coding with a CU (CU) to efficiently encode an image. 3 is a diagram illustrating a method of dividing a CU within a CTU (Coding Tree Unit) (hereinafter, also referred to as CTU (LCU) (Largest Coding Unit)) when coding an image.

As shown in FIG. 3, the HEVC sequentially divides an image into CTU (LCU) units, and then determines a divided structure by CTU (LCU) units. A partition structure means a distribution of CUs for efficiently encoding an image in a CTU (LCU), which distribution can be determined by determining whether the CU is divided into four CUs whose size is reduced by half the length and half. A partitioned CU can be recursively partitioned into four CUs, which are reduced in half by half in the same manner. At this time, the division of the CU can be performed to a predetermined depth. The depth information is information indicating the size of the CU and is stored in all the CUs. The depth of the underlying CTU (LCU) is zero, and the depth of the SCU is a predefined maximum depth. The depth of the CU increases by one every time a division is made in the horizontal and vertical halves from the CTU (LCU). For each depth, CUs that are not partitioned are sized to 2Nx2N, and when partitioning is performed, they are divided into 4 CUs of NxN size. The size of N decreases by half every time the depth increases by one. Figure 5 shows an example where the size of the smallest coding unit (SCU) with a minimum depth of 0 is 64 × 64 pixels and the maximum depth is 3 × 8 pixels. A 64x64 pixel CU (CTU (LCU)) has a depth of '0', a 32x32 pixel CU has a depth of 1, a 16x16 pixel CU has a depth of 2, and an 8x8 CU (SCU) has a depth of 3. Also, information on whether to divide a specific CU is represented by division information, which is 1-bit information for each CU. This partition information is included in all the CUs except for the SCU. When not partitioning the CU, '0' is stored in the partition information, and '1' is stored in case of partitioning.

Such a CU is an encoding unit and can have a coding mode in units of one CU. That is, each of the CUs can be divided into intra picture coding (also referred to as MODE_INTRA or INTRA) mode or inter picture coding (also referred to as MODE_INTER or INTER). The inter-picture coding (MODE_INTER) mode can be divided into a MODE_INTER mode and a MODE_SKIP (also referred to as SKIP) mode.

PU (Prediction Unit) is a unit of prediction. As shown in FIG. 4, one CU may be divided into several PUs (PUs) to be predicted. If the coding mode of one CU is INTRA, all the PUs in the CU are encoded into INTRA, and if the encoding mode of one CU is INTER, all the PUs in the CU are encoded into INTER. For example, when one CU is INTRA, the partition structure of the PU may include only PU 2Nx2N and PU NxN. For example, if one CU is INTER, then the PU may have all of the partition structures of FIG.

The actual image and its depth information map can be images obtained from multiple cameras as well as one camera. Images obtained from multiple cameras can be independently encoded, and a general two-dimensional video coding codec can be used. In addition, since images acquired from a plurality of cameras have correlation between viewpoints, images can be encoded using different inter-view prediction in order to increase the encoding efficiency. In one embodiment, FIG. 5 shows an example of an inter-view prediction structure for images acquired from three cameras.

5, view 1 is an image acquired from a camera located on the left side based on view 0 (View 0), view 2 (view 2) is an image acquired from a camera located on the right side based on view 0 . View 1 and View 2 perform inter-view prediction using View 0 as a reference picture and the coding order is View 1 and View 2 (View 0) must be encoded first. At this time, View 0 can be independently encoded regardless of other viewpoints, so it is referred to as an independent view. On the other hand, View 1 and View 2 are called Dependent View because View 0 is used as a reference image. The independent viewpoint image can be encoded using a general two-dimensional video codec. On the other hand, since the dependent view image must perform the inter-view prediction, it can be encoded using the 3D video codec including the inter-view prediction process.

Also, in order to increase the coding efficiency of view 1 (view 1) and view 2 (view 2), it is possible to encode using depth information map. For example, when an actual image and its depth information map are encoded, they can be independently encoded / decoded. Also, when the actual image and the depth information map are encoded, they can be encoded / decoded dependently on each other as shown in FIG. In one embodiment, an actual image can be encoded / decoded using an already-encoded / decoded depth information map, and conversely, a depth information map can be encoded / decoded using an already encoded / decoded real image.

In one embodiment, an encoding prediction structure for encoding an actual image obtained in three cameras and a depth information map thereof is shown in Fig. In FIG. 7, three actual images are shown as T0, T1, and T2 according to the viewpoint, and three depth information maps at the same position as the actual image are shown as D0, D1, and D2 according to the viewpoint. Here, T0 and D0 are images obtained at View 0, T1 and D1 are images acquired at View 1, and T2 and D2 are images acquired at View 2, respectively. Each picture can be encoded into I (Intra Picture), P (Uni-prediction Picture), and B (Bi-prediction Picture). In Fig. 7, arrows indicate prediction directions. That is, the actual image and its depth information map are encoded / decoded depending on each other.

It may mean motion information of current block (only motion vector of the current block in the real image, or motion vector, reference picture number, unidirectional prediction, bi-directional prediction, inter-view prediction, temporal prediction or other prediction). The method for inferring is divided into temporal prediction and inter-view prediction. Temporal prediction is a prediction method using temporal correlation within the same time, and inter-view prediction is a prediction method using inter-view correlation at an adjacent time. Such temporal prediction and inter-view prediction can be used in combination with each other in one picture.

A merge method is used as a method of coding motion information in image encoding / decoding. At this time, the motion information is information including at least one of a motion vector, an index for a reference image, and a prediction direction (unidirectional, bidirectional, etc.). The prediction direction can be largely divided into unidirectional prediction and bidirectional prediction according to use of a reference picture list (Ref PictureList). The unidirectional prediction is divided into forward prediction (Pred_L0) using the forward reference picture list (LIST0) and backward prediction (Pred_L1; Prediction L1) using the reverse reference picture list (LIST1). The bidirectional prediction (Pred_BI) uses both the forward reference picture list (LIST 0) and the backward reference picture list (LIST 1), and both the forward prediction and the backward prediction are present. In addition, the forward reference picture list (LIST0) may be copied to the reverse reference picture list (LIST1), and two forward prediction may be included in the bidirectional prediction. The prediction direction can be defined using predFlagL0, predFlagL1. For example, in the case of unidirectional prediction and forward prediction, predFlagL0 may be '1' and predFlagL1 may be '0'. In case of unidirectional prediction and backward prediction, predFlagL0 may be '0' and predFlagL1 may be '1'. In case of bi-directional prediction, predFlagL0 may be '1' and predFlagL1 may be '1'.

The merging motion can be a merge motion in units of a coding unit (CU) and a merging motion in units of a prediction unit (PU). In the case of performing a merge movement in units of a CU or a PU (hereinafter, referred to as a 'block' for convenience of description), information on whether to perform a merge movement for each block partition and information on whether to perform a merge movement on neighboring blocks It is necessary to transmit information on which of the blocks adjacent to the left side of the block, the adjacent block on the upper side of the current block, the temporally neighboring block of the current block, etc., is to be merged.

The merge candidate list (List) represents a list in which the motion information is stored, and is generated before the merge movement is performed. Here, the motion information stored in the merged motion candidate list may be motion information of a neighboring block adjacent to the current block or motion information of a collocated block corresponding to the current block in the reference image. Also, the motion information stored in the merged motion candidate list may be new motion information created by combining motion information already present in the merged motion candidate list.

The merged motion candidate list is used for motion information of the current block in the neighboring blocks A, B, C, D, and E and the candidate block H (or M) And if it is available, the motion information of the block can be input to the merged motion candidate list. Also, if each neighboring block has the same motion information, the motion information of the neighboring block is not included in the merged motion candidate list. For example, when generating a merged motion candidate list for X block in FIG. 8, if neighboring block A is available and included in the merged motion candidate list, neighboring block B is not the same motion information as neighboring block A Only candidate merging motion candidate list. In the same way, the neighboring block C can be included in the merged motion candidate list only when it is not the same motion information as the neighboring block B. Can be applied to the peripheral block D and the peripheral block E in the same way. Here, the same motion information may mean that the same motion vector is used and the same reference picture is used and the same prediction direction (unidirectional (forward, reverse), bidirectional) is used. Finally, in FIG. 8, the merged motion candidate list for the X block may be added to the list in a predetermined order, for example, A → B → C → D → E → H (or M) block order.

In the three-dimensional video coding, in order to efficiently encode motion information, motion information at an adjacent time point can be used. FIG. 9 shows an example of a method of deriving motion information of a current block using motion information at an adjacent time.

In FIG. 9, in order to derive the motion information for the current block (the block for the current position X) at an adjacent point in time, a target block (block for the reference position XR) most similar to the motion information of the current block at the adjacent point Find. Since the current picture is different only in the camera distance at the reference point adjacent to the current picture at the current point of time, the object block closest to the current block at the adjacent point in time can be derived from the displacement vector DV (Disparity Vector) .

A method of using neighboring blocks of the current block and a method of using a depth information map associated with the current block may be used to derive a variation vector, respectively, and they may be used in combination. First, since the motion information of the inter-view prediction block indicates the disparity vector, the inter-view prediction is performed in the neighboring blocks of the current block, and the motion information of the current block can be used as the disparity vector. Next, since the depth information map is generated through the time difference between the two view points, the depth information value of the depth information map block associated with the current block can be converted into a variation vector.

The target block including the reference position XR associated with the current position X of the current block in the current picture at the reference time is found using the found variation vector and motion information of the current block can be used as motion information of the current block.

The following describes the process of deriving motion information according to the above-described method based on the 3D-HEVC Test Model 3.

Terms and formulas used in 3D-HEVC Test Model 3

- PredMode; Indicates the encoding mode of the current PU (Prediction Unit) block, and MODE_SKIP, MODE_INTRA, and MODE_INTER are present.

- RefPicListX; As a reference picture list, X may be '0' or '1'. For example, if X is 0 (i.e., RefPicList0), the L0 reference picture list is used.

- PicWidthInSamplesL; Width in current picture

- PicHeightInSamplesL; Height in current picture

- CtbSizeY; The height of the current CTU (or LCU)

[Equation 1]

The process of deriving a temporal motion vector between temporal points

The inputs at this stage are as follows

- the upper left position of the current PU block; (xP, yP)

- variable indicating the width and height of the current PU block; (nPSW, nPSH)

Reference picture list delimiter; X (for example, if X is 0, use the L0 reference picture list).

- reference point index; refViewIdx

- mutation vectors; mvDisp

- Merge_flag; Indicates whether the current PU block is coded in the merge mode. If the current PU block is coded in the merge mode, Merge_flag is set to '1' and vice versa.

An index indicating a reference picture in RefPicListX; refIdxLX (where X can be either '0' or '1')

The output at this stage is shown below.

A flag indicating whether or not a temporal inter-view motion vector is valid; availableFlagLXInterView

- Temporal inter-view motion vector candidates; mvLXInterView (if availableFlagLXInterView is true) (where X can be either '0' or '1')

An index indicating a reference picture in RefPicListX; refIdxLX

Initialize availableFlagLXInterView to '0'.

Initialize mvL0InterView and mvL1InterView to '0'.

The position of the reference block with respect to the current block at the reference time is derived as follows using the mutation vector.

&Quot; (2) "

In the reference view image for ViewIdx set to the refViewIdx value, let the encoded coding block containing the (xRef, yRef) position derived above be refCU.

If PredMode of refCU is MODE_SKIP or MODE_INTER, perform the following procedure. (Where the variable Y ranges from (1-X) in X).

1. Set refIdxLX of refPU to refPredFlagLY.

2. Set mVLY of refPU to refMvLY.

3. Set refPicListLY of refPU to refRefPicListLY.

4. If refPredFlagLY is 'true', perform the following steps from i (mergeFlag? 0: refIdxLX) to (mergeFlag? Num_ref_idx_1X_active_minus1: refIdxLX).

A. If availableFlagLXInterView is '0' and the POC (Picture Order Count) of refRefPicListLY [refRefIdxLY] is equal to the POC of RefPicListLX [i], set availableFlagLXInterView to '1' and do the following:

i. mvLXInterView is derived as follows.

mvLXInterView [0] = refMvLY [0]

mvLXInterView [1] = refMvLY [1]

ii. If mergeFlag is '1', refIdxLX is derived as follows.

refIdxLX = i

o Problems with vertical guidance

Cameras installed horizontally may have errors in their position and orientation. In order to minimize the error of the unaligned multi-view image, vertical parallax correction or rectification is performed on the multi-view images. Even if a multi-view image is rectified, there may be an error. In addition, there may exist objects having vertical motion in the camera, which may cause a difference in movement in the vertical direction between the viewpoints due to the shooting time error of the camera. Accordingly, in the process of FIG. 9, a disparity vector (DV) derived from a neighboring block may have both a horizontal direction component and a vertical direction component.

FIG. 10 shows an example of reference blocks found using a variation vector DV derived from a neighboring block for any block A, B, and C of the current LCU (or CTU) M.

As shown in Fig. 10, it can be confirmed that the reference block has moved up and down from its original position due to the vertical component of the variation vector DV derived from the neighboring block. In this case, motion information (motion vector, reference picture number, and prediction direction information) must exist in the internal memory in order to quickly and easily access the motion information on the reference blocks at the reference time. Otherwise, if the motion information of the reference blocks exists in the external memory, it must be read from the external memory to the internal memory whenever necessary, thereby increasing the memory bandwidth.

FIG. 11 shows an example of the memory usage required to derive motion information in the coding and decoding steps for arbitrary blocks A, B, and C of the current LCU (or CTU) M.

11, in order to derive the motion information for any block A, B, and C of the current LCU (or CTU) M, not only the motion information about the horizontal position of the current block (first portion) And motion information of the lower block (part 3) are required. That is, motion information of reference blocks for at least three LCUs is required. Also, in order to reduce the coding and decoding speed for the current block, the motion information of these reference blocks must be present in the internal memory.

However, when motion information of reference blocks is stored in the internal memory, the limitation is imposed due to the limited internal memory capacity. In addition, when the size of the current image to be encoded is larger than FHD (1920x1088), the amount of motion information of the reference blocks stored in the internal memory is considerably large. Therefore, a method for reducing motion information of reference blocks to be stored is needed.

The feature of the present invention is to provide a method for restricting movement information (motion vector, reference picture number, prediction direction information) for reference blocks at a reference time point only within a certain range.

The memory efficiency is improved through a method of restricting motion information (motion vector, reference picture number, and prediction direction information) for the reference blocks at the reference time only within an arbitrary range.

Figure 1. Basic structure and data format of 3D video system
2. An example of a real image of a "balloons" image and a depth information map image: (a) a real image, (b) a depth information map
3. An example of a method of dividing a CTU (CTU (LCU)) into CU units
Figure 4. Example of partition structure of PU
Figure 5. An example of the inter-view prediction structure in 3D Video Codec
Figure 6. Implementation of 3D Video Encoder / Decoder
Figure 7. Example of the prediction structure of 3D Video Codec
8. An example of neighboring blocks of a current block used as a merging motion candidate list
9. An example of a method of deriving motion information of a current block using motion information at an adjacent time
Figure 10. An example of reference blocks found using a variation vector (DV) derived from a neighboring block.
Figure 11. An example of memory usage required to derive motion information.
Figure 12. An example of a conceptual diagram of the proposed method
Figure 13. An example of the basic structure of the proposed method

In order to efficiently use the internal memory, the present invention proposes a method of restricting motion information (motion vectors, reference picture numbers, and prediction direction information) for reference blocks at reference time points within a certain range.

Fig. 12 shows an example of a conceptual diagram of the proposed method.

In FIG. 12, the reference blocks A 'and C' of any of the blocks A and C of the current LCU (or CTU) M do not exist in any range (one part, for example one LCU size). Therefore, in order to be within an arbitrary range, the vertical direction component of the variation vector DV is clipped to an arbitrary range in order to derive the reference blocks A 'and C'. The clipped disparity vector (DV) is then used to derive a reference block for the current block.

13 shows an example of a basic structure of the proposed method.

In the " clipping process " in Fig. 13, a variation vector ("motion information ex. DV") and an arbitrary range derived from the neighboring blocks of the current block are input, Clipping the vertical direction component and outputting "changed DV". Next, in the " motion information derivation " process, the " changed DV " and " motion information in the reference view image " are input and the motion information is derived from the reference view image using the changed displacement vector to output & .

The following description is based on the 3D-HEVC Test Model 3 to derive the motion information according to the proposed method.

Terms and formulas used in 3D-HEVC Test Model 3

- PicWidthInSamplesL; Width in current picture

- PicHeightInSamplesL; Height in current picture

- CtbSizeY; The height of the current CTU (or LCU)

Clip3 (x, y, z) is expressed by Equation (1).

The process of deriving a temporal motion vector between temporal points

The inputs at this stage are:

- the upper left position of the current PU block; (xP, yP)

- reference point index; refViewIdx

- mutation vectors; mvDisp

The output at this stage is shown below.

An index indicating a reference picture in RefPicListX; refIdxLX

Initialize availableFlagLXInterView to '0'.

Initialize mvL0InterView and mvL1InterView to '0'.

The upper left position (xCtb, yCtb) of the current CTU to be encoded is derived.

&Quot; (3) "

1. Set refIdxLX of refPU to refPredFlagLY.

2. Set mVLY of refPU to refMvLY.

3. Set refPicListLY of refPU to refRefPicListLY.

i. mvLXInterView is derived as follows.

mvLXInterView [0] = refMvLY [0]

mvLXInterView [1] = refMvLY [1]

ii. If mergeFlag is '1', refIdxLX is derived as follows.

refIdxLX = i

In order to verify the superiority of the method proposed in the present invention, experiments were performed on HTM7.0, which is a verification model of 3D-HEVC. Experimental images of JCT3V were used.

[Table 1] Experimental results of the method proposed in the present invention

Table 1 compares the proposed method with the existing method. The method proposed in the present invention can reduce the memory by at least 1/3 and reduce the memory of the maximum 1 / (full picture), but has little effect on the performance as compared with the conventional method.

Alternatively, in order to derive the motion information of the current block, the motion vector (DV) used to derive the motion information (motion vector, reference picture number, prediction direction information) Only the horizontal direction component can be used without using the direction component. For example, it is possible to substitute '0' for the vertical direction component in the disparity vector DV to derive motion information (motion vector, reference picture number, prediction direction information) for the reference blocks at the reference time . Since there is only movement in the horizontal direction, memory efficiency can be increased.

For example, the proposed method can calculate the position (xRef, yRef) of the reference block at the reference time through the following equation.

&Quot; (4) "

Alternatively, it can be solved by limiting the vertical direction range of motion estimation (ME) used for inter-view prediction. For example, the motion estimation (ME) range in the vertical direction may be limited to '0'. In this case, the vertical direction component of the variation vector is automatically set to '0'.

In order to verify the method of limiting the vertical direction range of motion estimation (Motion Estimation) to '0', experiments were performed on HTM7.0 which is a verification model of 3D-HEVC. Experimental images of JCT3V were used.

[Table 2] Experimental results of a method of limiting the range of vertical direction of motion prediction

Table 2 compares the existing method with the method of limiting the range of the vertical direction of motion estimation in the present invention. The method of limiting the range of the motion estimation (Motion Estimation) in the vertical direction has an advantage of greatly reducing the memory compared to the conventional method, but the bit amount is increased by 0.3% on the average and increased by 2.3% As shown in FIG.

In another embodiment, the motion estimation (ME) range in the vertical direction may be limited to 'arbitrary values (A, B)' within a memory allowable range. In order to converge the vertical component of the disparity vector to within the allowable range of the memory, the position (xRef, yRef) of the reference block of the reference point can be calculated by the following equation. Here, A and B represent arbitrary values. For example, A can be "CtbSizeY" and B can be "2 * CtbSizeY - 1". Here, CtbSizeY represents the height of the current CTU (or LCU).

&Quot; (5) "

In another embodiment, the range of motion estimation (ME) in the vertical direction may be limited to an arbitrary value T within a range in which the memory is allowed, in which case the vertical component of the variation vector may be any value T ), And can be performed in combination with the above proposed method of this method. Therefore, in order to converge the vertical component of the mutation vector to within a range of arbitrary values (T) within a memory allowable range, the position (xRef, yRef) of the reference block at the reference time point can be calculated by the following equation have.

&Quot; (6) "

The above-described method can be used in 3D-HEVC (High Efficiency Video Coding), which is currently being jointly standardized by MPEG (Moving Picture Experts Group) and VCEG (Video Coding Experts Group). Therefore, the above-described method can be applied in different ranges depending on the block size, the CU (Coding Uint) depth, or the TU (Transform Unit) depth, as in the example of FIG. The variable (i.e., size or depth information) for determining the coverage can be set to use a predetermined value by the encoder or the decoder, use a predetermined value according to the profile or the level, When described in a stream, the decoder may use this value from a bitstream. In case of varying the application range according to the CU depth, as shown in Fig. 16, the method A) is applied only to a depth of a given depth or more, the method B) There can be a way.

Table 3 Examples of range determination methods applying the methods of the present invention where the given CU (or TU) depth is 2. (O: applied to the depth, X: not applied to the depth.)

When the methods of the present invention are not applied to all the depths, they may be indicated by using an optional flag, or a value one greater than the maximum value of the CU depth may be expressed by signaling with a CU depth value indicating the application range have.

As a further feature of the present invention, the applicability of the above method and any range may be included in the bitstream and may be applied to the SPS (Sequence Parameter Set), PPS (Picture Parameter Set) and Slice Header Syntax as follows: have.

[Table 4] Examples applied to SPS

[Table 5] Examples applied to PPS

[Table 6] Examples applied to Slice Header Syntax

[Table 7] Another example applied to Slice Header Syntax

Here, "restricted_dv_enable_flag" indicates whether or not the proposed method is applied. When the proposed method is applied, "restricted_dv_enable_flag" becomes "1", and when not applied, "restricted_dv_enable_flag" becomes "0". The opposite is also possible.

Also, "restricted_dv_info" is a syntax that is activated when the proposed method is applied (or when "restricted_dv_enable_flag" is true), and when signaling an arbitrary range for clipping a "vertical direction component" in units of sequence, picture and slice The syntax to use. Here, these values can be encoded (se (v)) into a form having a positive or negative sign. Alternatively, these values can be encoded (ue (v)) with a 0 and a positive sign.

Claims

(Motion vector, reference picture number, and prediction direction information) for the reference blocks of the reference point only within a certain range.

The method of claim 1, further comprising: receiving "motion information (ex. DV)" and "arbitrary range" and outputting "changed DV" by clipping the vertical component of the variation vector to an arbitrary range; Receiving the changed DV and the motion information in the reference view image and outputting the derived motion information by deriving the motion information from the reference view image using the changed disparity vector; / RTI >

2. The method of claim 1, further comprising: limiting a vertical motion estimation (ME) range to an arbitrary value (T) within a memory allowable range; (E.g., "motion information ex. DV"), "arbitrary range" and "arbitrary value T", and clipping the vertical component of the mutation vector to "arbitrary range and arbitrary value T" Outputting "changed DV"; Receiving the changed DV and the motion information in the reference view image and outputting the derived motion information by deriving the motion information from the reference view image using the changed disparity vector; / RTI >