KR20080007086A

KR20080007086A - A method and apparatus for decoding/encoding a video signal

Info

Publication number: KR20080007086A
Application number: KR1020070033404A
Authority: KR
Inventors: 전용준; 전병문; 구한서; 박승욱
Original assignee: 엘지전자 주식회사
Priority date: 2006-07-14
Filing date: 2007-04-04
Publication date: 2008-01-17

Abstract

A method and an apparatus for decoding/encoding video signals are provided to perform coding more efficiently when separately coding an anchor picture and a non-anchor picture according to anchor picture identification information because inter-view dependency between the anchor picture and the non-anchor picture is different from each other. A video signal decoding method comprises the following steps of: obtaining flag information indicating whether a video signal is a multi-view video coded bit stream; if so, obtaining identification information indicating whether a coded picture of a current NAL(Network Abstraction Layer) is an anchor picture or not; and decoding information about a multi-view video according to the identification information.

Description

A method and apparatus for decoding / encoding a video signal

도 1은 본 발명이 적용되는 비디오 신호 디코딩 장치의 개략적인 블록도를 나타낸다. 1 is a schematic block diagram of a video signal decoding apparatus to which the present invention is applied.

도 2는 본 발명이 적용되는 실시예로서, 앵커 픽쳐 식별 정보를 포함하는 NAL 단위의 구조를 나타낸다.FIG. 2 illustrates a structure of an NAL unit including anchor picture identification information according to an embodiment to which the present invention is applied.

도 3은 본 발명이 적용되는 실시예로서, 앵커 픽쳐의 개념을 설명하기 위한 예측 구조를 나타낸다.3 is an embodiment to which the present invention is applied and shows a prediction structure for explaining the concept of an anchor picture.

도 4는 본 발명이 적용되는 실시예로서, H.264/AVC와 호환 가능한 비트스트림에서의 앵커 픽쳐 식별 정보를 포함하는 NAL 단위의 구조를 나타낸다.FIG. 4 is an embodiment to which the present invention is applied and shows a structure of an NAL unit including anchor picture identification information in a bitstream compatible with H.264 / AVC.

도 5는 본 발명이 적용되는 실시예로서, 앵커 픽쳐 식별 정보를 이용하여 다시점 영상을 디코딩하는 장치의 개략적인 블록도를 나타낸다.FIG. 5 is a schematic block diagram of an apparatus for decoding a multiview image using anchor picture identification information according to an embodiment to which the present invention is applied.

도 6은 본 발명이 적용되는 실시예로서, 새롭게 정의된 앵커 픽쳐의 개념을 설명하기 위한 예측 구조를 나타낸다.6 is an embodiment to which the present invention is applied and shows a prediction structure for explaining the concept of a newly defined anchor picture.

본 발명은 비디오 신호의 디코딩/인코딩 방법 및 장치에 관한 기술이다.The present invention relates to a method and apparatus for decoding / encoding a video signal.

현재 주류를 이루고 있는 비디오 방송 영상물은 한 대의 카메라로 획득한 단일시점 영상이다. 반면, 다시점 비디오(Multi-view video)란 한 대 이상의 카메라를 통해 촬영된 영상들을 기하학적으로 교정하고 공간적인 합성 등을 통하여 여러 방향의 다양한 시점을 사용자에게 제공하는 3차원(3D) 영상처리의 한 분야이다. 다시점 비디오는 사용자에게 시점의 자유를 증가시킬 수 있으며, 한대의 카메라를 이용하여 획득할 수 있는 영상 영역에 비해 큰 영역을 포함하는 특징을 지닌다. The mainstream video broadcasting image is a single view image acquired with one camera. Multi-view video, on the other hand, is a three-dimensional (3D) image processing method that geometrically corrects images taken by more than one camera and provides users with various viewpoints in various directions through spatial synthesis. It is a field. Multi-view video can increase the freedom of view for the user, and has a feature that includes a larger area than the image area that can be acquired using a single camera.

이러한 다시점 비디오 영상은 전반적인 코딩 구조에 따라 랜덤 액세스(random access)를 하게 될 경우, 그 구조가 복잡하여 오랜 시간 지연이 문제가 된다. 따라서 이러한 문제점을 해결하기 위하여 앵커 픽쳐(anchor picture)라는 새로운 픽쳐 타입을 효율적으로 이용할 필요가 있다.When a multi-view video image is randomly accessed according to an overall coding structure, the structure is complicated and a long time delay becomes a problem. Therefore, in order to solve this problem, it is necessary to efficiently use a new picture type called an anchor picture.

본 발명의 목적은 다시점 영상 데이터에 대하여 효율적으로 디코딩하는 방법 및 장치를 제공하는데 있다.An object of the present invention is to provide a method and apparatus for efficiently decoding multi-view image data.

본 발명의 다른 목적은 앵커 픽쳐 식별 정보를 규격화된 방식으로 추가함으로써 다시점 비디오 영상의 랜덤 액세스를 효율적으로 수행하고자 함에 있다.Another object of the present invention is to efficiently perform random access of a multiview video image by adding anchor picture identification information in a standardized manner.

본 발명의 또 다른 목적은 앵커 픽쳐 식별 정보를 이용함으로써 시점간 예측을 보다 효율적으로 수행하고자 함에 있다.Still another object of the present invention is to more efficiently perform inter-view prediction by using anchor picture identification information.

본 발명의 또 다른 목적은 앵커 픽쳐 식별 정보를 이용함으로써 시점간 예측을 위한 참조 픽쳐들을 보다 효율적으로 관리하고자 함에 있다.Another object of the present invention is to more efficiently manage reference pictures for inter-view prediction by using anchor picture identification information.

상기 목적을 달성하기 위하여 본 발명은 비디오 신호로부터 다시점 영상 코딩된 비트스트림인지 여부를 나타내는 플래그 정보를 획득하는 단계와 상기 플래그 정보에 따라 상기 비디오 신호가 다시점 영상 코딩된 비트스트림인 경우, 현재 NAL의 코딩된 픽쳐가 앵커 픽쳐인지를 나타내는 식별 정보를 획득하는 단계 및 상기 식별 정보에 따라 다시점 영상에 관한 정보를 디코딩하는 단계를 포함하는 것을 특징으로 하는 비디오 신호 디코딩 방법을 제공한다.In order to achieve the above object, the present invention provides a method for obtaining flag information indicating whether a video stream is a multiview video coded bit stream from a video signal and when the video signal is a multiview video coded bitstream according to the flag information. And acquiring identification information indicating whether the coded picture of the NAL is an anchor picture and decoding the information on the multi-view image according to the identification information.

또한, 본 발명은 다중 시점을 기반으로 하는 비디오 신호 인코딩 방법에 있어서, 시점 간의 에측 구조를 나타내는 시점 의존성 정보를 생성하는 단계와 상기 시점 의존성 정보에 따라, 현재 NAL의 코딩된 픽쳐가 앵커 픽쳐인지를 나타내는 식별 정보를 생성하는 단계를 포함하는 것을 특징으로 하는 비디오 신호 인코딩 방법을 제공한다.In addition, the present invention provides a video signal encoding method based on multiple viewpoints, the method comprising: generating view dependency information indicating an prediction structure between views and whether the current NAL coded picture is an anchor picture according to the view dependency information. A method of encoding a video signal, the method comprising generating identification information indicating the identification information.

또한, 본 발명은 비디오 신호로부터 다시점 영상 코딩된 비트스트림인지 여부를 나타내는 플래그 정보를 획득하는 비트스트림 판단부와 상기 플래그 정보에 따라 상기 비디오 신호가 다시점 영상 코딩된 비트스트림인 경우, 현재 NAL의 코딩된 픽쳐가 앵커 픽쳐인지를 나타내는 식별 정보를 획득하는 앵커픽쳐 식별정보 획득부 및 상기 식별 정보에 따라 다시점 영상에 관한 정보를 디코딩하는 다시점 영상 디코딩부를 포함하는 것을 특징으로 하는 비디오 신호 디코딩 장치를 제공한다.The present invention also provides a bitstream determination unit for obtaining flag information indicating whether a multiview video coded bitstream is a video stream and a current NAL when the video signal is a multiview video coded bitstream according to the flag information. A video signal decoding comprising: an anchor picture identification information acquisition unit for obtaining identification information indicating whether a coded picture of an anchor picture is an anchor picture and a multiview image decoding unit for decoding information about a multiview image according to the identification information Provide the device.

상술한 목적 및 구성의 특징은 첨부된 도면과 관련하여 다음의 상세한 설명을 통하여 보다 명확해질 것이다. 이하 첨부된 도면을 참조하여 본 발명에 따른 바 람직한 실시예들를 상세히 설명한다.The above objects and features of the construction will become more apparent from the following detailed description taken in conjunction with the accompanying drawings. Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

아울러, 본 발명에서 사용되는 용어는 가능한 한 현재 널리 사용되는 일반적인 용어를 선택하였으나, 특정한 경우는 출원인이 임의로 선정한 용어도 있으며, 이 경우는 해당되는 발명의 설명 부분에서 상세히 그 의미를 기재하였으므로, 단순한 용어의 명칭이 아닌 용어가 가지는 의미로서 본 발명을 파악하여야 함을 밝혀두고자 한다.In addition, the terms used in the present invention was selected as a general term widely used as possible now, but in some cases, the term is arbitrarily selected by the applicant, in which case the meaning is described in detail in the description of the invention, It is to be understood that the present invention is to be understood as the meaning of terms rather than the names of terms.

상기 디코딩 장치는 크게 파싱부(100), 엔트로피 디코딩부(200), 역양자화/역변환부(300), 인트라 예측부(400), 디블록킹 필터부(500), 복호 픽쳐 버퍼부(600), 인터 예측부(700) 등을 포함한다. 그리고, 복호 픽쳐 버퍼부(600)는 참조 픽쳐 리스트 생성부(610), 참조 픽쳐 관리부(620) 등을 포함한다.The decoding apparatus includes a parsing unit 100, an entropy decoding unit 200, an inverse quantization / inverse transform unit 300, an intra prediction unit 400, a deblocking filter unit 500, a decoded picture buffer unit 600, Inter prediction unit 700 and the like. The decoded picture buffer unit 600 includes a reference picture list generator 610, a reference picture manager 620, and the like.

파싱부(100)에서는 수신된 비디오 영상을 복호하기 위하여 NAL 단위로 파싱을 수행한다. 일반적으로 하나 또는 그 이상의 시퀀스 파라미터 셋과 픽쳐 파라미터 셋이 슬라이스 헤더와 슬라이스 데이터가 디코딩되기 전에 디코더로 전송된다. 이 때 NAL 헤더 영역 또는 NAL 헤더의 확장 영역에는 여러 가지 속성 정보가 포함될 수 있다. 예를 들어, 시간적 레벨(temporal level) 정보, 시점 레벨(view level) 정보, 앵커 픽쳐(anchor picture) 식별 정보, 시점 식별(view identifier) 정보 등이 포함될 수 있다. The parsing unit 100 performs parsing on a NAL basis to decode the received video image. In general, one or more sequence parameter sets and picture parameter sets are transmitted to the decoder before the slice header and slice data are decoded. In this case, various attribute information may be included in the NAL header area or the extension area of the NAL header. For example, temporal level information, view level information, anchor picture identification information, view identifier information, and the like may be included.

여기서, 앵커 픽쳐 식별 정보란, 현재 NAL 유닛의 코딩된 픽쳐가 앵커 픽쳐 인지 여부를 식별하는 정보를 말한다(③). 그리고, 앵커 픽쳐라 함은, 모든 슬라이스들이 동일 시간대의 프레임에 있는 슬라이스만을 참조하는 부호화된 픽쳐를 의미한다. 예를 들어, 다른 시점에 있는 슬라이스만을 참조하고 현재 시점에 있는 슬라이스는 참조하지 않는 부호화된 픽쳐를 말한다. 다시점 영상의 복호화 과정에 있어서, 시점 간의 랜덤 액세스는 필수적이다. 따라서, 복호화 노력을 최소화하면서 임의 시점에 대한 액세스가 가능하도록 하여야 한다. 여기서 효율적인 랜덤 액세스를 실현하기 위하여 앵커 픽쳐 식별 정보가 필요할 수 있다. Here, the anchor picture identification information refers to information for identifying whether the coded picture of the current NAL unit is an anchor picture (3). An anchor picture refers to an encoded picture in which all slices refer only to slices in frames of the same time zone. For example, an encoded picture refers to only a slice at another viewpoint and no slice at the current viewpoint. In the decoding process of a multiview image, random access between viewpoints is essential. Therefore, access to arbitrary time points should be made possible while minimizing decryption effort. Here, anchor picture identification information may be needed to realize efficient random access.

또한, 다시점 비디오 영상의 전반적인 코딩 구조에 따를 때, 앵커 픽쳐와 넌-앵커 픽쳐의 시점간 의존성이 다르기 때문에 상기 앵커 픽쳐 식별 정보에 따라 앵커 픽쳐와 넌-앵커 픽쳐를 구별할 필요가 있다. 따라서, 참조 픽쳐 리스트를 생성할 때 시점간 예측을 위한 참조 픽쳐들을 추가하는데 상기 앵커 픽쳐 식별 정보가 이용될 수도 있다. 그리고, 추가된 상기 시점간 예측을 위한 참조 픽쳐들을 관리하기 위해서도 이용될 수 있다. 예를 들어, 상기 참조 픽쳐들을 앵커 픽쳐와 넌-앵커 픽쳐를 나누고, 시점간 예측을 수행할 때 이용되지 않는 참조 픽쳐들은 사용하지 않겠다는 마킹을 할 수 있다. 또한, 상기 앵커 픽쳐 식별 정보는 가상 참조 디코더(hypothetical reference decoder)에서도 적용될 수 있다.In addition, according to the overall coding structure of the multiview video image, since the dependency between the viewpoints of the anchor picture and the non-anchor picture is different, it is necessary to distinguish the anchor picture from the non-anchor picture according to the anchor picture identification information. Therefore, the anchor picture identification information may be used to add reference pictures for inter-view prediction when generating a reference picture list. It may also be used to manage the added reference pictures for the inter-view prediction. For example, the reference pictures may be divided into an anchor picture and a non-anchor picture, and the reference pictures that are not used when performing inter-view prediction may be marked not to be used. The anchor picture identification information may also be applied to a hypothetical reference decoder.

또한, 시간적 레벨 정보란, 비디오 신호로부터 시간적 확장성을 제공하기 위한 계층적인 구조에 대한 정보를 말한다. 이러한 시간적 레벨 정보를 통해 사용자에게 다양한 시간대의 영상을 제공할 수 있게 된다. 또한, 시점 레벨 정보란, 비디오 신호로부터 시점 확장성을 제공하기 위한 계층적인 구조에 대한 정보를 말한다. 다시점 비디오 영상에서는 사용자에게 다양한 시간 및 시점의 영상을 제공하도록 하기 위해 시간 및 시점에 대한 레벨을 정의해 줄 필요가 있다. 또한 시점 식별 정보란, 현재 시점에 있는 픽쳐와 다른 시점에 있는 픽쳐를 구별하기 위한 정보를 말한다. 비디오 영상 신호가 코딩될 때, 각각의 픽쳐를 식별하기 위하여 POC(Picture Order Count)와 frame_num 이 이용된다. 다시점 비디오 영상인 경우에는 시점 간의 예측이 수행되기 때문에 현재 시점에 있는 픽쳐와 다른 시점에 있는 픽쳐를 구별하기 위한 식별 정보가 필요하다. Also, temporal level information refers to information on a hierarchical structure for providing temporal scalability from a video signal. Through such temporal level information, it is possible to provide a user with images of various time zones. In addition, the view level information refers to information about a hierarchical structure for providing view expandability from a video signal. In a multi-view video image, it is necessary to define the levels of time and view in order to provide a user with images of various times and views. The viewpoint identification information refers to information for distinguishing a picture at a current view from a picture at a different view. When a video image signal is coded, a picture order count (POC) and frame_num are used to identify each picture. In the case of a multiview video image, since prediction is performed between views, identification information for distinguishing a picture at a current view from a picture at a different view is required.

파싱된 비트스트림은 엔트로피 디코딩부(200)를 통하여 엔트로피 디코딩되고, 각 매크로브록의 계수, 움직임 벡터 등이 추출된다. 역양자화/역변환부(300)에서는 수신된 양자화된 값에 일정한 상수를 곱하여 변환된 계수값을 획득하고, 상기 계수값을 역변환하여 화소값을 복원하게 된다. 상기 복원된 화소값을 이용하여 인트라 예측부(400)에서는 현재 픽쳐 내의 디코딩된 샘플로부터 화면내 예측을 수행하게 된다. 한편, 디블록킹 필터부(500)에서는 블록 왜곡 현상을 감소시키기 위해 각각의 코딩된 매크로블록에 적용된다. 필터는 블록의 가장자리를 부드럽게 하여 디코딩된 프레임의 화질을 향상시킨다. 필터링 과정의 선택은 경계 세기(boundary strenth)와 경계 주위의 이미지 샘플의 변화(gradient)에 의해 좌우된다. 필터링을 거친 픽쳐들은 출력되거나 참조 픽쳐로 이용하기 위해 복호 픽쳐 버퍼부(600)에 저장된다. The parsed bitstream is entropy decoded by the entropy decoding unit 200, and coefficients, motion vectors, and the like of each macroblock are extracted. The inverse quantization / inverse transform unit 300 multiplies the received quantized value by a constant constant to obtain a transformed coefficient value, and inversely transforms the coefficient value to restore the pixel value. The intra prediction unit 400 performs intra prediction from the decoded samples in the current picture by using the reconstructed pixel value. Meanwhile, the deblocking filter unit 500 is applied to each coded macroblock in order to reduce block distortion. The filter smoothes the edges of the block to improve the quality of the decoded frame. The choice of filtering process depends on the boundary strength and the gradient of the image samples around the boundary. The filtered pictures are output or stored in the decoded picture buffer unit 600 for use as a reference picture.

복호 픽쳐 버퍼부(Decoded Picture Buffer unit)(600)에서는 화면간 예측을 수행하기 위해서 이전에 코딩된 픽쳐들을 저장하거나 개방하는 역할 등을 수행한 다. 이 때 복호 픽쳐 버퍼부(600)에 저장하거나 개방하기 위해서 각 픽쳐의 frame_num 과 POC(Picture Order Count)를 이용하게 된다. 따라서, MVC에 있어서 상기 이전에 코딩된 픽쳐들 중에는 현재 픽쳐와 다른 시점에 있는 픽쳐들도 있으므로, 이러한 픽쳐들을 참조 픽쳐로서 활용하기 위해서는 상기 frame_num 과 POC 뿐만 아니라 픽쳐의 시점을 식별하는 시점 정보도 함께 이용할 수 있다. 상기 복호 픽쳐 버퍼부(600)는 참조 픽쳐 리스트 생성부(610)와 참조 픽쳐 관리부(620)를 포함한다. The decoded picture buffer unit 600 stores or opens previously coded pictures in order to perform inter prediction. In this case, in order to store or open the decoded picture buffer unit 600, frame_num and POC (Picture Order Count) of each picture are used. Therefore, some of the previously coded pictures in MVC have pictures that are different from the current picture. Therefore, in order to utilize these pictures as reference pictures, not only the frame_num and the POC but also the view information for identifying the view point of the picture are included. It is available. The decoded picture buffer unit 600 includes a reference picture list generator 610 and a reference picture manager 620.

참조 픽쳐 리스트 생성부(610)는 화면간 예측을 위한 참조 픽쳐들의 리스트를 생성하게 된다. 이 때, 앵커 픽쳐와 넌-앵커 픽쳐를 구별하여 상기 참조 픽쳐 리스트를 생성할 수 있다. 다시점 비디오 코딩에 있어서는 시점간 예측이 이루어질 수 있으므로 현재 픽쳐가 다른 시점에 있는 픽쳐를 참조하게 되는 경우, 시점간 예측을 위한 참조 픽쳐 리스트를 생성할 필요가 있을 수 있다. 이 때, 참조 픽쳐 리스트 생성부(610)는 시점간 예측을 위한 참조 픽쳐 리스트를 생성하기 위하여 시점에 대한 정보를 이용할 수 있다. The reference picture list generator 610 generates a list of reference pictures for inter prediction. In this case, the reference picture list may be generated by distinguishing the anchor picture from the non-anchor picture. In multi-view video coding, since inter-view prediction may be performed, it may be necessary to generate a reference picture list for inter-view prediction when the current picture refers to a picture at a different view. In this case, the reference picture list generator 610 may use information about a viewpoint to generate a reference picture list for inter-view prediction.

참조 픽쳐 관리부(620)는 보다 유연하게 화면간 예측을 실현하기 위하여 참조 픽쳐를 관리한다. 이때, 앵커 픽쳐와 넌-앵커 픽쳐를 구별하여 상기 참조 픽쳐를 관리할 수 있다. 예를 들어, 적응 메모리 관리 방법(Memory Management Control Operation Method)과 이동 윈도우 방법(Sliding Window Method)이 이용될 수 있다. 이는 참조 픽쳐와 비참조 픽쳐의 메모리를 하나의 메모리로 통일하여 관리하고 적은 메모리로 효율적으로 관리하기 위함이다. 다시점 비디오 코딩에 있어서, 시점 방향의 픽쳐들은 픽쳐 출력 순서(Picture Order Count)가 동일하기 때문에 이들의 마킹을 위해서는 각 픽쳐의 시점을 식별해주는 정보가 이용될 수 있다. 이러한 과정을 통해 관리되는 참조 픽쳐들은 인터 예측부(700)에서 이용될 수 있다.The reference picture manager 620 manages the reference picture in order to more flexibly implement inter prediction. In this case, the reference picture may be managed by distinguishing an anchor picture from a non-anchor picture. For example, an adaptive memory management control method and a sliding window method may be used. This is to manage the memory of the reference picture and the non-reference picture into one memory and manage them efficiently with less memory. In multi-view video coding, since pictures in a view direction have the same picture order count, information for identifying the view point of each picture may be used for their marking. Reference pictures managed through this process may be used in the inter predictor 700.

인터 예측부(700)에서는 복호 픽쳐 버퍼부(600)에 저장된 참조 픽쳐를 이용하여 화면간 예측을 수행한다. 인터 코딩된 매크로블록은 매크로블록 파티션으로 나누어질 수 있으며, 각 매크로블록 파티션은 하나 또는 두개의 참조 픽쳐로부터 예측될 수 있다. The inter prediction unit 700 performs inter prediction using a reference picture stored in the decoded picture buffer unit 600. Inter-coded macroblocks can be divided into macroblock partitions, where each macroblock partition can be predicted from one or two reference pictures.

H.264/AVC 에서의 비트열의 구성을 살펴보면, 동영상 부호화 처리 그 자체를 다루는 VCL(Video Coding Layer, 비디오 부호화 계층)과 부호화된 정보를 전송하고 저장하는 하위 시스템과의 사이에 있는 NAL(Network Abstraction Layer, 네트워크 추상 계층)이라는 분리된 계층 구조로 정의되어 있다. 부호화 과정의 출력은 VCL 데이터이고 전송하거나 저장하기 전에 NAL 단위로 맵핑된다. 각 NAL 단위는 압축된 비디오 데이터 또는 헤더 정보에 해당하는 데이터인 RBSP(Raw Byte Sequence Payload, 동영상 압축의 결과데이터)를 포함한다.Looking at the structure of the bit stream in H.264 / AVC, the network abstraction between the video coding layer (VCL) that deals with the video encoding process itself and the subsystem that transmits and stores the encoded information Layer, which is defined as a separate hierarchical structure. The output of the encoding process is VCL data and is mapped in units of NAL before transmission or storage. Each NAL unit includes raw video sequence payload (RBSP), which is data corresponding to compressed video data or header information.

NAL 단위는 기본적으로 NAL헤더와 RBSP의 두 부분으로 구성된다. NAL 헤더에는 그 NAL 단위의 참조픽처가 되는 슬라이스가 포함되어 있는지 여부를 나타내는 플래그 정보(nal_ref_idc)와 NAL 단위의 종류를 나타내는 식별자(nal_unit_type)가 포함되어 있다. RBSP 에는 압축된 원본의 데이터를 저장하며, RBSP 의 길이를 8비 트의 배수로 표현하기 위해 RBSP 의 마지막에 RBSP 채워넣기 비트(RBSP trailing bit)를 첨가한다. 이러한 NAL 단위의 종류에는 IDR (Instantaneous Decoding Refresh, 순간 복호 리프레쉬) 픽쳐, SPS (Sequence Parameter Set, 시퀀스 파라미터 세트), PPS (Picture Parameter Set, 픽쳐 파라미터 세트), SEI (Supplemental Enhancement Information, 보충적 부가정보) 등이 있다.The NAL unit basically consists of two parts: the NAL header and the RBSP. The NAL header includes flag information (nal_ref_idc) indicating whether a slice serving as a reference picture of the NAL unit is included and an identifier (nal_unit_type) indicating the type of the NAL unit. The RBSP stores the compressed original data and adds an RBSP trailing bit at the end of the RBSP to represent the length of the RBSP in multiples of 8 bits. These NAL unit types include Instantaneous Decoding Refresh (IDR) pictures, Sequence Parameter Set (SPS), Picture Parameter Set (PPS), and Supplemental Enhancement Information (SEI). Etc.

이와 같은 NAL 단위의 구조에 있어서, 상기 앵커 픽쳐 식별 정보는 헤더 영역으로부터 획득될 수 있다. 상기 헤더 영역은, 예를 들어, NAL 헤더 또는 NAL 헤더의 확장 영역, 또는 슬라이스 헤더 등을 포함할 수 있다. 또한, 예를 들어, 상기 앵커 픽쳐 식별 정보가 NAL 헤더의 확장 영역으로부터 획득되는 경우는 NAL 타입이 다시점 비디오 영상을 위한 픽쳐인 경우에 적용될 수 있다. nal_unit_type = 20 or 21 인 경우를 들 수 있다.In such a NAL unit structure, the anchor picture identification information may be obtained from a header area. The header area may include, for example, an NAL header, an extended area of the NAL header, or a slice header. In addition, for example, the case where the anchor picture identification information is obtained from the extension region of the NAL header may be applied when the NAL type is a picture for a multiview video image. For example, nal_unit_type = 20 or 21.

도 3은 본 발명이 적용되는 실시예로서, 앵커 픽쳐의 개념을 설명하기 위한 다시점 영상 신호의 전체적인 예측 구조를 나타낸다.3 is an embodiment to which the present invention is applied and shows an overall prediction structure of a multiview image signal for explaining the concept of an anchor picture.

도 3에 나타난 바와 같이 가로축의 T0 ~ T100 은 각각 시간에 따른 프레임을 나타낸 것이고, 세로축의 S0 ~ S100은 각각 시점에 따른 프레임을 나타낸 것이다. 예를 들어, T0에 있는 픽쳐들은 같은 시간대(T0)에 서로 다른 카메라에서 찍은 영상들을 의미하며, S0 에 있는 픽쳐들은 한 대의 카메라에서 찍은 다른 시간대의 영상들을 의미한다. 또한, 도면 상의 화살표들은 각 픽쳐들의 예측 방향과 순서를 나타낸 것으로서, 예를 들어, T0 시간대의 S2 시점에 있는 P0 픽쳐는 I0로부터 예측된 픽쳐이며, 이는 TO 시간대의 S4 시점에 있는 P0 픽쳐의 참조 픽쳐가 된다. 또 한, S2 시점의 T4, T2 시간대에 있는 B1, B2 픽쳐의 참조 픽쳐가 된다.As shown in FIG. 3, T0 to T100 on the horizontal axis represent frames according to time, and S0 to S100 on the vertical axis represent frames according to viewpoints, respectively. For example, pictures in T0 refer to images taken by different cameras in the same time zone (T0), and pictures in S0 refer to images in different time zones taken by one camera. In addition, the arrows in the drawings indicate the prediction direction and the order of each picture. For example, a P0 picture at S2 time point in the T0 time zone is a picture predicted from I0, which refers to a P0 picture at S4 time point in the TO time zone. It becomes a picture. It is also a reference picture of the B1 and B2 pictures in the T4 and T2 time zones at the time of S2.

다시점 영상의 복호화 과정에 있어서, 시점 간의 랜덤 액세스는 필수적이다. 따라서, 복호화 노력을 최소화하면서 임의 시점에 대한 액세스가 가능하도록 하여야 한다. 여기서 효율적인 랜덤 액세스를 실현하기 위하여 앵커 픽쳐의 개념이 필요할 수 있다. 앵커 픽쳐라 함은, 모든 슬라이스들이 동일 시간대의 프레임에 있는 슬라이스만을 참조하는 부호화된 픽쳐를 의미한다. 예를 들어, 다른 시점에 있는 슬라이스만을 참조하고 현재 시점에 있는 슬라이스는 참조하지 않는 부호화된 픽쳐를 말한다. 도 3에서 보면, T0 시간대의 S0 시점에 있는 I0픽쳐가 앵커 픽쳐라면, 같은 시간대에 있는, 즉 T0 시간대의 다른 시점에 있는 모든 픽쳐들 또한 앵커 픽쳐가 된다. 또 다른 예로서, T8 시간대의 S0 시점에 있는 I0픽쳐가 앵커 픽쳐라면, 같은 시간대에 있는, 즉 T8 시간대의 다른 시점에 있는 모든 픽쳐들 또한 앵커 픽쳐가 된다. 마찬가지로, T16, …, T96, T100 에 있는 모든 픽쳐들이 앵커 픽쳐의 예가 된다. 앵커 픽쳐가 디코딩된 후, 차례로 코딩된 모든 픽쳐들은 앵커 픽쳐에 선행하여 디코딩된 픽쳐로부터 인터-프리딕션(inter-prediction)없이 디코딩된다.In the decoding process of a multiview image, random access between viewpoints is essential. Therefore, access to arbitrary time points should be made possible while minimizing decryption effort. Herein, the concept of an anchor picture may be necessary to realize efficient random access. An anchor picture refers to an encoded picture in which all slices refer only to slices in frames of the same time zone. For example, an encoded picture refers to only a slice at another viewpoint and no slice at the current viewpoint. 3, if the I0 picture at the time S0 of the time zone T0 is an anchor picture, all pictures in the same time zone, that is, at another time point in the time zone T0, are also anchor pictures. As another example, if an I0 picture at time S0 in the T8 time zone is an anchor picture, all pictures in the same time zone, that is, at other times in the T8 time zone, are also anchor pictures. Similarly, T16,... All pictures in, T96, T100 are examples of anchor pictures. After the anchor picture is decoded, all pictures coded in turn are decoded without inter-prediction from the decoded picture prior to the anchor picture.

따라서, 상기 도 3의 다시점 비디오 영상의 전반적인 코딩 구조에 따를 때, 앵커 픽쳐와 넌-앵커 픽쳐의 시점간 의존성이 다르기 때문에 상기 앵커 픽쳐 식별 정보에 따라 앵커 픽쳐와 넌-앵커 픽쳐를 구별할 필요가 있다.Therefore, according to the overall coding structure of the multi-view video image of FIG. 3, the anchor picture and the non-anchor picture need to be distinguished according to the anchor picture identification information because the dependency between the viewpoints of the anchor picture and the non-anchor picture is different. There is.

우리는 H.264/AVC 디코더와 호환성을 가지기 위한 적어도 하나의 시점 영 상(view sequence)이 필요하다. 따라서, 빠른 랜덤 액세스를 위해 독립적으로 복호화가 가능한 시점들을 정의할 필요가 있는데, 이를 기준 시점(base views)이라 한다. 이러한 기준시점(base views)은 다시점(multi view) 중 부호화의 기준이 되며, 이는 참조 시점(reference view)에 해당된다. MVC(Multiview Video Coding)에서 기준 시점에 해당되는 영상은 종래 일반적인 영상 부호화 방식(MPEG-2, MPEG-4, H.263, H.264 등)에 의해 부호화되어 독립적인 비트스트림으로 형성하게 된다.We need at least one view sequence to be compatible with the H.264 / AVC decoder. Therefore, it is necessary to define viewpoints that can be independently decoded for fast random access, which is called a base view. This base view serves as a reference for encoding among multi views, which corresponds to a reference view. In MVC (Multiview Video Coding), an image corresponding to a reference time point is encoded by a conventional general video encoding method (MPEG-2, MPEG-4, H.263, H.264, etc.) to form an independent bitstream.

예를 들어, NAL 단위 1의 NAL 헤더로부터 nal_unit_type = 1 or 5 인 정보가 획득될 때 상기 NAL 의 RBSP 정보는 H.264/AVC와 호환되는 NAL 이라고 볼 수 있다. 따라서, 앵커 픽쳐 식별 정보가 들어오더라도 이를 코딩할 수 없게 되므로, 후행하는 NAL 로부터 선행하는 NAL 의 앵커 픽쳐 식별 정보값을 알 수 있다. 이 때 후행하는 NAL (NAL 단위 2) 을 suffix NAL 이라고 하며, suffix NAL 은 선행하는 NAL의 설명 정보만을 포함할 수 있다. 따라서, 상기 NAL 단위 2의 nal_unit_type = 20 or 21 이 되고, 이 때 NAL 헤더의 확장 영역으로부터 앵커 픽쳐 식별 정보를 획득할 수 있다. 여기서, "svc_mvc_flag = 1"은 다시점 비디오 코딩된 비트스트림을 의미하며, "view_level = 0"는 상기 NAL 이 기준 시점에 해당됨을 의미한다. For example, when information having nal_unit_type = 1 or 5 is obtained from a NAL header of NAL unit 1, the RBSP information of the NAL may be regarded as NAL compatible with H.264 / AVC. Therefore, even when the anchor picture identification information comes in, it cannot be coded, so that the anchor picture identification information value of the preceding NAL can be known from the following NAL. In this case, the following NAL (NAL unit 2) is called a suffix NAL, and the suffix NAL may include only description information of the preceding NAL. Accordingly, nal_unit_type = 20 or 21 of the NAL unit 2 is obtained, and anchor picture identification information can be obtained from the extended region of the NAL header. Here, "svc_mvc_flag = 1" means a multiview video coded bitstream, and "view_level = 0" means that the NAL corresponds to a reference time point.

본 실시예에서의 디코딩 장치는 비트스트림 판단부(510), 앵커 픽쳐 식별 정보 획득부(520) 및 다시점 영상 디코딩부(530)를 포함한다. 비트스트림 판단부(510)에서는 비트스트림이 입력되면 상기 비트스트림이 스케일러블 비디오 코딩 된 비트스트림인지, 아니면 다시점 비디오 코딩된 비트스트림인지 여부를 판단하게 된다. 이는 비트스트림으로 날라오는 플래그 정보에 의해 판단될 수 있다.The decoding apparatus in this embodiment includes a bitstream determination unit 510, an anchor picture identification information acquisition unit 520, and a multiview image decoding unit 530. When the bitstream is input, the bitstream determination unit 510 determines whether the bitstream is a scalable video coded bitstream or a multiview video coded bitstream. This may be determined by flag information flying into the bitstream.

앵커 픽쳐 식별 정보 획득부(520)에서는 상기 판단 결과 다시점 비디오 코딩된 비트스트림인 경우 앵커 픽쳐 식별 정보를 획득할 수 있다. 상기 획득된 앵커 픽쳐 식별 정보가 참일 경우에는 현재 NAL에 있는 코딩된 슬라이스가 앵커 픽쳐임을 의미하고, 거짓일 경우에는 넌-앵커 픽쳐임을 의미할 수 있다. 이러한 앵커 픽쳐 식별 정보는 NAL 헤더의 확장 영역으로부터 획득될 수 있으며, 또는 슬라이스 레이어 영역으로부터 획득될 수도 있다.The anchor picture identification information acquisition unit 520 may acquire anchor picture identification information when the multiview video coded bitstream is a result of the determination. If the obtained anchor picture identification information is true, this means that the coded slice in the current NAL is an anchor picture, and if it is false, it may mean a non-anchor picture. Such anchor picture identification information may be obtained from an extended region of the NAL header, or may be obtained from a slice layer region.

다시점 영상 디코딩부(530)에서는 상기 앵커 픽쳐 식별 정보에 따라 다시점 영상을 디코딩하게 된다. 다시점 비디오 영상의 전반적인 코딩 구조에 따를 때, 앵커 픽쳐와 넌-앵커 픽쳐의 시점간 의존성이 다르기 때문에, 예를 들어, 참조 픽쳐 리스트를 생성할 때 시점간 예측을 위한 참조 픽쳐들을 추가하는데 상기 앵커 픽쳐 식별 정보가 이용될 수 있다. 그리고, 추가된 상기 시점간 예측을 위한 참조 픽쳐들을 관리하기 위해서도 이용될 수 있다. 또한, 상기 앵커 픽쳐 식별 정보는 가상 참조 디코더(hypothetical reference decoder)에서도 적용될 수 있다.The multiview image decoding unit 530 decodes a multiview image according to the anchor picture identification information. Depending on the overall coding structure of a multiview video image, since the anchor picture and the non-anchor picture have different inter-view dependencies, for example, when generating a reference picture list, the reference pictures for inter-view prediction are added to the anchor picture. Picture identification information may be used. It may also be used to manage the added reference pictures for the inter-view prediction. The anchor picture identification information may also be applied to a hypothetical reference decoder.

앵커 픽쳐라 함은, 모든 슬라이스들이 동일 시간대의 프레임에 있는 슬라이스만을 참조하는 부호화된 픽쳐를 의미한다. 예를 들어, 다른 시점에 있는 슬라이스만을 참조하고 현재 시점에 있는 슬라이스는 참조하지 않는 부호화된 픽쳐를 말 한다. MVC의 전반적인 예측 구조에 있어서, GOP는 I 픽쳐로부터 시작될 수 있으며, 상기 I 픽쳐는 H.264/AVC와 호환 가능하다. 따라서, H.264/AVC와 호환 가능한 모든 앵커 픽쳐들은 항상 I 픽쳐가 될 수 있다. 그러나, 상기 I 픽쳐들을 P 픽쳐로 대체하게 될 경우, 우리는 보다 효율적인 코딩이 가능해질 수 있다. 즉, GOP가 H.264/AVC와 호환 가능한 P 픽쳐로 시작하도록 하는 예측 구조를 이용할 경우 보다 효율적인 코딩이 가능해질 것이다.An anchor picture refers to an encoded picture in which all slices refer only to slices in frames of the same time zone. For example, this refers to an encoded picture that refers only to slices at different views and does not refer to slices at the current view. In the overall prediction structure of MVC, a GOP can be started from an I picture, and the I picture is compatible with H.264 / AVC. Thus, all anchor pictures compatible with H.264 / AVC can always be I pictures. However, if we replace the I pictures with P pictures, we can enable more efficient coding. In other words, more efficient coding will be enabled if the GOP starts with a P picture that is compatible with H.264 / AVC.

이 때, 앵커 픽쳐를 다시 정의하면, 모든 슬라이스들이 동일 시간대의 프레임에 있는 슬라이스뿐만 아니라 동일 시점의 다른 시간대에 있는 슬라이스도 참조할 수 있는 부호화된 픽쳐가 된다. 다만, 동일 시점의 다른 시간대에 있는 슬라이스를 참조하는 경우는 오로지 H.264/AVC와 호환 가능한 앵커 픽쳐에 한할 수 있다. 예를 들어, 도 6에서 S0 시점의 T8 시간에 있는 P 픽쳐는 새롭게 정의된 앵커 픽쳐가 될 수 있으며, 마찬가지로 S0 시점의 T96 시간에 있는 P 픽쳐, 그리고 S0 시점의 T100 시간에 있는 P 픽쳐가 상기 새롭게 정의된 앵커 픽쳐가 될 수 있다. 또는 상기 앵커 픽쳐는 기준 시점인 경우에 한하여 정의될 수도 있다.At this time, if the anchor picture is redefined, all the slices become encoded pictures that can refer not only to slices in frames of the same time zone but also slices in different time zones of the same viewpoint. However, when referring to slices in different time zones at the same time, it may be limited to anchor pictures compatible with H.264 / AVC. For example, in FIG. 6, the P picture at time T8 at the time S0 may be a newly defined anchor picture, similarly the P picture at time T96 at the time S0 and the P picture at time T100 at the time S0. It can be a newly defined anchor picture. Alternatively, the anchor picture may be defined only at the reference time point.

상기에서 살펴본 바와 같이, 다시점 비디오 영상의 전반적인 코딩 구조에 따를 때, 앵커 픽쳐와 넌-앵커 픽쳐의 시점간 의존성이 다르기 때문에 상기 앵커 픽쳐 식별 정보에 따라 앵커 픽쳐와 넌-앵커 픽쳐를 구별하여 코딩하게 될 경우 보다 효율적인 코딩이 가능할 수 있다. 또한, 앵커 픽쳐를 기존의 I 픽쳐에서 P 픽쳐까지 가능하도록 새롭게 정의함으로써 보다 효율적인 코딩이 가능할 수 있다. As described above, according to the overall coding structure of the multi-view video image, since the dependency between the anchor picture and the non-anchor picture is different from each other, the anchor picture and the non-anchor picture are distinguished according to the anchor picture identification information. In this case, more efficient coding may be possible. In addition, more efficient coding may be possible by newly defining an anchor picture to be possible from an existing I picture to a P picture.

Claims

Obtaining flag information indicating whether a multiview image coded bitstream is from a video signal;

Acquiring identification information indicating whether a coded picture of a current NAL is an anchor picture when the video signal is a multiview image coded bitstream according to the flag information; And

Decoding information on a multiview image according to the identification information;

Video signal decoding method comprising a.

The method of claim 1,

And the information on the multi-view image includes information on view dependency between views.

The method of claim 1,

And the information about the multi-view image includes information for generating a reference picture list.

The method of claim 1,

And the information on the multi-view image includes information for reference picture marking process.

The method of claim 1,

The identification information is obtained from a header region of the video signal.

The method of claim 5,

And the header area is an extension area of a NAL header.

The method of claim 5,

And the header area is an extended area of a slice layer.

The method of claim 1, wherein the video signal decoding method comprises:

Obtaining view level information (view_level) indicating view scalability from the video signal,

And if the current NAL corresponds to a reference time according to the view level information, the anchor picture identification information obtained from the current NAL header indicates the anchor picture identification information of the preceding NAL.

The method of claim 1,

The anchor picture is a video signal decoding method of claim 1, wherein the reference picture is a picture that can be temporally predicted from pictures in different time zones of the same view.

The method of claim 9,

And when a current picture corresponds to the anchor picture, all pictures at different time points in the same time zone as the current picture also correspond to the anchor picture.

In the video signal encoding method based on multiple views,

Generating viewpoint dependency information indicating an predictive structure between viewpoints;

Generating identification information indicating whether a coded picture of a current NAL is an anchor picture according to the view dependency information.

A bitstream determination unit for obtaining flag information indicating whether the bitstream is a multiview image coded bitstream from the video signal;

An anchor picture identification information obtaining unit for obtaining identification information (anchor_pic_flag) indicating whether the current NAL coded picture is an anchor picture when the video signal is a multiview video coded bitstream according to the flag information; And

A multiview image decoding unit to decode information on a multiview image according to the identification information;

Video signal decoding apparatus comprising a.