KR100813064B1

KR100813064B1 - Method and Apparatus, Data format for decoding and coding of video sequence

Info

Publication number: KR100813064B1
Application number: KR1020060045261A
Authority: KR
Inventors: 박승욱; 전병문; 전용준
Original assignee: 엘지전자 주식회사
Priority date: 2006-05-19
Filing date: 2006-05-19
Publication date: 2008-03-14
Also published as: KR20070111880A

Abstract

본 발명은 비디오 신호를 효율적으로 복호화/부호화하기 위한 방법 및 장치, 그리고 그 데이터 포맷에 관한 것이다.The present invention relates to a method and apparatus for efficiently decoding / coding a video signal, and to a data format thereof.

비디오 신호의 복호화 방법에 있어서, 비트스트림으로부터 기준 시점 식별정보(view_dependency_flag)를 추출하는 단계와 상기 추출된 기준 시점 식별정보에 따라 앵커 픽쳐 식별정보(anchor_pic_flag)를 추출하는 단계와 상기 추출된 앵커 픽쳐 식별정보에 기초하여 해당 비트스트림을 복호화하는 단계를 포함하는 것을 특징으로 하는 비디오 신호 복호화 방법 및 장치를 제공한다. 또한, 그 데이터 포맷과 역과정을 통한 부호화 방법도 제공한다. 본 발명을 통하여 다시점 영상의 랜덤 액세스를 하는 경우, 기존의 H.264/AVC와 호환성을 가지면서 시간 지연 문제를 해결할 수 있게 되어 효율적인 복호화/부호화가 가능하게 된다.A method of decoding a video signal, the method comprising: extracting reference view identification information (view_dependency_flag) from a bitstream, extracting anchor picture identification information (anchor_pic_flag) according to the extracted reference view identification information, and extracting the extracted anchor picture identification It provides a video signal decoding method and apparatus comprising the step of decoding the corresponding bitstream based on the information. It also provides an encoding method through the data format and the inverse process. In the case of random access of a multi-view image through the present invention, it is possible to solve the time delay problem while being compatible with existing H.264 / AVC, thereby enabling efficient decoding / encoding.

다시점, 기준 시점(base views), 앵커 픽쳐(anchor picture) Multiview, base views, anchor picture

Description

Method and apparatus for video image decoding / coding, data format {Method and Apparatus, Data format for decoding and coding of video sequence}

도 1에서는 H.264/AVC 의 NAL(Network Abstraction Layer, 네트워크 추상 계층)단위의 구성을 나타낸다.1 shows a configuration of a NAL (Network Abstraction Layer) unit of H.264 / AVC.

도 2는 본 발명이 적용되는 다시점 영상(multiview sequence) 부호화 및 복호화 시스템을 나타낸 것이다.2 illustrates a multiview sequence encoding and decoding system to which the present invention is applied.

도 3은 본 발명이 적용되는 다시점 영상 신호의 전체적인 부호화 과정을 설명하기 위한 픽쳐들의 예측 구조를 나타낸 것이다.3 illustrates a prediction structure of pictures for explaining an entire encoding process of a multiview video signal to which the present invention is applied.

도 4는 본 발명이 적용된 예로서, 신택스 상의 nal unit header 안에 기준 시점 식별 정보를 추가한 것을 나타낸 것이다.4 illustrates an example of applying the present invention to adding reference time identification information in a nal unit header on a syntax.

도 5a는 본 발명이 적용된 데이터 포맷으로서, 현재 픽쳐가 기준 시점에 해당되는 경우의 NAL 구조를 나타낸다.5A is a data format to which the present invention is applied and shows a NAL structure when a current picture corresponds to a reference time point.

도 5b는 본 발명이 적용된 데이터 포맷으로서, 현재 픽쳐가 기준 시점에 해당되지 않는 경우의 NAL 구조를 나타낸다.5B is a data format to which the present invention is applied and shows a NAL structure when a current picture does not correspond to a reference time point.

도 6은 본 발명이 적용된 비디오 신호 복호화 방법을 설명하기 위한 흐름도를 나타낸다.6 is a flowchart illustrating a video signal decoding method to which the present invention is applied.

도 7은 본 발명이 적용된 예로서, 신택스 상의 slice layer 안에 앵커 픽쳐 식별 정보를 추가한 것을 나타낸 것이다.FIG. 7 illustrates an example in which anchor picture identification information is added to a slice layer on a syntax.

도 8은 본 발명이 적용된 비디오 신호 복호화 장치의 일부를 나타낸 것이다.8 shows a part of a video signal decoding apparatus to which the present invention is applied.

< 도면 내의 주요부분에 대한 설명 ><Description of the main parts in the drawing>

10 : 다시점 영상 발생부 20 : 전처리부(pre-processing)10: multi-view image generator 20: pre-processing

30 : 인코더(encoder) 40 : 디코더(decoder)30: encoder 40: decoder

50 : 후처리부(post-processing) 60 : 디스플레이부50: post-processing 60: display unit

61 : 2차원 디스플레이 63 : 스테레오 타입 디스플레이61: two-dimensional display 63: stereo type display

65 : M개 시점을 입체 영상으로 제공하는 디스플레이65: Display that provides M viewpoints in 3D

501: 제 1 NAL header 502: 제 1 slice layer501: First NAL header 502: First slice layer

503: 제 2 NAL header 504: 제 2 slice layer503: second NAL header 504: second slice layer

510: NAL header 520: slice layer510: NAL header 520: slice layer

810: 제 1 식별정보 추출부 820: 제 2 식별정보 추출부810: First identification information extraction unit 820: Second identification information extraction unit

830: 복호화부830: decryption unit

본 발명은 비디오 영상의 복호화/부호화 방법 및 장치와 그 데이터 포맷에 관한 기술이다.The present invention relates to a method and apparatus for decoding / encoding a video image and a data format thereof.

현재 주류를 이루고 있는 비디오 방송 영상물은 한 대의 카메라로 획득한 단일시점 영상이다. 비록 여러 대의 카메라로 찍은 영상이라 할지라도 편집되어 한 개의 영상으로 취급된다. 반면, 다시점 비디오(Multi-view video)란 한 대 이상의 카메라를 통해 촬영된 영상들을 기하학적으로 교정하고 공간적인 합성 등을 통하여 여러 방향의 다양한 시점을 사용자에게 제공하는 3차원(3D) 영상처리의 한 분야이다. 다시점 비디오는 사용자에게 시점의 자유를 증가시킬 수 있으며, 한대의 카메라를 이용하여 획득할 수 있는 영상 영역에 비해 큰 영역을 포함하는 특징을 지닌다. 이러한 다시점 비디오 영상은 카메라를 이동시킨다던가, 다수의 카메라를 여러 방향에 배치하거나, 반사경 등의 특수 장치를 이용하여 획득하게 된다.The mainstream video broadcasting image is a single view image acquired with one camera. Even if the video is taken by multiple cameras, it is edited and treated as a single video. Multi-view video, on the other hand, is a three-dimensional (3D) image processing method that geometrically corrects images taken by more than one camera and provides users with various viewpoints in various directions through spatial synthesis. It is a field. Multi-view video can increase the freedom of view for the user, and has a feature that includes a larger area than the image area that can be acquired using a single camera. Such multi-view video images may be acquired by moving cameras, arranging a plurality of cameras in various directions, or by using a special apparatus such as a reflector.

최근에는 이처럼 여러 대의 카메라로 찍은 다시점 영상 자체에 대한 부호화 및 전송, 복호화 그리고 디스플레이하는 시스템에 대한 연구가 활발히 진행되고 있다. Recently, researches on a system for encoding, transmitting, decoding, and displaying a multiview image itself taken by multiple cameras have been actively conducted.

MPEG(Moving Picture Experts Group)과 VCEG(Video Coding Experts Group)은 초기 MPEG-4와 H.263 표준안보다 우수하고 뛰어난 비디오 이미지 압축 성능을 약속하는 새로운 표준안을 개발했다. 새로운 표준안은 "AVC(Advanced Video Coding)"으로 이름이 붙여졌고, MPEG-4 Part 10과 ITU-T Recommendation H.264로 공동 발표되었다. Moving Picture Experts Group (MPEG) and Video Coding Experts Group (VCEG) have developed new standards that promise better video image compression performance than the earlier MPEG-4 and H.263 standards. The new standard was named "Advanced Video Coding" and co-published in MPEG-4 Part 10 and ITU-T Recommendation H.264.

이러한 H.264/AVC 에서의 비트열의 구성을 살펴보면, 동영상 부호화 처리 그 자체를 다루는 VCL(Video Coding Layer, 비디오 부호화 계층)과 부호화된 정보를 전송하고 저장하는 하위 시스템과의 사이에 있는 NAL(Network Abstraction Layer, 네트워크 추상 계층)이라는 분리된 계층 구조로 정의되어 있다. 부호화 과정의 출력은 VCL 데이터이고 전송하거나 저장하기 전에 NAL 단위로 맵핑된다. 각 NAL 단위 는 압축된 비디오 데이터 또는 헤더 정보에 해당하는 데이터인 RBSP(Raw Byte Sequence Payload, 동영상 압축의 결과데이터)를 포함한다.Looking at the configuration of the bit stream in H.264 / AVC, the NAL (Network) between the VCL (Video Coding Layer) that handles the video encoding process itself and the subsystem that transmits and stores the encoded information. It is defined as a separate hierarchical structure called Abstraction Layer. The output of the encoding process is VCL data and is mapped in units of NAL before transmission or storage. Each NAL unit includes raw video sequence payload (RBSP), which is data corresponding to compressed video data or header information.

도 1에서는 H.264/AVC 의 NAL(Network Abstraction Layer, 네트워크 추상 계층)단위의 구성을 나타낸다. NAL 단위는 기본적으로 NAL헤더와 RBSP의 두 부분으로 구성된다. NAL 헤더에는 그 NAL 단위의 참조픽처가 되는 슬라이스가 포함되어 있는지 여부를 나타내는 플래그 정보(nal_ref_idc)와 NAL 단위의 종류를 나타내는 식별자(nal_unit_type)가 포함되어 있다. RBSP 에는 압축된 원본의 데이터를 저장하며, RBSP 의 길이를 8비트의 배수로 표현하기 위해 RBSP 의 마지막에 RBSP trailing bit(RBSP 채워넣기 비트)를 첨가한다.1 shows a configuration of a NAL (Network Abstraction Layer) unit of H.264 / AVC. The NAL unit basically consists of two parts: the NAL header and the RBSP. The NAL header includes flag information (nal_ref_idc) indicating whether a slice serving as a reference picture of the NAL unit is included and an identifier (nal_unit_type) indicating the type of the NAL unit. The RBSP stores the compressed original data and adds an RBSP trailing bit to the end of the RBSP to express the length of the RBSP in multiples of 8 bits.

이러한 NAL 단위의 종류에는 IDR (Instantaneous Decoding Refresh, 순간 복호 리프레쉬) 픽쳐, SPS (Sequence Parameter Set, 시퀀스 파라미터 세트), PPS (Picture Parameter Set, 픽쳐 파라미터 세트), SEI (Supplemental Enhancement Information, 보충적 부가정보) 등이 있다.These NAL unit types include Instantaneous Decoding Refresh (IDR) pictures, Sequence Parameter Set (SPS), Picture Parameter Set (PPS), and Supplemental Enhancement Information (SEI). Etc.

MVC의 전반적인 코딩 구조에 따라 랜덤 액세스(random access)를 하게 될 경우, 그 구조가 복잡하여 오랜 시간 지연이 문제가 된다. 따라서 이러한 문제점을 해결하기 위하여 랜덤 액세스를 위한 최소 프레임 수를 감소시킬 필요가 있는데, 이는 디코더에서 시점들 간의 상관관계를 알고 앵커 픽쳐(anchor picture)라는 새로운 픽쳐 타입을 정의함으로써 가능할 수 있다.When random access is performed according to the overall coding structure of MVC, a long time delay is a problem because the structure is complicated. Therefore, in order to solve this problem, it is necessary to reduce the minimum number of frames for random access, which may be possible by knowing the correlation between viewpoints in the decoder and defining a new picture type called an anchor picture.

본 발명의 목적은 다시점 영상 데이터에 대하여 효율적으로 복호화 및 부호 화를 수행하는 방법 및 장치, 그리고 그 데이터 포맷을 제공하는데 있다.An object of the present invention is to provide a method and apparatus for efficiently decoding and encoding multiview image data, and a data format thereof.

본 발명의 목적은 앵커 픽쳐 식별 정보 또는 기준시점 식별 정보를 규격화된 방식으로 추가함으로써 효율적으로 부호화 및 복호화를 수행하는 방법 및 장치, 그리고 그 데이터 포맷을 제공하는데 있다.An object of the present invention is to provide a method and apparatus for efficiently encoding and decoding by adding anchor picture identification information or reference point identification information in a standardized manner, and a data format thereof.

본 발명의 목적은 앵커 픽쳐 식별 정보를 신택스 상에 추가함으로써 다시점 비디오 영상의 랜덤 액세스를 효율적으로 수행하고자 함에 있다.An object of the present invention is to efficiently perform random access of a multiview video image by adding anchor picture identification information on a syntax.

본 발명의 목적은 기준 시점 식별 정보를 신택스 상에 추가함으로써 기존의 H.264/AVC와 호환성을 유지하고자 함에 있다.An object of the present invention is to maintain compatibility with the existing H.264 / AVC by adding the reference time identification information on the syntax.

상기 목적을 달성하기 위하여, 본 발명은 비디오 신호의 복호화 방법에 있어서, 비트스트림으로부터 기준 시점 식별정보(view_dependency_flag)를 추출하는 단계와 상기 추출된 기준 시점 식별정보에 따라 앵커 픽쳐 식별정보(anchor_pic_flag)를 추출하는 단계와 상기 추출된 앵커 픽쳐 식별정보에 기초하여 해당 비트스트림을 복호화하는 단계를 포함하는 것을 특징으로 하는 비디오 신호 복호화 방법을 제공한다.In order to achieve the above object, the present invention provides a method of decoding a video signal, extracting reference view identification information (view_dependency_flag) from the bitstream and anchor picture identification information (anchor_pic_flag) according to the extracted reference view identification information; And extracting a corresponding bitstream based on the extracted anchor picture identification information.

또한, 본 발명은 비트스트림으로부터 기준 시점 식별정보를 추출하는 제 1 식별정보 추출부와 상기 추출된 기준 시점 식별정보에 따라 앵커 픽쳐 식별정보를 추출하는 제 2 식별정보 추출부와 상기 추출된 앵커 픽쳐 식별정보에 기초하여 해당 비트스트림을 복호화하는 복호화부를 포함하는 것을 특징으로 하는 비디오 신호 복호화 장치를 제공한다.The present invention also provides a first identification information extraction unit for extracting reference view identification information from a bitstream, a second identification information extraction unit for extracting anchor picture identification information according to the extracted reference view identification information, and the extracted anchor picture. It provides a video signal decoding apparatus comprising a decoding unit for decoding the corresponding bitstream based on the identification information.

또한, 본 발명은 비디오 신호의 데이터 포맷에 있어서, 현재 픽쳐의 속성 정보를 포함하는 제 1 NAL 헤더와 상기 현재 픽쳐의 데이터 정보를 포함하는 제 1 슬라이스 계층과 상기 제 1 슬라이스 계층에 연속하여 추가된, 기준 시점에 대한 식별 정보를 포함하는 제 2 NAL 헤더와 앵커 픽쳐에 대한 식별 정보를 포함하는 제 2 슬라이스 계층으로 이루어지는 것을 특징으로 하는 비디오 신호의 데이터 포맷을 제공한다.The present invention also provides a data format of a video signal, comprising: a first NAL header including attribute information of a current picture, a first slice layer including data information of the current picture, and a first slice layer sequentially added to the first slice layer; And a second slice layer including identification information on an anchor picture and a second NAL header including identification information on a reference time point.

또한, 본 발명은 비디오 신호의 데이터 포맷에 있어서, 현재 픽쳐의 속성 정보와 기준 시점 식별 정보를 포함하는 NAL 헤더, 상기 현재 픽쳐의 데이터 정보와 앵커 픽쳐 식별정보를 포함하는 슬라이스 계층으로 이루어지는 것을 특징으로 하는 비디오 신호의 데이터 포맷을 제공한다.The present invention also provides a data format of a video signal, comprising: a NAL header including attribute information of a current picture and reference view identification information, and a slice layer including data information and anchor picture identification information of the current picture. Provides a data format of a video signal.

또한, 본 발명은 기준 시점 식별정보를 nal unit header 안에 추가하는 것을 특징으로 하는 비디오 신호 부호화 방법을 제공한다.In addition, the present invention provides a video signal encoding method comprising adding reference view identification information to a nal unit header.

또한, 본 발명은 앵커 픽쳐 식별정보를 슬라이스 계층(slice layer) 안에 추가하는 것을 특징으로 하는 비디오 신호 부호화 방법을 제공한다.In addition, the present invention provides a video signal encoding method characterized by adding anchor picture identification information to a slice layer.

상술한 목적 및 구성의 특징은 첨부된 도면과 관련하여 다음의 상세한 설명을 통하여 보다 명확해질 것이다. 이하 첨부된 도면을 참조하여 본 발명에 따른 바람직한 실시예들을 상세히 설명한다.The above objects and features of the construction will become more apparent from the following detailed description taken in conjunction with the accompanying drawings. Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

아울러, 본 발명에서 사용되는 용어는 가능한 한 현재 널리 사용되는 일반적인 용어를 선택하였으나, 특정한 경우는 출원인이 임의로 선정한 용어도 있으며, 이 경우는 해당되는 발명의 설명 부분에서 상세히 그 의미를 기재하였으므로, 단순 한 용어의 명칭이 아닌 용어가 가지는 의미로서 본 발명을 파악하여야 함을 밝혀두고자 한다.In addition, the terminology used in the present invention was selected as a general term that is widely used at present, but in certain cases, the term is arbitrarily selected by the applicant, in which case the meaning is described in detail in the corresponding part of the present invention, It is to be understood that the present invention is to be understood as the meaning of a term rather than a name of a term.

도 2에 도시된 바와 같이, 본 발명을 적용한 다시점 영상 부호화 시스템은, 다시점 영상 발생부(10), 전처리부(20, preprocessing) 및 인코더(30, encoder)를 포함하여 구성된다. 또한, 복호화 시스템은 디코더(40, decoder), 후처리부(50, post processing) 및 디스플레이부(60, display)를 포함하여 구성된다.As shown in FIG. 2, the multi-view image encoding system to which the present invention is applied includes a multi-view image generation unit 10, a preprocessing unit 20, and an encoder 30. In addition, the decoding system includes a decoder 40, a post processing unit 50, and a display unit 60.

관련하여, 상기 다시점 영상 발생부(10)는 다시점 개수 만큼의 영상 획득장치(예를들어, 카메라 #1 ~ #N)를 구비하여, 각 시점별로 독립적인 영상을 획득하게 된다. 상기 전처리부(20)는 다시점 영상 데이터가 입력되면, 노이즈 제거, 임발란싱(imbalancing) 문제를 해결하면서 전처리 과정을 통해 다시점 영상 데이터들 간의 상관도를 높여주는 기능을 수행한다. 또한, 인코더(30)는 움직임(motion) 추정/보상 및 시점간의 변이(disparity) 추정/보상 및 비트율 제어 및 차영상 부호화부등을 포함하여 구성된다. 상기 인코더(30)는 일반적으로 알려진 방식을 적용할 수 있다.In this regard, the multi-view image generator 10 includes an image obtaining apparatus (for example, cameras # 1 to #N) corresponding to the number of multi-views to acquire independent images for each viewpoint. When the multiview image data is input, the preprocessor 20 performs a function of increasing the correlation between the multiview image data through a preprocessing process while solving noise removal and imbalancing problems. In addition, the encoder 30 includes a motion estimation / compensation and a disparity estimation / compensation between views, a bit rate control, a difference image encoding unit, and the like. The encoder 30 may apply a generally known scheme.

또한, 디코더(40)는 전술한 방식에 의해 부호화된 비트스트림을 수신한 후, 이를 역으로 복호화한다. 또한, 후처리부(50)는 디코딩된 데이터의 신뢰도 및 해상도를 높여주는 기능을 수행하게 된다. 마지막으로 디스플레이부(60)는 디스플레이의 기능, 특히 다시점 영상을 처리하는 능력에 따라 다양한 방식으로 사용자에게 디코딩된 데이터를 제공하게 된다. 예를들어, 평면 2차원 영상만을 제공하는 2D 디스플레이(61)이거나, 2개의 시점을 입체 영상으로 제공하는 스테레오(stereo) 타입의 디스플레이(63)이거나 또는 M개의 시점(2<M)을 입체 영상으로 제공하는 디스플레이(65)일 수 있다.In addition, the decoder 40 receives the bitstream encoded by the above-described method, and decodes it in reverse. In addition, the post-processing unit 50 performs a function of increasing the reliability and resolution of the decoded data. Finally, the display unit 60 provides the decoded data to the user in various ways depending on the function of the display, in particular, the ability to process a multi-view image. For example, it is a 2D display 61 providing only planar two-dimensional images, or a stereo type display 63 providing two views as stereoscopic images, or stereoscopic images of M views (2 <M). It may be provided as a display (65).

도 3에 나타난 바와 같이 가로축의 T0 ~ T100 은 각각 시간에 따른 프레임을 나타낸 것이고, 세로축의 S0 ~ S100은 각각 시점에 따른 프레임을 나타낸 것이다. 예를 들어, T0에 있는 픽쳐들은 같은 시간대(T0)에 서로 다른 카메라에서 찍은 영상들을 의미하며, S0 에 있는 픽쳐들은 한 대의 카메라에서 찍은 다른 시간대의 영상들을 의미한다. 또한, 도면 상의 화살표들은 각 픽쳐들의 예측 방향과 순서를 나타낸 것으로서, 예를 들어, T0 시간대의 S2 시점에 있는 P0 픽쳐는 I0로부터 예측된 픽쳐이며, 이는 TO 시간대의 S4 시점에 있는 P0 픽쳐의 참조 픽쳐가 된다. 또한, S2 시점의 T4, T2 시간대에 있는 B1, B2 픽쳐의 참조 픽쳐가 된다.As shown in FIG. 3, T0 to T100 on the horizontal axis represent frames according to time, and S0 to S100 on the vertical axis represent frames according to viewpoints, respectively. For example, pictures in T0 refer to images taken by different cameras in the same time zone (T0), and pictures in S0 refer to images in different time zones taken by one camera. In addition, the arrows in the drawings indicate the prediction direction and the order of each picture. For example, a P0 picture at S2 time point in the T0 time zone is a picture predicted from I0, which refers to a P0 picture at S4 time point in the TO time zone. It becomes a picture. It is also a reference picture of the B1 and B2 pictures in the T4 and T2 time zones at the time S2.

다시점 영상의 복호화 과정에 있어서, 시점 간의 랜덤 액세스는 필수적이다. 따라서, 복호화 노력을 최소화하면서 임의 시점에 대한 액세스가 가능하도록 하여야 한다. 여기서 효율적인 랜덤 액세스를 실현하기 위하여 앵커 픽쳐(anchor picture)의 개념을 설명할 필요가 있다. 앵커 픽쳐라 함은, 모든 슬라이스들이 동일 시간대의 프레임에 있는 슬라이스만을 참조하는 부호화된 픽쳐를 의미한다. 예를 들어, 다른 시점에 있는 슬라이스만을 참조하고 현재 시점에 있는 슬라이스는 참조하지 않는 부호화된 픽쳐를 말한다. 도 3에서 보면, T0 시간대의 S0 시점에 있는 I0픽쳐가 앵커 픽쳐라면, 같은 시간대에 있는, 즉 T0 시간대의 다른 시점에 있는 모든 픽쳐들 또한 앵커 픽쳐가 된다. 또 다른 예로서, T8 시간대의 S0 시점에 있는 I0픽쳐가 앵커 픽쳐라면, 같은 시간대에 있는, 즉 T8 시간대의 다른 시점에 있는 모든 픽쳐들 또한 앵커 픽쳐가 된다. 마찬가지로, T16, …, T96, T100 에 있는 모든 픽쳐들이 앵커 픽쳐의 예가 된다.In the decoding process of a multiview image, random access between viewpoints is essential. Therefore, access to arbitrary time points should be made possible while minimizing decryption effort. In order to realize efficient random access, it is necessary to explain the concept of an anchor picture. An anchor picture refers to an encoded picture in which all slices refer only to slices in frames of the same time zone. For example, an encoded picture refers to only a slice at another viewpoint and no slice at the current viewpoint. 3, if the I0 picture at the time S0 of the time zone T0 is an anchor picture, all pictures in the same time zone, that is, at another time point in the time zone T0, are also anchor pictures. As another example, if an I0 picture at time S0 in the T8 time zone is an anchor picture, all pictures in the same time zone, that is, at other times in the T8 time zone, are also anchor pictures. Similarly, T16,... All pictures in, T96, T100 are examples of anchor pictures.

앵커 픽쳐가 디코딩된 후, 차례로 코딩된 모든 픽쳐들은 앵커 픽쳐에 선행하여 디코딩된 픽쳐로부터 인터-프리딕션(inter-prediction)없이 디코딩된다.After the anchor picture is decoded, all pictures coded in turn are decoded without inter-prediction from the decoded picture prior to the anchor picture.

도 4를 설명하기에 앞서, 기준 시점(base views)에 대한 개념을 설명할 필요가 있다. 우리는 H.264/AVC 디코더와 호환성을 가지기 위한 적어도 하나의 시점 영상(view sequence)이 필요하다. 따라서, 빠른 랜덤 액세스를 위해 독립적으로 복호화가 가능한 시점들을 정의할 필요가 있는데, 이를 기준 시점(base views)이라 한다. 이러한 기준시점(base views)은 다시점(multi view) 중 부호화의 기준이 되며, 이는 참조 시점(reference view)에 해당된다. MVC(Multiview Video Coding)에서 기준 시점에 해당되는 영상은 종래 일반적인 영상 부호화 방식(MPEG-2, MPEG-4, H.263, H.264 등)에 의해 부호화되어 독립적인 비트스트림으로 형성하게 된다.Before describing FIG. 4, it is necessary to explain the concept of base views. We need at least one view sequence to be compatible with the H.264 / AVC decoder. Therefore, it is necessary to define viewpoints that can be independently decoded for fast random access, which is called a base view. This base view serves as a reference for encoding among multi views, which corresponds to a reference view. In MVC (Multiview Video Coding), an image corresponding to a reference time point is encoded by a conventional general video encoding method (MPEG-2, MPEG-4, H.263, H.264, etc.) to form an independent bitstream.

기준 시점에 해당되는 영상은 H.264/AVC와 호환될 수도 있고, 되지 않을 수도 있다. 하지만, H.264/AVC와 호환될 수 있는 시점의 영상은 항상 기준 시점이 된 다. 따라서, 본 발명에서는 현재 픽쳐가 기준 시점에 포함되는지 여부를 식별하기 위한 플래그로 MVC NAL의 nal unit header 내에서 "view_dependency_flag" 를 정의할 필요가 있다. 예를 들어, view_dependency_flag = 0 이면, 현재 픽쳐 또는 현재 슬라이스가 기준 시점에 포함되는 것을 의미하며, view_dependency_flag ≠ 0 이면, 현재 픽쳐 또는 현재 슬라이스가 기준 시점에 포함되지 않는 것을 의미한다. 또한, 새로운 MVC 슬라이스에 있어서, 각 슬라이스 타입에 대해 새로운 NAL unit type을 정의하는데, non-IDR 슬라이스에 대해서는 type 22로 지정하고, IDR 슬라이스에 대해서는 type 23으로 지정한다.(이에 대해서는 도 5a에서 좀더 상세히 설명하도록 한다.) view_dependency_flag를 추가함으로써, 수신된 비트스트림으로부터 복호화하게 될 경우 현재 픽쳐가 기준 시점에 해당되는지 여부를 판단할 수 있게 된다. 따라서, 이로부터 앵커 픽쳐 식별정보를 판단함에 있어서, H.264/AVC와 호환이 가능하게 된다. 이하, 현재 픽쳐가 앵커 픽쳐에 해당되는지 여부를 알려주는 앵커 픽쳐 식별 정보에 대해 살펴본다. The image corresponding to the reference time point may or may not be compatible with H.264 / AVC. However, an image of a time point compatible with H.264 / AVC is always a reference time point. Therefore, in the present invention, it is necessary to define "view_dependency_flag" in the nal unit header of the MVC NAL as a flag for identifying whether the current picture is included at the reference time point. For example, if view_dependency_flag = 0, it means that the current picture or the current slice is included in the reference view, and if view_dependency_flag ≠ 0, it means that the current picture or the current slice is not included in the reference view. In addition, for a new MVC slice, a new NAL unit type is defined for each slice type, type 22 for non-IDR slices and type 23 for IDR slices (see FIG. 5A for details). By adding the view_dependency_flag, it is possible to determine whether the current picture corresponds to the reference time point when decoding from the received bitstream. Therefore, in determining anchor picture identification information therefrom, it becomes compatible with H.264 / AVC. Hereinafter, the anchor picture identification information indicating whether the current picture corresponds to the anchor picture will be described.

상기 본 발명이 적용된 데이터 포맷은 제 1 NAL 헤더(501), 제 1 슬라이스 계층(502), 제 2 NAL 헤더(503), 제 2 슬라이스 계층(504)을 포함한다. 제 1 NAL 헤더(501)는 현재 픽쳐의 속성 정보를 포함하고 있다. 예를 들어, nal_ref_idc, nal_unit_type 이 있는데, nal_ref_idc는 NAL 단위의 참조픽쳐가 되는 슬라이스가 포함되어 있는지 여부를 나타내는 플래그 정보를 나타내며, nal_unit_type 은 NAL 단위의 종류를 나타내는 식별자를 나타낸다. 제 1 슬라이스 계층(502)은 압축된 결과 데이터를 포함하고 있다. 상기 제 1 NAL 헤더(501)와 제 1 슬라이스 계층(502)으로 이루어진 NAL 단위는 nal_unit_type이 1 또는 5가 된다. 이는 H.264/AVC 호환을 위한 슬라이스임을 나타내는 것이며, 예를 들어, nal_unit_type = 5 인 경우는 현재 슬라이스가 IDR 픽쳐의 슬라이스라는 것을 의미하며, nal_unit_type = 1 인 경우는 현재 슬라이스가 IDR 픽쳐이외의 픽쳐 슬라이스라는 것을 의미한다. 여기서 IDR(Instantaneous Decoding Refresh) 픽쳐란 순간 복호 리프레쉬 픽쳐로서, 영상 시퀀스의 선두 픽쳐를 의미한다. IDR 픽쳐에서는 픽쳐 비트스트림을 복호하기 위해 필요한 모든 상태가 초기화된다.The data format to which the present invention is applied includes a first NAL header 501, a first slice layer 502, a second NAL header 503, and a second slice layer 504. The first NAL header 501 includes attribute information of the current picture. For example, there are nal_ref_idc and nal_unit_type, where nal_ref_idc indicates flag information indicating whether a slice serving as a reference picture of a NAL unit is included, and nal_unit_type indicates an identifier indicating a type of a NAL unit. The first slice layer 502 contains compressed result data. The NAL unit consisting of the first NAL header 501 and the first slice layer 502 has a nal_unit_type of 1 or 5. This indicates that the slice is for H.264 / AVC compatibility. For example, when nal_unit_type = 5, this means that the current slice is a slice of an IDR picture. When nal_unit_type = 1, the current slice is a picture other than an IDR picture. It means slice. The IDR (Instantaneous Decoding Refresh) picture is an instant decoded refresh picture and means a first picture of a video sequence. In the IDR picture, all the states necessary for decoding the picture bitstream are initialized.

제 2 NAL 헤더(503)와 제 2 슬라이스 계층(504)은 상기 제 1 슬라이스 계층에 연속하여 순서대로 추가된 것이다. 제 2 NAL 헤더(503)는 기준 시점 식별정보를 포함하고 있으며, 이 부분에 현재 픽쳐가 기준 시점에 포함되는지 여부를 식별하는 view_dependency_flag 가 포함되어 있다. 제 2 슬라이스 계층(504)는 앵커 픽쳐 식별 정보만을 포함하고 있다. 따라서, anchor_pic_flag로부터 현재 픽쳐가 앵커 픽쳐인지 여부를 판단하게 된다. 상기 제 2 NAL 헤더(503)와 제 2 슬라이스 계층(504)으로 이루어진 추가된 NAL 단위는 nal_unit_type이 22 또는 23이 된다. 이는 MVC를 위한 슬라이스임을 나타내는 것이며, 예를 들어, nal_unit_type = 22 인 경우는 현재 슬라이스가 MVC에 있어서 IDR 픽쳐 이외의 픽쳐 슬라이스라는 것을 의미하며, nal_unit_type = 23 인 경우는 현재 슬라이스가 MVC에 있어서 IDR 픽쳐 슬라이스라는 것을 의미한다. nal_unit_type = 22인 NAL unit은 nal_unit_type = 1인 NAL unit을 따르며, nal_unit_type = 23인 NAL unit은 nal_unit_type = 5인 NAL unit을 따른다.The second NAL header 503 and the second slice layer 504 are added in sequence to the first slice layer. The second NAL header 503 includes reference time identification information, and this part includes view_dependency_flag for identifying whether the current picture is included in the reference time point. The second slice layer 504 includes only anchor picture identification information. Therefore, it is determined whether the current picture is an anchor picture from anchor_pic_flag. The added NAL unit consisting of the second NAL header 503 and the second slice layer 504 has a nal_unit_type of 22 or 23. This indicates that this is a slice for MVC. For example, when nal_unit_type = 22, this means that the current slice is a picture slice other than IDR picture in MVC, and when nal_unit_type = 23, the current slice is IDR picture in MVC It means slice. A NAL unit with nal_unit_type = 22 follows a NAL unit with nal_unit_type = 1, and a NAL unit with nal_unit_type = 23 follows a NAL unit with nal_unit_type = 5.

NAL 헤더(510)는 현재 픽쳐의 속성 정보 및 기준 시점 식별 정보를 포함하고 있다. 예를 들어, nal_ref_idc, nal_unit_type 이 있는데, nal_ref_idc는 NAL 단위의 참조픽쳐가 되는 슬라이스가 포함되어 있는지 여부를 나타내는 플래그 정보를 나타내며, nal_unit_type 은 NAL단위의 종류를 나타내는 식별자를 나타낸다. 또한, 현재 픽쳐가 기준 시점에 포함되는지 여부를 식별하는 view_dependency_flag 가 포함되어 있다. 슬라이스 계층(520)은 압축된 결과 데이터 및 앵커 픽쳐 식별 정보를 포함하고 있다. 상기 NAL 헤더(510)와 슬라이스 계층(520)으로 이루어진 NAL 단위는 nal_unit_type이 22 또는 23이 된다. 이는 MVC를 위한 슬라이스임을 나타내는 것이며, 예를 들어, nal_unit_type = 22 인 경우는 현재 슬라이스가 MVC에 있어서 non-IDR 픽쳐의 슬라이스라는 것을 의미하며, nal_unit_type = 23 인 경우는 현재 슬라이스가 MVC에 있어서 IDR 픽쳐 슬라이스라는 것을 의미한다. 상기 NAL 구조는 오직 MVC 디코더에서만 디코딩될 수 있으므로, 즉 기준 시점에 해당되지 않는 다른 시점들인 경우에는 H.264/AVC와 호환될 필요가 없으므로 슬라이스 내에 앵커 픽쳐 식별 정보를 포함시킬 수 있다.The NAL header 510 includes attribute information and reference viewpoint identification information of the current picture. For example, there are nal_ref_idc and nal_unit_type, where nal_ref_idc indicates flag information indicating whether a slice serving as a reference picture of a NAL unit is included, and nal_unit_type indicates an identifier indicating a type of a NAL unit. In addition, view_dependency_flag is included to identify whether the current picture is included in the reference time point. The slice layer 520 includes compressed result data and anchor picture identification information. The NAL unit consisting of the NAL header 510 and the slice layer 520 has a nal_unit_type of 22 or 23. This indicates that the slice is for MVC. For example, nal_unit_type = 22 means that the current slice is a slice of a non-IDR picture in MVC, and nal_unit_type = 23 means that the current slice is an IDR picture in MVC. It means slice. Since the NAL structure can be decoded only in the MVC decoder, that is, it is not necessary to be compatible with H.264 / AVC in case of other viewpoints that do not correspond to the reference viewpoint, so that anchor picture identification information can be included in the slice.

도 6은 본 발명이 적용된 비디오 신호 복호화 방법을 설명하기 위한 흐름도를 나타낸다. 6 is a flowchart illustrating a video signal decoding method to which the present invention is applied.

수신된 비트스트림으로부터 현재 픽쳐 또는 현재 슬라이스가 기준 시점(base views)에 포함되는지 여부를 판별하는 기준 시점 식별 정보를 추출하고(610), 추출된 기준 시점 식별 정보로부터 view_dependency_flag = 0 인지 여부를 판단한다(620). view_dependency_flag = 0 이면, 이는 현재 픽쳐 또는 현재 슬라이스가 기준 시점에 포함되는 것을 의미한다. 이 경우, 현재 픽쳐의 NAL(501,502)에 하나의 NAL(503,504)을 더 추가한 새로운 NAL 구조에 있어서, 추가된 NAL의 슬라이스 계층(504)으로부터 앵커 픽쳐 식별 정보를 추출한다(630). 추출된 앵커 픽쳐 식별 정보로부터 현재 픽쳐의 픽쳐 타입이 앵커 픽쳐인지 여부를 판단하는데(650), 예를 들어, anchor_pic_flag = 1 이면, 이는 현재 픽쳐의 픽쳐 타입이 앵커 픽쳐임을 나타내고, anchor_pic_flag ≠ 1 이면, 이는 현재 픽쳐의 픽쳐 타입이 앵커 픽쳐가 아님을 나타낸다. 앵커 픽쳐 여부를 판단한 후, 그에 따라 해당 비트스트림을 복호화하게 된다(660).Reference point identification information for determining whether the current picture or the current slice is included in the base view is extracted from the received bitstream (610), and it is determined whether view_dependency_flag = 0 from the extracted reference point identification information. (620). If view_dependency_flag = 0, this means that the current picture or the current slice is included at the reference time point. In this case, in the new NAL structure in which one NAL 503 and 504 is further added to the NAL 501 and 502 of the current picture, anchor picture identification information is extracted from the added NAL slice layer 504 (630). From the extracted anchor picture identification information, it is determined whether the picture type of the current picture is an anchor picture (650). For example, if anchor_pic_flag = 1, this indicates that the picture type of the current picture is an anchor picture, and if anchor_pic_flag ≠ 1, This indicates that the picture type of the current picture is not an anchor picture. After determining whether the anchor picture is present, the corresponding bitstream is decoded accordingly (660).

또한, 상기 view_dependency_flag = 0 인지 여부를 판단하는 단계(620)에서 view_dependency_flag ≠ 0 이면, 이는 현재 픽쳐 또는 현재 슬라이스가 기준 시점에 포함되지 않는 것을 의미한다. 이 경우, 현재 NAL 구조의 NAL header (510)에는 현재 픽쳐의 속성 정보 및 기준 시점 식별 정보를 포함하고 있으며, 슬라이스 계층(520)은 압축된 결과 데이터 및 앵커 픽쳐 식별 정보를 포함하고 있다. 상기와 같은 현재 픽쳐의 NAL 구조에 있어서, 슬라이스 계층(520)으로부터 앵커 픽쳐 식별 정보를 추출한다(640). 상기 NAL 구조는 오직 MVC 디코더에서만 디코딩될 수 있으므로, 즉 기준 시점에 해당되지 않는 다른 시점들인 경우에는 H.264/AVC와 호환될 필요가 없으므로 슬라이스 내에 앵커 픽쳐 식별 정보를 포함시킬 수 있다. 추출된 앵커 픽쳐 식별 정보로부터 현재 픽쳐의 픽쳐 타입이 앵커 픽쳐인지 여부를 판단하는데(650), 예를 들어, anchor_pic_flag = 1 이면, 이는 현재 픽쳐의 픽쳐 타입이 앵커 픽쳐임을 나타내고, anchor_pic_flag ≠ 1 이면, 이는 현재 픽쳐의 픽쳐 타입이 앵커 픽쳐가 아님을 나타낸다. 앵커 픽쳐 여부를 판단한 후, 그에 따라 해당 비트스트림을 복호화하게 된다(660).In addition, if view_dependency_flag ≠ 0 in the step 620 of determining whether view_dependency_flag = 0, this means that the current picture or the current slice is not included in the reference time point. In this case, the NAL header 510 of the current NAL structure includes attribute information and reference view identification information of the current picture, and the slice layer 520 includes compressed result data and anchor picture identification information. In the NAL structure of the current picture as described above, anchor picture identification information is extracted from the slice layer 520 (640). Since the NAL structure can be decoded only in the MVC decoder, that is, it is not necessary to be compatible with H.264 / AVC in case of other viewpoints that do not correspond to the reference viewpoint, so that anchor picture identification information can be included in the slice. From the extracted anchor picture identification information, it is determined whether the picture type of the current picture is an anchor picture (650). For example, if anchor_pic_flag = 1, this indicates that the picture type of the current picture is an anchor picture, and if anchor_pic_flag ≠ 1, This indicates that the picture type of the current picture is not an anchor picture. After determining whether the anchor picture is present, the corresponding bitstream is decoded accordingly (660).

다시점 영상 신호가 입력되면 인코더는 이를 신택스에 따라 비트스트림을 생성하게 되는데, 신택스 중 슬라이스 계층 함수 내에 앵커 픽쳐 식별 정보를 추가한다. 앵커 픽쳐 식별 정보를 추가함에 있어서는 기존의 H.264/AVC와 호환성이 유지될 필요가 있기 때문에, 앵커 픽쳐 식별 정보를 추가하기에 앞서 기준 시점 식별 정보인 view_dependency_flag 가 0인지 여부를 먼저 판단할 필요가 있다. 따라서, 신택스 상에서 먼저 "if (view_dependency_flag == 0)" 부분을 넣어서 현재 픽쳐의 픽쳐 타입이 기준 시점에 해당되는지 여부를 판단하고, 기준 시점에 해당된다면, 앵커 픽쳐 식별 정보인 anchor_pic_flag 만을 보내게 된다. 그러나, 현재 픽쳐의 픽쳐 타입이 기준 시점에 해당되지 않는다면, slice_header_in_mvc_extension() 함수와 slice_data_in_mvc_extension()함수 등을 호출하며, anchor_pic_flag 는 slice_header_in_mvc_extension() 함수 내에서 추출된다. 즉, 기준 시점에 해당되지 않는 다른 시점들인 경우에는 H.264/AVC와 호환될 필요가 없으므로 슬라이스 내 에 앵커 픽쳐 식별 정보를 포함시킬 수 있다. 이처럼 기준 시점 해당 여부를 판별한 뒤에 앵커 픽쳐 식별 정보를 구분하여 추가함으로서 H.264/AVC와 호환이 가능하며, 다시점 비디오 영상에서 랜덤 엑세스를 할 경우 최소한의 디코딩으로 어떠한 시점의 영상을 액세스할 수 있게 된다.When a multi-view video signal is input, the encoder generates the bitstream according to the syntax, and adds anchor picture identification information in the slice layer function among the syntaxes. In adding anchor picture identification information, compatibility with existing H.264 / AVC needs to be maintained. Therefore, it is necessary to first determine whether view_dependency_flag, which is reference view identification information, is 0 before adding anchor picture identification information. have. Therefore, it is determined whether the picture type of the current picture corresponds to the reference time point by inserting the "if (view_dependency_flag == 0)" portion on the syntax first, and if only the reference time, the anchor picture identification information anchor_pic_flag is transmitted. However, if the picture type of the current picture does not correspond to the reference time point, the slice_header_in_mvc_extension () function and the slice_data_in_mvc_extension () function are called, and the anchor_pic_flag is extracted in the slice_header_in_mvc_extension () function. That is, in case of other viewpoints that do not correspond to the reference viewpoint, the anchor picture identification information may be included in the slice since it is not necessary to be compatible with H.264 / AVC. In this way, it is compatible with H.264 / AVC by discriminating and adding anchor picture identification information after determining whether the reference point is applicable, and when random access is performed on a multiview video image, the image at any point of time can be accessed with minimal decoding. It becomes possible.

본 발명이 적용되는 상기 장치는 제 1 식별정보 추출부와 제 2 식별정보 추출부와 복호화부를 포함한다. 제 1 식별정보 추출부(810)는 수신된 비트스트림으로부터 기준 시점 식별정보(view_dependency_flag)를 추출한다. 제 2 식별정보 추출부(820)는 상기 추출된 기준 시점 식별정보에 따라 앵커 픽쳐 식별정보를 추출한다. 복호화부(830)는 상기 추출된 앵커 픽쳐 식별정보에 기초하여 해당 비트스트림을 복호화한다.The apparatus to which the present invention is applied includes a first identification information extracting unit, a second identification information extracting unit, and a decoding unit. The first identification information extracting unit 810 extracts reference view identification information (view_dependency_flag) from the received bitstream. The second identification information extracting unit 820 extracts anchor picture identification information according to the extracted reference view identification information. The decoder 830 decodes the corresponding bitstream based on the extracted anchor picture identification information.

상기에서 살펴본 바와 같이, 본 발명은 여러 대의 카메라에서 취득된 다시점 영상의 랜덤 액세스를 함에 있어서, 앵커 픽쳐라는 개념을 사용함으로써 선행하여 디코딩된 픽쳐로부터 인터-프리딕션(inter-prediction)없이 디코딩할 수 있으므로 시간 지연 문제를 해결할 수 있다. 또한, 기준 시점(base views)을 설정하여 그에 따라 앵커 픽쳐 식별 정보를 다른 방식으로 추가함으로써 기존 H.264/AVC와 호환이 가능하게 할 수 있다. 본 발명은 이러한 특징을 활용함으로써 보다 효율적으로 다시점 비디오 신호의 복호화 및 부호화를 할 수 있다.As described above, in the present invention, in the random access of multi-view images obtained from several cameras, the present invention uses the concept of an anchor picture to decode without inter-prediction from a previously decoded picture. This can solve the time delay problem. In addition, by setting the base view (base views) and add the anchor picture identification information in a different way accordingly it can be made compatible with the existing H.264 / AVC. The present invention can utilize such features to more efficiently decode and encode a multiview video signal.

Claims

Obtaining first identification information indicating a type of a current NAL from a video signal;

Acquiring, according to the first identification information, second identification information indicating whether a coded picture of a current NAL is an anchor picture; And

Decoding the video signal based on the second identification information

Video signal decoding method comprising a.

The method of claim 1,

The second identification information is obtained from the header of the current NAL.

The method of claim 1,

Acquiring reference viewpoint identification information indicating whether the current picture corresponds to the reference viewpoint,

And the second identification information is obtained based on the reference view identification information.

A first identification information obtaining unit which obtains first identification information indicating the type of the current NAL from the video signal;

A second identification information obtaining unit obtaining second identification information indicating whether a coded picture of a current NAL is an anchor picture according to the first identification information; And

A decoder which decodes the video signal based on the second identification information

Video signal decoding apparatus comprising a.

The method of claim 4, wherein

And the second identification information is obtained from a header of the current NAL.

The method of claim 4, wherein

Further comprising a third identification information acquisition unit for obtaining reference view identification information indicating whether the current picture corresponds to the reference view,

delete

Adding reference time identification information to the nal unit header

Video signal coding method.

A video signal encoding method comprising adding anchor picture identification information to a slice layer.