KR102537024B1

KR102537024B1 - A method for encoding/decoding a virtual reality video

Info

Publication number: KR102537024B1
Application number: KR1020170127531A
Authority: KR
Inventors: 임화섭; 전대석; 김현호; 박도현; 윤용욱; 김재곤
Original assignee: 한국항공대학교산학협력단
Priority date: 2017-09-29
Filing date: 2017-09-29
Publication date: 2023-05-25
Also published as: KR20190037817A; WO2019066529A1

Abstract

본 발명의 실시 예에 따른 영상 처리 방법은, 상기 가상 현실 영상의 프레임 패킹 처리를 포함하여, 부호화 또는 복호화를 처리하는 단계를 포함한다.An image processing method according to an embodiment of the present invention includes processing encoding or decoding, including frame packing processing of the virtual reality image.

Description

Method and apparatus for encoding/decoding virtual reality video providing frame packing

본 발명은 영상의 부호화/복호화 방법 및 장치에 관한것이다. 보다 구체적으로, 본 발명은 프레임 패킹을 제공하는 가상 현실 영상의 부호화/복호화 방법 및 그 장치에 관한 것이다.The present invention relates to a video encoding/decoding method and apparatus. More specifically, the present invention relates to a method and apparatus for encoding/decoding a virtual reality image providing frame packing.

최근 디지털 영상 처리와 컴퓨터 그래픽 기술이 발전함에 따라, 현실 세계를 재현하고 이를 실감나게 경험하도록 하는 가상현실(VIRTUAL REALITY, VR) 기술에 관한 연구가 활발히 진행되고 있다.As digital image processing and computer graphics technology have recently developed, research on virtual reality (VR) technology that reproduces the real world and allows a realistic experience is being actively conducted.

특히, HMD(Head Mounted Display)와 같은 최근의 VR 시스템은, 사용자의 양안에 3차원 입체 영상을 제공할 수 있을 뿐만 아니라, 그 시점을 전방위로 트래킹할 수 있기에, 360도 회전 시청 가능한 실감나는 가상현실(VR) 영상 컨텐츠를 제공할 수 있다는 점에서 많은 관심을 받고 있다.In particular, recent VR systems such as HMD (Head Mounted Display) can not only provide a 3D stereoscopic image to both eyes of the user, but also track the viewpoint in all directions, providing realistic virtual reality that can be viewed 360 degrees. It is receiving a lot of attention in that it can provide reality (VR) video content.

그러나, 360 VR 컨텐츠는 시간 및 양안 영상이 공간적으로 복합 동기화된 동시 전방위의 다시점 영상 정보로 구성되기 때문에, 영상의 제작 및 전송에 있어서, 모든 시점의 양안 공간에 대해 동기화된 2개의 대형 영상을 부호화하여 압축 및 전달하게 된다. 이는 복잡도 및 대역폭 부담을 가중시키며, 특히 복호화 장치에서는 사용자 시점을 벗어나 실제로 시청되지 않는 영역에 대하여도 복호화가 이루어짐으로써 불필요한 프로세스가 낭비되는 문제점이 있다.However, since 360 VR content is composed of simultaneous omnidirectional multi-view image information in which temporal and binocular images are spatially and compositely synchronized, two large images synchronized with respect to the binocular space of all viewpoints must be used in the production and transmission of images. It is encoded, compressed, and transmitted. This increases the complexity and bandwidth burden, and in particular, in the decoding apparatus, decoding is performed even for an area that is not actually viewed outside the user's point of view, so that unnecessary processes are wasted.

이에 따라, 영상의 전송 데이터량과 복잡도를 감소시키고, 대역폭 및 복호화 장치의 배터리 소모 측면에서도 효율적인 부호화 방법이 요구된다.Accordingly, there is a need for an encoding method that reduces the transmission data amount and complexity of an image and is efficient in terms of bandwidth and battery consumption of a decoding device.

본 발명은 상기와 같은 과제를 해결하기 위한 것으로, 가상 현실 영상의 프레임 패킹을 이용하여, 360도 카메라나 VR용 영상과 같은 가상 현실 영상을 효율적으로 부호화/복호화하는 방법 및 장치를 제공하는 데 그 목적이 있다.The present invention is to solve the above problems, and to provide a method and apparatus for efficiently encoding/decoding virtual reality images such as 360-degree cameras or VR images using frame packing of virtual reality images. There is a purpose.

상기한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본 발명의 실시예에 따른 영상 부호화 방법은, 가상 현실 영상을 획득하는 단계; 상기 가상 현실 영상의 프레임 패킹 처리를 수행하는 단계; 및 상기 가상 현실 영상을 부호화하는 단계를 포함한다.As a technical means for achieving the above technical problem, an image encoding method according to an embodiment of the present invention includes obtaining a virtual reality image; performing frame packing processing on the virtual reality image; and encoding the virtual reality image.

또한, 상기 방법은 컴퓨터에서 실행시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체로 구현될 수 있다.In addition, the method may be implemented in a computer-readable recording medium on which a program to be executed by a computer is recorded.

본 발명의 실시 예에 따르면, 가상 현실 영상으로부터 부호화 및 전송에 최적화된 프레임 패킹 처리를 제공하여, 영상의 전송 데이터량과 대역폭 및 복잡도를 효율적으로 감소시킬 수 있다.According to an embodiment of the present invention, by providing a frame packing process optimized for encoding and transmission of a virtual reality image, it is possible to efficiently reduce the transmission data amount, bandwidth, and complexity of an image.

도 1은 본 발명의 실시 예에 다른 전체 시스템을 개략적으로 도시한 블록도이다.
도 2 내지 도 4는 본 발명의 실시 예에 다른 공간적 구조 정보 기반의 영상 부호화 및 복호화를 나타낸다.
도 5 내지 도 9는 본 발명의 실시 예에 따른 프레임 패킹을 설명한다.
도 10 내지 도 11은 본 발명의 실시 예에 따른 부호화 및 복호화 장치를 나타낸다.1 is a block diagram schematically showing an entire system according to an embodiment of the present invention.
2 to 4 show video encoding and decoding based on spatial structure information according to an embodiment of the present invention.
5 to 9 illustrate frame packing according to an embodiment of the present invention.
10 to 11 show encoding and decoding apparatuses according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참조하여 본원이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본원의 실시 예를 상세히 설명한다. 그러나 본원은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시 예에 한정되지 않는다. 그리고 도면에서 본원을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, embodiments of the present application will be described in detail so that those skilled in the art can easily practice them with reference to the accompanying drawings. However, the present disclosure may be implemented in many different forms and is not limited to the embodiments described herein. And in order to clearly describe the present application in the drawings, parts irrelevant to the description are omitted, and similar reference numerals are attached to similar parts throughout the specification.

본원 명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결"되어 있는 경우도 포함한다.Throughout this specification, when a part is said to be "connected" to another part, this includes not only the case of being "directly connected" but also the case of being "electrically connected" with another element in between. do.

본원 명세서 전체에서, 어떤 부재가 다른 부재 상에 위치하고 있다고 할 때, 이는 어떤 부재가 다른 부재에 접해 있는 경우뿐 아니라 두 부재 사이에 또 다른 부재가 존재하는 경우도 포함한다.Throughout the present specification, when a member is said to be located on another member, this includes not only a case where a member is in contact with another member, but also a case where another member exists between the two members.

본원 명세서 전체에서, 어떤 부분이 어떤 구성요소를 "포함" 한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성 요소를 더 포함할 수 있는 것을 의미한다. 본원 명세서 전체에서 사용되는 정도의 용어 "약", "실질적으로" 등은 언급된 의미에 고유한 제조 및 물질 허용오차가 제시될 때 그 수치에서 또는 그 수치에 근접한 의미로 사용되고, 본원의 이해를 돕기 위해 정확하거나 절대적인 수치가 언급된 개시 내용을 비양심적인 침해자가 부당하게 이용하는 것을 방지하기 위해 사용된다. 본원 명세서 전체에서 사용되는 정도의 용어 "~(하는) 단계" 또는 "~의 단계"는 "~ 를 위한 단계"를 의미하지 않는다.Throughout the present specification, when a part "includes" a certain component, it means that it may further include other components without excluding other components unless otherwise stated. As used throughout this specification, the terms "about," "substantially," and the like are used at or approximating that value when manufacturing and material tolerances inherent in the stated meaning are given, and do not convey the understanding of this application. Accurate or absolute figures are used to help prevent exploitation by unscrupulous infringers of the disclosed disclosure. The term "step of (doing)" or "step of" as used throughout the present specification does not mean "step for".

본원 명세서 전체에서, 마쿠시 형식의 표현에 포함된 이들의 조합의 용어는 마쿠시 형식의 표현에 기재된 구성 요소들로 이루어진 군에서 선택되는 하나 이상의 혼합 또는 조합을 의미하는 것으로서, 상기 구성 요소들로 이루어진 군에서 선택되는 하나 이상을 포함하는 것을 의미한다.Throughout the present specification, the term of a combination thereof included in the expression of the Markush form means one or more mixtures or combinations selected from the group consisting of the components described in the expression of the Markush form, and includes the components It means including one or more selected from the group consisting of.

본 발명의 실시 예에서, 가상 현실 영상을 부호화하는 방법의 일예로, 현재까지 개발된 비디오 부호화 표준 중에서 최고의 부호화 효율을 가지는 MPEG(Moving Picture Experts Group)과 VCEG(Video Coding Experts Group)에서 공동으로 표준화한 HEVC(High Efficiency Video Coding) 또는 현재 표준화가 진행 중인 부호화 기술을 이용하여 부호화를 수행할 수 있으나, 이에 한정되지는 아니한다.In an embodiment of the present invention, as an example of a method of encoding a virtual reality image, it is jointly standardized by Moving Picture Experts Group (MPEG) and Video Coding Experts Group (VCEG), which have the highest coding efficiency among video coding standards developed to date. Coding may be performed using High Efficiency Video Coding (HEVC) or a coding technology currently being standardized, but is not limited thereto.

통상, 부호화 장치는 인코딩 과정과 디코딩 과정을 포함하고, 복호화 장치는 디코딩 과정을 구비한다. 복호화 장치의 디코딩 과정은 부호화 장치의 디코딩 과정과 동일하다. 따라서, 이하에서는 부호화 장치를 위주로 설명하기로 한다.Typically, an encoding device includes an encoding process and a decoding process, and a decoding device includes a decoding process. A decoding process of the decoding device is the same as that of the encoding device. Therefore, hereinafter, the encoding device will be mainly described.

도 1은 본 발명의 일실시예에 따른 전체 시스템 구조를 도시한다.1 shows the overall system structure according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 일 실시 예에 따른 전체 시스템은, 전처리 장치(10), 부호화 장치(100), 복호화 장치(200), 후처리 장치(20)를 포함한다.Referring to FIG. 1 , the entire system according to an embodiment of the present invention includes a pre-processing device 10, an encoding device 100, a decoding device 200, and a post-processing device 20.

본 발명의 실시 예에 따른 시스템은, 본 발명의 실시 예에 따른 가상 현실 영상 정보를 처리할 수 있다. 가상 현실 영상은 사용자가 실제로 그곳에 있는 듯한 경험을 제공하는 영상으로서, 사용자의 시각에 동기화되어 전방위를 표현할 수 있는 영상일 수 있으며, 360 비디오 또는 가상 현실 비디오라고도 불릴 수 있다.A system according to an embodiment of the present invention may process virtual reality image information according to an embodiment of the present invention. The virtual reality image is an image that provides an experience as if the user is actually there, and may be an image that is synchronized with the user's view and can express omnidirectional images, and may also be called a 360 video or a virtual reality video.

복수의 시점별 영상들을 병합 또는 스티치(stitch)등의 작업을 통해 전처리하여, 동기화된 비디오 프레임을 획득하는 전처리 장치(10)와, 상기 동기화된 비디오 프레임을 부호화하여 비트스트림을 출력하는 부호화 장치(100)와, 상기 비트스트림을 전송받아 상기 동기화된 비디오 프레임을 복호화하는 복호화 장치(200) 및 상기 비디오 프레임의 후처리를 통해 각 시점별 동기화된 영상이 각각의 디스플레이로 출력되도록 하는 후처리 장치(20)를 포함하여 구성될 수 있다.A pre-processing device 10 that obtains a synchronized video frame by pre-processing a plurality of images for each viewpoint through an operation such as merging or stitching, and an encoding device that encodes the synchronized video frame and outputs a bitstream ( 100), a decoding device 200 that receives the bitstream and decodes the synchronized video frame, and a post-processing device that outputs a synchronized image for each view to each display through post-processing of the video frame ( 20) may be configured.

여기서, 입력 영상은 다시점별 개별 영상을 포함할 수 있으며, 예를 들어 하나 이상의 카메라가 시간 및 공간 동기화된 상태에서 촬영되는 다양한 시점의 서브 이미지 정보를 포함할 수 있다. 이에 따라 전처리 장치(10)는 취득된 다시점 서브 이미지 정보를 시간에 따라 공간적 병합 또는 스티치 처리함으로써 동기화된 가상 현실 영상 정보를 획득할 수 있다.Here, the input image may include individual images for each multi-viewpoint, and may include, for example, sub-image information of various viewpoints captured by one or more cameras in a time- and spatial-synchronized state. Accordingly, the preprocessing device 10 may acquire synchronized virtual reality image information by spatially merging or stitching the obtained multi-view sub image information according to time.

그리고, 부호화 장치(100)는 상기 동기화된 가상 현실 영상 정보를 스캐닝 및 예측 부호화하여 비트스트림을 생성하며, 생성된 비트스트림은 복호화 장치(200)로 전송될 수 있다. 특히, 본 발명의 실시 예에 따른 부호화 장치(100)는 상기 동기화된 영상 정보로부터 공간적 구조 정보를 추출할 수 있으며, 복호화 장치(200)로 시그널링할 수 있다.The encoding device 100 scans and predictively encodes the synchronized virtual reality image information to generate a bitstream, and the generated bitstream may be transmitted to the decoding device 200 . In particular, the encoding device 100 according to an embodiment of the present invention can extract spatial structure information from the synchronized image information and signal it to the decoding device 200.

여기서 공간적 구조 정보(spatial layout information)는 상기 전처리 장치(10)로부터 하나 이상의 서브 이미지들이 병합되어 하나의 비디오 프레임으로 구성됨에 따라, 각각의 서브 이미지들의 속성 및 배치에 대한 기본 정보를 포함할 수 있다. 또한, 각 서브 이미지들 및 서브 이미지들간 관계에 대한 부가 정보를 더 포함할 수 있으며, 이에 대하여는 후술하도록 한다.Here, the spatial layout information may include basic information about attributes and arrangement of each subimage as one or more subimages are merged from the preprocessor 10 to form one video frame. . In addition, additional information about each sub-image and the relationship between sub-images may be further included, which will be described later.

이에 따라, 본 발명의 실시 예에 따른 공간적 구조 정보가 복호화 장치(200)로 전달될 수 있다. 그리고, 복호화 장치(200)는 공간적 구조 정보와, 사용자 시점 정보를 참조하여 가상 현실 영상 비트스트림의 복호화 대상 및 복호화 순서를 결정할 수 있으며, 이는 효율적인 복호화를 유도할 수 있다.Accordingly, spatial structure information according to an embodiment of the present invention may be delivered to the decoding apparatus 200. Also, the decoding apparatus 200 may determine a decoding target and a decoding order of a virtual reality image bitstream by referring to spatial structure information and user viewpoint information, which may lead to efficient decoding.

그리고, 복호화된 비디오 프레임은 다시 후처리 장치(20)를 통해 각각의 디스플레이별 서브 이미지로 분리되어 HMD 와 같은 복수의 동기화된 디스플레이 시스템으로 제공되며, 이에 따라 사용자는 가상 현실과 같이 현실감있는 가상 현실 영상을 제공받을 수 있게 된다.And, the decoded video frame is again separated into sub-images for each display through the post-processing device 20 and provided to a plurality of synchronized display systems such as HMD, so that the user can experience realistic virtual reality like virtual reality. video can be provided.

도 2는 본 발명의 일 실시 예에 따른 가상 현실 영상 부호화 장치의 구성을 나타내는 블록도이다.2 is a block diagram showing the configuration of a virtual reality video encoding apparatus according to an embodiment of the present invention.

도 2를 참조하면, 본 발명의 실시 예에 따른 부호화 장치(100)는 가상 현실 영상 획득부(110), 공간적 구조 정보 생성부(120), 공간적 구조 정보 시그널링부(130), 영상 부호화부 및 전송 처리부(150)를 포함한다.Referring to FIG. 2 , an encoding device 100 according to an embodiment of the present invention includes a virtual reality image acquisition unit 110, a spatial structure information generator 120, a spatial structure information signaling unit 130, an image encoder and It includes a transmission processing unit 150.

가상 현실 영상 획득부(110)는 360도 카메라와 같은 가상 현실 영상 획득 수단을 이용하여 가상 현실 영상을 획득한다. 가상 현실 영상은 시간 및 공간 동기화된 복수의 서브 이미지를 포함할 수 있으며, 전처리 장치(10)로부터 수신되거나 별도의 외부 입력 장치로부터 수신될 수도 있다.The virtual reality image acquisition unit 110 obtains a virtual reality image using a virtual reality image acquisition means such as a 360-degree camera. The virtual reality image may include a plurality of sub-images synchronized in time and space, and may be received from the pre-processing device 10 or from a separate external input device.

그리고, 공간적 구조 정보 생성부(120)는 상기 가상 현실 영상을 시간 단위의 비디오 프레임으로 분할하고, 상기 비디오 프레임에 대한 공간적 구조 정보를 추출한다. 공간적 구조 정보는 각각의 서브 이미지들의 속성 및 배치 상태에 따라 결정될 수 있으며, 전처리 장치(10)로부터 획득되는 정보에 따라 결정될 수도 있다.The spatial structure information generation unit 120 divides the virtual reality image into time-unit video frames and extracts spatial structure information of the video frames. Spatial structure information may be determined according to the properties and arrangement state of each sub-image, or may be determined according to information acquired from the pre-processing device 10 .

그리고, 공간적 구조 정보 시그널링부(130)는 상기 공간적 구조 정보를 복호화 장치(200)로 시그널링하기 위한 정보 처리를 수행한다. 예를 들어, 공간적 구조 정보 시그널링부(130)는 영상 부호화부에서 부호화된 영상 데이터에 포함시키거나, 별도의 데이터 포맷을 구성하거나, 부호화된 영상의 메타데이터에 포함시키기 위한 하나 이상의 프로세스를 수행할 수 있다.And, the spatial structure information signaling unit 130 performs information processing for signaling the spatial structure information to the decoding apparatus 200 . For example, the spatial structure information signaling unit 130 may perform one or more processes for including in image data encoded by the image encoder, constructing a separate data format, or including in metadata of an encoded image. can

그리고, 영상 부호화부는 가상 현실 영상을 시간 흐름에 따라 부호화한다. 또한, 영상 부호화부는 공간적 구조 정보 생성부(120)에서 생성되는 공간적 구조 정보를 참조 정보로 이용하여, 영상 스캐닝 순서 및 참조 이미지 등을 결정할 수 있다.And, the image encoder encodes the virtual reality image according to the lapse of time. In addition, the image encoder may use the spatial structure information generated by the spatial structure information generation unit 120 as reference information to determine an image scanning order and a reference image.

따라서, 영상 부호화부는 전술한 바와 같이 HEVC(High Efficiency Video Coding)를 이용하여 부호화를 수행할 수 있으나, 공간적 구조 정보에 따라, 가상 현실 영상에 대해 보다 효율적인 방식으로 개선될 수 있다.Accordingly, the image encoder may perform encoding using High Efficiency Video Coding (HEVC) as described above, but may be improved in a more efficient manner for virtual reality images according to spatial structure information.

그리고, 전송 처리부(150)는 부호화된 영상 데이터와, 상기 공간적 구조 정보 시그널링부(130)로부터 삽입된 공간적 구조 정보를 결합하여 복호화 장치(200)로 전송하기 위한 하나 이상의 변환 및 송신 처리를 수행할 수 있다.In addition, the transmission processing unit 150 performs one or more conversion and transmission processes for combining the encoded image data and the spatial structure information inserted from the spatial structure information signaling unit 130 and transmitting the combined image data to the decoding apparatus 200. can

도 3 내지 도 4는 본 발명의 다양한 실시 예에 따른 공간적 구조 정보의 시그널링 방법을 설명하기 위한 도면들이다.3 and 4 are diagrams for explaining a signaling method of spatial structure information according to various embodiments of the present disclosure.

전술한 바와 같이 입력 영상의 서브 이미지들은 다양한 방식으로 배치될 수 있다. 이에 따라, 공간적 구조 정보는 배치 정보를 시그널링하기 위한 테이블 인덱스를 별도 포함할 수 있다. 예를 들어, 도 11에 도시된 바와 같이 가상 현실 영상은 변환 방법에 따라 Equirectangular (ERP), Cubemap (CMP), Equal-area (EAP), Octahedron (OHP), Viewport generation using rectilinear projection, Icosahedron (ISP), Crasters Parabolic Projection for CPP-PSNR calculation, Truncated Square Pyramid (TSP), Segmented Sphere Projection (SSP), Adjusted Cubemap Projection (ACP), Rotated Sphere Projection (RSP)등의 레이아웃이 예시될 수 있으며, 공간적 구조 정보에는 각각의 레이아웃에 대응되는 도 12에 도시된 테이블 인덱스가 삽입될 수 있다.As described above, the sub-images of the input image may be arranged in various ways. Accordingly, spatial structure information may separately include a table index for signaling arrangement information. For example, as shown in FIG. 11, a virtual reality image is Equirectangular (ERP), Cubemap (CMP), Equal-area (EAP), Octahedron (OHP), Viewport generation using rectilinear projection, Icosahedron (ISP) according to the conversion method. ), Crasters Parabolic Projection for CPP-PSNR calculation, Truncated Square Pyramid (TSP), Segmented Sphere Projection (SSP), Adjusted Cubemap Projection (ACP), and Rotated Sphere Projection (RSP) layouts may be exemplified, and spatial structure information A table index shown in FIG. 12 corresponding to each layout may be inserted into .

보다 구체적으로, 각 공간적 구조 정보에 따라 360도에 대응하는 좌표계의 3차원 영상이 2차원 영상으로 투영(Projection)될 수 있다.More specifically, a 3D image of a coordinate system corresponding to 360 degrees may be projected as a 2D image according to each piece of spatial structure information.

ERP는 360도 영상을 하나의 면(face)에 투영 변환하는 것으로, 2차원 이미지의 샘플링 위치에 대응하는 u, v 좌표계 위치 변환 및 상기 u, v 좌표계 위치에 대응하는 구(sphere)상의 경도와 위도 좌표 변환 처리를 포함할 수 있다. 이에 따라, 공간적 구조 정보는 ERP 인덱스와, 단일 면 정보(예를 들어 face index가 0으로 설정)를 포함할 수 있다.ERP transforms a 360-degree image onto one face by projecting and converting the u/v coordinate system position corresponding to the sampling position of the 2D image and the longitude on the sphere corresponding to the u/v coordinate system position It may include latitude coordinate conversion processing. Accordingly, the spatial structure information may include an ERP index and single face information (for example, face index is set to 0).

CMP는 360도 영상을 6개의 정육각형 면(face)에 투영하는 것으로, PX, PY, PZ, NX, NY, NZ(P는 positive, N은 negative를 나타냄)에 대응하는 각 면 인덱스(face index, f)에 투영된 서브 이미지들이 배치될 수 있다. 예를 들어 CMP영상의 경우, ERP 영상을 3 x 2 큐브맵 영상으로 변환된 영상을 포함할 수 있다.CMP projects a 360-degree image onto six regular hexagonal faces, and each face index corresponding to PX, PY, PZ, NX, NY, NZ (P indicates positive, N indicates negative) Sub-images projected in f) may be arranged. For example, in the case of a CMP image, an image obtained by converting an ERP image into a 3x2 cubemap image may be included.

이에 따라, 공간적 구조 정보는 CMP 인덱스와, 서브 이미지에 대응하는 각 면 인덱스 정보가 포함될 수 있다. 후처리 장치(20)는 면 인덱스에 따라 서브 이미지상의 2차원 위치 정보를 처리하여, 3차원 좌표계에 대응되는 위치 정보를 산출하고, 이에 따른 3차원 360도 영상으로 역변환 출력할 수 있다.Accordingly, the spatial structure information may include the CMP index and each face index information corresponding to the sub-image. The post-processing device 20 may process the 2D positional information on the sub-image according to the plane index to calculate positional information corresponding to the 3D coordinate system, and inversely transform the result into a 3D 360-degree image.

ACP는 CMP와 같이 360도 영상을 6개의 정육각형 면(face)에 투영함에 있어서, 2차원으로의 투영 변환 및 3차원으로의 역변환에 각각 대응하여 3차원 굴곡 변형에 맞게 조정된 함수를 적용하는 것으로, 그 처리 함수는 상이하나, 이용되는 공간적 구조 정보는 ACP 인덱스와 서브 이미지별 면 인덱스 정보가 포함될 수 있다. 따라서, 후처리 장치(20)는 면 인덱스에 따라 서브 이미지상의 2차원 위치 정보를 조정된 함수에 따라 역변환 처리하여, 3차원 좌표계에 대응되는 위치 정보를 산출하고, 이에 따른 3차원 360도 영상으로 출력할 수 있다. ACP, like CMP, in projecting a 360-degree image onto six regular hexagonal faces, applies a function adjusted to 3-dimensional curvature in response to projection transformation into 2D and inverse transformation into 3D, respectively. , the processing function is different, but the spatial structure information used may include an ACP index and face index information for each sub-image. Therefore, the post-processing device 20 inversely transforms the 2D location information on the sub-image according to the face index according to the adjusted function to calculate the location information corresponding to the 3D coordinate system, and converts the 3D location information into a 360 degree image accordingly. can be printed out.

EAP는 ERP와 동일하게 하나의 면(face)에 투영되는 변환으로서, 2차원 이미지의 샘플링 위치에 즉시 대응하는 구(sphere)상의 경도와 위도 좌표 변환 처리를 포함할 수 있다. 공간적 구조 정보는 EAP 인덱스와 단일 면 정보를 포함할 수 있다.EAP is a transformation projected on one face, like ERP, and may include transformation of longitude and latitude coordinates on a sphere immediately corresponding to a sampling location of a two-dimensional image. Spatial structure information may include an EAP index and single face information.

OHP는 360도 영상을 8개의 정팔각형 면(face)에 6개의 꼭지점들(vertices) 을 이용하여 투영하는 것으로, 면 {F0, F1, F2, F3, F4, F5, F6, F7}과 꼭지점(V0, V1, V2, V3, V3, V4, V5)를 이용하여 투영된 서브 이미지들이 변환 영상에 배치될 수 있다.OHP projects a 360 degree image onto 8 regular octagonal faces using 6 vertices. Sub-images projected using V0, V1, V2, V3, V3, V4, and V5) may be arranged in the transformed image.

이에 따라, 공간적 구조 정보는 OHP 인덱스와, 서브 이미지에 대응하는 각 면 인덱스(face index) 정보 및 상기 면 인덱스 정보에 매칭되는 하나 이상의 꼭지점(vertex) 인덱스 정보가 포함될 수 있다. 또한, 변환 영상의 서브 이미지 배치는 컴팩트한 경우와 컴팩트하지 않는 경우로 구분될 수 있다. 이에 따라, 공간적 구고 정보는 컴팩트 여부 식별 정보를 더 포함할 수 있다. 예를 들어, 컴팩트하지 않는 경우와, 컴팩트한 경우의 면 인덱스와 꼭지점 인덱스 매칭 정보 및 역변환 프로세스가 상이하게 결정될 수 있다. 예를 들어, 면 인덱스 4에는 컴팩트가 아닌 경우 꼭지점 인덱스 V0, V5, V1 로 매칭될 수 있으며, 컴팩트인 경우 V1, V0, V5로 다른 매칭이 처리될 수 있다.Accordingly, the spatial structure information may include an OHP index, face index information corresponding to the sub-image, and one or more vertex index information matched to the face index information. In addition, sub-image arrangement of the converted image may be divided into a compact case and a non-compact case. Accordingly, the spatial construction information may further include compactness identification information. For example, face index and vertex index matching information and an inverse transformation process may be determined differently for a non-compact case and a compact case. For example, face index 4 may be matched with vertex indices V0, V5, and V1 when it is not compact, and other matching may be processed with V1, V0, and V5 when it is compact.

후처리 장치(20)는 면 인덱스 및 꼭지점 인덱스에 따라, 서브 이미지상의 2차원 위치 정보를 역변환 처리하여 3차원 좌표계에 대응되는 벡터 정보를 산출하고, 이에 따른 3차원 360도 영상으로 역변환 출력할 수 있다.The post-processing device 20 inversely transforms the 2D location information on the sub-image according to the face index and vertex index to calculate vector information corresponding to the 3D coordinate system, and outputs the inverse transformation as a 3D 360-degree image. there is.

ISP는 360도 영상을 20개의 면(face)과 12개의 꼭지점들(vertices) 을 이용하여 투영하는 것으로, 각 변환에 따른 서브 이미지들이 변환 영상에 배치될 수 있다. 공간적 구조 정보는 OHP와 유사하게 ISP 인덱스와, 면 인덱스, 꼭지점 인덱스, 컴팩트 식별 정보 중 적어도 하나를 포함할 수 있다.ISP projects a 360-degree image using 20 faces and 12 vertices, and sub-images according to each transformation can be arranged on the transformed image. Similar to OHP, the spatial structure information may include at least one of an ISP index, a face index, a vertex index, and compact identification information.

SSP는 360도 영상의 구체를 북극, 적도 및 남극의 3개 세그먼트로 구분하여 처리하는 것으로, 북극 및 남극은 인덱스로 식별되는 두 개의 원으로 각각 매핑되며, 두 극 세그먼트간 모서리는 회색의 비활성 샘플로 처리되고, 적도는 ERP와 동일한 투영법이 이용될 수 있다. 이에 따라, 공간적 구조 정보는 SSP 인덱스와, 각 적도, 북극 및 남극 세그먼트에 대응하는 면 인덱스를 포함할 수 있다.SSP divides the sphere of the 360-degree image into three segments, the North Pole, the Equator, and the South Pole, and processes them. , and the equator can use the same projection as ERP. Accordingly, the spatial structure information may include an SSP index and a face index corresponding to each of the equatorial, arctic and antarctic segments.

RSP는 360도 영상의 구체를 두개의 동일한 크기의 구획으로 분할하고, 2차원 변환 영상에 상기 분할된 영상을 펼쳐 두개의 행으로 배치하는 방식을 포함할 수 있다. 그리고, RSP는 CMP와 유사한 3X2종횡비로서 6개의 면을 이용하여 상기 배치를 구현할 수 있다. 이에 따라, 변환 영상에는 상단 세그먼트의 제1 구획 영상과 하단 세그먼트의 제2 구획 영상이 포함될 수 있다. 공간적 구조 정보는 RSP 인덱스와 구획 영상 인덱스 및 면 인덱스 중 적어도 하나가 포함될 수 있다.RSP may include a method of dividing a sphere of a 360-degree image into two equally sized sections and spreading the divided images on a 2D converted image and arranging them in two rows. In addition, the RSP may implement the above arrangement by using six surfaces with a 3X2 aspect ratio similar to that of the CMP. Accordingly, the transformed image may include the first segment image of the upper segment and the second segment image of the lower segment. Spatial structure information may include at least one of an RSP index, a segmentation image index, and a face index.

TSP는 360도 영상을 6개의 큐브면으로 투영한 프레임을 잘린 사각형 피라미드의 면에 대응하여 변형 투영하는 방식을 포함할 수 있다. 이에 따라, 각 면에 대응하는 서브 이미지의 크기 및 형태가 모두 상이할 수 있다. 공간적 구조 정보는 TSP 식별 정보 및 면 인덱스 중 적어도 하나가 포함될 수 있다.TSP may include a method of transforming and projecting a frame obtained by projecting a 360-degree image onto six cube planes corresponding to the planes of a truncated quadrangular pyramid. Accordingly, the sizes and shapes of the sub images corresponding to each side may be different. Spatial structure information may include at least one of TSP identification information and face index.

Viewport generation using rectilinear projection은 360도 영상을 시각(viewing angle)을 Z 축으로 하여 투영된 2차원 영상으로 변환 획득하는 것으로, 공간적 구조 정보는 Viewport generation using rectilinear projection 인덱스 정보와, 시점을 나타내는 시각 포트(Viewport) 정보를 더 포함할 수 있다.한편, 공간적 구조 정보는 상기 영상 변환에 있어서 적용될 보간 필터 정보를 더 포함할 수 있다. 예를 들어, 보간 필터 정보는 각 투영 변환 방식에 따라 상이할 수 있으며, 최인접 필터(nearest neighbor), 바이리니어 필터, 바이큐빅 필터, Lanczos 필터 중 적어도 하나를 포함할 수 있다.Viewport generation using rectilinear projection is obtained by converting a 360-degree image into a two-dimensional image projected with a viewing angle as the Z axis, and spatial structure information includes Viewport generation using rectilinear projection index information and a viewing port representing a viewpoint ( Viewport) information may be further included. Meanwhile, the spatial structure information may further include interpolation filter information to be applied in the image conversion. For example, the interpolation filter information may be different according to each projection transformation method, and may include at least one of a nearest neighbor filter, a bilinear filter, a bicubic filter, and a Lanczos filter.

한편, 전처리 변환 및 후처리 역변환의 처리 성능 평가를 위한 변환 방식 및 그 인덱스가 별도 정의될 수 있다. 예를 들어, 성능 평가는 전처리 장치(10)에서 전처리 방식을 결정하기 위해 이용될 수 있으며, 그 방식으로는 서로 다른 두 변환 영상을 CPP(Crasters Parablic Projection) 도메인으로 변환하여 PSNR 을 측정하는 CPP 방식이 예시될 수 있다.Meanwhile, a transformation scheme and its index for evaluating processing performance of pre-processing transformation and post-processing inverse transformation may be separately defined. For example, performance evaluation may be used to determine a preprocessing method in the preprocessor 10, and in that method, a CPP method that measures PSNR by converting two different transformed images into a Crasters Parablic Projection (CPP) domain. This can be exemplified.

다만, 도 4에 도시된 테이블은 입력 영상에 따라 임의적으로 배치된 것으로, 부호화 효율 및 시장의 컨텐츠 분포 등에 따라 변경될 수 있다.However, the table shown in FIG. 4 is arbitrarily arranged according to the input image, and may be changed according to encoding efficiency and content distribution in the market.

이에 따라, 복호화 장치(200)는 별도 시그널링되는 테이블 인덱스를 파싱하여, 복호화 처리에 이용할 수 있다.Accordingly, the decoding apparatus 200 may parse the separately signaled table index and use it for decoding processing.

특히, 본 발명의 실시 예에서 상기 각 레이아웃 정보는 영상의 일부 복호화에 유용하게 이용될 수 있다. 즉 CUBIC LAYOUT과 같은 서브 이미지 배치 정보는 독립적 서브 이미지와 의존적 서브 이미지를 구분하는데 이용 수 있으며 이에 따라 효율적인 부호화 및 복호화 스캐닝 순서를 결정하거나, 특정 시점에 대한 일부 복호화를 수행하는데 이용될 수도 있다.In particular, in an embodiment of the present invention, each layout information may be usefully used for partial decoding of an image. That is, subimage arrangement information such as CUBIC LAYOUT can be used to distinguish independent subimages from dependent subimages, and accordingly, it can be used to determine an efficient coding/decoding scanning order or to perform partial decoding at a specific point in time.

이하에서는 본 발명의 실시 예에 따른 프레임 패킹(Frame Packing)을 도 5 내지 도 9를 참조하여 설명하도록 한다.Hereinafter, frame packing according to an embodiment of the present invention will be described with reference to FIGS. 5 to 9 .

도 5를 참조하면, 360 비디오를 부호화 하기 위한 영상 포맷 기법 중 하나인 ISP 방식의 효율적인 투영면 재배치 기법이 제안되고 있다.Referring to FIG. 5, an efficient projection plane rearrangement technique of the ISP method, which is one of image format techniques for encoding 360 video, is proposed.

360도 비디오를 부호화 하기 위해서는 2D 영상으로의 투영(Projection)이 필요하다. 투영 방법에는 ERP(Equirectangular Projection) 등 다양한 포맷이 존재하며, 포맷에 투영면들의 재배치를 통해 비활성영역의 및 불연속 경계를 감소시킨다.In order to encode a 360-degree video, projection into a 2D image is required. There are various projection methods such as ERP (Equirectangular Projection), and the non-active area and discontinuous boundaries are reduced through rearrangement of projection planes in the format.

ISP는 구 영상을 한 면이 삼각형으로 이루어진 20면으로 투영하여 변환하는 투영 기법이다. 도 5과 같이 투영된 각 면은 정렬되어 2D 영상으로 표현할 수 있다. 정렬되지 않은 ISP는 일반적으로 그림 1(a)와 같이 표현할 수 있다. 일반적인 ISP에서 존재하는 비활성 영역과 투영면 간의 불연속성은 부호화 효율을 감소시키고 주관적 화질을 열화 시킨다.ISP is a projection technique that transforms a spherical image by projecting it onto 20 planes, one of which is triangular. As shown in FIG. 5 , each projected surface may be aligned and expressed as a 2D image. An unaligned ISP can generally be expressed as in Figure 1(a). The discontinuity between the non-active area and the projection plane existing in a general ISP reduces the coding efficiency and deteriorates the subjective picture quality.

이에 도 5와 같이 투영면을 정렬하여 비활성 영역을 제거하고 불연속성을 감소시킬 수 있다. 이를 CISP(Compact ISP)라 칭한다. 투영면 정렬 과정에서 몇몇 투영면들은 2조각으로 나뉘어 지거나, 좌우 혹은 상하로 뒤집혀 배치 된다. 만약 두 투영면이 20면체와 2D투영 영상 내에서 동시에 이웃해 있다면 불연속성이 발생하지 않는다. 하지만 두 투영면이 20면체에서는 이웃하지 않지만 2D 투영 영상에서 이웃하고 있다면 불연속성이 발생한다. 이를 두 투영면 사이에 여백을 주어 패딩 함으로써 부호화를 단순화 하고, 복호화 영상에 나타나는 시각적 아티팩트를 줄일 수 있다. 여백은 이웃한 투영면 경계의 가장 가까운 화소를 사용한 보간 화소로 채워지며, 이웃하는 투영면이 없을 경우 경계면의 화소를 복사하여 채워지게 된다. 현재 CISP는 수평으로 4개, 대각선으로 4개를 합한 총 8개의 불연속 경계면을 갖고 프레임 패킹이 된다. 이렇게 발생한 불연속 경계면은 CISP의 부호화 효율을 감소시키는 주된 원인이 된다.Accordingly, as shown in FIG. 5 , the inactive region may be removed and the discontinuity may be reduced by aligning the projection plane. This is called CISP (Compact ISP). In the process of arranging projection planes, some projection planes are divided into two pieces or placed upside down or upside down. If the two projection planes are simultaneously adjacent in the icosahedron and in the 2D projection image, no discontinuity occurs. However, a discontinuity occurs if the two projection planes are not adjacent in the icosahedron but are adjacent in the 2D projection image. By padding with a blank space between the two projection planes, coding can be simplified and visual artifacts appearing in the decoded image can be reduced. Blanks are filled with interpolation pixels using pixels closest to the boundary of the adjacent projection plane, and if there is no adjacent projection plane, pixels of the boundary are copied and filled. Currently, CISP is frame-packed with a total of 8 discontinuous boundaries, 4 horizontally and 4 diagonally. The discontinuous boundary generated in this way is the main cause of reducing the coding efficiency of CISP.

위 문제를 해결하기 위해 다양한 프레임 패킹 방법들이 제시되고 있다. 도5는 그 방법 중 하나로 주관적 화질을 고려하여 재정렬된 CISP가 제시된다.To solve the above problem, various frame packing methods have been proposed. As one of the methods, FIG. 5 presents a rearranged CISP in consideration of subjective picture quality.

그러나, 기존 CISP 방법에서 발생하는 불연속면은 이웃한 화소간의 상관성이 낮아 영상을 부호화 하는 과정에서 부호화 효율을 감소시킨다.However, the discontinuity generated in the existing CISP method reduces the coding efficiency in the process of encoding an image due to low correlation between neighboring pixels.

이를 해결하기 위해, 본 발명은 대각선 불연속면을 유사성이 높은 삼각형 사이에서 발생시켜 부호화 효율을 높일 수 있고, 또한 적도구역을 하나로 붙여 주관적 화질을 향상시킬 수 있는 효율적인 프레임 패킹 기법을 제시할 수 있다.To solve this problem, the present invention can provide an efficient frame packing technique that can increase coding efficiency by generating diagonal discontinuities between triangles with high similarity, and can improve subjective picture quality by attaching equatorial regions into one.

보다 구체적으로, 본 발명에서 제안하는 프레임 패킹 기법은 투영면들의 재정렬을 통해 대각선 방향의 불연속 경계면이 유사성이 높은 동일한 극영역에서 발생하게 한다. 또한 360영상의 주된 정보가 담긴 적도 영역의 삼각형들을 일렬로 배치하여, 적도부분 삼각형들 사이의 불연속성을 최소로 함으로써 주관적 화질 향상을 기대 할 수 있다.More specifically, the frame packing technique proposed in the present invention causes a discontinuous boundary in a diagonal direction to occur in the same polar region with high similarity through realignment of projection planes. In addition, by arranging the triangles in the equatorial region containing the main information of the 360 image in a row and minimizing the discontinuity between the triangles in the equatorial region, subjective image quality can be improved.

도 6은 이를 구현하기 위한 본 발명의 실시 예에 따른 프레임 패킹을 설명하기 위한 도면이다.6 is a diagram for explaining frame packing according to an embodiment of the present invention for implementing this.

보다 구체적으로, 본 발명은 ISP 포맷으로의 변환 단계에서 수행 될 수 있다. 전술한 바와 같이, CISP는 수평으로 4개, 대각선으로 4개를 합한 8개의 불연속면을 갖고 프레임 패킹이 진행되며, 하지만 이렇게 발생한 불연속면은 이웃한 투영면과 상관성이 떨어지는 화소로 부호화를 수행하기 때문에 비효율적인 부호화가 수행된다. 또한 360 영상이 큰 움직임을 갖는 경우, 투영면 재배치 과정에서 이웃한 투영면간 방향성이 다를 수 있다. 이는 영상 예측단계에서 서로간의 방향성이 유사하지 않기 때문에 부호화 효율 감소의 원인이 된다.More specifically, the present invention can be performed in the conversion step to the ISP format. As described above, CISP has 8 discontinuous surfaces, 4 horizontally and 4 diagonally, and frame packing proceeds. Encoding is performed. In addition, when a 360 image has a large motion, the directionality of adjacent projection planes may be different in the process of rearranging the projection plane. This causes a decrease in encoding efficiency because directions are not similar to each other in the image prediction step.

따라서, 다음과 같은 방법을 통해 이로 인한 부호화 효율 감소를 줄일 수 있다. 하기 프로세스들은 전처리부(10)의 영상 포맷 변환부(11)의 포맷 변환 프로세스에 의해 처리되는 것이 예시될 수 있다.Therefore, the reduction in encoding efficiency caused by this may be reduced through the following method. The following processes may be exemplified as being processed by the format conversion process of the image format conversion unit 11 of the pre-processing unit 10.

제1 실시 예에 따르면, 포맷 변환 프로세스는 극지방의 유사성이 높은 투영면 사이에서 대각선 불연속면이 발생하도록 투영면을 배치하는 프로세스를 포함할 수 있다.According to the first embodiment, the format conversion process may include a process of arranging projection planes such that a diagonal discontinuity is generated between projection planes having high polar similarity.

제2 실시 예에 따르면, 포맷 변환 프로세스는 이웃한 투영면간 유사한 방향성을 갖도록 투영면을 배치 하는 프로세스를 포함할 수 있다.According to the second embodiment, the format conversion process may include a process of arranging projection planes adjacent to each other to have similar directivity.

제3 실시 예에 따르면, 포맷 변환 프로세스는 투영면 사이의 인접 화소를 N화소 복사하여 불연속면 패딩 하는 프로세스를 포함할 수 있다.According to the third embodiment, the format conversion process may include a process of padding discontinuous areas by copying adjacent pixels between projection planes by N pixels.

제4 실시 예에 따르면, 포맷 변환 프로세스는 투영면 사이의 인접 화소를 N화소 보간하여 불연속면 패딩 하는 프로세스를 포함할 수 있다.According to the fourth embodiment, the format conversion process may include a process of padding discontinuous areas by performing N-pixel interpolation of adjacent pixels between projection planes.

보다 구체적으로, 도 6의 [D1] 단계에서는 360 영상 포맷 변환 단계; 360도 영상을 부호화 함에 있어 ERP, CMP, ACP, EAP, OHP, ISP, TSP, SSP, RSP 등 여러가지 포맷 중 적어도 한가지를 이용할 수 있다. 그 포맷들 중 ISP의 투영면을 정렬하여 비활성영역을 제거하고 투영면 간의 불연속성을 최소화 시킨 것을 CISP라 한다. 이를 투영면 재배치를 통해 기존 CISP에서 불연속면으로 인해 발생하는 부호화 효율 감소를 줄일 수 있다.More specifically, in step [D1] of FIG. 6, a 360 image format conversion step; At least one of various formats such as ERP, CMP, ACP, EAP, OHP, ISP, TSP, SSP, and RSP can be used to encode a 360-degree video. Among those formats, the one in which the projection plane of the ISP is aligned to remove the inactive area and the discontinuity between the projection planes is minimized is called CISP. Through rearrangement of the projection plane, the reduction in coding efficiency caused by the discontinuity plane in the existing CISP can be reduced.

여기서, [D1-1] 단계가 더 포함될 수 있으며, 이는 상기 투영면 재배치를 진행하는데 있어, 인접면의 유사성을 이용하여 재배치하는 단계를 포함할 수 있다.Here, a step [D1-1] may be further included, which may include a step of rearranging the projection plane by using similarity of adjacent planes in the rearrangement of the projection plane.

도 7은 CISP의 투영면을 재배치한 예시이며, 도 8은 투영면의 재배치를 통해 구성된 CISP 예시 및 불연속면을 나타낸다.FIG. 7 is an example of rearranging the projection plane of CISP, and FIG. 8 shows an example of CISP and a discontinuous plane configured through rearrangement of the projection plane.

[도 7]과 [도 8]의 초록색 선분과 빨간색 선분은 투영면간 불연속성을 의미할 수 있다.The green line segment and the red line segment in [FIG. 7] and [FIG. 8] may mean discontinuity between projection planes.

도 7및 도 8를 참조하면, 영상 포맷 변환부(11)는 극지방의 유사성이 높은 투영면 사이에서 대각선 불연속면이 발생하도록 투영면을 배치할 수 있다.Referring to FIGS. 7 and 8 , the image format conversion unit 11 may arrange the projection planes such that a diagonal discontinuity is generated between the projection planes having high polar similarity.

예를 들어, [도 7]은 투영면의 재배치를 통해 [도 8]로 프레임 패킹될 수 있다. 투영면의 재배치는 화소 간 상관성이 높은 극지방내에서 대각선 불연속면이 발생하도록 구현될 수 있다.For example, [Fig. 7] can be frame-packed into [Fig. 8] through rearrangement of the projection plane. The rearrangement of the projection plane can be implemented so that a diagonal discontinuity is generated in the pole region where the inter-pixel correlation is high.

또한, 영상 포맷 변환부(11)는 이웃한 투영면간 유사한 방향성을 갖도록 투영면을 배치할 수 있다.Also, the image format conversion unit 11 may arrange projection planes so that adjacent projection planes have similar directivity.

예를 들어, 영상이 큰 움직임을 갖는 영상일 경우, 영상 포맷 변환부(11)는 투영면의 재배치를 통해 투영면 간의 방향성을 유사하게 만들어 줄 수 있다.For example, when an image has great motion, the image format conversion unit 11 may make the projection planes have similar orientations through rearrangement of the projection planes.

예를 들어, [도 7]에서 14, 16번 투영면과 18번 투영면의 방향성을 유사하게 만들어 줄 수 있다.For example, in [Fig. 7], projection planes 14 and 16 and projection planes 18 may have similar orientations.

한편, 상기 [D1]에는 [D1-2] 단계가 더 포함될 수 있다. 이는 상기 재배치된 투영면들에 존재하는 불연속 경계에 대해서, N화소 패딩을 수행하는 단계를 포함할 수 있다.Meanwhile, the [D1] may further include steps [D1-2]. This may include performing N-pixel padding on discontinuous boundaries existing on the rearranged projection planes.

이 경우, 영상 포맷 변환부(11)는 투영면의 인접한 화소를 복사하여 패딩할 수 있다.In this case, the image format conversion unit 11 may copy and pad adjacent pixels of the projection plane.

예를 들어, 영상 포맷 변환부(11)는 투영면 간의 불연속 경계를 인접한 화소를 복사하여 N화소만큼 다양한 크기로 수행할 수 있다.For example, the image format conversion unit 11 may copy adjacent pixels of discontinuous boundaries between projection planes and perform the same in various sizes as many as N pixels.

또한, 영상 포맷 변환부(11)는 투영면의 인접한 화소를 보간법을 사용하여 패딩할 수 있다.Also, the image format conversion unit 11 may pad adjacent pixels of the projection plane using an interpolation method.

예를 들어, 영상 포맷 변환부(11)는 투영면 간의 불연속 경계를 선형 보간법 등 다양한 보간 방법으로 N화소만큼 다양한 방법으로 수행할 수 있다.For example, the image format conversion unit 11 may perform discontinuous boundaries between projection planes in various ways as many as N pixels using various interpolation methods such as linear interpolation.

제안한 기법을 HM16.16_360Lib4.0에 구현하여 CTC(Common Test Condition)에 따라 성능을 확인하였다. 제안하는 기법은 End to End(E2E) S-PSNR-NN, WS-PSNR, Codec level(CL) S-PSNR-NN, WS-PSNR[6]로 평가되었다. 객관적 화질에 있어 기존 CISP대비 E2E S-PSNR-NN, WS-PSNR, CL S-PSNR-NN, WS-PSNR에 대해 각각 1.0%, 1.0%, 1.55%, 0.75%의 BD-rate 감소를 확인 할 수 있었다.The proposed technique was implemented in HM16.16_360Lib4.0 and performance was verified according to CTC (Common Test Condition). The proposed method was evaluated using End to End (E2E) S-PSNR-NN, WS-PSNR, and Codec level (CL) S-PSNR-NN and WS-PSNR [6]. In terms of objective picture quality, BD-rate reductions of 1.0%, 1.0%, 1.55%, and 0.75% for E2E S-PSNR-NN, WS-PSNR, CL S-PSNR-NN, and WS-PSNR compared to existing CISP can be confirmed, respectively. could

도 9의 (a)는 기존 CISP의 화상 표시 영역이고, (b)는 제안 기법의 화상 표시 영역을 나타낸다. 주관적 화질에 있어 제안된 방법은 기존 CISP 대비 360비디오 정보의 주를 이루는 적도 영역에서 발생하는 시각적 아티팩트가 줄어든 것을 확인 할 수 있었다.(a) of FIG. 9 is an image display area of the existing CISP, and (b) shows an image display area of the proposed technique. In terms of subjective image quality, it was confirmed that the proposed method reduced visual artifacts in the equatorial region, which is the main source of 360 video information, compared to the existing CISP.

본 발명의 실시 예에 따르면, 이와 같은 투영면의 재정렬을 통해 유사성이 높은 투영면끼리 이웃하게 하여 프레임 패킹 하는 것이 부호화 효율 향상에 도움이 된다는 것을 확인할 수 있다. 실험결과 객관적 화질 평가에 있어 기존 CISP대비 End to End S-PSNR-NN, WS-PSNR, Codec level S-PSNR-NN, WS-PSNR에서 각각 1.0%, 1.0%, 1.55%, 0.75%의 BD-rate 감소를 보였다. 또한 주관적 화질에 있어 360비디오 정보의 주를 이루는 적도 영역의 시각적 아티팩트가 줄어든 것을 확인 할 수 있었다.According to an embodiment of the present invention, it can be confirmed that frame packing by rearranging projection planes having high similarity to each other and frame packing is helpful in improving coding efficiency. Experimental results: 1.0%, 1.0%, 1.55%, 0.75% of BD- rate decreased. In addition, it was confirmed that visual artifacts in the equatorial region, which is the main source of 360 video information, were reduced in terms of subjective picture quality.

도 10 내지 도 11은 본 발명의 실시 예에 따른 부호화 및 복호화 처리를 설명하기 위한 도면들이다.10 to 11 are diagrams for explaining encoding and decoding processes according to an embodiment of the present invention.

도 10은 본 발명의 일실시예에 따른 동영상 부호화 장치의 구성을 블록도로 도시한 것으로, 본 발명의 실시 예에 따른 가상 현실 영상의 각각의 서브 이미지 또는 전체 프레임을 입력 비디오 신호로서 입력받아 처리할 수 있다.10 is a block diagram showing the configuration of a video encoding apparatus according to an embodiment of the present invention, which receives and processes each sub-image or entire frame of a virtual reality video as an input video signal according to an embodiment of the present invention. can

도 10을 참조하면, 본 발명에 따른 동영상 부호화 장치(100)는 픽쳐 분할부(160), 변환부, 양자화부, 스캐닝부, 엔트로피 부호화부, 인트라 예측부(169), 인터 예측부(170), 역양자화부, 역변환부, 후처리부(171), 픽쳐 저장부(172), 감산부 및 가산부(168)를 포함한다.Referring to FIG. 10, a video encoding apparatus 100 according to the present invention includes a picture division unit 160, a transform unit, a quantization unit, a scanning unit, an entropy encoding unit, an intra prediction unit 169, and an inter prediction unit 170. , an inverse quantization unit, an inverse transform unit, a post-processing unit 171, a picture storage unit 172, a subtraction unit, and an addition unit 168.

픽쳐 분할부(160)는 입력되는 비디오 신호를 분석하여 픽쳐를 가장 큰 코딩 유닛(LCU:Largest Coding Unit)마다 소정 크기의 코딩 유닛으로 분할하여 예측 모드를 결정하고, 상기 코딩 유닛별로 예측 유닛의 크기를 결정한다.The picture divider 160 analyzes the input video signal, divides the picture into coding units of a predetermined size for each Largest Coding Unit (LCU), determines a prediction mode, and determines the size of the prediction unit for each coding unit. decide

그리고, 픽쳐 분할부(160)는 부호화할 예측 유닛을 예측 모드(또는 예측 방법)에 따라 인트라 예측부(169) 또는 인터 예측부(170)로 보낸다. 또한, 픽쳐 분할부(160)는 부호화할 예측 유닛을 감산부로 보낸다.Then, the picture division unit 160 sends the prediction unit to be encoded to the intra prediction unit 169 or the inter prediction unit 170 according to the prediction mode (or prediction method). Also, the picture division unit 160 sends a prediction unit to be encoded to the subtraction unit.

픽쳐는 복수의 슬라이스로 구성되고, 슬라이스는 복수개의 최대 부호화 단위(Largest coding unit: LCU)로 구성될 수 있다.A picture is composed of a plurality of slices, and a slice may be composed of a plurality of largest coding units (LCUs).

상기 LCU는 복수개의 부호화 단위(CU)로 분할될 수 있고, 부호기는 분할여부를 나타내는 정보(flag)를 비트스트림에 추가할 수 있다. 복호기는 LCU의 위치를 어드레스(LcuAddr)를 이용하여 인식할 수 있다.The LCU may be divided into a plurality of coding units (CUs), and an encoder may add information (flag) indicating whether division is performed to a bitstream. The decoder can recognize the location of the LCU using the address (LcuAddr).

분할이 허용되지 않는 경우의 부호화 단위(CU)는 예측 단위(Prediction unit: PU)로 간주되고, 복호기는 PU의 위치를 PU인덱스를 이용하여 인식할 수 있다.A coding unit (CU) when division is not allowed is regarded as a prediction unit (PU), and the decoder can recognize the location of the PU using the PU index.

예측 단위(PU)는 복수개의 파티션으로 나뉠 수 있다. 또한 예측 단위(PU)는 복수개의 변환 단위(Transform unit: TU)로 구성될 수 있다.A prediction unit (PU) may be divided into a plurality of partitions. In addition, a prediction unit (PU) may be composed of a plurality of transform units (TUs).

이 경우, 픽쳐 분할부(160)는 결정된 부호화 모드에 따른 소정 크기의 블록 단위(예를 들면, PU 단위 또는 TU 단위)로 영상 데이터를 감산부로 보낼 수 있다.In this case, the picture division unit 160 may send the image data to the subtraction unit in units of blocks (eg, PU units or TU units) of a predetermined size according to the determined coding mode.

동영상 부호화 단위로 CTB (Coding Tree Block)을 사용하며, 이 때 CTB는 다양한 정사각형 모양으로 정의된다. CTB는 코딩단위 CU(Coding Unit)라고 부른다. CTB (Coding Tree Block) is used as a video coding unit, and at this time, CTB is defined in various square shapes. CTB is called a coding unit CU (Coding Unit).

코딩단위(CU)는 분할에 따른 쿼드트리(Quad Tree)의 형태를 가질 수 있다. 또한, QTBT(Quadtree plus binary tree) 분할의 경우 코딩단위는 상기 쿼드트리 또는 단말 노드에서 이진 분할된 바이너리 트리(Binary Tree)의 형태를 가질 수 있으며, 부호화기의 표준의 따라 최대 크기가 256X256에서 64ㅧ64로 구성될 수 있다.A coding unit (CU) may have a form of a quad tree according to division. In addition, in the case of QTBT (Quadtree plus binary tree) division, the coding unit may have the form of a binary tree binary partitioned from the quad tree or terminal node, and the maximum size is 256X256 to 64x according to the standard of the encoder. It can consist of 64.

예를 들어 픽쳐 분할부(160)는 최대 크기가 64X64인 경우, 최대 코딩단위 LCU(Largest Coding Unit)일 때 깊이(Depth)를 0으로 하여 깊이가 3이 될 때까지, 즉 8ㅧ8크기의 코딩단위(CU)까지 재귀적(Recursive)으로 최적의 예측단위를 찾아 부호화를 수행한다. 또한, 예를 들어 QTBT로 분할된 단말 노드의 코딩 유닛에 대해, PU(Prediction Unit) 및 TU(Transform Unit)는 상기 분할된 코딩 유닛과 동일한 형태를 갖거나 더 분할된 형태를 가질 수 있다.For example, when the maximum size is 64X64, the picture divider 160 sets the depth to 0 when the maximum coding unit is LCU (Largest Coding Unit), and until the depth becomes 3, that is, 8 × 8 size. Coding is performed by finding an optimal prediction unit recursively up to the coding unit (CU). Also, for a coding unit of a terminal node divided into, for example, QTBT, a prediction unit (PU) and a transform unit (TU) may have the same form as the divided coding unit or may have a further divided form.

예측을 수행하는 예측단위는 PU(Prediction Unit)로 정의되며, 각 코딩단위(CU)는 다수개의 블록으로 분할된 단위의 예측이 수행되며, 정사각형과 직사각형의 형태로 나뉘어 예측을 수행한다. A prediction unit that performs prediction is defined as a PU (Prediction Unit), and each coding unit (CU) is divided into a plurality of blocks and prediction is performed, and prediction is performed by dividing into square and rectangular shapes.

변환부는 입력된 예측 유닛의 원본 블록과 인트라 예측부(169) 또는 인터 예측부(170)에서 생성된 예측 블록의 잔차신호인 잔차 블록을 변환한다. 상기 잔차 블록은 코딩 유닛 또는 예측 유닛으로 구성된다. 코딩 유닛 또는 예측 유닛으로 구성된 잔차 블록은 최적의 변환 단위(Transform Unit)로 분할되어 변환된다. 예측 모드(intra or inter)에 따라 서로 다른 변환 매트릭스가 결정될 수 있다. 또한, 인트라 예측의 잔차 신호는 인트라 예측 모드에 따라 방향성을 가지므로 인트라 예측 모드에 따라 적응적으로 변환 매트릭스가 결정될 수 있다.The conversion unit converts the original block of the input prediction unit and the residual block, which is a residual signal of the prediction block generated by the intra prediction unit 169 or the inter prediction unit 170. The residual block is composed of a coding unit or a prediction unit. A residual block composed of coding units or prediction units is divided into optimal transform units and transformed. Different transformation matrices may be determined according to prediction modes (intra or inter). Also, since the residual signal of intra prediction has a direction according to the intra prediction mode, a transform matrix may be adaptively determined according to the intra prediction mode.

변환 단위는 2개(수평, 수직)의 1차원 변환 매트릭스에 의해 변환될 수 있다. 예를 들어, 인터 예측의 경우에는 미리 결정된 1개의 변환 매트릭스가 결정된다.A transformation unit can be transformed by two (horizontal, vertical) one-dimensional transformation matrices. For example, in the case of inter prediction, one predetermined transformation matrix is determined.

반면에, 인트라 예측의 경우, 인트라 예측 모드가 수평인 경우에는 잔차 블록이 수직방향으로의 방향성을 가질 확률이 높아지므로, 수직방향으로는 DCT 기반의 정수 매트릭스를 적용하고, 수평방향으로는 DST 기반 또는 KLT 기반의 정수 매트릭스를 적용한다. 인트라 예측 모드가 수직인 경우에는 수직방향으로는 DST 기반 또는 KLT 기반의 정수 매트릭스를, 수평 방향으로는 DCT 기반의 정수 매트릭스를 적용한다.On the other hand, in the case of intra prediction, when the intra prediction mode is horizontal, since the probability that the residual block has a vertical direction increases, a DCT-based integer matrix is applied in the vertical direction and a DST-based integer matrix is applied in the horizontal direction. Alternatively, a KLT-based integer matrix is applied. When the intra prediction mode is vertical, a DST-based or KLT-based integer matrix is applied in the vertical direction, and a DCT-based integer matrix is applied in the horizontal direction.

DC 모드의 경우에는 양방향 모두 DCT 기반 정수 매트릭스를 적용한다. 또한, 인트라 예측의 경우, 변환 단위의 크기에 의존하여 변환 매트릭스가 적응적으로 결정될 수도 있다.In case of DC mode, DCT-based integer matrix is applied to both directions. Also, in the case of intra prediction, a transform matrix may be adaptively determined depending on the size of a transform unit.

양자화부는 상기 변환 매트릭스에 의해 변환된 잔차 블록의 계수들을 양자화하기 위한 양자화 스텝 사이즈를 결정한다. 양자화 스텝 사이즈는 미리 정해진 크기 이상의 부호화 단위(이하, 양자화 유닛이라 함)별로 결정된다.The quantization unit determines a quantization step size for quantizing coefficients of the residual block transformed by the transform matrix. The quantization step size is determined for each coding unit (hereinafter, referred to as a quantization unit) equal to or larger than a predetermined size.

상기 미리 정해진 크기는 8x8 또는 16x16일 수 있다. 그리고, 결정된 양자화 스텝 사이즈 및 예측 모드에 따라 결정되는 양자화 매트릭스를 이용하여 상기 변환 블록의 계수들을 양자화한다.The predetermined size may be 8x8 or 16x16. Coefficients of the transform block are quantized using a quantization matrix determined according to the determined quantization step size and prediction mode.

양자화부는 현재 양자화 유닛의 양자화 스텝 사이즈 예측자로서 현재 양자화 유닛에 인접한 양자화 유닛의 양자화 스텝 사이즈를 이용한다.The quantization unit uses a quantization step size of a quantization unit adjacent to the current quantization unit as a quantization step size predictor of the current quantization unit.

양자화부는 현재 양자화 유닛의 좌측 양자화 유닛, 상측 양자화 유닛, 좌상측 양자화 유닛 순서로 검색하여 1개 또는 2개의 유효한 양자화 스텝 사이즈를 이용하여 현재 양자화 유닛의 양자화 스텝 사이즈 예측자를 생성할 수 있다.The quantization unit may generate a quantization step size predictor of the current quantization unit by using one or two effective quantization step sizes by searching in the order of the left quantization unit, the upper quantization unit, and the upper left quantization unit of the current quantization unit.

예를 들어, 상기 순서로 검색된 유효한 첫번째 양자화 스텝 사이즈를 양자화 스텝 사이즈 예측자로 결정할 수 있다. 또한, 상기 순서로 검색된 유효한 2개의 양자화 스텝 사이즈의 평균값을 양자화 스텝 사이즈 예측자로 결정할 수도 있고, 1개만이 유효한 경우에는 이를 양자화 스텝 사이즈 예측자로 결정할 수 있다.For example, the first effective quantization step size retrieved in the above order may be determined as a quantization step size predictor. In addition, an average value of two valid quantization step sizes retrieved in the above order may be determined as a quantization step size predictor, and if only one is valid, this may be determined as a quantization step size predictor.

상기 양자화 스텝 사이즈 예측자가 결정되면, 현재 부호화 단위의 양자화 스텝 사이즈와 상기 양자화 스텝 사이즈 예측자 사이의 차분값을 엔트로피 부호화부로 전송한다.When the quantization step size predictor is determined, a difference between the quantization step size of the current coding unit and the quantization step size predictor is transmitted to the entropy encoder.

한편, 현재 코딩 유닛의 좌측 코딩 유닛, 상측 코딩 유닛, 좌상측 코딩 유닛 모두가 존재하지 않을 가능성이 있다. 반면에 최대 코딩 유닛 내의 부호화 순서 상으로 이전에 존재하는 코딩 유닛이 존재할 수 있다.Meanwhile, there is a possibility that all of the left coding unit, the above coding unit, and the top left coding unit of the current coding unit do not exist. On the other hand, a previously existing coding unit may exist in the coding order within the largest coding unit.

따라서, 현재 코딩 유닛에 인접한 양자화 유닛들과 상기 최대 코딩 유닛 내에서는 부호화 순서상 바로 이전의 양자화 유닛의 양자화 스텝 사이즈가 후보자가 될 수 있다.Therefore, in quantization units adjacent to the current coding unit and the maximum coding unit, a quantization step size of a quantization unit immediately preceding in coding order may be a candidate.

이 경우, 1) 현재 코딩 유닛의 좌측 양자화 유닛, 2) 현재 코딩 유닛의 상측 양자화 유닛, 3) 현재 코딩 유닛의 좌상측 양자화 유닛, 4) 부호화 순서상 바로 이전의 양자화 유닛 순서로 우선순위를 둘 수 있다. 상기 순서는 바뀔 수 있고, 상기 좌상측 양자화 유닛은 생략될 수도 있다.In this case, priority is given to 1) the left quantization unit of the current coding unit, 2) the upper quantization unit of the current coding unit, 3) the upper left quantization unit of the current coding unit, and 4) the order of the immediately preceding quantization unit in the coding order. can The order may be reversed, and the upper left quantization unit may be omitted.

상기 양자화된 변환 블록은 역양자화부와 스캐닝부로 제공된다.The quantized transform block is provided to an inverse quantization unit and a scanning unit.

스캐닝부는 양자화된 변환 블록의 계수들을 스캐닝하여 1차원의 양자화 계수들로 변환한다. 양자화 후의 변환 블록의 계수 분포가 인트라 예측 모드에 의존적일 수 있으므로, 스캐닝 방식은 인트라 예측 모드에 따라 결정된다.The scanning unit scans the coefficients of the quantized transform block and transforms them into one-dimensional quantization coefficients. Since the coefficient distribution of the transform block after quantization may depend on the intra prediction mode, the scanning method is determined according to the intra prediction mode.

또한, 계수 스캐닝 방식은 변환 단위의 크기에 따라 달리 결정될 수도 있다. 상기 스캔 패턴은 방향성 인트라 예측 모드에 따라 달라질 수 있다. 양자화 계수들의 스캔순서는 역방향으로 스캔한다.In addition, the coefficient scanning method may be determined differently according to the size of the transform unit. The scan pattern may vary according to a directional intra prediction mode. The scanning order of quantization coefficients is scanned in the reverse direction.

상기 양자화된 계수들이 복수개의 서브셋으로 분할된 경우에는 각각의 서브셋 내의 양자화 계수들에 동일한 스캔패턴을 적용한다. 서브셋 간의 스캔패턴은 지그재그 스캔 또는 대각선 스캔을 적용한다. 스캔 패턴은 DC를 포함하는 메인 서브셋으로부터 순방향으로 잔여 서브셋들로 스캔하는 것이 바람직하나, 그 역방향도 가능하다.When the quantized coefficients are divided into a plurality of subsets, the same scan pattern is applied to the quantized coefficients in each subset. A zigzag scan or a diagonal scan is applied to the scan pattern between the subsets. The scan pattern preferably scans from the main subset including DC to the remaining subsets in a forward direction, but the reverse direction is also possible.

또한, 서브셋 내의 양자화된 계수들의 스캔패턴과 동일하게 서브셋 간의 스캔패턴을 설정할 수도 있다. 이 경우, 서브셋 간의 스캔패턴이 인트라 예측 모드에 따라 결정된다. 한편, 부호기는 상기 변환 유닛내의 0이 아닌 마지막 양자화 계수의 위치를 나타낼 수 있는 정보를 복호기로 전송한다.In addition, a scan pattern between subsets may be set identically to a scan pattern of quantized coefficients within the subset. In this case, a scan pattern between subsets is determined according to an intra prediction mode. Meanwhile, the encoder transmits information indicating the position of the last non-zero quantization coefficient in the transform unit to the decoder.

각 서브셋 내의 0이 아닌 마지막 양자화 계수의 위치를 나타낼 수 있는 정보도 복호기로 전송할 수 있다.Information indicating the position of the last non-zero quantization coefficient in each subset may also be transmitted to the decoder.

역양자화(135)는 상기 양자화된 양자화 계수를 역양자화한다. 역변환부는 역양자화된 변환 계수를 공간 영역의 잔차 블록으로 복원한다. 가산기는 상기 역변환부에 의해 복원된 잔차블록과 인트라 예측부(169) 또는 인터 예측부(170)로부터의 수신된 예측 블록을 합쳐서 복원 블록을 생성한다.Inverse quantization 135 inverse quantizes the quantized quantization coefficients. The inverse transform unit restores the inverse quantized transform coefficient into a residual block in the spatial domain. The adder generates a reconstructed block by adding the residual block reconstructed by the inverse transform unit and the prediction block received from the intra predictor 169 or the inter predictor 170.

후처리부(171)는 복원된 픽쳐에 발생하는 블록킹 효과의 제거하기 위한 디블록킹 필터링 과정, 화소 단위로 원본 영상과의 차이값을 보완하기 위한 적응적 오프셋 적용 과정 및 코딩 유닛으로 원본 영상과의 차이값을 보완하기 위한 적응적 루프 필터링 과정을 수행한다.The post-processing unit 171 performs a deblocking filtering process to remove a blocking effect occurring in the reconstructed picture, an adaptive offset application process to supplement the difference value with the original video in pixel units, and a difference from the original video using a coding unit. Performs an adaptive loop filtering process to complement the value.

디블록킹 필터링 과정은 미리 정해진 크기 이상의 크기를 갖는 예측 유닛 및 변환 단위의 경계에 적용하는 것이 바람직하다. 상기 크기는 8x8일 수 있다. 상기 디블록킹 필터링 과정은 필터링할 경계(boundary)를 결정하는 단계, 상기 경계에 적용할 경계 필터링 강도(bounary filtering strength)를 결정하는 단계, 디블록킹 필터의 적용 여부를 결정하는 단계, 상기 디블록킹 필터를 적용할 것으로 결정된 경우, 상기 경계에 적용할 필터를 선택하는 단계를 포함한다.It is preferable to apply the deblocking filtering process to a boundary between a prediction unit and a transformation unit having a size greater than or equal to a predetermined size. The size may be 8x8. The deblocking filtering process includes determining a boundary to be filtered, determining a boundary filtering strength to be applied to the boundary, determining whether to apply a deblocking filter, and the deblocking filter. and selecting a filter to be applied to the boundary when it is determined to apply.

상기 디블록킹 필터의 적용 여부는 i) 상기 경계 필터링 강도가 0보다 큰지 여부 및 ii) 상기 필터링할 경계에 인접한 2개의 블록(P 블록, Q블록) 경계 부분에서의 화소값들이 변화 정도를 나타내는 값이 양자화 파라미터에 의해 결정되는 제1 기준값보다 작은지 여부에 의해 결정된다.Whether or not the deblocking filter is applied is i) whether the boundary filtering strength is greater than 0 and ii) a value representing the degree of change in pixel values at the boundary of two blocks (P block, Q block) adjacent to the boundary to be filtered. It is determined by whether it is smaller than the first reference value determined by this quantization parameter.

상기 필터는 적어도 2개 이상인 것이 바람직하다. 블록 경계에 위치한 2개의 화소들간의 차이값의 절대값이 제2 기준값보다 크거나 같은 경우에는 상대적으로 약한 필터링을 수행하는 필터를 선택한다.It is preferable that the said filter is at least two or more. When the absolute value of the difference between two pixels located at the block boundary is greater than or equal to the second reference value, a filter that performs relatively weak filtering is selected.

상기 제2 기준값은 상기 양자화 파라미터 및 상기 경계 필터링 강도에 의해 결정된다.The second reference value is determined by the quantization parameter and the boundary filtering strength.

적응적 오프셋 적용 과정은 디블록킹 필터가 적용된 영상내의 화소와 원본 화소간의 차이값(distortion)을 감소시키기 위한 것이다. 픽쳐 또는 슬라이스 단위로 상기 적응적 오프셋 적용 과정을 수행할지 여부를 결정할 수 있다.The process of applying the adaptive offset is to reduce a distortion between a pixel in an image to which a deblocking filter is applied and an original pixel. It may be determined whether to perform the adaptive offset application process in units of pictures or slices.

픽쳐 또는 슬라이스는 복수개의 오프셋 영역들로 분할될 수 있고, 각 오프셋 영역별로 오프셋 타입이 결정될 수 있다. 오프셋 타입은 미리 정해진 개수(예를 들어, 4개)의 에지 오프셋 타입과 2개의 밴드 오프셋 타입을 포함할 수 있다.A picture or slice may be divided into a plurality of offset regions, and an offset type may be determined for each offset region. Offset types may include a predetermined number (eg, 4) of edge offset types and two band offset types.

오프셋 타입이 에지 오프셋 타입일 경우에는 각 화소가 속하는 에지 타입을 결정하여, 이에 대응하는 오프셋을 적용한다. 상기 에지 타입은 현재 화소와 인접하는 2개의 화소값의 분포를 기준으로 결정한다.If the offset type is an edge offset type, an edge type to which each pixel belongs is determined, and an offset corresponding thereto is applied. The edge type is determined based on the distribution of values of two pixels adjacent to the current pixel.

적응적 루프 필터링 과정은 디블록킹 필터링 과정 또는 적응적 오프셋 적용 과정을 거친 복원된 영상과 원본 영상을 비교한 값을 기초로 필터링을 수행할 수 있다. 적응적 루프 필터링은 상기 결정된 ALF는 4x4 크기 또는 8x8 크기의 블록에 포함된 화소 전체에 적용될 수 있다.The adaptive loop filtering process may perform filtering based on a value obtained by comparing a reconstructed image that has undergone a deblocking filtering process or an adaptive offset application process with an original video. Adaptive loop filtering may apply the determined ALF to all pixels included in a 4x4 or 8x8 block.

적응적 루프 필터의 적용 여부는 코딩 유닛별로 결정될 수 있다. 각 코딩 유닛에 따라 적용될 루프 필터의 크기 및 계수는 달라질 수 있다. 코딩 유닛별 상기 적응적 루프 필터의 적용 여부를 나타내는 정보는 각 슬라이스 헤더에 포함될 수 있다.Whether to apply the adaptive loop filter may be determined for each coding unit. The size and coefficient of the loop filter to be applied may vary according to each coding unit. Information indicating whether the adaptive loop filter is applied for each coding unit may be included in each slice header.

*색차 신호의 경우에는, 픽쳐 단위로 적응적 루프 필터의 적용 여부를 결정할 수 있다. 루프 필터의 형태도 휘도와 달리 직사각형 형태를 가질 수 있다.* In the case of a color difference signal, whether to apply an adaptive loop filter may be determined on a picture-by-picture basis. Unlike the luminance, the shape of the loop filter may also have a rectangular shape.

적응적 루프 필터링은 슬라이스별로 적용 여부를 결정할 수 있다. 따라서, 현재 슬라이스에 적응적 루프 필터링이 적용되는지 여부를 나타내는 정보는 슬라이스 헤더 또는 픽쳐 헤더에 포함된다.Whether adaptive loop filtering is applied may be determined for each slice. Accordingly, information indicating whether adaptive loop filtering is applied to the current slice is included in the slice header or picture header.

현재 슬라이스에 적응적 루프 필터링이 적용됨을 나타내면, 슬라이스 헤더 또는 픽쳐 헤더는 추가적으로 적응적 루프 필터링 과정에 사용되는 휘도 성분의 수평 및/또는 수직 방향의 필터 길이를 나타내는 정보를 포함한다.If it indicates that adaptive loop filtering is applied to the current slice, the slice header or picture header additionally includes information indicating the horizontal and/or vertical filter length of the luminance component used in the adaptive loop filtering process.

슬라이스 헤더 또는 픽쳐 헤더는 필터 세트의 수를 나타내는 정보를 포함할 수 있다. 이때 필터 세트의 수가 2 이상이면, 필터 계수들이 예측 방법을 사용하여 부호화될 수 있다. 따라서, 슬라이스 헤더 또는 픽쳐 헤더는 필터 계수들이 예측 방법으로 부호화되는지 여부를 나타내는 정보를 포함할 수 있으며, 예측 방법이 사용되는 경우에는 예측된 필터 계수를 포함한다.A slice header or a picture header may include information indicating the number of filter sets. In this case, if the number of filter sets is 2 or more, filter coefficients may be coded using a prediction method. Accordingly, a slice header or a picture header may include information indicating whether filter coefficients are coded using a prediction method, and includes predicted filter coefficients when the prediction method is used.

한편, 휘도 뿐만 아니라, 색차 성분들도 적응적으로 필터링될 수 있다. 따라서, 색차 성분 각각이 필터링되는지 여부를 나타내는 정보를 슬라이스 헤더 또는 픽쳐 헤더가 포함할 수 있다. 이 경우, 비트수를 줄이기 위해 Cr과 Cb에 대한 필터링 여부를 나타내는 정보를 조인트 코딩(즉, 다중화 코딩)할 수 있다.Meanwhile, color difference components as well as luminance may be adaptively filtered. Accordingly, the slice header or the picture header may include information indicating whether each chrominance component is filtered. In this case, in order to reduce the number of bits, joint coding (ie, multiplex coding) of information indicating whether Cr and Cb are filtered may be performed.

이때, 색차 성분들의 경우에는 복잡도 감소를 위해 Cr과 Cb를 모두 필터링하지 않는 경우가 가장 빈번할 가능성이 높으므로, Cr과 Cb를 모두 필터링하지 않는 경우에 가장 작은 인덱스를 할당하여 엔트로피 부호화를 수행한다.At this time, in the case of color difference components, since it is most likely that both Cr and Cb are not filtered to reduce complexity, entropy encoding is performed by assigning the smallest index when both Cr and Cb are not filtered. .

그리고, Cr 및 Cb를 모두 필터링하는 경우에 가장 큰 인덱스를 할당하여 엔트로피 부호화를 수행한다.And, when both Cr and Cb are filtered, entropy encoding is performed by assigning the largest index.

픽쳐 저장부(172)는 후처리된 영상 데이터를 후처리부(171)로부터 입력받아 픽쳐(picture) 단위로 영상을 복원하여 저장한다. 픽쳐는 프레임 단위의 영상이거나 필드 단위의 영상일 수 있다. 픽쳐 저장부(172)는 다수의 픽쳐를 저장할 수 있는 버퍼(도시되지 않음)를 구비한다.The picture storage unit 172 receives post-processed image data from the post-processing unit 171 and restores and stores the image in units of pictures. A picture may be an image in frame units or an image in field units. The picture storage unit 172 includes a buffer (not shown) capable of storing a plurality of pictures.

인터 예측부(170)는 상기 픽쳐 저장부(172)에 저장된 적어도 하나 이상의 참조 픽쳐를 이용하여 움직임 추정을 수행하고, 참조 픽쳐를 나타내는 참조 픽쳐 인덱스 및 움직임 벡터를 결정한다.The inter prediction unit 170 performs motion estimation using at least one reference picture stored in the picture storage unit 172, and determines a reference picture index and a motion vector representing the reference picture.

그리고, 결정된 참조 픽쳐 인덱스 및 움직임 벡터에 따라, 픽쳐 저장부(172)에 저장된 다수의 참조 픽쳐들 중 움직임 추정에 이용된 참조 픽쳐로부터, 부호화하고자 하는 예측 유닛에 대응하는 예측 블록을 추출하여 출력한다.Then, a prediction block corresponding to a prediction unit to be encoded is extracted and output from a reference picture used for motion estimation among a plurality of reference pictures stored in the picture storage unit 172 according to the determined reference picture index and motion vector. .

인트라 예측부(169)는 현재 예측 유닛이 포함되는 픽처 내부의 재구성된 화소값을 이용하여 인트라 예측 부호화를 수행한다.The intra prediction unit 169 performs intra prediction encoding using reconstructed pixel values within a picture including a current prediction unit.

인트라 예측부(169)는 예측 부호화할 현재 예측 유닛을 입력받아 현재 블록의 크기에 따라 미리 설정된 개수의 인트라 예측 모드 중에 하나를 선택하여 인트라 예측을 수행한다.The intra prediction unit 169 receives a current prediction unit to be predictively encoded and performs intra prediction by selecting one of a preset number of intra prediction modes according to the size of a current block.

인트라 예측부(169)는 인트라 예측 블록을 생성하기 위해 참조 화소를 적응적으로 필터링한다. 참조 화소가 이용 가능하지 않은 경우에는 이용 가능한 참조 화소들을 이용하여 참조 화소들을 생성할 수 있다.The intra prediction unit 169 adaptively filters reference pixels to generate an intra prediction block. When reference pixels are not available, reference pixels may be generated using available reference pixels.

엔트로피 부호화부는 양자화부에 의해 양자화된 양자화 계수, 인트라 예측부(169)로부터 수신된 인트라 예측 정보, 인터 예측부(170)로부터 수신된 움직임 정보 등을 엔트로피 부호화한다.The entropy encoding unit entropy-codes the quantized coefficients quantized by the quantization unit, the intra prediction information received from the intra prediction unit 169, and the motion information received from the inter prediction unit 170.

도시되지는 않았으나, 인터 예측 부호화 장치는 움직임 정보 결정부, 움직임 정보 부호화 모드 결정부, 움직임 정보 부호화부, 예측 블록 생성부, 잔차 블록 생성부, 잔차 블록 부호화부 및 멀티플렉서를 포함하여 구성될 수 있다.Although not shown, the inter prediction encoding apparatus may include a motion information determining unit, a motion information encoding mode determining unit, a motion information encoding unit, a prediction block generator, a residual block generator, a residual block encoder, and a multiplexer. .

움직임 정보 결정부는 현재 블록의 움직임 정보를 결정한다. 움직임 정보는 참조 픽쳐 인덱스와 움직임 벡터를 포함한다. 참조 픽쳐 인덱스는 이전에 부호화되어 복원된 픽쳐 중 어느 하나를 나타낸다.The motion information determining unit determines motion information of the current block. Motion information includes a reference picture index and a motion vector. The reference picture index indicates one of previously encoded and reconstructed pictures.

현재 블록이 단방향 인터 예측 부호화되는 경우에는 리스트 0(L0)에 속하는 참조 픽쳐들 중의 어느 하나를 나타낸다. 반면에, 현재 블록이 양방향 예측 부호화되는 경우에는 리스트 0(L0)의 참조 픽쳐들 중 하나를 나타내는 참조픽쳐 인덱스와 리스트 1(L1)의 참조 픽쳐들 중의 하나를 나타내는 참조픽쳐 인덱스를 포함할 수 있다.When the current block is unidirectional inter-prediction coded, it represents one of reference pictures belonging to list 0 (L0). On the other hand, if the current block is subjected to bidirectional predictive coding, it may include a reference picture index indicating one of the reference pictures of list 0 (L0) and a reference picture index indicating one of the reference pictures of list 1 (L1). .

또한, 현재 블록이 양방향 예측 부호화되는 경우에는 리스트 0과 리스트 1을 결합하여 생성된 복합 리스트(LC)의 참조 픽쳐들 중의 1개 또는 2개의 픽쳐를 나타내는 인덱스를 포함할 수 있다.In addition, when the current block is bi-directionally predicted-coded, it may include indices indicating one or two pictures among reference pictures of a composite list (LC) generated by combining list 0 and list 1.

*움직임 벡터는 각각의 참조픽쳐 인덱스가 나타내는 픽쳐 내의 예측 블록의 위치를 나타낸다. 움직임 벡터는 화소단위(정수단위)일수도 있으나, 서브화소단위일 수도 있다.* A motion vector represents the position of a prediction block within a picture indicated by each reference picture index. The motion vector may be a pixel unit (integer unit) or may be a sub-pixel unit.

예를 들어, 1/2, 1/4, 1/8 또는 1/16 화소의 해상도를 가질 수 있다. 움직임 벡터가 정수단위가 아닐 경우에는 예측 블록은 정수 단위의 화소들로부터 생성된다.For example, it may have a resolution of 1/2, 1/4, 1/8 or 1/16 pixels. When the motion vector is not an integer unit, a prediction block is generated from pixels of an integer unit.

움직임 정보 부호화 모드 결정부는 현재 블록의 움직임 정보를 스킵 모드로 부호화할지, 머지 모드로 부호화할지, AMVP 모드로 부호화할지를 결정한다.The motion information encoding mode determining unit determines whether to encode the motion information of the current block in skip mode, merge mode, or AMVP mode.

스킵 모드는 현재 블록의 움직임 정보와 동일한 움직임 정보를 갖는 스킵 후보자가 존재하고, 잔차신호가 0인 경우에 적용된다. 또한, 스킵 모드는 현재 블록이 코딩 유닛과 사이즈가 같을 때 적용된다. 현재 블록은 예측 유닛으로 볼 수 있다.The skip mode is applied when a skip candidate having the same motion information as that of the current block exists and the residual signal is 0. Also, the skip mode is applied when the current block has the same size as the coding unit. The current block can be viewed as a prediction unit.

머지 모드는 현재 블록의 움직임 정보와 동일한 움직임 정보를 갖는 머지 후보자가 존재할 때 적용된다. 머지 모드는 현재 블록이 코딩 유닛과 사이즈가 다르거나, 사이즈가 같을 경우에는 잔차 신호가 존재하는 경우에 적용된다. 머지 후보자와 스킵 후보자는 동일할 수 있다.The merge mode is applied when a merge candidate having the same motion information as that of the current block exists. The merge mode is applied when a residual signal exists when the size of the current block is different from that of the coding unit or when the size is the same as that of the coding unit. Merge candidates and skip candidates may be the same.

AMVP 모드는 스킵 모드 및 머지 모드가 적용되지 않을 때 적용된다. 현재 블록의 움직임 벡터와 가장 유사한 움직임 벡터를 갖는 AMVP 후보자를 AMVP 예측자로 선택한다.AMVP mode is applied when skip mode and merge mode do not apply. An AMVP candidate having a motion vector most similar to that of the current block is selected as an AMVP predictor.

움직임 정보 부호화부는 움직임 정보 부호화 모드 결정부에 의해 결정된 방식에 따라 움직임 정보를 부호화한다. 움직임 정보 부호화 모드가 스킵 모드 또는 머지 모드일 경우에는 머지 움직임 벡터 부호화 과정을 수행한다. 움직임 정보 부호화 모드가 AMVP일 경우에는 AMVP 부호화 과정을 수행한다.The motion information encoding unit encodes the motion information according to the method determined by the motion information encoding mode determination unit. When the motion information encoding mode is a skip mode or a merge mode, a merge motion vector encoding process is performed. When the motion information encoding mode is AMVP, an AMVP encoding process is performed.

예측 블록 생성부는 현재 블록의 움직임 정보를 이용하여 예측 블록을 생성한다. 움직임 벡터가 정수 단위일 경우에는, 참조픽쳐 인덱스가 나타내는 픽쳐 내의 움직임 벡터가 나타내는 위치에 대응하는 블록을 복사하여 현재 블록의 예측 블록을 생성한다.The prediction block generation unit generates a prediction block using motion information of the current block. When the motion vector is an integer unit, a block corresponding to a position indicated by the motion vector in the picture indicated by the reference picture index is copied to generate a prediction block of the current block.

그러나, 움직임 벡터가 정수 단위가 아닐 경우에는, 참조픽쳐 인덱스가 나타내는 픽쳐내의 정수 단위 화소들로 부터 예측 블록의 화소들을 생성한다.However, when the motion vector is not an integer unit, pixels of a prediction block are generated from integer unit pixels in a picture indicated by a reference picture index.

이 경우, 휘도 화소의 경우에는 8탭의 보간 필터를 사용하여 예측 화소를 생성할 수 있다. 색차 화소의 경우에는 4탭 보간 필터를 사용하여 예측 화소를 생성할 수 있다.In this case, in the case of a luminance pixel, a prediction pixel may be generated using an 8-tap interpolation filter. In the case of color difference pixels, prediction pixels may be generated using a 4-tap interpolation filter.

잔차 블록 생성부는 현재 블록과 현재 블록의 예측 블록을 이용하여 잔차 블록을 생성한다. 현재 블록의 크기가 2Nx2N인 경우에는 현재 블록과 현재 블록에 대응하는 2Nx2N 크기의 예측 블록을 이용하여 잔차 블록을 생성한다.The residual block generator generates a residual block using a current block and a prediction block of the current block. When the size of the current block is 2Nx2N, a residual block is generated using the current block and a prediction block having a size of 2Nx2N corresponding to the current block.

그러나, 예측에 이용되는 현재 블록의 크기가 2NxN 또는 Nx2N인 경우에는 2Nx2N을 구성하는 2개의 2NxN 블록 각각에 대한 예측 블록을 구한 후, 상기 2개의 2NxN 예측 블록을 이용하여 2Nx2N 크기의 최종 예측 블록을 생성할 수 있다.However, when the size of the current block used for prediction is 2NxN or Nx2N, a prediction block for each of two 2NxN blocks constituting 2Nx2N is obtained, and then a final prediction block of size 2Nx2N is obtained using the two 2NxN prediction blocks. can create

그리고, 상기 2Nx2N 크기의 예측 블록을 이용하여 2Nx2N 의 잔차 블록을 생성할 수도 있다. 2NxN 크기의 2개의 예측블록들의 경계부분의 불연속성을 해소하기 위해 경계 부분의 픽셀들을 오버랩 스무딩할 수 있다.Also, a 2Nx2N residual block may be generated using the 2Nx2N prediction block. In order to solve the discontinuity of the boundary of two prediction blocks of size 2NxN, pixels of the boundary may be overlapped and smoothed.

잔차 블록 부호화부는 생성된 잔차 블록을 하나 이상의 변환 유닛으로 나눈다. 그리고, 각 변환 유닛을 변환 부호화, 양자화 및 엔트로피 부호화된다. 이때, 변환 유닛의 크기는 잔차 블록의 크기에 따라 쿼드트리 방식으로 결정될 수 있다.The residual block encoder divides the generated residual block into one or more transform units. Then, each transform unit is transform-encoded, quantized, and entropy-encoded. In this case, the size of the transform unit may be determined according to the quadtree method according to the size of the residual block.

잔차 블록 부호화부는 인터 예측 방법에 의해 생성된 잔차 블록을 정수기반 변환 매트릭스를 이용하여 변환한다. 상기 변환 매트릭스는 정수기반 DCT 매트릭스이다.The residual block encoder transforms the residual block generated by the inter prediction method using an integer-based transformation matrix. The transformation matrix is an integer-based DCT matrix.

잔차 블록 부호화부는 상기 변환 매트릭스에 의해 변환된 잔차 블록의 계수들을 양자화하기 위해 양자화 매트릭스를 이용한다. 상기 양자화 매트릭스는 양자화 파라미터에 의해 결정된다.The residual block encoder uses a quantization matrix to quantize coefficients of the residual block transformed by the transform matrix. The quantization matrix is determined by a quantization parameter.

상기 양자화 파라미터는 미리 정해진 크기 이상의 코딩 유닛별로 결정된다. 상기 미리 정해진 크기는 8x8 또는 16x16일 수 있다. 따라서, 현재 코딩 유닛이 상기 미리 정해진 크기보다 작은 경우에는 상기 미리 정해진 크기 내의 복수개의 코딩 유닛 중 부호화 순서상 첫번째 코딩 유닛의 양자화 파라미터만을 부호화하고, 나머지 코딩 유닛의 양자화 파라미터는 상기 파라미터와 동일하므로 부호화할 필요가 없다.The quantization parameter is determined for each coding unit having a predetermined size or larger. The predetermined size may be 8x8 or 16x16. Therefore, when the current coding unit is smaller than the predetermined size, only the quantization parameters of the first coding unit in the coding order among the plurality of coding units within the predetermined size are coded, and the quantization parameters of the remaining coding units are the same as the above parameters. No need to.

그리고, 결정된 양자화 파라미터 및 예측 모드에 따라 결정되는 양자화 매트릭스를 이용하여 상기 변환 블록의 계수들을 양자화한다.Coefficients of the transform block are quantized using a quantization matrix determined according to the determined quantization parameter and prediction mode.

상기 미리 정해진 크기 이상의 코딩 유닛별로 결정되는 양자화 파라미터는 현재 코딩 유닛에 인접한 코딩 유닛의 양자화 파라미터를 이용하여 예측 부호화된다. 현재 코딩 유닛의 좌측 코딩 유닛, 상측 코딩 유닛 순서로 검색하여 유효한 1개 또는 2개의 유효한 양자화 파라미터를 이용하여 현재 코딩 유닛의 양자화 파라미터 예측자를 생성할 수 있다.A quantization parameter determined for each coding unit having a predetermined size or larger is predicted and encoded using a quantization parameter of a coding unit adjacent to the current coding unit. A quantization parameter predictor of the current coding unit may be generated using one or two effective quantization parameters by searching in the order of the left coding unit and the upper coding unit of the current coding unit.

예를 들어, 상기 순서로 검색된 유효한 첫번째 양자화 파라미터를 양자화 파라미터 예측자로 결정할 수 있다. 또한, 좌측 코딩 유닛, 부호화 순서상 바로 이전의 코딩 유닛 순으로 검색하여 유효한 첫번째 양자화 파라미터를 양자화 파라미터 예측자로 결정할 수 있다.For example, the first effective quantization parameter searched in the above order may be determined as a quantization parameter predictor. In addition, a first effective quantization parameter may be determined as a quantization parameter predictor by searching in the order of the left coding unit and the immediately preceding coding unit in coding order.

양자화된 변환 블록의 계수들은 스캐닝되어 1차원의 양자화 계수들로 변환한다. 스캐닝 방식은 엔트로피 부호화 모드에 따라 달리 설정될 수 있다. 예를 들어, CABAC으로 부호화될 경우에는 인터 예측 부호화된 양자화 계수들은 미리 정해진 하나의 방식(지그재그, 또는 대각선 방향으로의 래스터 스캔)으로 스캐닝될 수 있다. 반면에 CAVLC으로 부호화될 경우에는 상기 방식과 다른 방식으로 스캐닝될 수 있다.Coefficients of the quantized transform block are scanned to transform into one-dimensional quantization coefficients. A scanning method may be set differently according to an entropy encoding mode. For example, in the case of CABAC encoding, inter-prediction coded quantization coefficients may be scanned in a predetermined manner (zigzag or raster scan in a diagonal direction). On the other hand, if it is coded by CAVLC, it can be scanned in a different way from the above way.

예를 들어, 스캐닝 방식이 인터의 경우에는 지그재그, 인트라의 경우에는 인트라 예측 모드에 따라 결정될 수 있다. 또한, 계수 스캐닝 방식은 변환 단위의 크기에 따라 달리 결정될 수도 있다.For example, the scanning method may be determined according to zigzag in the case of inter and intra prediction mode in the case of intra. In addition, the coefficient scanning method may be determined differently according to the size of the transform unit.

상기 스캔 패턴은 방향성 인트라 예측 모드에 따라 달라질 수 있다. 양자화 계수들의 스캔순서는 역방향으로 스캔한다.The scan pattern may vary according to a directional intra prediction mode. The scanning order of quantization coefficients is scanned in the reverse direction.

멀티플렉서는 상기 움직임 정보 부호화부에 의해 부호화된 움직임 정보들과 상기 잔차 블록 부호화부에 의해 부호화된 잔차 신호들을 다중화한다. 상기 움직임 정보는 부호화 모드에 따라 달라질 수 있다.The multiplexer multiplexes the motion information encoded by the motion information encoder and the residual signals encoded by the residual block encoder. The motion information may vary according to an encoding mode.

즉, 스킵 또는 머지일 경우에는 예측자를 나타내는 인덱스만을 포함한다. 그러나, AMVP일 경우에는 현재 블록의 참조 픽쳐 인덱스, 차분 움직임 벡터 및 AMVP 인덱스를 포함한다.That is, in the case of skip or merge, only indexes representing predictors are included. However, in the case of AMVP, the reference picture index of the current block, the differential motion vector, and the AMVP index are included.

이하, 인트라 예측부(169)의 동작에 대한 일실시예를 상세히 설명하기로 한다.Hereinafter, an embodiment of the operation of the intra prediction unit 169 will be described in detail.

먼저, 픽쳐 분할부(160)에 의해 예측 모드 정보 및 예측 블록의 크기를 수신하며, 예측 모드 정보는 인트라 모드를 나타낸다. 예측 블록의 크기는 64x64, 32x32, 16x16, 8x8, 4x4등의 정방형일 수 있으나, 이에 한정하지 않는다. 즉, 상기 예측 블록의 크기가 정방형이 아닌 비정방형일 수도 있다. First, prediction mode information and prediction block size are received by the picture divider 160, and the prediction mode information indicates an intra mode. The size of the prediction block may be a square such as 64x64, 32x32, 16x16, 8x8, or 4x4, but is not limited thereto. That is, the size of the prediction block may be non-square rather than square.

다음으로, 예측 블록의 인트라 예측 모드를 결정하기 위해 참조 화소를 픽쳐 저장부(172)로부터 읽어 들인다.Next, reference pixels are read from the picture storage unit 172 to determine the intra prediction mode of the prediction block.

상기 이용 가능하지 않은 참조화소가 존재하는지 여부를 검토하여 참조 화소 생성 여부를 판단한다. 상기 참조 화소들은 현재 블록의 인트라 예측 모드를 결정하는데 사용된다.It is determined whether a reference pixel is generated by examining whether the unavailable reference pixel exists. The reference pixels are used to determine the intra prediction mode of the current block.

현재 블록이 현재 픽쳐의 상측 경계에 위치하는 경우에는 현재 블록의 상측에 인접한 화소들이 정의되지 않는다. 또한, 현재 블록이 현재 픽쳐의 좌측 경계에 위치하는 경우에는 현재 블록의 좌측에 인접한 화소들이 정의되지 않는다.When the current block is located on the upper boundary of the current picture, pixels adjacent to the upper side of the current block are not defined. Also, when the current block is located on the left boundary of the current picture, pixels adjacent to the left of the current block are not defined.

이러한 화소들은 이용 가능한 화소들이 아닌 것으로 판단한다. 또한, 현재 블록이 슬라이스 경계에 위치하여 슬라이스의 상측 또는 좌측에 인접하는 화소들이 먼저 부호화되어 복원되는 화소들이 아닌 경우에도 이용 가능한 화소들이 아닌 것으로 판단한다.It is determined that these pixels are not usable pixels. Also, when the current block is located at the slice boundary and adjacent pixels to the upper or left side of the slice are not pixels to be encoded and reconstructed first, it is determined that they are not usable pixels.

상기와 같이 현재 블록의 좌측 또는 상측에 인접한 화소들이 존재하지 않거나, 미리 부호화되어 복원된 화소들이 존재하지 않는 경우에는 이용 가능한 화소들만을 이용하여 현재 블록의 인트라 예측 모드를 결정할 수도 있다.As described above, when pixels adjacent to the left side or above the current block do not exist or pixels that have been previously encoded and reconstructed do not exist, the intra prediction mode of the current block may be determined using only available pixels.

그러나, 현재 블록의 이용 가능한 참조화소들을 이용하여 이용 가능하지 않은 위치의 참조화소들을 생성할 수도 있다. 예를 들어, 상측 블록의 화소들이 이용 가능하지 않은 경우에는 좌측 화소들의 일부 또는 전부를 이용하여 상측 화소들을 생성할 수 있고, 그 역으로도 가능하다.However, reference pixels at unavailable locations may be generated using available reference pixels of the current block. For example, when the pixels of the upper block are not available, some or all of the pixels on the left side may be used to generate the upper side pixels, and vice versa.

즉, 이용 가능하지 않은 위치의 참조화소로부터 미리 정해진 방향으로 가장 가까운 위치의 이용 가능한 참조화소를 복사하여 참조화소로 생성할 수 있다. 미리 정해진 방향에 이용 가능한 참조화소가 존재하지 않는 경우에는 반대 방향의 가장 가까운 위치의 이용 가능한 참조화소를 복사하여 참조화소로 생성할 수 있다.That is, a reference pixel may be created by copying an available reference pixel at a nearest location in a predetermined direction from a reference pixel at an unavailable location. When there is no reference pixel available in a predetermined direction, a reference pixel may be created by copying an available reference pixel at the closest location in the opposite direction.

한편, 현재 블록의 상측 또는 좌측 화소들이 존재하는 경우에도 상기 화소들이 속하는 블록의 부호화 모드에 따라 이용 가능하지 않은 참조 화소로 결정될 수 있다.Meanwhile, even when there are pixels above or to the left of the current block, they may be determined as unavailable reference pixels according to the coding mode of the block to which the pixels belong.

예를 들어, 현재 블록의 상측에 인접한 참조 화소가 속하는 블록이 인터 부호화되어 복원된 블록일 경우에는 상기 화소들을 이용 가능하지 않은 화소들로 판단할 수 있다.For example, when a block to which a reference pixel adjacent to the upper side of the current block belongs is a block reconstructed by inter-encoding, the pixels may be determined as unavailable pixels.

이 경우에는 현재 블록에 인접한 블록이 인트라 부호화되어 복원된 블록에 속하는 화소들을 이용하여 이용 가능한 참조 화소들을 생성할 수 있다. 이 경우에는 부호기에서 부호화 모드에 따라 이용 가능한 참조 화소를 판단한다는 정보를 복호기로 전송해야 한다.In this case, a block adjacent to the current block is intra-coded, and usable reference pixels may be generated using pixels belonging to a reconstructed block. In this case, the encoder needs to transmit information indicating that usable reference pixels are determined according to the encoding mode to the decoder.

다음으로, 상기 참조 화소들을 이용하여 현재 블록의 인트라 예측 모드를 결정한다. 현재 블록에 허용 가능한 인트라 예측 모드의 수는 블록의 크기에 따라 달라질 수 있다. 예를 들어, 현재 블록의 크기가 8x8, 16x16, 32x32인 경우에는 34개의 인트라 예측 모드가 존재할 수 있고, 현재 블록의 크기가 4x4인 경우에는 17개의 인트라 예측 모드가 존재할 수 있다.Next, an intra prediction mode of the current block is determined using the reference pixels. The number of intra prediction modes allowable for the current block may vary depending on the size of the block. For example, when the size of the current block is 8x8, 16x16, or 32x32, 34 intra prediction modes may exist, and when the size of the current block is 4x4, 17 intra prediction modes may exist.

상기 34개 또는 17개의 인트라 예측 모드는 적어도 하나 이상의 비방향성 모드(non-directional mode)와 복수개의 방향성 모드들(directional modes)로 구성될 수 있다.The 34 or 17 intra prediction modes may include at least one non-directional mode and a plurality of directional modes.

하나 이상의 비방향성 모드는 DC 모드 및/또는 플래너(planar) 모드일수 있다. DC 모드 및 플래너모드가 비방향성 모드로 포함되는 경우에는, 현재 블록의 크기에 관계없이 35개의 인트라 예측 모드가 존재할 수도 있다.One or more non-directional modes may be DC modes and/or planar modes. When the DC mode and the planar mode are included as non-directional modes, 35 intra prediction modes may exist regardless of the size of the current block.

이 때에는 2개의 비방향성 모드(DC 모드 및 플래너 모드)와 33개의 방향성 모드를 포함할 수 있다.In this case, two non-directional modes (DC mode and planar mode) and 33 directional modes may be included.

플래너 모드는 현재 블록의 우하측(bottom-right)에 위치하는 적어도 하나의 화소값(또는 상기 화소값의 예측값, 이하 제1 참조값이라 함)과 참조화소들을 이용하여 현재 블록의 예측 블록을 생성한다.The planner mode generates a prediction block of the current block using at least one pixel value (or a predicted value of the pixel value, hereinafter referred to as a first reference value) located at the bottom-right of the current block and reference pixels. .

상기한 바와 같이, 본 발명의 일실시예에 따른 동영상 복호화 장치의 구성은 앞서 설명한 동영상 부호화 장치의 구성으로부터 도출될 수 있으며, 예를 들어 앞서 설명한 바와 같은 부호화 과정의 역과정을 수행함으로써 영상을 복호화할 수 있다.As described above, the configuration of the video decoding apparatus according to an embodiment of the present invention can be derived from the configuration of the video encoding apparatus described above. can do.

도 11은 본 발명의 일실시예에 따른 동영상 복호화 장치의 구성을 블록도로 도시한 것이다.11 is a block diagram showing the configuration of a video decoding apparatus according to an embodiment of the present invention.

도 11을 참조하면, 본 발명에 따른 동영상 복호화 장치는, 엔트로피 복호화부(210), 역양자화/역변환부(220), 가산기(270), 디블록킹 필터(250), 픽쳐 저장부(260), 인트라 예측부(230), 움직임 보상 예측부(240) 및 인트라/인터전환 스위치(280)를 구비한다.Referring to FIG. 11, the video decoding apparatus according to the present invention includes an entropy decoding unit 210, an inverse quantization/inverse transform unit 220, an adder 270, a deblocking filter 250, a picture storage unit 260, It includes an intra prediction unit 230, a motion compensated prediction unit 240, and an intra/inter conversion switch 280.

엔트로피 복호화부(210)는, 동영상 부호화 장치로부터 전송되는 부호화 비트 스트림을 복호하여, 인트라 예측 모드 인덱스, 움직임 정보, 양자화 계수 시퀀스 등으로 분리한다. 엔트로피 복호화부(210)는 복호된 움직임 정보를 움직임 보상 예측부(240)에 공급한다.The entropy decoding unit 210 decodes the encoded bit stream transmitted from the video encoding device and separates the encoded bit stream into an intra prediction mode index, motion information, quantization coefficient sequence, and the like. The entropy decoding unit 210 supplies the decoded motion information to the motion compensation prediction unit 240 .

엔트로피 복호화부(210)는 상기 인트라 예측 모드 인덱스를 상기 인트라 예측부(230), 역양자화/역변환부(220)로 공급한다. 또한, 상기 엔트로피 복호화부(210)는 상기 역양자화 계수 시퀀스를 역양자화/역변환부(220)로 공급한다.The entropy decoding unit 210 supplies the intra prediction mode index to the intra prediction unit 230 and the inverse quantization/inverse transformation unit 220 . Also, the entropy decoding unit 210 supplies the inverse quantization coefficient sequence to the inverse quantization/inverse transformation unit 220 .

역양자화/역변환부(220)는 상기 양자화 계수 시퀀스를 2차원 배열의 역양자화 계수로 변환한다. 상기 변환을 위해 복수개의 스캐닝 패턴 중에 하나를 선택한다. 현재 블록의 예측모드(즉, 인트라 예측 및 인터 예측 중의 어느 하나)와 인트라 예측 모드 중 적어도 하나에 기초하여 복수개의 스캐닝 패턴 중 하나를 선택한다.The inverse quantization/inverse transformation unit 220 transforms the quantization coefficient sequence into a 2-dimensional array of inverse quantization coefficients. For the conversion, one of a plurality of scanning patterns is selected. One of a plurality of scanning patterns is selected based on at least one of the prediction mode of the current block (that is, one of intra prediction and inter prediction) and the intra prediction mode.

상기 인트라 예측 모드는 인트라 예측부 또는 엔트로피 복호화부로부터 수신한다.The intra prediction mode is received from an intra prediction unit or an entropy decoding unit.

역양자화/역변환부(220)는 상기 2차원 배열의 역양자화 계수에 복수개의 양자화 매트릭스 중 선택된 양자화 매트릭스를 이용하여 양자화 계수를 복원한다. 복원하고자 하는 현재 블록의 크기에 따라 서로 다른 양자화 매트릭스가 적용되며, 동일 크기의 블록에 대해서도 상기 현재 블록의 예측 모드 및 인트라 예측 모드 중 적어도 하나에 기초하여 양자화 매트릭스를 선택한다.The inverse quantization/inverse transformation unit 220 restores quantization coefficients by using a quantization matrix selected from among a plurality of quantization matrices for the inverse quantization coefficients of the two-dimensional array. Different quantization matrices are applied according to the size of a current block to be reconstructed, and a quantization matrix is selected for blocks of the same size based on at least one of a prediction mode and an intra prediction mode of the current block.

그리고, 상기 복원된 양자화 계수를 역변환하여 잔차 블록을 복원한다.Then, a residual block is reconstructed by inverse transforming the reconstructed quantization coefficient.

가산기(270)는 역양자화/역변환부(220)에 의해 복원된 잔차 블록과 인트라 예측부(230) 또는 움직임 보상 예측부(240)에 의해 생성되는 예측 블록을 가산함으로써, 영상 블록을 복원한다.The adder 270 restores an image block by adding the residual block reconstructed by the inverse quantization/inverse transformation unit 220 and the prediction block generated by the intra prediction unit 230 or the motion compensation prediction unit 240.

디블록킹 필터(250)는 가산기(270)에 의해 생성된 복원 영상에 디블록킹 필터 처리를 실행한다. 이에 따라, 양자화 과정에 따른 영상 손실에 기인하는 디블록킹 아티펙트를 줄일 수 있다.The deblocking filter 250 performs a deblocking filter process on the reconstructed image generated by the adder 270. Accordingly, deblocking artifacts caused by image loss due to the quantization process can be reduced.

픽쳐 저장부(260)는 디블록킹 필터(250)에 의해 디블록킹 필터 처리가 실행된 로컬 복호 영상을 유지하는 프레임 메모리이다.The picture storage unit 260 is a frame memory that stores a locally decoded image on which the deblocking filter process is performed by the deblocking filter 250 .

인트라 예측부(230)는 엔트로피 복호화부(210)로부터 수신된 인트라 예측 모드 인덱스에 기초하여 현재 블록의 인트라 예측 모드를 복원한다. 그리고, 복원된 인트라 예측 모드에 따라 예측 블록을 생성한다.The intra prediction unit 230 restores the intra prediction mode of the current block based on the intra prediction mode index received from the entropy decoding unit 210 . Then, a prediction block is generated according to the reconstructed intra prediction mode.

움직임 보상 예측부(240)는 움직임 벡터 정보에 기초하여 픽쳐 저장부(260)에 저장된 픽쳐로부터 현재 블록에 대한 예측 블록을 생성한다. 소수 정밀도의 움직임 보상이 적용될 경우에는 선택된 보간 필터를 적용하여 예측 블록을 생성한다.The motion compensation prediction unit 240 generates a prediction block for a current block from a picture stored in the picture storage unit 260 based on the motion vector information. When decimal precision motion compensation is applied, a prediction block is generated by applying the selected interpolation filter.

인트라/인터 전환 스위치(280)는 부호화 모드에 기초하여 인트라 예측부(230)와 움직임 보상 예측부(240)의 어느 하나에서 생성된 예측 블록을 가산기(270)에 제공한다.The intra/inter conversion switch 280 provides a prediction block generated by either the intra predictor 230 or the motion compensated predictor 240 to the adder 270 based on the encoding mode.

이와 같은 방식으로 복원된 현재 블록의 예측 블록과 복호화한 현재 블록의 잔차 블록을 이용하여 현재 블록이 복원된다.The current block is reconstructed using the predicted block of the current block reconstructed in this way and the residual block of the current block that has been decoded.

본 발명의 일실시예에 따른 동영상 비트스트림은 하나의 픽처에서의 부호화된 데이터를 저장하는데 사용되는 단위로서, PS(parameter sets)와 슬라이스 데이터를 포함할 수 있다.A video bitstream according to an embodiment of the present invention is a unit used to store coded data in one picture, and may include parameter sets (PS) and slice data.

PS(parameter sets)는, 각 픽처의 헤드에 상당하는 데이터인 픽처 파라미터 세트(이하 간단히 PPS라 한다)와 시퀀스 파라미터 세트(이하 간단히 SPS라 한다)로 분할된다. 상기 PPS와 SPS는 각 부호화를 초기화하는데 필요한 초기화 정보를 포함할 수 있으며, 본 발명의 실시 예에 따른 공간적 구조 정보(SPATIAL LAYOUT INFORMATION)가 포함될 수 있다.Parameter sets (PS) are divided into a picture parameter set (hereinafter simply referred to as PPS) and a sequence parameter set (hereinafter simply referred to as SPS), which are data corresponding to the head of each picture. The PPS and SPS may include initialization information necessary for initializing each encoding, and may include spatial structure information (SPATIAL LAYOUT INFORMATION) according to an embodiment of the present invention.

SPS는 램덤 액세스 유닛(RAU)으로 부호화된 모든 픽처를 복호화하기 위한 공통 참조 정보로서, 프로파일, 참조용으로 사용 가능한 픽처의 최대 수 및 픽처 크기 등을 포함할 수 있다.The SPS is common reference information for decoding all pictures coded with a random access unit (RAU), and may include a profile, the maximum number of pictures usable for reference, and a picture size.

PPS는, 랜덤 액세스 유닛(RAU)으로 부호화된 각 픽처에 대해, 픽처를 복호화하기 위한 참조 정보로서 가변 길이 부호화 방법의 종류, 양자화 단계의 초기값 및 다수의 참조 픽처들을 포함할 수 있다.The PPS may include, for each picture coded by a random access unit (RAU), a type of variable length coding method, an initial value of a quantization step, and a plurality of reference pictures as reference information for decoding the picture.

한편, 슬라이스 헤더(SH)는 슬라이스 단위의 코딩시 해당 슬라이스에 대한 정보를 포함한다.Meanwhile, the slice header (SH) includes information on a corresponding slice when coding in slice units.

상술한 본 발명에 따른 방법은 컴퓨터에서 실행되기 위한 프로그램으로 제작되어 컴퓨터가 읽을 수 있는 기록 매체에 저장될 수 있으며, 컴퓨터가 읽을 수 있는 기록 매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피디스크, 광 데이터 저장장치 등이 있다.The method according to the present invention described above may be produced as a program to be executed on a computer and stored in a computer-readable recording medium. Examples of the computer-readable recording medium include ROM, RAM, CD-ROM, and magnetic tape. , floppy disks, and optical data storage devices.

컴퓨터가 읽을 수 있는 기록 매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다. 그리고, 상기 방법을 구현하기 위한 기능적인(function) 프로그램, 코드 및 코드 세그먼트들은 본 발명이 속하는 기술분야의 프로그래머들에 의해 용이하게 추론될 수 있다.The computer-readable recording medium is distributed to computer systems connected through a network, so that computer-readable codes can be stored and executed in a distributed manner. In addition, functional programs, codes, and code segments for implementing the method can be easily inferred by programmers in the technical field to which the present invention belongs.

Claims

In the image processing method of the image processing device,
Including frame packing processing of the virtual reality image, processing encoding or decoding of the virtual reality image,
The virtual reality image is framed in a CISP (Compact ISP) format having a certain number of horizontal and diagonal discontinuities so as to align projection planes of the virtual reality images converted to the ISP format, remove inactive regions, and reduce discontinuity between projection planes. Including packed virtual reality images,
The frame packing process,
In the virtual reality image converted to the CISP format, a format conversion process of rearranging a projection plane so that a diagonal discontinuity is generated in the virtual reality image frame-packed with triangles,
The format conversion process,
In the polar region of the virtual reality image converted to the CISP format, the triangles in the equatorial region are arranged in a line while rearranging such that a discontinuity in a diagonal direction occurs between the projection planes of triangles in the polar region where the inter-pixel correlation is higher than the reference value. a first process of rearranging the projection plane so that the discontinuity between the triangles of the equatorial portion is minimized;
In performing the first process, a second process of rearranging projection surfaces so that adjacent projection surfaces have similar directivity within a predetermined range;
In performing the second process, a third process of padding at least some of the discontinuous boundaries existing on the rearranged projection surfaces by copying adjacent pixels between projection planes by N pixels;
In performing the third process, a fourth process of padding at least some of the discontinuous boundaries existing on the rearranged projection surfaces by performing N-pixel interpolation of adjacent pixels between projection planes,
Including processing to reduce visual artifacts for subjective image quality due to frame packing of the virtual reality image
Image processing method.