KR20210134555A

KR20210134555A - Apparatus and method for intra-prediction based video encoding or decoding

Info

Publication number: KR20210134555A
Application number: KR1020210146142A
Authority: KR
Inventors: 나태영; 이선영; 김효성; 손세훈; 신재섭
Original assignee: 에스케이텔레콤 주식회사
Priority date: 2018-02-23
Filing date: 2021-10-28
Publication date: 2021-11-10
Also published as: KR20200000548A; KR20210134556A

Abstract

The present invention provides a video decoding method based on intra prediction. The video decoding method comprises the steps of: decoding transform coefficients for a current block to be decoded from a bit-stream; constructing input data using a reference region decoded before the current block; generating prediction pixels of the current block by applying a 2-dimensional or 3-dimensional filter coefficient set to the input data; generating residual signals for the current block by inversely transforming the transform coefficients; and reconstructing the current block using the prediction pixels and the residual signals.

Description

Intra prediction-based video encoding or decoding apparatus and method

본 발명은 영상 부호화 또는 복호화 장치에 관한 것으로, 보다 구체적으로는 인트라 예측 기반의 영상 부호화 또는 복호화 장치 및 방법에 관한 것이다.The present invention relates to an apparatus for encoding or decoding an image, and more particularly, to an apparatus and method for encoding or decoding an image based on intra prediction.

이 부분에 기술된 내용은 단순히 본 실시예에 대한 배경 정보를 제공할 뿐 종래 기술을 구성하는 것은 아니다.The content described in this section merely provides background information for the present embodiment and does not constitute the prior art.

동영상 데이터는 음성 데이터나 정지 영상 데이터 등에 비해 데이터량이 많기 때문에, 데이터 원본을 그대로 저장 또는 전송하는 경우 메모리 등의 하드웨어 자원을 많이 소모하게 된다. 따라서, 일반적으로 동영상 데이터는 부호화기를 이용하여 압축된 후 저장 또는 전송되며, 압축된 동영상 데이터는 복호화기를 이용하여 압축 해제된 후 재생된다.Since moving image data has a larger amount of data than audio data or still image data, hardware resources such as memory are consumed a lot when the data source is stored or transmitted as it is. Therefore, in general, moving picture data is compressed using an encoder and then stored or transmitted, and the compressed moving picture data is decompressed using a decoder and then reproduced.

한편, 오늘날 고용량 게임이나 360도 영상 등의 비디오 콘텐츠에 대한 수요가 급증함에 따라, 영상의 크기 및 해상도, 프레임율이 증가하고 있다. 이에 따라, 복호화해야 하는 데이터량이 증가하여 복호화기의 복잡도도 함께 증가하게 되는 문제가 발생하고 있다. 이를 해결하기 위하여, 차세대 비디오 코덱에서는 코딩 효율의 저하 없이 압축된 비트스트림으로부터 데이터를 효율적으로 추출해 낼 수 있는 기술이 요구되고 있다.Meanwhile, as the demand for video content such as high-capacity games and 360-degree images is rapidly increasing today, the size, resolution, and frame rate of images are increasing. Accordingly, there is a problem in that the amount of data to be decoded increases and the complexity of the decoder also increases. In order to solve this problem, in the next-generation video codec, a technology capable of efficiently extracting data from a compressed bitstream without degrading coding efficiency is required.

최근 실험결과에 따르면, 기존 영상 부호화 또는 복호화 장치의 인-루프(in-loop) 필터를 인공 신경망의 일종인 CNN(Convolutional Neural Network) 필터로 대체함으로써 약 3.57%의 BDBR(Bjonteggrad-delta bit rate) 이득을 달성할 수 있음이 밝혀진 바 있다. 이에 따라, 인공 신경망 기술을 이용한 영상 부복호화 기술이 상술한 문제에 대한 해결책으로 주목받고 있다.According to recent experimental results, the Bjonteggrad-delta bit rate (BDBR) of about 3.57% is achieved by replacing the in-loop filter of the existing video encoding or decoding device with a Convolutional Neural Network (CNN) filter, which is a kind of artificial neural network. It has been shown that benefits can be achieved. Accordingly, image encoding/decoding technology using artificial neural network technology is attracting attention as a solution to the above-mentioned problem.

본 실시예는 복호화기의 복잡도는 유지하면서도 예측 정확도를 향상시킬 수 있는 영상 부호화 또는 복호화 장치 및 방법을 제공하고자 한다.An object of the present embodiment is to provide an apparatus and method for encoding or decoding an image capable of improving prediction accuracy while maintaining complexity of a decoder.

본 실시예의 일 측면에 의하면, 인트라 예측 기반의 영상 복호화 방법으로서, 비트스트림으로부터 복호화하고자 하는 현재 블록에 대한 변환 계수들을 복호화하는 단계; 상기 현재 블록보다 먼저 복호화된 참조 영역을 이용하여 입력 데이터를 구성하는 단계; 상기 입력 데이터에 2 차원 또는 3 차원으로 구성된 필터계수 셋(Set)을 적용하여 상기 현재 블록의 예측 픽셀들을 생성하는 단계; 상기 변환 계수들을 역변환하여 상기 현재 블록에 대한 잔차 신호들을 생성하는 단계; 및 상기 예측 픽셀들과 상기 잔차 신호들을 이용하여 상기 현재 블록을 복원하는 단계를 포함하는 영상 복호화 방법을 제공한다.According to an aspect of the present embodiment, there is provided a video decoding method based on intra prediction, comprising: decoding transform coefficients for a current block to be decoded from a bitstream; constructing input data using a reference region decoded before the current block; generating prediction pixels of the current block by applying a 2-dimensional or 3-dimensional filter coefficient set to the input data; generating residual signals for the current block by inverse transforming the transform coefficients; and reconstructing the current block using the prediction pixels and the residual signals.

본 실시예의 다른 측면에 의하면, 인트라 예측 기반의 영상 부호화 방법으로서, 부호화하고자 하는 현재 블록보다 먼저 복호화된 참조 영역으로부터 입력 데이터를 구성하는 단계; 상기 입력 데이터에 2 차원 또는 3 차원으로 구성된 필터계수 셋(Set)을 적용하여 상기 현재 블록의 예측 블록을 생성하는 단계; 상기 현재 블록으로부터 상기 예측 블록을 감산하여 잔차 블록을 생성하는 단계; 및 상기 잔차 블록을 부호화하여 부호화 데이터를 생성하는 단계를 포함하는 영상 부호화 방법을 제공한다.According to another aspect of the present embodiment, there is provided an intra prediction-based video encoding method, comprising: constructing input data from a reference region decoded before a current block to be encoded; generating a prediction block of the current block by applying a 2-dimensional or 3-dimensional filter coefficient set to the input data; generating a residual block by subtracting the prediction block from the current block; and encoding the residual block to generate encoded data.

본 실시예에 따른 영상 부호화 또는 복호화 장치는 2 차원 또는 3 차원으로 구성된 필터계수 셋(Set)을 적용한 인트라 예측을 이용함으로써, 영상 부호화 또는 복호화 효율을 향상시킬 수 있다.The image encoding or decoding apparatus according to the present embodiment can improve image encoding or decoding efficiency by using intra prediction to which a set of filter coefficients configured in two or three dimensions is applied.

도 1은 본 개시의 기술들에 적용될 수 있는 CNN의 구조에 대한 예시도이다.
도 2는 본 개시의 기술들을 구현할 수 있는 영상 부호화 장치에 대한 예시도이다.
도 3은 복수의 인트라 예측 모드들에 대한 예시도이다.
도 4는 본 개시의 기술들을 구현할 수 있는 영상 복호화 장치에 대한 예시도이다.
도 5는 본 실시예에 따른 영상 부호화 장치 측 CNN 예측부의 구성을 나타내는 블록도이다.
도 6은 CNN의 입력 데이터로 이용될 수 있는 주변 영역에 대한 예시도이다.
도 7은 복수의 주변 블록들로부터 CNN의 입력 레이어를 구성한 일 예를 나타내는 도면이다.
도 8은 주변 블록들의 픽셀값 형태에 기초한 현재 블록의 예측 방향에 대한 예시도이다.
도 9는 힌트 정보를 포함하는 CNN의 레이어 구성에 대한 예시도이다.
도 10은 본 실시예에 따른 영상 복호화 장치 측 CNN 예측부의 구성을 나타내는 블록도이다.
도 11은 도 5의 영상 부호화 장치 측 CNN 예측부의 동작을 나타내는 흐름도이다.
도 12는 도 10의 영상 복호화 장치 측 CNN 예측부의 동작을 나타내는 흐름도이다.1 is an exemplary diagram of a structure of a CNN that can be applied to the techniques of the present disclosure.
2 is an exemplary diagram of an image encoding apparatus capable of implementing the techniques of the present disclosure.
3 is an exemplary diagram for a plurality of intra prediction modes.
4 is an exemplary diagram of an image decoding apparatus capable of implementing the techniques of the present disclosure.
5 is a block diagram showing the configuration of a CNN prediction unit on the side of the video encoding apparatus according to the present embodiment.
6 is an exemplary diagram of a surrounding area that can be used as input data of CNN.
7 is a diagram illustrating an example of configuring an input layer of a CNN from a plurality of neighboring blocks.
8 is an exemplary diagram of a prediction direction of a current block based on pixel value shapes of neighboring blocks.
9 is an exemplary diagram of a layer configuration of a CNN including hint information.
10 is a block diagram illustrating the configuration of a CNN prediction unit on the side of the image decoding apparatus according to the present embodiment.
11 is a flowchart illustrating the operation of the CNN prediction unit of the image encoding apparatus of FIG. 5 .
12 is a flowchart illustrating an operation of the CNN prediction unit of the image decoding apparatus of FIG. 10 .

이하, 본 발명의 일부 실시예들을 예시적인 도면을 통해 상세하게 설명한다. 각 도면의 구성 요소들에 식별 부호를 부가함에 있어서, 동일한 구성요소들에 대해서는 비록 다른 도면상에 표시되더라도 가능한 한 동일한 부호를 가지도록 하고 있음에 유의해야 한다. 또한, 본 발명을 설명함에 있어, 관련된 공지 구성 또는 기능에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명은 생략한다.Hereinafter, some embodiments of the present invention will be described in detail with reference to exemplary drawings. It should be noted that in adding identification codes to the components of each drawing, the same components are to have the same reference numerals as much as possible even though they are indicated on different drawings. In addition, in describing the present invention, if it is determined that a detailed description of a related known configuration or function may obscure the gist of the present invention, the detailed description thereof will be omitted.

또한, 본 발명의 구성 요소를 설명하는 데 있어서, 제 1, 제 2, A, B, (a), (b) 등의 용어를 사용할 수 있다. 이러한 용어는 그 구성 요소를 다른 구성 요소와 구별하기 위한 것일 뿐, 그 용어에 의해 해당 구성 요소의 본질이나 차례 또는 순서 등이 한정되지 않는다. 명세서 전체에서, 어떤 부분이 어떤 구성요소를 '포함', '구비'한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다. 또한, 명세서에 기재된 '…부,' '모듈' 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어나 소프트웨어 또는 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다.In addition, in describing the components of the present invention, terms such as first, second, A, B, (a), (b), etc. may be used. These terms are only for distinguishing the components from other components, and the essence, order, or order of the components are not limited by the terms. Throughout the specification, when a part 'includes' or 'includes' a certain component, it means that other components may be further included, rather than excluding other components, unless otherwise stated. . In addition, the '... Terms such as 'unit' and 'module' mean a unit that processes at least one function or operation, which may be implemented as hardware or software or a combination of hardware and software.

이하, 첨부된 도면들을 참조하여 본 발명의 일 실시예들에 대해서 보다 상세하게 설명하기로 한다.Hereinafter, embodiments of the present invention will be described in more detail with reference to the accompanying drawings.

도 1은 본 개시의 기술들에 적용될 수 있는 CNN의 구조에 대한 예시도이다.1 is an exemplary diagram of a structure of a CNN that can be applied to the techniques of the present disclosure.

CNN은 영상처리를 위해 고안된 특수한 연결구조를 갖는 다층신경망을 말한다. CNN 알고리즘은 학습 과정과 추론 과정으로 구분될 수 있는데, 학습 과정은 학습 방식에 따라 다시 지도 학습(supervised learning), 비지도 학습(unsupervised learning) 및 강화 학습(reinforcement learning)으로 구분될 수 있다. 이 중, 지도 학습이란 입력 데이터에 대한 명시적인 정답인 출력 레이블(label)을 이용하여 컨볼루션 커널의 계수값들을 산출하는 과정을 의미한다. 그리고, 컨볼루션 커널의 계수값들은 출력 데이터와 출력 레이블 사이의 오차를 최소화하기 위해 오류 역전파(error backpropagation) 알고리즘을 이용한 반복된 학습 과정을 통해 업데이트될 수 있다.CNN refers to a multi-layer neural network with a special connection structure designed for image processing. The CNN algorithm can be divided into a learning process and an inference process, and the learning process can be further divided into supervised learning, unsupervised learning, and reinforcement learning according to a learning method. Among them, supervised learning refers to a process of calculating coefficient values of a convolution kernel using an output label that is an explicit correct answer for input data. In addition, the coefficient values of the convolution kernel may be updated through an iterative learning process using an error backpropagation algorithm in order to minimize an error between the output data and the output label.

이후, 학습 과정을 통해 산출된 컨볼루션 커널의 계수값들을 이용하여 입력 데이터로부터 출력 데이터를 생성하는 추론 과정이 수행된다. 예컨대, 컨볼루션 커널의 계수값들을 이용한 추론 과정을 통해, Y 이미지로부터 U/V 이미지가 생성될 수 있다.Thereafter, an inference process of generating output data from input data using the coefficient values of the convolution kernel calculated through the learning process is performed. For example, a U/V image may be generated from the Y image through an inference process using coefficient values of the convolution kernel.

도 1을 참조하면, CNN은 입력 레이어(110), 히든 레이어(130) 및 출력 레이어(150)를 포함할 수 있다. 히든 레이어(130)는 입력 레이어(110)와 출력 레이어(150) 사이에 위치하며, 복수의 컨볼루션 레이어(131 내지 139)를 포함할 수 있다. 또한, 히든 레이어(130)는 컨볼루션 연산결과가 되는 특징 맵(feature map)의 해상도(resolution)를 조절하기 위하여, 업샘플링(upsampling) 레이어 또는 풀링(pooling) 레이어를 더 포함할 수 있다.Referring to FIG. 1 , a CNN may include an input layer 110 , a hidden layer 130 , and an output layer 150 . The hidden layer 130 is positioned between the input layer 110 and the output layer 150 and may include a plurality of convolutional layers 131 to 139 . In addition, the hidden layer 130 may further include an upsampling layer or a pooling layer in order to adjust the resolution of a feature map that is a result of a convolution operation.

CNN을 구성하는 모든 레이어는 각각 복수의 노드들을 포함하며, 각 노드는 인접한 다른 레이어의 노드들과 상호 연결되어 소정의 연결 가중치가 적용된 출력 값을 다른 노드들의 입력으로 전달할 수 있다.All layers constituting the CNN each include a plurality of nodes, and each node is interconnected with nodes of other adjacent layers to transmit an output value to which a predetermined connection weight is applied as an input of other nodes.

컨볼루션 레이어(131 내지 139)는 2차원 또는 3차원 행렬 형태의 컨볼루션 커널(즉, 필터)을 이용하여 각 레이어에 입력된 영상 데이터에 대해 컨볼루션 연산을 수행함으로써 특징 맵을 생성할 수 있다. 여기서, 특징 맵은 각 레이어에 입력된 영상 데이터의 다양한 특징들이 표현된 영상 데이터를 의미한다. 컨볼루션 레이어(131 내지 139)의 개수, 컨볼루션 커널의 크기 등은 학습 과정 이전에 미리 설정될 수 있다.The convolution layers 131 to 139 may generate a feature map by performing a convolution operation on image data input to each layer using a convolution kernel (ie, a filter) in the form of a 2D or 3D matrix. . Here, the feature map refers to image data in which various features of image data input to each layer are expressed. The number of convolutional layers 131 to 139, the size of the convolution kernel, etc. may be preset prior to the learning process.

출력 레이어(150)는 완전 연결 레이어(fully connected layer)로 구성될 수 있다. 출력 레이어(150)의 노드들은 특징 맵에 표현된 다양한 특징들을 조합하여 영상 데이터를 출력할 수 있다.The output layer 150 may be configured as a fully connected layer. Nodes of the output layer 150 may output image data by combining various features expressed in the feature map.

도 2는 본 개시의 기술들을 구현할 수 있는 영상 부호화 장치에 대한 예시도이다.2 is an exemplary diagram of an image encoding apparatus capable of implementing the techniques of the present disclosure.

영상 부호화 장치는 블록 분할부(210), 예측부(220), 감산기(230), 변환부(240), 양자화부(245), 부호화부(250), 역양자화부(260), 역변환부(265), 가산기(270), 필터부(280) 및 메모리(290)를 포함할 수 있다. 영상 부호화 장치의 각 구성요소는 하드웨어 또는 소프트웨어로 구현되거나, 하드웨어 및 소프트웨어의 결합으로 구현될 수도 있다. 또한, 복수의 “부”, “unit” 등은 각각이 개별적인 특정 하드웨어로 구현될 필요가 있는 경우를 제외하고는, 적어도 하나의 모듈이나 칩으로 일체화되어 적어도 하나의 프로세서로 구현될 수 있다.The image encoding apparatus includes a block divider 210 , a predictor 220 , a subtractor 230 , a transform unit 240 , a quantizer 245 , an encoder 250 , an inverse quantizer 260 , and an inverse transform unit ( 265 ), an adder 270 , a filter unit 280 , and a memory 290 . Each component of the image encoding apparatus may be implemented as hardware or software, or may be implemented as a combination of hardware and software. In addition, the plurality of “units” and “units” may be integrated into at least one module or chip and implemented as at least one processor, except when each needs to be implemented as individual specific hardware.

블록 분할부(210)는 영상을 구성하는 각 픽처(picture)를 복수의 CTU(Coding Tree Unit)로 분할한 이후, 트리 구조(tree structure)를 이용하여 CTU를 반복적으로(recursively) 분할한다. 이 때, 트리 구조에서 리프 노드(leaf node)가 부호화의 기본 단위인 CU(Coding Unit)가 된다.The block divider 210 divides each picture constituting an image into a plurality of coding tree units (CTUs) and then recursively divides the CTUs using a tree structure. In this case, in the tree structure, a leaf node becomes a coding unit (CU), which is a basic unit of encoding.

트리 구조는 상위 노드(또는, 부모 노드)가 동일한 크기의 네 개의 하위 노드(또는, 자식 노드)로 분할되는 쿼드트리(QuadTree, QT) 구조, 상위 노드가 두 개의 하위 노드로 분할되는 바이너리트리(BinaryTree, BT) 구조 또는 상위 노드가 1:2:1 비율의 세 개의 하위 노드로 분할되는 터너리트리(TernaryTree, TT) 구조일 수 있다. 또한, 트리 구조는 이러한 QT 구조, BT 구조 및 TT 구조 중 적어도 하나 이상이 혼용된 구조, 예컨대 QTBT(QuadTree plus BinaryTree) 구조 또는 QTBTTT(QuadTree plus BinaryTree TernaryTree) 구조일 수 있다.The tree structure is a QuadTree (QT) structure in which the parent node (or parent node) is divided into four child nodes (or child nodes) of the same size, and a binary tree (QT) structure in which the parent node is divided into two child nodes ( It may be a BinaryTree, BT) structure or a TernaryTree (TT) structure in which an upper node is divided into three lower nodes in a ratio of 1:2:1. In addition, the tree structure may be a structure in which at least one or more of these QT structures, BT structures, and TT structures are mixed, for example, a QuadTree plus BinaryTree (QTBT) structure or a QuadTree plus BinaryTree TernaryTree (QTBTTT) structure.

이하, 부호화 또는 복호화하고자 하는 CU(즉, CTU 분할에 이용된 트리 구조의 리프 노드)에 해당하는 블록을 '현재 블록'으로 칭하기로 한다. 현재 블록을 구성하는 픽셀은 루마(luma) 성분과 두 개의 크로마(chroma) 성분으로 구성될 수 있다. 루마 성분으로 구성된 루마 블록과 크로마 성분으로 구성된 크로마 블록들은 개별적으로 예측되고 부호화될 수 있다.Hereinafter, a block corresponding to a CU to be encoded or decoded (ie, a leaf node of a tree structure used for CTU splitting) will be referred to as a 'current block'. A pixel constituting the current block may include a luma component and two chroma components. A luma block composed of a luma component and a chroma block composed of a chroma component may be individually predicted and encoded.

이하, 각 성분들을 채널이라는 용어를 사용하여 구분하기로 한다. 즉, 동일 채널이란 동일한 성분을 의미하고, 다른 채널이란 다른 성분을 의미한다.Hereinafter, each component will be classified using the term "channel". That is, the same channel means the same component, and different channels mean different components.

각 CU에 적용되는 정보들은 CU의 신택스로서 부호화되고, 하나의 CTU에 포함된 CU들에 공통적으로 적용되는 정보는 CTU의 신택스로서 부호화된다. 또한, 하나의 슬라이스 내의 모든 블록들에 공통적으로 적용되는 정보는 슬라이스의 신택스로서 부호화되며, 하나의 픽처들을 구성하는 모든 블록들에 적용되는 정보는 픽처 파라미터 셋(Picture Parameter Set, PPS)에 부호화된다. 나아가, 복수의 픽처가 공통으로 참조하는 정보들은 시퀀스 파라미터 셋(Sequence Parameter Set, SPS)에 부호화된다. 그리고 하나 이상의 SPS가 공통으로 참조하는 정보들은 비디오 파라미터 셋(Video Parameter Set, VPS)에 부호화된다.Information applied to each CU is encoded as a syntax of the CU, and information commonly applied to CUs included in one CTU is encoded as a syntax of the CTU. In addition, information commonly applied to all blocks in one slice is encoded as a syntax of the slice, and information applied to all blocks constituting one picture is encoded in a picture parameter set (PPS). . Furthermore, information commonly referenced by a plurality of pictures is encoded in a sequence parameter set (SPS). In addition, information commonly referenced by one or more SPSs is encoded in a video parameter set (VPS).

예측부(220)는 현재 블록을 예측하여 예측 블록을 생성한다. 예측부(220)는 인트라 예측부(222)와 인터 예측부(224)를 포함한다. The prediction unit 220 generates a prediction block by predicting the current block. The prediction unit 220 includes an intra prediction unit 222 and an inter prediction unit 224 .

일반적으로, 픽처 내 현재 블록들은 각각 예측적으로 코딩될 수 있다. 현재 블록의 예측은 (현재 블록을 포함하는 픽처로부터의 데이터를 이용하는) 인트라 예측 기술 또는 (현재 블록을 포함하는 픽처 이전에 코딩된 픽처로부터의 데이터를 이용하는) 인터 예측 기술을 이용하여 일반적으로 수행될 수 있다.In general, each of the current blocks in a picture may be predictively coded. Prediction of the current block is generally performed using an intra prediction technique (using data from the picture containing the current block) or inter prediction technique (using data from a picture coded before the picture containing the current block). can

인트라 예측부(222)는 현재 블록이 포함된 현재 픽처 내에서 현재 블록의 주변에 위치한 픽셀(참조 픽셀)들을 이용하여 현재 블록 내의 픽셀들을 예측한다. 예측 방향에 따라 복수의 인트라 예측 모드가 존재한다. 예컨대, 도 3에 도시된 바와 같이, 복수의 인트라 예측 모드는 planar 모드와 DC 모드를 포함하는 비방향성 모드와 65 개의 방향성 모드를 포함할 수 있다. 각 예측 모드에 따라, 인트라 예측에 이용되는 주변 픽셀과 연산식이 다르게 정의된다.The intra prediction unit 222 predicts pixels in the current block by using pixels (reference pixels) located around the current block in the current picture including the current block. A plurality of intra prediction modes exist according to a prediction direction. For example, as shown in FIG. 3 , the plurality of intra prediction modes may include a non-directional mode including a planar mode and a DC mode and 65 directional modes. According to each prediction mode, neighboring pixels used for intra prediction and an arithmetic expression are defined differently.

또한, 인트라 예측부(222)는 CNN에 기반한 학습 및 추론 과정을 통해 참조 픽셀들을 이용하여 현재 블록 내의 픽셀들을 예측할 수도 있다. 이 경우, 인트라 예측부(222)는 도 3을 참조하여 전술한 복수의 인트라 예측 모드와 함께 CNN 기반의 인트라 예측 모드(이하, 'CNN 모드'라고 칭함)를 병렬적으로 운용할 수 있다. 또는, 인트라 예측부(222)는 CNN 모드만을 독자적으로 운용할 수도 있다.Also, the intra prediction unit 222 may predict pixels in the current block using reference pixels through a CNN-based learning and inference process. In this case, the intra prediction unit 222 may parallelly operate a CNN-based intra prediction mode (hereinafter referred to as a 'CNN mode') together with the plurality of intra prediction modes described above with reference to FIG. 3 . Alternatively, the intra prediction unit 222 may independently operate only the CNN mode.

인트라 예측부(222)는 현재 블록을 부호화하는데 이용할 인트라 예측 모드를 결정할 수 있다. 일부 예들에서, 인트라 예측부(222)는 여러 인트라 예측 모드들을 이용하여 현재 블록을 부호화하고, 테스트된 모드들로부터 이용할 적절한 인트라 예측 모드를 선택할 수도 있다. 예를 들어, 인트라 예측부(222)는 여러 테스트된 인트라 예측 모드들에 대한 레이트 왜곡(rate-distortion) 분석을 이용하여 레이트 왜곡 값들을 계산하고, 테스트된 모드들 중 최선의 레이트 왜곡 특징들을 갖는 인트라 예측 모드를 선택할 수 있다.The intra prediction unit 222 may determine an intra prediction mode to be used for encoding the current block. In some examples, the intra prediction unit 222 may encode a current block using several intra prediction modes and select an appropriate intra prediction mode to use from the tested modes. For example, the intra prediction unit 222 calculates rate-distortion values using rate-distortion analysis for several tested intra prediction modes, and has the best rate-distortion characteristics among the tested modes. An intra prediction mode can be selected.

인트라 예측부(222)는 선택된 인트라 예측 모드에 따라 결정되는 주변 픽셀(또는, 입력 데이터)과 연산식(또는, 컨볼루션 커널의 계수값)을 이용하여 현재 블록을 예측한다.The intra prediction unit 222 predicts the current block using a neighboring pixel (or input data) determined according to the selected intra prediction mode and an arithmetic expression (or a coefficient value of a convolution kernel).

복수의 인트라 예측 모드 중 어느 모드가 현재 블록의 인트라 예측 모드로 사용되었는지를 지시하는 인트라 예측 모드 정보는 부호화부(250)에 의해 부호화되어 영상 복호화 장치로 시그널링된다.Intra prediction mode information indicating which of the plurality of intra prediction modes is used as the intra prediction mode of the current block is encoded by the encoder 250 and signaled to the image decoding apparatus.

한편, 인트라 예측부(222)는, 복수의 인트라 예측 모드 중 어느 모드가 현재 블록의 인트라 예측 모드로 사용되었는지를 지시하는 인트라 예측 모드 정보를 효율적으로 부호화하기 위해, 복수의 인트라 예측 모드 중 현재 블록의 인트라 예측 모드로서 가능성이 높은 일부의 모드를 MPM(most probable mode)으로 결정할 수 있다.Meanwhile, the intra prediction unit 222 is configured to efficiently encode intra prediction mode information indicating which of the plurality of intra prediction modes is used as the intra prediction mode of the current block, the current block among the plurality of intra prediction modes. As the intra prediction mode of , some modes with high probability may be determined as the most probable mode (MPM).

MPM 리스트는 현재 블록의 주변 블록들의 인트라 예측 모드들, planar 모드, DC 모드를 포함할 수 있다. 또한, MPM 리스트는 CNN 모드를 더 포함할 수 있다.The MPM list may include intra prediction modes, planar mode, and DC mode of neighboring blocks of the current block. In addition, the MPM list may further include a CNN mode.

현재 블록의 인트라 예측 모드가 MPM 중에서 선택되는 경우, MPM 중 어느 모드가 현재 블록의 인트라 예측 모드로 선택되었는지를 지시하는 제1 인트라 식별정보가 부호화부(250)에 의해 부호화되어 영상 복호화 장치로 시그널링된다.When the intra prediction mode of the current block is selected from among the MPMs, first intra identification information indicating which mode of the MPM is selected as the intra prediction mode of the current block is encoded by the encoder 250 and signaled to the image decoding apparatus do.

반면, 현재 블록의 인트라 예측 모드가 MPM 중에서 선택되지 않은 경우, MPM이 아닌 나머지 모드들 중 어느 모드가 현재 블록의 인트라 예측 모드로 선택되었는지를 지시하는 제2 인트라 식별정보는 부호화부(250)에 의해 부호화되어 영상 복호화 장치로 시그널링된다.On the other hand, when the intra prediction mode of the current block is not selected from among the MPMs, second intra identification information indicating which of the remaining modes other than the MPM is selected as the intra prediction mode of the current block is transmitted to the encoder 250 . is encoded and signaled to an image decoding apparatus.

인터 예측부(224)는 움직임 추정(estimation) 및 움직임 보상(compensation) 과정을 통해 현재 블록에 대한 예측 블록을 생성한다. 즉, 인터 예측부(224)는 현재 픽처보다 먼저 부호화 및 복호화된 참조 픽처 내에서 현재 블록과 가장 유사한 블록을 탐색하고, 그 탐색된 블록을 이용하여 현재 블록에 대한 예측 블록을 생성한다. 그리고, 인터 예측부(224)는 현재 픽처 내의 현재 블록과 참조 픽처 내의 예측 블록 간의 변위(displacement)에 해당하는 움직임 벡터(motion vector)를 생성한다.The inter prediction unit 224 generates a prediction block for the current block through motion estimation and motion compensation. That is, the inter prediction unit 224 searches for a block most similar to the current block in the reference picture encoded and decoded before the current picture, and generates a prediction block for the current block using the searched block. Then, the inter prediction unit 224 generates a motion vector corresponding to a displacement between the current block in the current picture and the prediction block in the reference picture.

일반적으로, 움직임 추정은 루마(luma) 성분에 대해 수행되고, 루마 성분에 기초하여 계산된 모션 벡터는 루마 성분 및 크로마 성분 모두에 대해 이용된다. 현재 블록을 예측하기 위해 이용된 참조 픽처에 대한 정보 및 움직임 벡터에 대한 정보를 포함하는 움직임 정보는 부호화부(250)에 의해 부호화되어 영상 복호화 장치로 전달된다.In general, motion estimation is performed for a luma component, and a motion vector calculated based on the luma component is used for both the luma component and the chroma component. Motion information including information on a reference picture and information on a motion vector used to predict the current block is encoded by the encoder 250 and transmitted to the image decoding apparatus.

감산기(230)는 현재 블록으로부터 인트라 예측부(222) 또는 인터 예측부(224)에 의해 생성된 예측 블록을 감산하여 잔차 블록을 생성한다.The subtractor 230 generates a residual block by subtracting the prediction block generated by the intra prediction unit 222 or the inter prediction unit 224 from the current block.

변환부(240)는 공간 영역의 픽셀 값들을 가지는 잔차 블록 내의 잔차 신호를 주파수 도메인의 변환 계수로 변환한다. 변환부(240)는 잔차 블록 내의 잔차 신호들을 현재 블록의 크기를 변환 단위로 이용하여 변환할 수 있으며, 또는 잔차 블록을 더 작은 복수의 서브블록을 분할하고 서브블록 크기의 변환 단위로 잔차 신호들을 변환할 수도 있다. 잔차 블록을 더 작은 서브블록으로 분할하는 방법은 다양하게 존재할 수 있다. 예컨대, 기정의된 동일한 크기의 서브블록으로 분할할 수도 있으며, 또는 잔차 블록을 루트 노드로 하는 QT(quadtree) 방식의 분할을 이용할 수도 있다. The transform unit 240 transforms the residual signal in the residual block having pixel values in the spatial domain into transform coefficients in the frequency domain. The transform unit 240 may transform the residual signals in the residual block by using the size of the current block as a transform unit, or divide the residual block into a plurality of smaller subblocks and convert the residual signals in the transform unit of the subblock size. You can also convert There may be various methods for dividing the residual block into smaller subblocks. For example, it may be divided into sub-blocks of the same size as predefined, or a quadtree (QT) type partitioning using a residual block as a root node may be used.

양자화부(245)는 변환부(240)로부터 출력되는 변환 계수들을 양자화하고, 양자화된 변환 계수들을 부호화부(250)로 출력한다.The quantization unit 245 quantizes the transform coefficients output from the transform unit 240 , and outputs the quantized transform coefficients to the encoder 250 .

부호화부(250)는 양자화된 변환 계수들을 CABAC 등의 부호화 방식을 이용하여 부호화하여 비트스트림을 생성한다. 또한, 부호화부(250)는 블록 분할과 관련된 CTU size, QT 분할 플래그, BT 분할 플래그, 분할 타입 등의 정보를 부호화하여, 영상 복호화 장치가 영상 부호화 장치와 동일하게 블록을 분할할 수 있도록 한다.The encoder 250 generates a bitstream by encoding the quantized transform coefficients using an encoding method such as CABAC. Also, the encoder 250 encodes information such as a CTU size, a QT split flag, a BT split flag, and a split type related to block splitting so that the video decoding apparatus can split the block in the same way as the video encoding apparatus.

부호화부(250)는 현재 블록이 인트라 예측에 의해 부호화되었는지 아니면 인터 예측에 의해 부호화되었는지 여부를 지시하는 예측 타입에 대한 정보를 부호화하고, 예측 타입에 따라 인트라 예측정보(즉, 인트라 예측 모드에 대한 정보) 또는 인터 예측정보(참조 픽처 및 움직임 벡터에 대한 정보)를 부호화한다. The encoder 250 encodes information on a prediction type indicating whether the current block is encoded by intra prediction or inter prediction, and intra prediction information (ie, information about the intra prediction mode) according to the prediction type. information) or inter prediction information (information about reference pictures and motion vectors) is encoded.

역양자화부(260)는 양자화부(245)로부터 출력되는 양자화된 변환 계수들을 역양자화하여 변환 계수들을 생성한다. 역변환부(265)는 역양자화부(260)로부터 출력되는 변환 계수들을 주파수 도메인으로부터 공간 도메인으로 변환하여 잔차 블록을 복원한다.The inverse quantization unit 260 generates transform coefficients by inverse quantizing the quantized transform coefficients output from the quantization unit 245 . The inverse transform unit 265 reconstructs a residual block by transforming the transform coefficients output from the inverse quantization unit 260 from the frequency domain to the spatial domain.

가산기(270)는 복원된 잔차 블록과 예측부(220)에 의해 생성된 예측 블록을 가산하여 현재 블록을 복원한다. 복원된 현재 블록 내의 픽셀들은 다음 순서의 블록을 인트라 예측할 때 참조 픽셀로서 이용된다.The adder 270 reconstructs the current block by adding the reconstructed residual block to the prediction block generated by the prediction unit 220 . Pixels in the reconstructed current block are used as reference pixels when intra-predicting the next block.

필터부(280)는 블록 기반의 예측 및 변환/양자화로 인해 발생하는 블록킹 아티팩트(blocking artifacts), 링잉 아티팩트(ringing artifacts), 블러링 아티팩트(blurring artifacts) 등을 줄이기 위해 복원된 픽셀들에 대한 필터링을 수행한다. 필터부(280)는 디블록킹 필터(282)와 SAO 필터(284)를 포함한다.The filter unit 280 filters the reconstructed pixels to reduce blocking artifacts, ringing artifacts, blurring artifacts, etc. generated due to block-based prediction and transformation/quantization. carry out The filter unit 280 includes a deblocking filter 282 and an SAO filter 284 .

디블록킹 필터(282)는 블록 단위의 부호화로 인해 발생하는 블록킹 현상(blocking artifact)을 제거하기 위해 복원된 블록 간의 경계를 디블록킹 필터링한다.The deblocking filter 282 deblocks and filters the boundary between the reconstructed blocks in order to remove a blocking artifact caused by block-by-block encoding.

SAO 필터(284)는 손실 부호화(lossy coding)로 인해 발생하는 복원된 픽셀과 원본 픽셀 간의 차이를 보상하기 위해 디블록킹 필터링된 영상에 대해 추가적인 필터링을 수행한다.The SAO filter 284 performs additional filtering on the deblocking-filtered image to compensate for a difference between a reconstructed pixel and an original pixel caused by lossy coding.

디블록킹 필터(282) 및 SAO 필터(284)를 이용해 필터링된 복원 블록은 메모리(290)에 저장된다. 한 픽처 내의 모든 블록들이 복원되면, 복원된 픽처는 이후에 부호화하고자 하는 픽처 내의 블록을 인터 예측하기 위한 참조 픽처로 이용된다.The reconstruction block filtered using the deblocking filter 282 and the SAO filter 284 is stored in the memory 290 . When all blocks in one picture are reconstructed, the reconstructed picture is used as a reference picture for inter prediction of blocks within a picture to be encoded later.

도 4는 본 개시의 기술들을 구현할 수 있는 영상 복호화 장치에 대한 예시도이다.4 is an exemplary diagram of an image decoding apparatus capable of implementing the techniques of the present disclosure.

영상 복호화 장치는 복호화부(410), 역양자화부(420), 역변환부(430), 예측부(440), 가산기(450) 등을 포함하는 영상 복원기(4000)와, 필터부(460) 및 메모리(470)를 포함할 수 있다. 영상 복호화 장치의 각 구성요소는 하드웨어 또는 소프트웨어로 구현되거나, 하드웨어 및 소프트웨어의 결합으로 구현될 수도 있다. 또한, 복수의 “부”, “unit” 등은 각각이 개별적인 특정 하드웨어로 구현될 필요가 있는 경우를 제외하고는, 적어도 하나의 모듈이나 칩으로 일체화되어 적어도 하나의 프로세서로 구현될 수 있다.The image decoding apparatus includes an image reconstructor 4000 including a decoder 410 , an inverse quantizer 420 , an inverse transform unit 430 , a predictor 440 , an adder 450 , and the like, and a filter unit 460 . and a memory 470 . Each component of the image decoding apparatus may be implemented as hardware or software, or a combination of hardware and software. In addition, the plurality of “units” and “units” may be integrated into at least one module or chip and implemented as at least one processor, except when each needs to be implemented as individual specific hardware.

복호화부(410)는 영상 부호화 장치로부터 수신된 비트스트림을 복호화하여 블록 분할과 관련된 정보를 추출하여 복호화하고자 하는 현재 블록을 결정하고, 현재 블록을 복원하기 위해 필요한 예측 정보(예: 힌트 정보)와 잔차 신호에 대한 정보 등을 추출한다.The decoder 410 decodes the bitstream received from the image encoding apparatus, extracts information related to block division, determines a current block to be decoded, and includes prediction information (eg, hint information) necessary for reconstructing the current block. Information on the residual signal is extracted.

복호화부(410)는 SPS(Sequence Parameter Set) 또는 PPS(Picture Parameter Set)로부터 CTU size에 대한 정보를 추출하여 CTU의 크기를 결정하고, 픽처를 결정된 크기의 CTU로 분할한다. 그리고, 복호화부(410)는 CTU를 트리 구조의 최상위 레이어, 즉, 루트 노드로 결정하고, CTU에 대한 분할 정보를 추출함으로써 CTU를 트리 구조를 이용하여 분할한다.The decoder 410 extracts information about the CTU size from a sequence parameter set (SPS) or a picture parameter set (PPS), determines the size of the CTU, and divides the picture into CTUs of the determined size. Then, the decoder 410 determines the CTU as the highest layer of the tree structure, that is, the root node, and extracts the division information on the CTU to split the CTU using the tree structure.

예컨대, QTBT 구조를 이용하여 CTU를 분할하는 경우, 먼저 QT의 분할과 관련된 제1 플래그(QT_split_flag)를 추출하여 각 노드를 하위 레이어의 네 개의 노드로 분할한다. 그리고, QT의 리프 노드에 해당하는 노드에 대해서는 BT의 분할과 관련된 제2 플래그(BT_split_flag) 및 분할 타입(분할 방향) 정보를 추출하여 해당 리프 노드를 BT 구조로 분할한다.For example, when a CTU is split using a QTBT structure, a first flag (QT_split_flag) related to QT splitting is first extracted and each node is split into four nodes of a lower layer. And, for a node corresponding to a leaf node of QT, a second flag (BT_split_flag) and split type (split direction) information related to BT splitting are extracted and the corresponding leaf node is split into a BT structure.

다른 예로서, QTBTTT 구조를 이용하여 CTU를 분할하는 경우, 먼저 QT의 분할과 관련된 제1 플래그(QT_split_flag)를 추출하여 각 노드를 하위 레이어의 네 개의 노드로 분할한다. 그리고, QT의 리프 노드에 해당하는 노드에 대해서는 BT 또는 TT로 더 분할되는지 여부를 지시하는 분할 플래그(split_flag) 및 분할 타입(또는, 분할 방향) 정보, BT 구조 인지 TT 구조 인지를 구별하는 추가 정보를 추출한다. 이를 통해 QT의 리프 노드 이하의 각 노드들을 BT 또는 TT 구조로 반복적으로(recursively) 분할한다.As another example, when a CTU is split using the QTBTTT structure, a first flag (QT_split_flag) related to QT splitting is first extracted and each node is split into four nodes of a lower layer. And, for a node corresponding to a leaf node of QT, a split flag (split_flag) indicating whether to be further split into BT or TT and split type (or split direction) information, additional information for distinguishing whether a BT structure or a TT structure to extract Through this, each node below the leaf node of QT is recursively divided into a BT or TT structure.

한편, 복호화부(410)는 트리 구조의 분할을 통해 복호화하고자 하는 현재 블록을 결정하게 되면, 현재 블록이 인트라 예측되었는지 아니면 인터 예측되었는지를 지시하는 예측 타입에 대한 정보를 추출한다.Meanwhile, when the decoding unit 410 determines a current block to be decoded through division of the tree structure, information on a prediction type indicating whether the current block is intra-predicted or inter-predicted is extracted.

예측 타입 정보가 인트라 예측을 지시하는 경우, 복호화부(410)는 현재 블록의 인트라 예측정보(인트라 예측 모드)에 대한 신택스 요소를 추출한다.When the prediction type information indicates intra prediction, the decoder 410 extracts a syntax element for intra prediction information (intra prediction mode) of the current block.

다음으로, 예측 타입 정보가 인터 예측을 지시하는 경우, 복호화부(410)는 인터 예측정보에 대한 신택스 요소, 즉, 움직임 벡터 및 그 움직임 벡터가 참조하는 참조 픽처를 나타내는 정보를 추출한다.Next, when the prediction type information indicates inter prediction, the decoder 410 extracts a syntax element for the inter prediction information, that is, information indicating a motion vector and a reference picture referenced by the motion vector.

한편, 복호화부(410)는 잔차 신호에 대한 정보로서 현재 블록의 양자화된 변환계수들에 대한 정보를 추출한다.Meanwhile, the decoder 410 extracts information on the quantized transform coefficients of the current block as information on the residual signal.

역양자화부(420)는 양자화된 변환계수들을 역양자화하고, 역변환부(430)는 역양자화된 변환계수들을 주파수 도메인으로부터 공간 도메인으로 역변환하여 잔차 신호들을 복원함으로써 현재 블록에 대한 잔차 블록을 생성한다.The inverse quantization unit 420 inverse quantizes the quantized transform coefficients, and the inverse transform unit 430 inverse transforms the inverse quantized transform coefficients from the frequency domain to the spatial domain to restore residual signals to generate a residual block for the current block. .

예측부(440)는 인트라 예측부(442) 및 인터 예측부(444)를 포함한다. 인트라 예측부(442)는 현재 블록의 예측 타입이 인트라 예측일 때 활성화되고, 인터 예측부(444)는 현재 블록의 예측 타입이 인터 예측일 때 활성화된다.The prediction unit 440 includes an intra prediction unit 442 and an inter prediction unit 444 . The intra prediction unit 442 is activated when the prediction type of the current block is intra prediction, and the inter prediction unit 444 is activated when the prediction type of the current block is inter prediction.

인트라 예측부(442)는 복호화부(410)로부터 추출된 인트라 예측 모드에 대한 신택스 요소로부터 복수의 인트라 예측 모드 중 현재 블록의 인트라 예측 모드를 결정하고, 결정된 인트라 예측 모드에 따라 현재 블록 주변의 참조 픽셀들을 이용하여 현재 블록을 예측한다.The intra prediction unit 442 determines the intra prediction mode of the current block from among the plurality of intra prediction modes from the syntax element for the intra prediction mode extracted from the decoder 410, and references the vicinity of the current block according to the determined intra prediction mode. Predict the current block using pixels.

현재 블록에 대한 인트라 예측 모드가 CNN 모드로 결정된 경우, 인트라 예측부(442)는 영상 부호화 장치에 의해 결정된 컨볼루션 커널의 계수(즉, 필터 계수)를 이용하여 CNN의 추론 과정을 수행함으로써 현재 블록을 예측한다.When the intra prediction mode for the current block is determined to be the CNN mode, the intra prediction unit 442 performs an inference process of the CNN using the coefficients (ie, filter coefficients) of the convolution kernel determined by the image encoding apparatus to the current block. predict

인터 예측부(444)는 복호화부(410)로부터 추출된 인터 예측 모드에 대한 신택스 요소를 이용하여 현재 블록의 움직임 벡터와 그 움직임 벡터가 참조하는 참조 픽처를 결정하고, 결정된 움직임 벡터와 참조 픽처를 이용하여 현재 블록을 예측한다.The inter prediction unit 444 determines a motion vector of the current block and a reference picture to which the motion vector refers by using the syntax element for the inter prediction mode extracted from the decoder 410, and combines the determined motion vector and the reference picture. to predict the current block.

가산기(450)는 역변환부(430)로부터 출력되는 잔차 블록과 인터 예측부(444) 또는 인트라 예측부(442)로부터 출력되는 예측 블록을 가산하여 현재 블록을 복원한다. 복원된 현재 블록 내의 픽셀들은 이후에 복호화할 블록을 인트라 예측할 때의 참조픽셀로서 활용된다.The adder 450 reconstructs the current block by adding the residual block output from the inverse transform unit 430 and the prediction block output from the inter prediction unit 444 or the intra prediction unit 442 . Pixels in the reconstructed current block are used as reference pixels when intra-predicting a block to be decoded later.

영상 복원기(4000)에 의해 CU들에 해당하는 현재 블록들을 순차적으로 복원함으로써, CU들로 구성된 CTU 및 CTU들로 구성된 픽처가 복원된다.By sequentially reconstructing current blocks corresponding to CUs by the image reconstructor 4000 , a CTU composed of CUs and a picture composed of CTUs are reconstructed.

필터부(460)는 디블록킹 필터(462) 및 SAO 필터(464)를 포함한다. 디블록킹 필터(462)는 블록 단위의 복호화로 인해 발생하는 블록킹 현상(blocking artifact)를 제거하기 위해 복원된 블록 간의 경계를 디블록킹 필터링한다. SAO 필터(464)는, 손실 부호화(lossy coding)로 인해 발생하는 복원된 픽셀과 원본 픽셀 간의 차이를 보상하기 위해, 디블록킹 필터링 이후의 복원된 블록에 대해 추가적인 필터링을 수행한다. 디블록킹 필터(462) 및 SAO 필터(464)를 통해 필터링된 복원 블록은 메모리(470)에 저장된다. 한 픽처 내의 모든 블록들이 복원되면, 복원된 픽처는 이후에 부호화하고자 하는 픽처 내의 블록을 인터 예측하기 위한 참조 픽처로 이용된다.The filter unit 460 includes a deblocking filter 462 and an SAO filter 464 . The deblocking filter 462 deblocks and filters the boundary between the reconstructed blocks in order to remove a blocking artifact caused by block-by-block decoding. The SAO filter 464 performs additional filtering on the reconstructed block after deblocking filtering in order to compensate for a difference between the reconstructed pixel and the original pixel caused by lossy coding. The reconstruction block filtered through the deblocking filter 462 and the SAO filter 464 is stored in the memory 470 . When all blocks in one picture are reconstructed, the reconstructed picture is used as a reference picture for inter prediction of blocks within a picture to be encoded later.

이하, 첨부된 도면을 참조하여 본 실시예에 따른 CNN 기반 인트라 예측부에 대해 상세하게 설명하기로 한다.Hereinafter, a CNN-based intra prediction unit according to the present embodiment will be described in detail with reference to the accompanying drawings.

도 5는 본 실시예에 따른 영상 부호화 장치 측 CNN 예측부의 구성을 나타내는 블록도이다.5 is a block diagram showing the configuration of a CNN prediction unit on the side of the video encoding apparatus according to the present embodiment.

도 5를 참조하면, CNN 예측부(500)는 블록 분할부로부터 전달된 부호화 대상 영상(즉, 원본 영상) 및 가산기로부터 전달된 복원 영상에 대해 CNN 기반의 인트라 예측을 수행하여 예측 블록을 생성할 수 있다. 이를 위해, CNN 예측부(500)는 CNN 설정부(510) 및 CNN 실행부(530)를 포함할 수 있다.5, the CNN prediction unit 500 generates a prediction block by performing CNN-based intra prediction on the encoding target image (ie, the original image) transmitted from the block divider and the reconstructed image transmitted from the adder. can To this end, the CNN prediction unit 500 may include a CNN setting unit 510 and a CNN execution unit 530 .

CNN 설정부(510)는 복수의 레이어로 구성된 CNN을 이용하여 지도 학습(supervised learning)을 수행함으로써 필터 계수 즉, 컨볼루션 커널의 계수들을 산출할 수 있다. 여기서, CNN의 구조는 도 1을 참조하여 전술한 바와 같으며, CNN은 레이어의 크기를 조절하기 위해 업샘플링 레이어 또는 풀링 레이어를 더 포함하여 구성될 수 있다.The CNN setting unit 510 may calculate filter coefficients, ie, coefficients of a convolution kernel, by performing supervised learning using a CNN composed of a plurality of layers. Here, the structure of the CNN is as described above with reference to FIG. 1 , and the CNN may be configured to further include an upsampling layer or a pooling layer to adjust the size of the layer.

입력 레이어로 입력되는 영상 데이터(이하, '입력 데이터'라 칭함)는 현재 블록보다 먼저 부호화된 참조 영역으로 구성될 수 있다.Image data input to the input layer (hereinafter, referred to as 'input data') may be configured as a reference region encoded before the current block.

참조 영역은 현재 블록에 인접한 주변 영역, 및 현재 블록을 구성하는 루마 블록 및 크로마 블록들 중 부호화하고자 하는 성분의 블록보다 먼저 부호화된 성분의 블록(이하, '다른 채널의 현재 블록'이라 칭함) 중 적어도 하나의 블록(또는, 영역)을 포함할 수 있다. 여기서, 주변 영역은 현재 블록과 동일 채널의 영역일 수도 있고 다른 채널의 영역일 수도 있다. 또한, 주변 영역은 블록 단위(즉, 주변 블록)로 구성될 수도 있고 픽셀 단위(즉, 주변 픽셀 또는 주변 라인)로 구성될 수 있다. 참조 영역은 주변 영역의 픽셀값들을 평균 연산하여 생성된 새로운 영역(즉, 평균 블록, 평균 픽셀 또는 평균 라인)을 더 포함할 수 있다.The reference region is one of the neighboring regions adjacent to the current block and blocks of a component coded before a block of a component to be coded among luma blocks and chroma blocks constituting the current block (hereinafter, referred to as a 'current block of another channel'). It may include at least one block (or region). Here, the neighboring area may be an area of the same channel as the current block or an area of a different channel. In addition, the peripheral region may be configured in block units (ie, neighboring blocks) or may be configured in units of pixels (ie, peripheral pixels or peripheral lines). The reference region may further include a new region (ie, an average block, an average pixel, or an average line) generated by averaging pixel values of the surrounding region.

도 6은 CNN의 입력 데이터로 이용될 수 있는 주변 영역에 대한 예시도이다. 구체적으로, 도 6의 (a)는 블록 단위의 주변 영역을 나타내고, 도 6의 (b)는 픽셀 단위의 주변 영역을 나타낸다.6 is an exemplary diagram of a surrounding area that can be used as input data of CNN. Specifically, FIG. 6A shows the peripheral area in units of blocks, and FIG. 6B shows the peripheral area in units of pixels.

도 6의 (a)를 참조하면, 블록 단위의 참조 영역 즉, 주변 블록들은 현재 블록(X)에 인접한 좌측블록(C), 상단블록(B), 우상단블록(D), 좌하단블록(E), 좌상단블록(A)을 포함할 수 있다. 본 명세서에서, 주변 블록들의 원본 블록(즉, 부호화 되지 않은 블록), 예측 블록 및 복원 블록은 서로 다르게 표기한다. 예컨대, 좌상단 블록(A)에 대하여, 그 원본 블록은 'Ao'로 표기하고, 예측 블록은 'Ap'로 표기하며, 복원 블록은 'Ar'로 표기한다. 또한, 주변 블록들(A,B,C,D,E)의 픽셀값들을 평균 연산한 평균 블록은 'F'로 표기한다.Referring to (a) of FIG. 6 , the block unit reference area, that is, the neighboring blocks, is a left block (C), an upper block (B), an upper right block (D), and a lower left block (E) adjacent to the current block (X). ), the upper left block (A) may be included. In the present specification, an original block (ie, an uncoded block), a prediction block, and a reconstructed block of neighboring blocks are denoted differently. For example, with respect to the upper left block (A), the original block is denoted by 'Ao', the prediction block is denoted by 'Ap', and the reconstructed block is denoted by 'Ar'. In addition, the average block obtained by averaging the pixel values of the neighboring blocks A, B, C, D, E is denoted by 'F'.

도 6의 (b)를 참조하면, 픽셀 단위의 참조 영역은 현재 블록(X)에 인접한 '1×1'의 픽셀, 및 '1×n' 또는 'n×1'의 라인들을 포함할 수 있다. 참고로, 블록 단위의 참조 영역은 픽셀 단위의 참조 영역보다 컨볼루션 커널의 적용 범위가 넓으므로, CNN의 학습 과정 및 추론 과정의 정확도를 향상시킬 수 있다. 이하, 설명의 편의를 위해 참조 영역은 블록 단위인 것을 전제로 본 실시예를 설명하기로 한다.Referring to FIG. 6B , the pixel unit reference region may include '1×1' pixels and '1×n' or 'n×1' lines adjacent to the current block X. . For reference, since the block-unit reference region has a wider application range of the convolution kernel than the pixel-unit reference region, it is possible to improve the accuracy of the CNN learning process and inference process. Hereinafter, for convenience of description, the present embodiment will be described on the assumption that the reference area is in block units.

한편, YCbCr 4:2:0 또는 4:2:2 형식에서, 크로마 블록은 원래의 크기대로 사용되거나, 또는 루마 블록과 크기가 같아지도록 업샘플링 레이어를 이용하여 업스케일(up-scaled)된 후 사용될 수 있다.On the other hand, in the YCbCr 4:2:0 or 4:2:2 format, the chroma block is used in its original size, or is up-scaled using an upsampling layer to have the same size as the luma block. can be used

현재 블록과는 다른 채널의 주변 블록들이 입력 레이어로 입력되는 경우, 도 6의 (a)에 도시된 주변 블록들(Ar, Br, Cr, Dr, Er) 이외에 현재 블록(X)의 우측블록, 하측블록 및 우하측블록 중 하나 이상의 블록들(미도시)이 입력 레이어로 더 입력될 수 있다. 예컨대, 현재 블록이 크로마 블록인 경우, 이미 부호화가 완료된 루마 채널의 현재 블록의 우측블록, 하측블록 및 우하측블록 중 하나 이상의 블록들을 입력 데이터로 추가함으로써 인트라 예측의 정확도를 향상시킬 수 있다.When neighboring blocks of a channel different from the current block are input to the input layer, the right block of the current block (X) in addition to the neighboring blocks (Ar, Br, Cr, Dr, Er) shown in (a) of FIG. 6 ; One or more blocks (not shown) among the lower block and the lower right block may be further input to the input layer. For example, when the current block is a chroma block, the accuracy of intra prediction may be improved by adding one or more blocks among a right block, a lower block, and a lower right block of the current block of the luma channel that have already been encoded as input data.

도 7은 복수의 주변 블록들로부터 CNN의 입력 레이어를 구성한 일 예를 나타내는 도면이다.7 is a diagram illustrating an example of configuring an input layer of a CNN from a plurality of neighboring blocks.

입력 레이어는 도 7의 (a)에 도시된 바와 같이 주변 블록들(Ar, Br, Cr, Dr, Er)별 복수의 레이어로 구성될 수도 있고, 도 7의 (b)에 도시된 바와 같이 복수의 주변 블록들(Ar, Br)이 통합되어 하나의 레이어로 구성될 수도 있다.The input layer may be composed of a plurality of layers for each of the neighboring blocks (Ar, Br, Cr, Dr, Er) as shown in (a) of FIG. 7, and as shown in (b) of FIG. The neighboring blocks Ar and Br of may be integrated to form one layer.

다시 도 5로 돌아와, 출력 레이어에서 출력되는 영상 데이터(이하, '출력 데이터'라 칭함)는 현재 블록의 예측 블록일 수 있다. 이 경우, 출력 레이블(label)은 출력 데이터와의 비교를 통한 지도 학습을 위해 현재 블록의 원본 블록(즉, 부호화 되지 않은 블록)으로 구성될 수 있다.Returning to FIG. 5 , image data output from the output layer (hereinafter referred to as 'output data') may be a prediction block of the current block. In this case, the output label may be composed of an original block (ie, an uncoded block) of the current block for supervised learning through comparison with output data.

CNN 레이어의 몇가지 구성예는 표 1과 같이 정리할 수 있다. 다만, 이는 예시적인 것이고 본 실시예를 한정하는 것은 아니라는 점에 유의하여야 한다.Some configuration examples of the CNN layer can be summarized as shown in Table 1. However, it should be noted that this is illustrative and does not limit the present embodiment.

CNN 레이어 예시#CNN Layer Example# 입력 레이어input layer 출력 레이어output layer 데이터data 데이터data 레이블(label)label 예시 1Example 1 현재 블록과 동일 채널의 주변 블록들Neighboring blocks of the same channel as the current block 현재 블록의 예측 블록Prediction block of current block 현재 블록의 원본 블록original block of current block 예시 2Example 2 현재 블록과 다른 채널의 현재 블록Current block on a different channel than the current block 현재 블록의 예측 블록Prediction block of current block 현재 블록의 원본 블록original block of current block 예시 3Example 3 현재 블록과 동일 채널의 주변 블록들 및 이들의 평균 블록,
다른 채널의 현재 블록neighboring blocks of the same channel as the current block and their average blocks;
Current block on another channel 현재 블록의 예측 블록Prediction block of current block 현재 블록의 원본 블록original block of current block 예시 4Example 4 현재 블록과 동일 채널의 주변 블록들 및 이들의 평균 블록,
다른 채널의 현재 블록,
다른 채널의 주변 블록들 및 이들의 평균 블록neighboring blocks of the same channel as the current block and their average blocks;
current block on another channel,
Neighboring blocks of other channels and their average blocks 현재 블록의 예측 블록Prediction block of current block 현재 블록의 원본 블록original block of current block

표 1을 참조하면, CNN 레이어의 구성예에서, 입력 레이어의 데이터는 다양한 조합으로 구성될 수 있으며, 출력 레이어의 데이터는 현재 블록의 예측 블록이고, 출력 레이어의 레이블은 현재 블록의 원본 블록이다. 입력 데이터 및 출력 데이터는 각각 CNN의 학습 과정 및 추론 과정에서 동일하여야 한다.Referring to Table 1, in the configuration example of the CNN layer, the data of the input layer may be configured in various combinations, the data of the output layer is the prediction block of the current block, and the label of the output layer is the original block of the current block. The input data and output data should be the same in the learning process and inference process of CNN, respectively.

한편, CNN 설정부(510)는 출력 데이터와 출력 레이블 사이의 오차를 최소화하고 인트라 예측의 정확도를 향상시키기 위해 힌트 정보를 설정할 수 있다. 여기서, 힌트 정보는 인트라 예측의 방향성 정보, 현재 블록 또는 참조 영역의 양자화 파라미터(quantization parameter, QP), 및 주변 블록의 변환 계수들 또는 잔차 신호들의 절대 합(즉, 잔차의 양) 중 적어도 하나를 포함할 수 있다. 그리고, 힌트 정보는 비트스트림을 통해 영상 복호화 장치에 전달되어 현재 블록을 복호화하는 데 이용될 수 있다.Meanwhile, the CNN setting unit 510 may set hint information in order to minimize the error between the output data and the output label and to improve the accuracy of intra prediction. Here, the hint information includes at least one of directional information of intra prediction, a quantization parameter (QP) of a current block or reference region, and an absolute sum (ie, amount of residual) of transform coefficients or residual signals of a neighboring block. may include In addition, the hint information may be transmitted to the image decoding apparatus through a bitstream and used to decode the current block.

도 8은 주변 블록들의 픽셀값 형태에 기초한 현재 블록의 예측 방향에 대한 예시도이다.8 is an exemplary diagram of a prediction direction of a current block based on pixel value shapes of neighboring blocks.

도 8에서, 현재 블록(X)의 주변 블록들은 좌상단블록(A), 상단블록(B) 및 좌측블록(C)로 구성된다.In FIG. 8 , the neighboring blocks of the current block X are composed of an upper left block (A), an upper block (B), and a left block (C).

도 8의 (a)를 참조하면, 주변 블록(A,B,C)들의 픽셀값 형태를 살펴보면, 좌상단블록(A)은 절반 가량이 하얀색을 띄고, 좌측블록(C)은 대부분이 하얀색을 띄지만, 상단블록(B)은 대부분이 하얀색 이외의 색을 띈다. 현재 블록(X)의 픽셀값 형태가 대부분 하얀색을 띄는 점을 고려할 때, 가로 방향(수평 방향)의 인트라 예측을 수행하는 것이 예측 정확도를 가장 높일 수 있음을 알 수 있다.Referring to (a) of FIG. 8 , looking at the shape of the pixel values of the neighboring blocks A, B, and C, the upper left block (A) has about half of it white, and the left block (C) has mostly white color. However, most of the upper block (B) has a color other than white. Considering that the shape of the pixel value of the current block X is mostly white, it can be seen that performing the intra prediction in the horizontal direction (horizontal direction) can increase the prediction accuracy the most.

도 8의 (b)를 참조하면, 주변 블록(A,B,C)들의 픽셀값 형태를 살펴보면, 좌상단블록(A) 및 좌측블록(C)은 대부분이 하얀색을 띄지만, 상단블록(B)은 대부분이 하얀색 이외의 색을 띈다. 현재 블록(X)의 픽셀값 형태가 대부분 하얀색 이외의 색을 띄는 점을 고려할 때, 세로 방향(수직 방향)의 인트라 예측을 수행하는 것이 예측 정확도를 가장 높일 수 있음을 알 수 있다.Referring to (b) of FIG. 8 , looking at the shape of the pixel values of the neighboring blocks A, B, and C, the upper left block (A) and the left block (C) are mostly white, but the upper block (B) Most of them have a color other than white. Considering that most pixel values of the current block X have a color other than white, it can be seen that performing the intra prediction in the vertical direction (vertical direction) can increase the prediction accuracy the most.

따라서, 본 실시예에 따른 CNN 예측부(500)는 인트라 예측의 방향성 정보를 CNN의 학습 과정 및 추론 과정의 힌트 정보로서 이용함으로써 인트라 예측의 정확도를 향상시키고자 한다.Accordingly, the CNN prediction unit 500 according to the present embodiment intends to improve the accuracy of the intra prediction by using the direction information of the intra prediction as hint information of the learning process and the inference process of the CNN.

인트라 예측의 방향성 정보는 도 3을 참조하여 전술한 65 개의 방향성 모드 및 비방향성 모드를 지시하는 인트라 예측 모드 번호일 수 있다. 그리고, 하나 이상의 예측 방향성 정보를 포함하는 힌트 정보는 부호화부(250)에 의해 부호화되어 영상 복호화 장치로 전달될 수 있다.The intra-prediction directional information may be an intra-prediction mode number indicating the 65 directional modes and the non-directional modes described above with reference to FIG. 3 . In addition, hint information including one or more prediction direction information may be encoded by the encoder 250 and transmitted to the image decoding apparatus.

이 때, 힌트 정보를 부호화하는 데 소요되는 비트량을 최소화하기 위해 다양한 방법이 사용될 수 있다. 예컨대, CNN 설정부(510)는 65 개의 예측 방향 중 일부(예: horizontal 방향, vertical 방향, diagonal down-right 방향, diagonal up-right 방향 등)를 대표 방향으로 선택하고, 선택된 대표 방향 중 어느 하나를 현재 블록의 인트라 예측을 위한 힌트 정보로 설정할 수 있다. 그리고, CNN 설정부(510)는 MPM(most probable mode)과 유사한 방식으로 힌트 정보를 영상 복호화 장치로 전달할 수 있다.In this case, various methods may be used to minimize the amount of bits required to encode hint information. For example, the CNN setting unit 510 selects some of the 65 prediction directions (eg, a horizontal direction, a vertical direction, a diagonal down-right direction, a diagonal up-right direction, etc.) as a representative direction, and selects any one of the selected representative directions. may be set as hint information for intra prediction of the current block. In addition, the CNN setting unit 510 may transmit hint information to the image decoding apparatus in a manner similar to the most probable mode (MPM).

힌트 정보는 양자화의 세기를 나타내는 양자화 파라미터(quantization parameter, QP)를 포함할 수 있다. 여기서, QP는 현재 블록 또는 참조 영역의 양자화 과정에 적용된 QP 값일 수 있다.The hint information may include a quantization parameter (QP) indicating the strength of quantization. Here, QP may be a QP value applied to the quantization process of the current block or reference region.

힌트 정보는 잔차의 양을 포함할 수 있다. 여기서, 잔차의 양은 주변 블록의 변환 계수들 또는 잔차 신호들의 절대값의 합일 수 있다.The hint information may include an amount of residual. Here, the amount of residual may be the sum of absolute values of transform coefficients or residual signals of neighboring blocks.

힌트 정보는 하나 이상의 맵으로 구성되어 CNN의 레이어에 연접(concatenation)될 수 있다. 힌트 정보에 대한 맵은 입력 레이어와 출력 레이어 사이의 다양한 위치에서 연접될 수 있다. 예컨대, 힌트 정보에 대한 맵은 도 9에 도시된 바와 같이 입력 레이어의 직후에 연접될 수도 있고, 또는 출력 레이어의 직전에 연접될 수도 있다.Hint information may be composed of one or more maps and concatenated to a layer of CNN. The map for hint information may be concatenated at various positions between the input layer and the output layer. For example, the map for hint information may be concatenated immediately after the input layer as shown in FIG. 9 , or may be concatenated immediately before the output layer.

한편, 입력 데이터는 인트라 예측의 방향성에 따라 다양한 조합으로 구성될 수 있다. 예컨대, 인트라 예측의 방향성이 가로 방향(수평 방향)인 경우, 입력 데이터는 현재 블록(X)의 좌측 주변 블록들(Ar, Cr, Er) 중에서 선택된 하나 이상의 블록들 및 이들의 평균 블록으로 구성될 수 있다. 반대로, 인트라 예측의 방향성이 세로 방향(수직 방향)인 경우, 입력 데이터는 현재 블록(X)의 상측 주변 블록들(Ar, Br, Dr) 중에서 선택된 하나 이상의 블록들 및 이들의 평균 블록으로 구성될 수 있다.Meanwhile, input data may be configured in various combinations according to the direction of intra prediction. For example, when the intra prediction direction is a horizontal direction (horizontal direction), the input data may be composed of one or more blocks selected from among the left neighboring blocks (Ar, Cr, Er) of the current block X and an average block thereof. can Conversely, when the direction of intra prediction is the vertical direction (vertical direction), the input data may be composed of one or more blocks selected from the upper neighboring blocks (Ar, Br, Dr) of the current block X and an average block thereof. can

CNN 설정부(510)는 출력 데이터와 출력 레이블 사이의 오차를 최소화하기 위해 오류 역전파(error backpropagation) 알고리즘을 이용한 반복된 학습 과정을 통해 필터 계수를 산출할 수 있다. 구체적으로, 출력 데이터와 출력 레이블 사이의 오차는 CNN의 출력 레이어로부터 히든 레이어를 거쳐 입력 레이어로 향하는 역방향으로 전파될 수 있다. 오차의 전파 과정에서, 노드들 사이의 연결 가중치들은 해당 오차를 감소시키는 방향으로 업데이트될 수 있다. 그리고, 해당 오차가 소정의 임계치 미만이 될 때까지 CNN 설정부(510)는 오류 역전파 알고리즘을 이용하여 CNN의 학습 과정을 반복함으로써 필터 계수를 산출할 수 있다.The CNN setting unit 510 may calculate filter coefficients through an iterative learning process using an error backpropagation algorithm in order to minimize an error between the output data and the output label. Specifically, the error between the output data and the output label may be propagated in the reverse direction from the output layer of the CNN to the input layer through the hidden layer. In the error propagation process, connection weights between nodes may be updated in a direction to reduce the corresponding error. Then, the CNN setting unit 510 may calculate the filter coefficients by repeating the learning process of the CNN using the error backpropagation algorithm until the corresponding error is less than a predetermined threshold.

이상의 필터 계수 산출 과정은 소정의 단위(예: CU, CTU, 슬라이스, 프레임, 또는 시퀀스(프레임들의 그룹) 단위)로 수행될 수 있다. 예컨대, CNN 설정부(1010)는 현재 블록마다 필터 계수를 산출할 수도 있고, 프레임마다 필터 계수를 산출할 수도 있다.The above filter coefficient calculation process may be performed in a predetermined unit (eg, CU, CTU, slice, frame, or sequence (group of frames) unit). For example, the CNN setting unit 1010 may calculate a filter coefficient for each current block or may calculate a filter coefficient for each frame.

필터 계수가 프레임 단위로 산출되는 경우, 해당 필터 계수는 해당 프레임에 포함된 복수의 현재 블록들의 인트라 예측에 공통적으로 이용될 수 있다. 이 때, 힌트 정보 중 하나인 예측 방향성 정보도 복수 개일 수 있다. 예컨대, 인트라 예측 방향성 정보가 하나의 맵으로 구성되는 경우, 하나의 맵은 복수 개의 방향성 값을 포함할 수 있다.When the filter coefficients are calculated in units of frames, the corresponding filter coefficients may be commonly used for intra prediction of a plurality of current blocks included in the corresponding frame. In this case, there may be a plurality of prediction directionality information, which is one of the hint information. For example, when the intra prediction directionality information consists of one map, one map may include a plurality of directionality values.

산출된 필터 계수에 대한 정보는 비트스트림을 통해 영상 복호화 장치로 전달되어 영상 복호화 과정에 이용될 수 있다.Information on the calculated filter coefficients may be transmitted to an image decoding apparatus through a bitstream and used in an image decoding process.

또한, CNN 설정부(510)는 소정의 샘플 영상들을 이용하여 복수의 필터 계수를 미리 산출하여 필터 계수 세트를 구성할 수도 있다. 이 경우, CNN 설정부(510)는 해당 세트에서 소정의 기준에 따라 선택된 하나의 필터 계수를 현재 블록을 위한 필터 계수로 설정할 수 있다. 예컨대, CNN 설정부(510)는 현재 블록과 샘플 영상들 간의 픽셀값의 유사성에 기초하여 해당 세트에서 하나의 필터 계수를 선택할 수 있다. 또는, CNN 설정부(510)는 1회의 학습 과정을 통해 산출된 필터 계수와 가장 근사한 필터 계수를 해당 세트에서 선택할 수도 있다. 필터 계수의 선택 정보, 예컨대 인덱스 정보는 비트스트림을 통해 영상 복호화 장치로 전달되어 영상 복호화 과정에 이용될 수 있다.Also, the CNN setting unit 510 may configure a filter coefficient set by pre-calculating a plurality of filter coefficients using predetermined sample images. In this case, the CNN setting unit 510 may set one filter coefficient selected according to a predetermined criterion in the corresponding set as the filter coefficient for the current block. For example, the CNN setting unit 510 may select one filter coefficient from the set based on the similarity of pixel values between the current block and sample images. Alternatively, the CNN setting unit 510 may select a filter coefficient most similar to the filter coefficient calculated through one learning process from the corresponding set. Filter coefficient selection information, eg, index information, may be transmitted to an image decoding apparatus through a bitstream and used in an image decoding process.

한편, 도 5에서는 CNN 설정부(510)가 CNN 예측부(500)에 포함되는 것으로 도시하고 있으나, 이는 예시적인 것이고, 본 실시예가 이에 한정되는 것은 아니라는 점에 유의하여야 한다. 즉, CNN 설정부(510)는 CNN 예측부(500)와는 별도의 유닛으로 구현될 수도 있고, CNN 실행부(530)와 통합되어 하나의 유닛으로 구현될 수도 있다.Meanwhile, although FIG. 5 shows that the CNN setting unit 510 is included in the CNN prediction unit 500, it should be noted that this is an example, and the present embodiment is not limited thereto. That is, the CNN setting unit 510 may be implemented as a unit separate from the CNN prediction unit 500 , or may be implemented as a single unit by being integrated with the CNN execution unit 530 .

CNN 실행부(530)는 CNN 설정부(510)에 의해 설정된 필터 계수 즉, 컨볼루션 커널의 계수값들을 이용하여 입력 데이터에 대해 CNN 기반의 추론 과정을 수행함으로써, 출력 데이터 즉, 현재 블록에 대한 예측 블록을 생성할 수 있다. 이 때, 생성된 예측 블록은 영상 부호화 장치의 감산기로 전달되어, 현재 블록으로부터 잔차 블록을 생성하는 데 이용될 수 있다.The CNN execution unit 530 performs a CNN-based reasoning process on the input data using the filter coefficients set by the CNN setting unit 510, that is, the coefficient values of the convolution kernel, so that the output data, that is, the current block A prediction block can be generated. In this case, the generated prediction block may be transmitted to a subtractor of the image encoding apparatus and used to generate a residual block from the current block.

도 10은 본 실시예에 따른 영상 복호화 장치 측 CNN 예측부의 구성을 나타내는 블록도이다. 도 10의 CNN 예측부는 도 5의 CNN 예측부와 입력되는 신호 및 필터 계수 즉, 컨볼루션 커널의 계수값들을 설정하는 방법에 있어서만 차이가 있으며, 서로 중복되는 내용에 대한 설명은 생략하거나 간략히 하기로 한다.10 is a block diagram illustrating the configuration of a CNN prediction unit on the side of the image decoding apparatus according to the present embodiment. The CNN prediction unit of FIG. 10 differs from the CNN prediction unit of FIG. 5 only in the method of setting the input signal and filter coefficients, that is, the coefficient values of the convolution kernel, and the description of overlapping content will be omitted or simplified. do it with

도 10을 참조하면, CNN 예측부(1000)는 복원 영상을 기초로 CNN 기반의 인트라 예측을 수행하여 예측 블록을 생성할 수 있다. 이를 위해, CNN 예측부(1000)는 CNN 설정부(1010) 및 CNN 실행부(1030)를 포함할 수 있다.Referring to FIG. 10 , the CNN prediction unit 1000 may generate a prediction block by performing CNN-based intra prediction based on a reconstructed image. To this end, the CNN prediction unit 1000 may include a CNN setting unit 1010 and a CNN execution unit 1030 .

CNN의 구조는 도 1을 참조하여 전술한 바와 같으며, CNN은 레이어의 크기를 조절하기 위해 업샘플링 레이어 또는 풀링 레이어를 더 포함하여 구성될 수 있다.The structure of the CNN is as described above with reference to FIG. 1 , and the CNN may be configured to further include an upsampling layer or a pooling layer to adjust the size of the layer.

입력 레이어로 입력되는 영상 데이터(이하, '입력 데이터'라 칭함)는 현재 블록보다 먼저 복호화된 참조 영역으로 구성될 수 있다.Image data input to the input layer (hereinafter, referred to as 'input data') may be configured as a reference region decoded before the current block.

참조 영역은 현재 블록에 인접한 주변 영역, 및 현재 블록을 구성하는 루마 블록 및 크로마 블록들 중 복호화하고자 하는 성분의 블록보다 먼저 복호화된 성분의 블록(이하, '다른 채널의 현재 블록'이라 칭함) 중 적어도 하나의 블록(또는, 영역)을 포함할 수 있다. 여기서, 주변 영역은 현재 블록과 동일 채널의 영역일 수도 있고 다른 채널의 영역일 수도 있다. 또한, 주변 영역은 블록 단위(즉, 주변 블록)로 구성될 수도 있고 픽셀 단위(즉, 주변 픽셀 또는 주변 라인)로 구성될 수 있다.The reference region includes a neighboring region adjacent to the current block, and blocks of a component decoded earlier than a block of a component to be decoded among luma blocks and chroma blocks constituting the current block (hereinafter, referred to as a 'current block of another channel'). It may include at least one block (or region). Here, the neighboring area may be an area of the same channel as the current block or an area of a different channel. In addition, the peripheral region may be configured in block units (ie, neighboring blocks) or may be configured in units of pixels (ie, peripheral pixels or peripheral lines).

참조 영역은 주변 영역의 픽셀값들을 평균 연산하여 생성된 새로운 영역(즉, 평균 블록, 평균 픽셀 또는 평균 라인)을 더 포함할 수 있다. 예컨대, 입력 데이터는 현재 블록과 동일 채널의 주변 블록들 및 이들의 평균 블록과 다른 채널의 현재 블록으로 구성될 수 있다.The reference region may further include a new region (ie, an average block, an average pixel, or an average line) generated by averaging pixel values of the surrounding region. For example, the input data may be composed of neighboring blocks of the same channel as the current block, and an average block thereof and a current block of a different channel.

이하, 설명의 편의를 위해 참조 영역은 블록 단위인 것을 전제로 본 실시예를 설명하기로 한다.Hereinafter, for convenience of description, the present embodiment will be described on the assumption that the reference area is in block units.

도 7을 참조하여 전술한 바와 같이, 입력 레이어는 주변 블록들 별로 복수의 레이어로 구성될 수도 있고, 복수의 주변 블록들이 통합되어 하나의 레이어로 구성될 수도 있다.As described above with reference to FIG. 7 , the input layer may be composed of a plurality of layers for each neighboring block, or a plurality of neighboring blocks may be integrated to form one layer.

출력 레이어에서 출력되는 영상 데이터(이하, '출력 데이터'라 칭함)는 현재 블록의 예측 블록일 수 있다.Image data output from the output layer (hereinafter, referred to as 'output data') may be a prediction block of the current block.

CNN 레이어의 몇가지 구성예는 표 1을 참조하여 전술한 바와 같다. 다만, 이는 예시적인 것이고 본 실시예를 한정하는 것은 아니라는 점에 유의하여야 한다.Some configuration examples of the CNN layer are as described above with reference to Table 1. However, it should be noted that this is illustrative and does not limit the present embodiment.

CNN 설정부(1010)는 영상 부호화 장치로부터 전달된 힌트 정보를 이용하여 하나 이상의 맵을 구성한 후, 입력 레이어와 출력 레이어 사이의 다양한 위치에 연접(concatenation)시킬 수 있다.The CNN setting unit 1010 may configure one or more maps using hint information transmitted from the image encoding apparatus, and then concatenate them at various positions between the input layer and the output layer.

힌트 정보는 인트라 예측의 정확도를 향상시키기 위한 정보로서, 예측 방향성 정보, 현재 블록 또는 참조 영역의 양자화 파라미터(quantization parameter, QP), 및 주변 블록의 변환 계수들 또는 잔차 신호들의 절대 합(즉, 잔차의 양) 중 적어도 하나를 포함할 수 있다.Hint information is information for improving the accuracy of intra prediction, and is an absolute sum (ie, residual) of prediction direction information, a quantization parameter (QP) of a current block or reference region, and transform coefficients or residual signals of a neighboring block. of) may include at least one of

힌트 정보에 포함되는 예측 방향성 정보는, 65 개의 방향성 모드 및 비방향성 모드를 지시하는 인트라 예측 모드 번호일 수도 있고, 65 개의 방향성 모드에서 선택된 하나 이상의 대표 방향들 중 어느 하나를 지시하는 인덱스 정보일 수도 있다.The prediction directionality information included in the hint information may be an intra prediction mode number indicating 65 directional modes and non-directional modes, or index information indicating any one of one or more representative directions selected from 65 directional modes. have.

한편, 입력 데이터는 인트라 예측의 방향성에 따라 다양한 조합으로 구성될 수 있다. 예컨대, 인트라 예측의 방향성이 가로 방향(수평 방향)인 경우, 입력 데이터는 현재 블록의 좌측 주변 블록들 중에서 선택된 하나 이상의 블록들 및 이들의 평균 블록으로 구성될 수 있다. 반대로, 인트라 예측의 방향성이 세로 방향(수직 방향)인 경우, 입력 데이터는 현재 블록의 상측 주변 블록들 중에서 선택된 하나 이상의 블록들 및 이들의 평균 블록으로 구성될 수 있다.Meanwhile, input data may be configured in various combinations according to the direction of intra prediction. For example, when the intra prediction direction is a horizontal direction (horizontal direction), the input data may be composed of one or more blocks selected from the left neighboring blocks of the current block and an average block thereof. Conversely, when the intra prediction direction is a vertical direction (vertical direction), the input data may include one or more blocks selected from neighboring blocks above the current block and an average block thereof.

CNN 설정부(1010)는 영상 부호화 장치로부터 전달된 필터 계수를 현재 블록의 인트라 예측을 위한 필터 계수로 설정할 수 있다. 이 때, 필터 계수는 영상 부호화 장치에 의해 소정의 단위, 예컨대 CU 단위 또는 프레임 단위로 산출된 값일 수 있다.The CNN setting unit 1010 may set the filter coefficients transmitted from the image encoding apparatus as filter coefficients for intra prediction of the current block. In this case, the filter coefficient may be a value calculated in a predetermined unit, for example, a CU unit or a frame unit by the image encoding apparatus.

필터 계수가 프레임 단위로 설정되는 경우, 해당 필터 계수는 해당 프레임에 포함된 복수의 현재 블록들의 인트라 예측에 공통적으로 이용될 수 있다. 이 때, 힌트 정보 중 하나인 예측 방향성 정보도 복수 개일 수 있다. 예컨대, 인트라 예측의 방향성 정보는 하나의 맵으로 구성되나, 하나의 맵 안에는 복수 개의 방향성 값을 포함할 수 있다.When the filter coefficients are set in units of frames, the corresponding filter coefficients may be commonly used for intra prediction of a plurality of current blocks included in the corresponding frame. In this case, there may be a plurality of prediction directionality information, which is one of the hint information. For example, the directionality information of intra prediction consists of one map, but a plurality of directionality values may be included in one map.

영상 부호화 장치 및 영상 복호화 장치가 동일한 필터 계수 세트를 운용하는 경우, CNN 설정부(1010)는 영상 부호화 장치로부터 전달된 필터 계수의 인덱스 정보에 기초하여 현재 블록의 인트라 예측을 위한 필터 계수를 설정할 수도 있다.When the image encoding apparatus and the image decoding apparatus operate the same filter coefficient set, the CNN setting unit 1010 may set filter coefficients for intra prediction of the current block based on index information of the filter coefficients transmitted from the image encoding apparatus. have.

한편, 도 10에서는 CNN 설정부(1010)가 CNN 예측부(1000)에 포함되는 것으로 도시하고 있으나, 이는 예시적인 것이고, 본 실시예가 이에 한정되는 것은 아니라는 점에 유의하여야 한다. 즉, CNN 설정부(1010)는 CNN 예측부(1000)와는 별도의 유닛으로 구현될 수 있다. 또한, CNN 설정부(1010)는 CNN 실행부(1030)와 통합되어 하나의 유닛으로 구현될 수도 있다.Meanwhile, although FIG. 10 shows that the CNN setting unit 1010 is included in the CNN prediction unit 1000, it should be noted that this is an example, and the present embodiment is not limited thereto. That is, the CNN setting unit 1010 may be implemented as a unit separate from the CNN prediction unit 1000 . Also, the CNN setting unit 1010 may be integrated with the CNN execution unit 1030 and implemented as a single unit.

CNN 실행부(1030)는 CNN 설정부(1010)에 의해 설정된 필터 계수 즉, 컨볼루션 커널의 계수값들을 이용하여 입력 데이터에 대해 CNN 기반의 추론 과정을 수행함으로써, 출력 데이터 즉, 현재 블록에 대한 예측 블록을 생성할 수 있다.The CNN execution unit 1030 performs a CNN-based reasoning process on the input data using the filter coefficients set by the CNN setting unit 1010, that is, the coefficient values of the convolution kernel, thereby generating output data, that is, for the current block. A prediction block can be generated.

그리고, 생성된 예측 블록은 가산기로 전달되어 잔차 블록에 가산됨으로써 현재 블록을 복원하는 데 이용될 수 있다.Then, the generated prediction block is transmitted to the adder and added to the residual block, so that it can be used to reconstruct the current block.

이하, 도 11 및 도 12을 참조하여, 본 실시예에 따른 CNN 기반의 인트라 예측을 수행하는 예시적인 방법을 설명하기로 한다.Hereinafter, an exemplary method for performing CNN-based intra prediction according to the present embodiment will be described with reference to FIGS. 11 and 12 .

도 11은 도 5의 영상 부호화 장치 측 CNN 예측부의 동작을 나타내는 흐름도이다.11 is a flowchart illustrating the operation of the CNN prediction unit of the image encoding apparatus of FIG. 5 .

도 11을 참조하면, 단계 S1110에서, CNN 설정부(510)는 CNN의 입력 데이터 및 출력 레이블(label)을 설정할 수 있다.Referring to FIG. 11 , in step S1110 , the CNN setting unit 510 may set input data and output labels of the CNN.

입력 데이터는 현재 블록보다 먼저 부호화된 참조 영역으로 구성될 수 있다. 예컨대, 입력 데이터는 현재 블록과 동일 채널의 주변 블록들로 구성될 수 있다. 또는, 입력 데이터는 현재 블록과 동일 채널의 주변 블록들 및 이들의 평균 블록과, 현재 블록과 다른 채널의 현재 블록으로 구성될 수 있다.The input data may consist of a reference region encoded before the current block. For example, the input data may be composed of neighboring blocks of the same channel as the current block. Alternatively, the input data may be composed of neighboring blocks of the same channel as the current block and an average block thereof, and a current block of a channel different from the current block.

출력 레이어의 데이터는 현재 블록의 예측 블록이고, 출력 레이어의 레이블은 현재 블록의 원본 블록으로 구성될 수 있다.The data of the output layer may be a prediction block of the current block, and the label of the output layer may be composed of the original block of the current block.

CNN 설정부(510)는 인트라 예측의 정확도를 향상시키기 위하여 예측의 방향성 정보 등을 힌트 정보로 설정할 수 있다. 설정된 힌트 정보는 비트스트림을 통해 영상 복호화 장치에 전달되어 현재 블록을 복호화하는 데 이용될 수 있다. 이 경우, 인트라 예측의 방향성에 따라 입력 데이터가 다양한 조합으로 구성될 수 있다.The CNN setting unit 510 may set the direction information of prediction as hint information in order to improve the accuracy of intra prediction. The set hint information may be transmitted to the image decoding apparatus through a bitstream and used to decode the current block. In this case, the input data may be composed of various combinations according to the directionality of the intra prediction.

단계 S1120에서, CNN 설정부(510)는 학습 과정을 통해 필터 계수를 산출할 수 있다. CNN 설정부(510)는 인트라 예측의 정확도를 향상시키기 위해 오류 역전파(error backpropagation) 알고리즘을 이용하여 학습 과정을 반복할 수 있다.In step S1120, the CNN setting unit 510 may calculate filter coefficients through a learning process. The CNN setting unit 510 may repeat the learning process using an error backpropagation algorithm in order to improve the accuracy of intra prediction.

필터 계수 산출 과정은 소정의 단위, 예컨대 프레임 단위 또는 블록 단위로 수행될 수 있다. CNN 설정부(510)는 소정의 샘플 영상들을 이용하여 복수의 필터 계수를 미리 산출하여 필터 계수 세트를 구성할 수도 있다. 이 경우, CNN 설정부(510)는 해당 세트에서 소정의 기준에 따라 선택된 하나의 필터 계수를 현재 블록을 위한 필터 계수로 설정할 수 있다.The filter coefficient calculation process may be performed in a predetermined unit, for example, in a frame unit or a block unit. The CNN setting unit 510 may configure a filter coefficient set by pre-calculating a plurality of filter coefficients using predetermined sample images. In this case, the CNN setting unit 510 may set one filter coefficient selected according to a predetermined criterion in the corresponding set as the filter coefficient for the current block.

단계 S1130에서, CNN 실행부(530)는 CNN 설정부(510)에 의해 설정된 필터 계수 즉, 컨볼루션 커널의 계수값들을 이용하여 입력 데이터에 대해 CNN 기반의 추론 과정을 수행함으로써, 출력 데이터 즉, 현재 블록에 대한 예측 블록을 생성할 수 있다. 이 때, 생성된 예측 블록은 영상 부호화 장치의 감산기로 전달되어, 현재 블록으로부터 잔차 블록을 생성하는 데 이용될 수 있다.In step S1130, the CNN execution unit 530 performs a CNN-based reasoning process on the input data using the filter coefficients set by the CNN setting unit 510, that is, the coefficient values of the convolution kernel, thereby outputting data, that is, A prediction block for the current block may be generated. In this case, the generated prediction block may be transmitted to a subtractor of the image encoding apparatus and used to generate a residual block from the current block.

도 12는 도 10의 영상 복호화 장치 측 CNN 예측부의 동작을 나타내는 흐름도이다.12 is a flowchart illustrating an operation of the CNN prediction unit of the image decoding apparatus of FIG. 10 .

도 12을 참조하면, 단계 S1210에서, CNN 설정부(1010)는 영상 부호화 장치로부터 전달된 필터 계수에 대한 정보에 기초하여 현재 블록의 인트라 예측을 위한 필터 계수를 설정할 수 있다.Referring to FIG. 12 , in step S1210 , the CNN setting unit 1010 may set filter coefficients for intra prediction of the current block based on information on filter coefficients transmitted from the image encoding apparatus.

CNN의 입력 데이터는 현재 블록보다 먼저 복호화된 참조 영역으로 구성될 수 있으며, 출력 데이터는 현재 블록에 대한 예측 블록이 된다.The input data of the CNN may be composed of a reference region decoded before the current block, and the output data becomes a prediction block for the current block.

영상 부호화 장치로부터 인트라 예측을 위한 힌트 정보가 전달된 경우, CNN 설정부(1010)는 복호화부에 의해 추출된 힌트 정보를 하나의 맵으로 구성하여 CNN의 레이어에 연접(concatenation)시킬 수 있다.When hint information for intra prediction is delivered from the image encoding apparatus, the CNN setting unit 1010 may configure the hint information extracted by the decoder into one map and concatenate the hint information to the CNN layer.

한편, 입력 데이터는 인트라 예측의 방향성에 따라 다양한 조합으로 구성될 수 있다.Meanwhile, input data may be configured in various combinations according to the direction of intra prediction.

단계 S1220에서, CNN 실행부(1030)는 CNN 설정부(1010)에 의해 설정된 필터 계수 즉, 컨볼루션 커널의 계수값들을 이용하여 입력 데이터에 대해 CNN 기반의 추론 과정을 수행함으로써, 출력 데이터 즉, 현재 블록에 대한 예측 블록을 생성할 수 있다. 이 때, 생성된 예측 블록은 가산기로 전달되어 잔차 블록에 가산됨으로써 현재 블록을 복원하는 데 이용될 수 있다.In step S1220, the CNN execution unit 1030 performs a CNN-based reasoning process on the input data using the filter coefficients set by the CNN setting unit 1010, that is, the coefficient values of the convolution kernel, thereby output data, that is, A prediction block for the current block may be generated. In this case, the generated prediction block may be transferred to the adder and added to the residual block to be used to reconstruct the current block.

이상 도 11 및 도 12에서는, 복수의 단계를 순차적으로 수행하는 것으로 기재하고 있으나, 이는 본 실시예의 기술 사상을 예시적으로 설명한 것에 불과한 것이다. 다시 말해, 본 실시예가 속하는 기술 분야에서 통상의 지식을 가진 자라면, 본 실시예의 본질적인 특성에서 벗어나지 않는 범위에서, 도 11 및 도 12에 기재된 순서를 변경하여 수행하거나 상기 복수의 단계 중 일부를 병렬적으로 수행하는 것으로 다양하게 수정 및 변경하여 적용 가능할 것이므로, 도 11 및 도 12는 시계열적인 순서로 한정되는 것은 아니다.11 and 12, it has been described that a plurality of steps are sequentially performed, but this is merely illustrative of the technical idea of the present embodiment. In other words, those of ordinary skill in the art to which this embodiment pertains may change the order described in FIGS. 11 and 12 or perform some of the plurality of steps in parallel without departing from the essential characteristics of the present embodiment. 11 and 12 are not limited to a time-series order, since various modifications and changes may be applied to the performance.

한편, 도 11 및 도 12에 도시된 흐름도의 각 단계는 컴퓨터 상에서 다양한 구성요소를 통하여 실행될 수 있는 컴퓨터 프로그램의 형태로 구현될 수 있으며, 이와 같은 컴퓨터 프로그램은 컴퓨터로 읽을 수 있는 기록매체에 기록될 수 있다. 컴퓨터가 읽을 수 있는 기록매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록장치를 포함한다. 즉, 컴퓨터가 읽을 수 있는 기록매체는 마그네틱 저장매체(예를 들면, 롬, 플로피 디스크, 하드디스크 등), 광학적 판독 매체(예를 들면, 시디롬, 디브이디 등)와 같은 저장매체를 포함한다. 또한, 컴퓨터가 읽을 수 있는 기록매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어 분산방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다.On the other hand, each step of the flowchart shown in FIGS. 11 and 12 may be implemented in the form of a computer program that can be executed through various components on a computer, and such a computer program may be recorded on a computer-readable recording medium. can The computer-readable recording medium includes all kinds of recording devices in which data readable by a computer system is stored. That is, the computer-readable recording medium includes a storage medium such as a magnetic storage medium (eg, ROM, floppy disk, hard disk, etc.) and an optically readable medium (eg, CD-ROM, DVD, etc.). In addition, the computer-readable recording medium may be distributed in a network-connected computer system to store and execute computer-readable codes in a distributed manner.

이상의 설명은 본 실시예의 기술 사상을 예시적으로 설명한 것에 불과한 것으로서, 본 실시예가 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 실시예의 본질적인 특성에서 벗어나지 않는 범위에서 다양한 수정 및 변형이 가능할 것이다. 따라서, 본 실시예들은 본 실시예의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시예에 의하여 본 실시예의 기술 사상의 범위가 한정되는 것은 아니다. 본 실시예의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 실시예의 권리범위에 포함되는 것으로 해석되어야 할 것이다.The above description is merely illustrative of the technical idea of this embodiment, and various modifications and variations will be possible by those skilled in the art to which this embodiment belongs without departing from the essential characteristics of the present embodiment. Accordingly, the present embodiments are intended to explain rather than limit the technical spirit of the present embodiment, and the scope of the technical spirit of the present embodiment is not limited by these embodiments. The protection scope of this embodiment should be interpreted by the following claims, and all technical ideas within the equivalent range should be interpreted as being included in the scope of the present embodiment.

Claims

A video decoding method based on intra prediction, comprising:
Decodes block size information from the bitstream, decodes partition information for dividing a block having a size determined by the block size information into a tree structure, determines a current block to be decoded, and converts transform coefficients for the current block decrypting;
constructing input data using a reference region decoded before the current block;
generating prediction pixels of the current block by applying a 2-dimensional or 3-dimensional filter coefficient set to the input data;
generating residual signals for the current block by inverse transforming the transform coefficients; and
reconstructing the current block using the prediction pixels and the residual signals
video decoding method.

According to claim 1,
The set of filter coefficients is an image decoding method, characterized in that the kernel coefficients of a CNN (Convolutional Neural Network).

3. The method of claim 2,
The input data is
Further comprising hint information for intra prediction of the current block,
The hint information is
Absolute sum of prediction directionality information, a quantization parameter (QP) of the current block or the reference region, and transform coefficients or residual signals of a reconstructed neighboring region adjacent to the current block
containing one or more of
video decoding method.

According to claim 1,
Further comprising the step of decoding first information related to the configuration of the input data from the bitstream,
The input data is configured by combining pixel values obtained from reference regions adjacent to the left and upper sides of the current block based on the first information.
video decoding method.

According to claim 1,
The reference area is
Reference pixels in a restored peripheral region adjacent to the current block, and at least one of already decoded components among luma and chroma components constituting the current block.
video decoding method.

According to claim 1,
The input data is
It is constructed using pixel values generated by averaging pixel values in the reference region.
video decoding method.

According to claim 1,
The method further comprises decoding, from the bitstream, second information for selecting a filter coefficient set to be applied to the current block from among a plurality of filter coefficient sets configured in two or three dimensions,
The prediction pixels of the current block are generated using a filter coefficient set determined based on the second information.
video decoding method.

An intra prediction-based video encoding method, comprising:
determining a current block to be encoded by dividing a predetermined block into a tree structure;
constructing input data from a reference region encoded before the current block;
generating a prediction block of the current block by applying a 2-dimensional or 3-dimensional filter coefficient set to the input data;
generating a residual block by subtracting the prediction block from the current block; and
generating encoded data by encoding block size information indicating the size of the predetermined block, partition information related to the tree structure partitioning, and the residual block
Video encoding method.

9. The method of claim 8,
The set of filter coefficients is an image encoding method, characterized in that the kernel coefficients of a CNN (Convolutional Neural Network).

10. The method of claim 9,
The input data is
Further comprising hint information for intra prediction of the current block,
The hint information is
Absolute sum of prediction directionality information, a quantization parameter (QP) of the current block or the reference region, and transform coefficients or residual signals of a reconstructed neighboring region adjacent to the current block
containing one or more of
Video encoding method.

9. The method of claim 8,
The step of generating the encoded data includes:
encoding first information related to the configuration of the input data;
The input data is configured by combining pixel values obtained from reference regions adjacent to the left and upper sides of the current block so as to conform to the input data configuration method indicated by the first information.

9. The method of claim 8,
The reference area is
Reference pixels in a restored peripheral region adjacent to the current block, and at least one of already decoded components among luma and chroma components constituting the current block.
Video encoding method.

9. The method of claim 8,
The input data is
It is constructed using pixel values generated by averaging pixel values in the reference region.
Video encoding method.

9. The method of claim 8,
The step of generating a prediction block of the current block comprises:
selecting a filter coefficient set to be applied to the current block from among a plurality of filter coefficient sets configured in two or three dimensions; and
generating the prediction block using the selected filter coefficient set;
The step of generating the encoded data includes:
encoding second information for selecting a filter coefficient set to be applied to the current block from among the plurality of filter coefficient sets;
Video encoding method.

A decoder-readable recording medium for storing an encoded bitstream of video data, comprising:
The bitstream is
determining a current block to be encoded by dividing a predetermined block into a tree structure;
composing input data from a reference region encoded before a current block to be encoded;
generating a prediction block of the current block by applying a 2-dimensional or 3-dimensional filter coefficient set to the input data;
generating a residual block by subtracting the prediction block from the current block; and
A decoder, characterized in that it is generated by executing an intra prediction-based video encoding method comprising encoding block size information indicating the size of the predetermined block, partition information related to the tree structure partitioning, and encoding the residual block, readable recording medium.