KR20200000543A

KR20200000543A - Method and apparatus for image enhancement using supervised learning

Info

Publication number: KR20200000543A
Application number: KR1020180072499A
Authority: KR
Inventors: 나태영; 이선영; 신재섭; 손세훈; 김효성
Original assignee: 에스케이텔레콤 주식회사
Priority date: 2018-06-25
Filing date: 2018-06-25
Publication date: 2020-01-03
Also published as: KR102648464B1

Abstract

지도 학습을 이용한 영상 개선 방법 및 장치를 개시한다.
본 실시예의 일 측면에 의하면, CNN 기반의 필터를 이용하여 영상을 복호화하는 방법에 있어서, 양자화 파라미터 맵과 블록 분할 맵 중 적어도 하나와 제1 픽처를 상기 CNN 기반의 필터로 입력하는 단계, 및 제2 픽처를 출력하는 단계를 포함하고, 상기 양자화 파라미터 맵은 상기 제1 픽처를 구성하는 부호화 단위에 대한 정보를 나타내고, 상기 블록 분할 맵은 상기 제1 픽처를 구성하는 분할된 영역에 대한 정보를 나타냄을 특징으로 하는 영상을 복호화 하는 방법을 제공한다.Disclosed are a method and an apparatus for improving an image using supervised learning.
According to an aspect of the present embodiment, a method of decoding an image using a CNN-based filter, comprising: inputting at least one of a quantization parameter map and a block partitioning map and a first picture to the CNN-based filter; Outputting two pictures, wherein the quantization parameter map indicates information about coding units constituting the first picture, and the block division map indicates information about divided areas constituting the first picture. It provides a method for decoding an image characterized by.

Description

Method and apparatus for image enhancement using supervised learning {Method and apparatus for image enhancement using supervised learning}

본 발명은 지도 학습을 이용한 영상 개선 방법 및 장치에 관한 것이다.The present invention relates to an image improvement method and apparatus using supervised learning.

이 부분에 기술된 내용은 단순히 본 발명에 대한 배경 정보를 제공할 뿐 종래기술을 구성하는 것은 아니다.The contents described in this section merely provide background information on the present invention and do not constitute a prior art.

딥 러닝 기술을 이용한 영상 부호화 기술이 많은 전문가들에 의해 주목받고 있다. 인-루프(In-loop) 필터인 bilateral filter, 디블록킹 필터(deblocking filter), 및 sample adaptive offset(SAO)을 딥 러닝 기술로 학습된 하나의 컨볼루션 신경망(Convolutional Neural Network) 필터로 대체하여 3.57%의 BDBR(Bjontegaard-delta bit rate) gain을 all intra 실험조건(common testing condition)에서 달성했다. 기존 영상 부복호화 기술에 딥 러닝 기술이 효과적으로 결합될 수 있음이 실험에 의해 증명됨에 따라, 딥 러닝 기술이 크게 활용될 것으로 예상된다.Image coding technology using deep learning technology is attracting attention by many experts. In-loop filters, bilateral filter, deblocking filter, and sample adaptive offset (SAO), are replaced with one convolutional neural network filter learned with deep learning technology. BDBR (Bjontegaard-delta bit rate) gain of% was achieved under all intra testing conditions (common testing conditions). It is expected that the deep learning technology will be greatly utilized as the experiment proves that the deep learning technology can be effectively combined with the existing image encoding / decoding technology.

딥 러닝이란 기계 학습의 한 분야로, 다수의 레이어로 연결된 비선형 변환 기법들의 조합을 통해 다량의 데이터들 사이의 핵심적인 구조를 학습하는 방법들의 집합을 의미한다. 기계 학습은 학습시키는 방식에 따라 크게 지도 학습(Supervised Learning), 비지도 학습(Unsupervised Learning), 그리고 강화 학습(Reinforcement Learning)으로 분류될 수 있다. 이 중에서 지도 학습은 학습하는 데이터에 대한 명시적인 정답인 레이블(label)이 주어진 상태에서 학습을 수행하는 방법을 일컫는다.Deep learning is a field of machine learning that refers to a set of methods for learning the core structure between large amounts of data through a combination of nonlinear transformations connected in multiple layers. Machine learning can be broadly classified into supervised learning, unsupervised learning, and reinforcement learning. Among these, supervised learning refers to a method of performing learning in a state in which a label that is an explicit answer to the data to be learned is given.

지도 학습 중 대표적인 기법은 인간 뇌의 동작 구조를 공학적으로 모델링한 인공 신경망(Neural Network, 이하 'NN'이라 칭함)이다. NN을 여러 레이어(계층)로 깊이 쌓은 구조를 심층 신경망(Deep Neural Network)이라 지칭한다. 일반적으로 NN은 각 레이어의 모든 노드가 완전 연결된 구조를 가지고 있으며, 이를 영상 처리에 적합하도록 컨볼루션 커널들로 연결된 구조를 컨볼루션 신경망(Convolutional Neural Network, 이하 'CNN'이라 칭함)이라 지칭한다.A representative technique of supervised learning is an artificial neural network (hereinafter referred to as NN) which is an engineering model of the motion structure of the human brain. The structure in which NN is stacked in layers (layers) is called a deep neural network. In general, an NN has a structure in which all nodes of each layer are completely connected, and a structure connected to convolution kernels to be suitable for image processing is referred to as a convolutional neural network (hereinafter referred to as 'CNN').

도 1은 영상 개선을 위한 CNN 구조를 나타낸 도면이다. 1 is a diagram illustrating a CNN structure for improving an image.

도 1을 참조하면, CNN은 입력 레이어(110), 출력 레이어(120)와 컨볼루션 레이어(130)를 포함한다. 컨볼루션 레이어(130)는 다수의 레이어(132, 134, 136, 138, 140)로 구성될 수 있다. 모든 컨볼루션 레이어를 기반으로 확률 분포 모델이 구성될 수 있다. Referring to FIG. 1, the CNN includes an input layer 110, an output layer 120, and a convolutional layer 130. The convolutional layer 130 may be composed of a plurality of layers 132, 134, 136, 138, and 140. Probability distribution models can be constructed based on all convolutional layers.

CNN은 크게 학습 과정과 추론 과정으로 구분된다. 학습 과정에서 입력 레이어에는 화질 개선 대상 영상 즉, 이미지가 데이터로 입력된다. 각 컨볼루션 레이어의 컨볼루션 커널 계수들은 학습 전에 초기화되며, 오류 역전파(error backpropagation) 알고리즘에 의해 출력 레이어의 데이터인 이미지와 출력 레이어의 레이블 즉, 원본 화질의 이미지 사이의 에러가 최소가 되도록 학습된다. 출력 레이어의 정확도는 학습 과정 중 입/출력 레이어, 컨볼루션 레이어 설계, 및/또는 오류 역전파 알고리즘 등에 의해 달라질 수 있다.CNN is largely divided into learning process and inference process. In the learning process, an image to be improved in quality, that is, an image, is input to the input layer as data. The convolution kernel coefficients of each convolution layer are initialized before training, and the error backpropagation algorithm is trained to minimize the error between the image of the output layer and the label of the output layer, that is, the image of the original quality. do. The accuracy of the output layer may vary due to input / output layers, convolutional layer design, and / or error back propagation algorithms during the learning process.

이후 추론 과정에서는 다수의 학습 과정을 통해 산출된 컨볼루션 커널 계수를 적용하여 화질 개선 대상 이미지로부터 화질 개선된 이미지를 추론할 수 있다.Subsequently, in the deduction process, the convolution kernel coefficients calculated through a plurality of learning processes may be applied to deduce the image having the improved image quality from the image to be improved.

본 실시예는, 영상 부호화기 및 복호화기에서 영상을 개선하고 양자화 에러 및 블록킹 열화를 해결할 수 있는 방법 및 장치를 제공하는 데 주된 목적이 있다.The present embodiment has a main object to provide a method and apparatus for improving an image and solving quantization error and blocking degradation in an image encoder and a decoder.

본 실시예의 일 측면에 의하면, CNN 기반의 필터를 이용하여 영상을 복호화하는 방법에 있어서, 양자화 파라미터 맵과 블록 분할 맵 중 적어도 하나와 제1 픽처를 상기 CNN 기반의 필터로 입력하는 단계, 및 제2 픽처를 출력하는 단계를 포함하고, 상기 양자화 파라미터 맵은 상기 제1 픽처를 구성하는 부호화 단위에 대한 정보를 나타내고, 상기 블록 분할 맵은 상기 제1 픽처를 구성하는 분할된 영역에 대한 정보를 나타냄을 특징으로 하는 영상을 복호화 하는 방법을 제공한다.According to an aspect of the present embodiment, a method of decoding an image using a CNN-based filter, the method comprising: inputting at least one of a quantization parameter map and a block division map and a first picture to the CNN-based filter; Outputting two pictures, wherein the quantization parameter map indicates information about coding units constituting the first picture, and the block partition map indicates information about divided regions constituting the first picture It provides a method for decoding an image characterized by.

본 실시예의 다른 측면에 의하면, CNN 기반의 필터를 이용한 영상 복호화 장치에 있어서, 양자화 파라미터 맵과 블록 분할 맵 중 적어도 하나와 제1 픽처를 입력받는 입력부, 상기 입력부에 입력된 상기 양자화 파라미터 맵과 상기 블록 분할 맵 중 적어도 하나와 상기 제1 픽처에 상기 CNN 기반의 필터의 계수를 적용하는 필터부, 및 상기 양자화 파라미터 맵과 상기 블록 분할 맵 중 적어도 하나와 상기 제1 픽처에 상기 CNN 기반의 필터의 계수를 적용해 제2 픽처를 출력하는 출력부를 포함하고, 상기 양자화 파라미터 맵은 상기 제1 픽처를 구성하는 부호화 단위를 나타내고, 상기 블록 분할 맵은 상기 제1 픽처를 구성하는 분할된 영역에 대한 정보를 나타냄을 특징으로 하는 영상 복호화 장치를 제공한다.According to another aspect of the present embodiment, in the image decoding apparatus using a CNN-based filter, an input unit for receiving at least one of a quantization parameter map and a block division map and a first picture, the quantization parameter map input to the input unit and the A filter unit for applying coefficients of the CNN-based filter to at least one of the block division maps and the first picture, and at least one of the quantization parameter map and the block division map and the CNN-based filter to the first picture An output unit configured to output a second picture by applying coefficients, wherein the quantization parameter map indicates a coding unit constituting the first picture, and the block partition map indicates information about a divided region constituting the first picture It provides a video decoding apparatus characterized in that.

이상에서 설명한 바와 같이 본 실시예에 의하면, 지도 학습을 통해 학습된 필터를 이용하여 영상 개선, 양자화 에러, 및 블록킹 열화(blocking artifact)를 해결할 수 있다.As described above, according to the present embodiment, an image improvement, a quantization error, and a blocking artifact may be solved using a filter learned through supervised learning.

도 1은 영상 개선을 위한 CNN 구조를 나타낸 도면,
도 2는 본 개시의 기술들을 구현할 수 있는 영상 부호화 장치에 대한 예시적인 블록도를 나타낸 도면,
도 3은 본 개시의 기술들을 구현할 수 있는 영상 복호화 장치의 예시적인 블록도를 나타낸 도면,
도 4는 본 발명의 일 실시예에 따른 CNN 기반의 필터를 나타낸 도면,
도 5a 내지 도 5c는 본 발명의 일 실시예로 연접 레이어의 위치에 따른 CNN의 구조를 나타낸 도면,
도 6a 내지 도 6c는 본 발명의 일 실시예에 따른 입력 레이어로 입력될 데이터를 나타낸 도면,
도 7a 및 도 7b는 본 발명의 일 실시예에 따른 블록 분할 맵의 일예를 나타낸 도면,
도 8a 내지 도 8c는 본 발명의 일 실시예에 따른 블록 분할 맵의 다른 일예를 나타낸 도면,
도 9a 내지 도 9c는 본 발명의 일 실시예에 따라 디블록킹의 강도를 조절하기 위한 블록 분할 맵을 나타낸 도면,
도 10은 본 개시에 따라 CNN 기반의 필터를 이용해 영상을 복호화하는 순서도를 나타낸 도면,
도 11은 본 개시에 따른 영상을 복호화하는 장치의 구성도를 개략적으로 나타낸 도면이다.1 is a diagram illustrating a CNN structure for improving an image;
2 is an exemplary block diagram of an image encoding apparatus capable of implementing techniques of this disclosure;
3 is an exemplary block diagram of an image decoding apparatus that may implement techniques of the present disclosure;
4 is a diagram illustrating a CNN-based filter according to an embodiment of the present invention;
5A to 5C are diagrams illustrating a structure of a CNN according to a position of a concatenated layer according to an embodiment of the present invention;
6A to 6C illustrate data to be input to an input layer according to an embodiment of the present invention;
7A and 7B illustrate an example of a block partitioning map according to an embodiment of the present invention;
8A to 8C are views illustrating another example of a block division map according to an embodiment of the present invention;
9A to 9C illustrate block division maps for adjusting the strength of deblocking according to an embodiment of the present invention;
10 is a flowchart of decoding an image using a CNN-based filter according to the present disclosure;
11 is a diagram schematically illustrating a configuration of an apparatus for decoding an image according to the present disclosure.

이하, 본 발명의 일부 실시예들을 예시적인 도면을 통해 상세하게 설명한다. 각 도면의 구성요소들에 참조부호를 부가함에 있어서, 동일한 구성요소들에 대해서는 비록 다른 도면상에 표시되더라도 가능한 한 동일한 부호를 가지도록 하고 있음에 유의해야 한다. 또한, 본 발명을 설명함에 있어, 관련된 공지 구성 또는 기능에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명은 생략한다.Hereinafter, some embodiments of the present invention will be described in detail through exemplary drawings. In adding reference numerals to the components of each drawing, it should be noted that the same reference numerals are assigned to the same components as much as possible even though they are shown in different drawings. In addition, in describing the present invention, when it is determined that the detailed description of the related well-known configuration or function may obscure the gist of the present invention, the detailed description thereof will be omitted.

또한, 본 발명의 구성 요소를 설명하는 데 있어서, 제 1, 제 2, A, B, (a), (b) 등의 용어를 사용할 수 있다. 이러한 용어는 그 구성 요소를 다른 구성 요소와 구별하기 위한 것일 뿐, 그 용어에 의해 해당 구성 요소의 본질이나 차례 또는 순서 등이 한정되지 않는다. 명세서 전체에서, 어떤 부분이 어떤 구성요소를 '포함', '구비'한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다. 또한, 명세서에 기재된 '…부', '모듈' 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어나 소프트웨어 또는 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다.In addition, in describing the component of this invention, terms, such as 1st, 2nd, A, B, (a), (b), can be used. These terms are only for distinguishing the components from other components, and the nature, order or order of the components are not limited by the terms. Throughout the specification, when a part is said to include, 'include' a certain component, which means that it may further include other components, except to exclude other components unless otherwise stated. . In addition, as described in the specification. The terms 'unit' and 'module' refer to a unit that processes at least one function or operation, which may be implemented by hardware or software or a combination of hardware and software.

도 2는 본 개시의 기술들을 구현할 수 있는 영상 부호화 장치에 대한 예시적인 블록도를 나타낸 도면이다.2 is an exemplary block diagram of an image encoding apparatus that may implement techniques of this disclosure.

영상 부호화 장치는 블록 분할부(210), 예측부(220), 감산기(230), 변환부(240), 양자화부(245), 부호화부(250), 역양자화부(260), 역변환부(265), 가산기(270), 필터부(280) 및 메모리(290)를 포함한다. 영상 부호화 장치는 각 구성요소가 하드웨어 칩으로 구현될 수 있으며, 또는 소프트웨어로 구현되고 하나 이상의 마이크로프로세서가 각 구성요소에 대응하는 소프트웨어의 기능을 실행하도록 구현될 수도 있다.The image encoding apparatus includes a block divider 210, a predictor 220, a subtractor 230, a transformer 240, a quantizer 245, an encoder 250, an inverse quantizer 260, and an inverse transform unit ( 265, an adder 270, a filter unit 280, and a memory 290. The image encoding apparatus may be implemented as a hardware chip, or each component may be implemented in software and one or more microprocessors may be implemented to execute a function of software corresponding to each component.

하나의 영상(비디오)는 복수의 픽처들로 구성된다. 각 픽처들은 복수의 영역으로 분할되고 각 영역마다 부호화가 수행된다. 예를 들어, 하나의 픽처는 하나 이상의 슬라이스(slice) 또는/및 타일(Tile)로 분할되고, 각 슬라이스 또는 타일은 하나 이상의 CTU(Coding Tree Unit)로 분할된다. 그리고 각 CTU는 트리 구조에 의해 하나 이상의 CU(Coding Unit)들로 분할된다. 각 CU에 적용되는 정보들은 CU의 신택스로서 부호화되고, 하나의 CTU에 포함된 CU들에 공통적으로 적용되는 정보는 CTU의 신택스로서 부호화된다. One image (video) is composed of a plurality of pictures. Each picture is divided into a plurality of regions, and encoding is performed for each region. For example, one picture is divided into one or more slices and / or tiles, and each slice or tile is divided into one or more coding tree units (CTUs). Each CTU is divided into one or more coding units (CUs) by a tree structure. Information applied to each CU is encoded as the syntax of the CU, and information commonly applied to the CUs included in one CTU is encoded as the syntax of the CTU.

블록 분할부(210)는 CTU(Coding Tree Unit)의 크기를 결정한다. CTU의 크기에 대한 정보(CTU size)는 SPS(Sequence Parameter Set) 또는 PPS(Picture Parameter Set)의 신택스로서 부호화되어 영상 복호화 장치로 전달된다. 블록 분할부(210)는 영상을 구성하는 각 픽처(picture)를 결정된 크기의 복수의 CTU(Coding Tree Unit)로 분할한 이후에, CTU를 트리 구조(tree structure)를 이용하여 반복적으로(recursively) 분할한다. 본 개시에서 블록 분할부(210)는 각 픽처들의 분할된 영역에 대한 정보를 나타내기 위한 블록 분할 맵을 추가로 더 생성할 수 있다.The block dividing unit 210 determines the size of a coding tree unit (CTU). Information on the size of the CTU (CTU size) is encoded as a syntax of a Sequence Parameter Set (SPS) or Picture Parameter Set (PPS) and transmitted to the image decoding apparatus. After dividing each picture constituting an image into a plurality of coding tree units (CTUs) having a determined size, the block dividing unit 210 recursively uses a tree structure to divide the CTUs. Divide. In the present disclosure, the block dividing unit 210 may further generate a block dividing map for indicating information on the divided region of each picture.

예측부(220)는 현재블록을 예측하여 예측블록을 생성한다. 예측부(220)는 인트라 예측부(222)와 인터 예측부(224)를 포함한다. The prediction unit 220 generates a prediction block by predicting the current block. The predictor 220 includes an intra predictor 222 and an inter predictor 224.

일반적으로, 픽처 내 현재블록들은 각각 예측적으로 코딩될 수 있다. 현재블록의 예측은 (현재블록을 포함하는 픽처로부터의 데이터를 사용하는) 인트라 예측 기술 또는 (현재블록을 포함하는 픽처 이전에 코딩된 픽처로부터의 데이터를 사용하는) 인터 예측 기술을 사용하여 일반적으로 수행될 수 있다. 인터 예측은 단방향 예측과 양방향 예측 모두를 포함한다.In general, current blocks within a picture may each be predictively coded. Prediction of the current block is generally performed using an intra prediction technique (using data from a picture containing the current block) or an inter prediction technique (using data from a picture coded before a picture containing the current block). Can be performed. Inter prediction includes both unidirectional prediction and bidirectional prediction.

인트라 예측부(222)는 현재블록이 포함된 현재 픽처 내에서 현재블록의 주변에 위치한 픽셀(참조 픽셀)들을 이용하여 현재블록 내의 픽셀들을 예측한다. 예측 방향에 따라 복수의 인트라 예측모드가 존재한다. The intra predictor 222 predicts pixels in the current block by using pixels (reference pixels) positioned around the current block in the current picture including the current block. There are a plurality of intra prediction modes according to the prediction direction.

인트라 예측부(222)는 현재블록을 부호화하는데 사용할 인트라 예측 모드를 결정할 수 있다. 일부 예들에서, 인트라 예측부(222)는 여러 인트라 예측 모드들을 사용하여 현재블록을 인코딩하고, 테스트된 모드들로부터 사용할 적절한 인트라 예측 모드를 선택할 수도 있다. 예를 들어, 인트라 예측부(222)는 여러 테스트된 인트라 예측 모드들에 대한 레이트 왜곡(rate-distortion) 분석을 사용하여 레이트 왜곡 값들을 계산하고, 테스트된 모드들 중 최선의 레이트 왜곡 특징들을 갖는 인트라 예측 모드를 선택할 수도 있다. The intra predictor 222 may determine an intra prediction mode to use to encode the current block. In some examples, intra prediction unit 222 may encode the current block using several intra prediction modes and select an appropriate intra prediction mode to use from the tested modes. For example, intra predictor 222 calculates rate distortion values using rate-distortion analysis for several tested intra prediction modes, and has the best rate distortion characteristics among the tested modes. Intra prediction mode may be selected.

인트라 예측부(222)는 복수의 인트라 예측 모드 중에서 하나의 인트라 예측 모드를 선택하고, 선택된 인트라 예측 모드에 따라 결정되는 주변 픽셀(참조 픽셀)과 연산식을 사용하여 현재블록을 예측한다. 선택된 인트라 예측 모드에 대한 정보는 부호화부(250)에 의해 부호화되어 영상 복호화 장치로 전달된다.The intra predictor 222 selects one intra prediction mode from among the plurality of intra prediction modes, and predicts the current block by using a neighboring pixel (reference pixel) and an operation formula determined according to the selected intra prediction mode. Information on the selected intra prediction mode is encoded by the encoder 250 and transmitted to the image decoding apparatus.

인터 예측부(224)는 움직임 보상 과정을 통해 현재블록에 대한 예측블록을 생성한다. 현재 픽처보다 먼저 부호화 및 복호화된 참조픽처 내에서 현재블록과 가장 유사한 블록을 탐색하고, 그 탐색된 블록을 이용하여 현재블록에 대한 예측블록을 생성한다. 그리고, 현재 픽처 내의 현재블록과 참조픽처 내의 예측블록 간의 변위(displacement)에 해당하는 움직임벡터(motion vector)를 생성한다. 일반적으로, 움직임 추정은 루마(luma) 성분에 대해 수행되고, 루마 성분에 기초하여 계산된 모션 벡터는 루마 성분 및 크로마 성분 모두에 대해 사용된다. 현재블록을 예측하기 위해 사용된 참조픽처에 대한 정보 및 움직임벡터에 대한 정보를 포함하는 움직임 정보는 부호화부(250)에 의해 부호화되어 영상 복호화 장치로 전달된다. The inter prediction unit 224 generates a prediction block for the current block through a motion compensation process. The block most similar to the current block is searched in the coded and decoded reference picture before the current picture, and a predicted block for the current block is generated using the searched block. A motion vector corresponding to a displacement between the current block in the current picture and the prediction block in the reference picture is generated. In general, motion estimation is performed on a luma component, and a motion vector calculated based on the luma component is used for both the luma component and the chroma component. The motion information including the information about the reference picture and the motion vector used to predict the current block is encoded by the encoder 250 and transmitted to the image decoding apparatus.

감산기(230)는 현재블록으로부터 인트라 예측부(222) 또는 인터 예측부(124)에 의해 생성된 예측블록을 감산하여 잔차 블록을 생성한다.The subtractor 230 subtracts the prediction block generated by the intra predictor 222 or the inter predictor 124 from the current block to generate a residual block.

변환부(240)는 공간 영역의 픽셀 값들을 가지는 잔차 블록 내의 잔차 신호를 주파수 도메인의 변환 계수로 변환한다. 변환부(240)는 잔차 블록 내의 잔차 신호들을 현재블록의 크기를 변환 단위로 사용하여 변환할 수 있으며, 또는 잔차 블록을 더 작은 복수의 서브블록을 분할하고 서브블록 크기의 변환 단위로 잔차 신호들을 변환할 수도 있다. 잔차 블록을 더 작은 서브블록으로 분할하는 방법은 다양하게 존재할 수 있다. 예컨대, 기정의된 동일한 크기의 서브블록으로 분할할 수도 있으며, 또는 잔차 블록을 루트 노드로 하는 QT(quadtree) 방식의 분할을 사용할 수도 있다. The converter 240 converts the residual signal in the residual block having pixel values of the spatial domain into a transform coefficient of the frequency domain. The transform unit 240 may convert the residual signals in the residual block using the size of the current block as a conversion unit, or divide the residual block into a plurality of smaller subblocks and convert the residual signals in a subblock-sized transform unit. You can also convert. There may be various ways of dividing the residual block into smaller subblocks. For example, it may be divided into sub-blocks of a predetermined same size, or a quadtree (QT) scheme may be used in which the residual block is a root node.

양자화부(245)는 변환부(240)로부터 출력되는 변환 계수들을 양자화하고, 양자화된 변환 계수들을 부호화부(250)로 출력한다.The quantization unit 245 quantizes the transform coefficients output from the transform unit 240, and outputs the quantized transform coefficients to the encoder 250.

부호화부(250)는 양자화된 변환 계수들을 CABAC 등의 부호화 방식을 사용하여 부호화하여 비트스트림을 생성한다. 또한, 부호화부(250)는 블록 분할과 관련된 CTU size, QT 분할 플래그, BT 분할 플래그, 분할 타입 등의 정보를 부호화하여, 영상 복호화 장치가 영상 부호화 장치와 동일하게 블록을 분할할 수 있도록 한다.The encoder 250 generates a bitstream by encoding the quantized transform coefficients by using an encoding method such as CABAC. In addition, the encoder 250 encodes information such as a CTU size, a QT split flag, a BT split flag, a split type, and the like related to block division, so that the image decoding apparatus may split the block in the same manner as the image encoding apparatus.

부호화부(250)는 현재블록이 인트라 예측에 의해 부호화되었는지 아니면 인터 예측에 의해 부호화되었는지 여부를 지시하는 예측 타입에 대한 정보를 부호화하고, 예측 타입에 따라 인트라 예측정보(즉, 인트라 예측 모드에 대한 정보) 또는 인터 예측정보(참조픽처 및 움직임벡터에 대한 정보)를 부호화한다. The encoder 250 encodes information on a prediction type indicating whether the current block is encoded by intra prediction or inter prediction, and encodes the intra prediction information according to the prediction type (ie, the intra prediction mode). Information) or inter prediction information (information about a reference picture and a motion vector) is encoded.

역양자화부(260)는 양자화부(245)로부터 출력되는 양자화된 변환 계수들을 역양자화하여 변환 계수들을 생성한다. 역변환부(265)는 역양자화부(260)로부터 출력되는 변환 계수들을 주파수 도메인으로부터 공간 도메인으로 변환하여 잔차블록을 복원한다.The inverse quantizer 260 inversely quantizes the quantized transform coefficients output from the quantizer 245 to generate transform coefficients. The inverse transformer 265 restores the residual block by converting the transform coefficients output from the inverse quantizer 260 from the frequency domain to the spatial domain.

가산부(270)는 복원된 잔차블록과 예측부(220)에 의해 생성된 예측블록을 가산하여 현재블록을 복원한다. 복원된 현재블록 내의 픽셀들은 다음 순서의 블록을 인트라 예측할 때 참조 픽셀로서 사용된다.The adder 270 reconstructs the current block by adding the reconstructed residual block and the predicted block generated by the predictor 220. The pixels in the reconstructed current block are used as reference pixels when intra prediction of the next order of blocks.

필터부(280)는 블록 기반의 예측 및 변환/양자화로 인해 발생하는 블록킹 아티팩트(blocking artifacts), 링잉 아티팩트(ringing artifacts), 블러링 아티팩트(blurring artifacts) 등을 줄이기 위해 복원된 픽셀들에 대한 필터링을 수행한다. 필터부(280)는 디블록킹 필터(282)와 SAO 필터(284)를 포함할 수 있다.The filter unit 280 filters the reconstructed pixels to reduce blocking artifacts, ringing artifacts, blurring artifacts, and the like caused by block-based prediction and transformation / quantization. Do this. The filter unit 280 may include a deblocking filter 282 and a SAO filter 284.

디블록킹 필터(280)는 블록 단위의 부호화/복호화로 인해 발생하는 블록킹 현상(blocking artifact)을 제거하기 위해 복원된 블록 간의 경계를 필터링하고, SAO 필터(284)는 디블록킹 필터링된 영상에 대해 추가적인 필터링을 수행한다. SAO 필터(284)는 손실 부호화(lossy coding)로 인해 발생하는 복원된 픽셀과 원본 픽셀 간의 차이를 보상하기 위해 사용되는 필터이다. The deblocking filter 280 filters the boundaries between the reconstructed blocks to remove blocking artifacts caused by block-by-block encoding / decoding, and the SAO filter 284 adds an additional block to the deblocking filtered image. Perform filtering. The SAO filter 284 is a filter used to compensate for the difference between the reconstructed pixel and the original pixel caused by lossy coding.

디블록킹 필터(282) 및 SAO 필터(284)를 통해 필터링된 복원 블록은 메모리(290)에 저장한다. 한 픽처 내의 모든 블록들이 복원되면, 복원된 픽처는 이후에 부호화하고자 하는 픽처 내의 블록을 인터 예측하기 위한 참조 픽처로 사용된다.The reconstructed block filtered through the deblocking filter 282 and the SAO filter 284 is stored in the memory 290. When all the blocks in a picture are reconstructed, the reconstructed picture is used as a reference picture for inter prediction of a block in a picture to be encoded later.

도 3은 본 개시의 기술들을 구현할 수 있는 영상 복호화 장치의 예시적인 블록도를 나타낸 도면이다.3 is an exemplary block diagram of an image decoding apparatus that may implement techniques of the present disclosure.

영상 복호화 장치는 복호화부(310), 역양자화부(320), 역변환부(330), 예측부(340), 가산기(350) 등을 포함하는 영상 복원기(300)와, 필터부(360) 및 메모리(370)를 포함한다. 도 2의 영상 부호화 장치와 마찬가지로, 영상 복호화 장치는 각 구성요소가 하드웨어 칩으로 구현될 수 있으며, 또는 소프트웨어로 구현되고 마이크로프로세서가 각 구성요소에 대응하는 소프트웨어의 기능을 실행하도록 구현될 수도 있다.The image decoding apparatus includes an image decompressor 300 including a decoder 310, an inverse quantizer 320, an inverse transformer 330, a predictor 340, an adder 350, and a filter 360. And a memory 370. Like the image encoding apparatus of FIG. 2, the image decoding apparatus may be implemented by each component as a hardware chip, or may be implemented by software and a microprocessor to execute a function of software corresponding to each component.

복호화부(310)는 영상 부호화 장치로부터 수신한 비트스트림을 복호화하여 블록 분할과 관련된 정보를 추출하여 복호화하고자 하는 현재블록을 결정하고, 현재블록을 복원하기 위해 필요한 예측 정보와 잔차신호에 대한 정보 등을 추출한다.The decoder 310 decodes the bitstream received from the image encoding apparatus, extracts information related to block division, determines a current block to be decoded, and includes prediction information and residual signal information necessary for reconstructing the current block. Extract

복호화부(310)는 SPS (Sequence Parameter Set) 또는 PPS (Picture Parameter Set)로부터 CTU size에 대한 정보를 추출하여 CTU의 크기를 결정하고, 픽처를 결정된 크기의 CTU로 분할한다. 그리고 CTU를 트리 구조의 최상위 레이어, 즉, 루트 노드로 결정하고, CTU에 대한 분할 정보를 추출함으로써 CTU를 트리 구조를 이용하여 분할한다. The decoder 310 extracts information on the CTU size from a Sequence Parameter Set (SPS) or Picture Parameter Set (PPS) to determine the size of the CTU, and divides the picture into a CTU of the determined size. The CTU is determined as the highest layer of the tree structure, that is, the root node, and the CTU is partitioned using the tree structure by extracting partition information about the CTU.

또한, 복호화부(310)는 트리 구조의 분할을 통해 복호화하고자 하는 현재블록을 결정하게 되면, 현재블록이 인트라 예측되었는지 아니면 인터 예측되었는지를 지시하는 예측 타입에 대한 정보를 추출한다. In addition, when the decoder 310 determines the current block to be decoded by splitting the tree structure, the decoder 310 extracts information on a prediction type indicating whether the current block is intra predicted or inter predicted.

예측 타입 정보가 인트라 예측을 지시하는 경우, 복호화부(310)는 현재블록의 인트라 예측정보(인트라 예측 모드)에 대한 신택스 요소를 추출한다. When the prediction type information indicates intra prediction, the decoder 310 extracts a syntax element for intra prediction information (intra prediction mode) of the current block.

예측 타입 정보가 인터 예측을 지시하는 경우, 복호화부(310)는 인터 예측정보에 대한 신택스 요소, 즉, 움직임벡터 및 그 움직임벡터가 참조하는 참조픽처를 나타내는 정보를 추출한다. When the prediction type information indicates inter prediction, the decoder 310 extracts a syntax element for the inter prediction information, that is, a motion vector and information indicating a reference picture to which the motion vector refers.

한편, 복호화부(310)는 잔차신호에 대한 정보로서 현재블록의 양자화된 변환계수들에 대한 정보를 추출한다.Meanwhile, the decoder 310 extracts information on the quantized transform coefficients of the current block as information on the residual signal.

역양자화부(320)는 양자화된 변환계수들을 역양자화하고 역변환부(330)는 역양자화된 변환계수들을 주파수 도메인으로부터 공간 도메인으로 역변환하여 잔차신호들을 복원함으로써 현재블록에 대한 잔차블록을 생성한다.The inverse quantization unit 320 inversely quantizes the quantized transform coefficients, and the inverse transform unit 330 inversely transforms the inverse quantized transform coefficients from the frequency domain to the spatial domain to generate a residual block for the current block.

예측부(340)는 인트라 예측부(342) 및 인터 예측부(344)를 포함한다. 인트라 예측부(342)는 현재블록의 예측 타입인 인트라 예측일 때 활성화되고, 인터 예측부(344)는 현재블록의 예측 타입인 인트라 예측일 때 활성화된다.The predictor 340 includes an intra predictor 342 and an inter predictor 344. The intra predictor 342 is activated when the intra prediction is the prediction type of the current block, and the inter predictor 344 is activated when the intra prediction is the prediction type of the current block.

인트라 예측부(342)는 복호화부(310)로부터 추출된 인트라 예측 모드에 대한 신택스 요소로부터 복수의 인트라 예측 모드 중 현재블록의 인트라 예측 모드를 결정하고, 인트라 예측 모드에 따라 현재블록 주변의 참조 픽셀들을 이용하여 현재블록을 예측한다. The intra predictor 342 determines the intra prediction mode of the current block among the plurality of intra prediction modes from the syntax element for the intra prediction mode extracted from the decoder 310, and references pixels around the current block according to the intra prediction mode. Predict the current block using

인터 예측부(344)는 복호화부(310)로부터 추출된 인트라 예측 모드에 대한 신택스 요소를 이용하여 현재블록의 움직임 벡터와 그 움직임벡터가 참조하는 참조픽처를 결정하고, 움직임벡터와 참조픽처를 현재블록을 예측한다.The inter prediction unit 344 determines the motion vector of the current block and the reference picture to which the motion vector refers by using the syntax elements of the intra prediction mode extracted from the decoder 310, and determines the current motion vector and the reference picture. Predict the block.

가산기(350)는 역변환부로부터 출력되는 잔차블록과 인터 예측부 또는 인트라 예측부로부터 출력되는 예측블록을 가산하여 현재블록을 복원한다. 복원된 현재블록 내의 픽셀들은 이후에 복호화할 블록을 인트라 예측할 때의 참조픽셀로서 활용된다.The adder 350 reconstructs the current block by adding the residual block output from the inverse transformer and the prediction block output from the inter predictor or the intra predictor. The pixels in the reconstructed current block are utilized as reference pixels when intra prediction of a block to be decoded later.

영상 복원기(300)에 의해 CU들에 해당하는 현재블록들을 순차적으로 복원함으로써, CU들로 구성된 CTU, CTU들로 구성된 픽처가 복원된다.By sequentially reconstructing the current blocks corresponding to the CUs by the image reconstructor 300, a CTU composed of CUs and a picture composed of CTUs are reconstructed.

필터부(360)는 디블록킹 필터(362) 및 SAO 필터(364)를 포함한다. 디블록킹 필터(362)는 블록 단위의 복호화로 인해 발생하는 블록킹 현상(blocking artifact)를 제거하기 위해 복원된 블록 간의 경계를 디블록킹 필터링한다. SAO 필터(364)는, 손실 부호화(lossy coding)으로 인해 발생하는 복원된 픽셀과 원본 픽셀 간의 차이를 보상하기 위해, 디블록킹 필터링 이후의 복원된 블록에 대해 추가적인 필터링을 수행한다. 디블록킹 필터(362) 및 SAO 필터(364)를 통해 필터링된 복원 블록은 메모리(370)에 저장한다. 한 픽처 내의 모든 블록들이 복원되면, 복원된 픽처는 이후에 부호화하고자 하는 픽처 내의 블록을 인터 예측하기 위한 참조 픽처로 사용된다.The filter unit 360 includes a deblocking filter 362 and a SAO filter 364. The deblocking filter 362 deblocks and filters the boundary between the reconstructed blocks to remove blocking artifacts caused by block-by-block decoding. The SAO filter 364 performs additional filtering on the reconstructed block after the deblocking filtering to compensate for the difference between the reconstructed pixel and the original pixel caused by lossy coding. The reconstructed block filtered through the deblocking filter 362 and the SAO filter 364 is stored in the memory 370. When all the blocks in a picture are reconstructed, the reconstructed picture is used as a reference picture for inter prediction of a block in a picture to be encoded later.

본 개시에서는 디블록킹 필터(182, 362)와 SAO 필터(284, 464)의 기능을 갖는 CNN 기반의 필터에 대해 상세히 설명한다. 본 개시에 따른 CNN 기반의 필터는 영상 부호화 장치 및 영상 복호화 장치 모두에서 사용될 수 있다.In the present disclosure, a CNN-based filter having the functions of the deblocking filters 182 and 362 and the SAO filters 284 and 464 will be described in detail. The CNN-based filter according to the present disclosure may be used in both an image encoding apparatus and an image decoding apparatus.

또한, 본 개시에서는 픽처를 구성하는 정보로 YUV를 예로 들어 설명하나, RGB, YCbCr 등에 적용될 수 있다. 즉, 화질을 개선시킬 YUV는 화질을 개선시킬 RGB 또는 화질을 개선시킬 YCbCr 등이 될 수 있다. In addition, although the present disclosure describes YUV as an example of information constituting a picture, it may be applied to RGB, YCbCr, and the like. That is, the YUV for improving image quality may be RGB for improving image quality or YCbCr for improving image quality.

도 4는 본 발명의 일 실시예에 따른 CNN 기반의 필터를 나타낸 도면이다.4 illustrates a CNN-based filter according to an embodiment of the present invention.

입력 레이어에 양자화 파라미터(QP: quantization parameter) 맵(403)과 블록 분할(block partition) 맵(405) 중 적어도 하나와 화질을 개선시킬 YUV(401)가 입력되면 출력 레이어로 YUV 차(difference)(421)가 출력된다. 여기서 화질을 개선시킬 YUV(401)는 부호화기로부터 수신된 비트 스트림으로부터 복원된 YUV(401)일 수 있으며, 원본 YUV가 인위적 또는 비인위적으로 손상된 YUV를 의미한다. 추가적으로 입력 레이어에 힌트(미도시)도 함께 입력될 수 있다. At least one of a quantization parameter (QP) map 403 and a block partition map 405 and a YUV 401 for improving image quality are input to an input layer, and a YUV difference ( 421 is output. Here, the YUV 401 to improve the image quality may be the YUV 401 restored from the bit stream received from the encoder, and means a YUV in which the original YUV is artificially or unartificially damaged. In addition, a hint (not shown) may also be input to the input layer.

먼저, 학습 과정에서는 상기 출력 레이어로 출력된 YUV 차(421)가 원본 YUV와 화질을 개선시킬 YUV의 차가 되도록 CNN 기반의 필터의 계수 즉, 컨볼루션 커널의 계수가 학습된다. 여기서 컨볼루션 커널은 2D(dimension), 3D 형태 모두 가능하다. CNN 기반의 필터(411)는 입력된 YUV의 화질을 개선시키기 위한 것으로 CNN 기반의 필터(411)의 최종 출력은 YUV 즉, 화질이 개선된 YUV(431)로 칭한다.First, in the learning process, the coefficients of the CNN-based filter, that is, the coefficients of the convolution kernel, are trained such that the YUV difference 421 outputted to the output layer is the difference between the original YUV and the YUV that will improve the image quality. The convolution kernel is available in 2D and 3D forms. The CNN-based filter 411 is for improving the quality of the input YUV, and the final output of the CNN-based filter 411 is referred to as YUV, that is, the YUV 431 with improved image quality.

여기서, 화질을 개선시킬 YUV는 각 채널별로 또는 한번에 필터링 될 수 있다. Here, the YUV to improve the image quality may be filtered for each channel or at once.

QP 맵의 크기는 필터링하고자 하는 입력되는 YUV와 동일한 해상도(resolution)로 설정되고, QP 맵의 값은 YUV plane 내 부호화 단위 예를 들어, 블록 혹은 서브 블록에서 사용한 QP 값으로 채워질 수 있다. 이때 YUV가 각 채널별로 필터링된다면, 필터링하고자 하는 채널의 QP 값으로 하나의 맵이 구성될 수 있다. YUV가 한번에 필터링된다면 세 채널의 QP 값이 별도의 3개 맵으로 구성되거나 평균 QP값을 갖는 하나의 맵으로 구성될 수 있다. The size of the QP map may be set to the same resolution as the input YUV to be filtered, and the value of the QP map may be filled with QP values used in coding units in the YUV plane, for example, blocks or sub-blocks. At this time, if the YUV is filtered for each channel, one map may be configured with the QP value of the channel to be filtered. If the YUV is filtered at one time, the QP values of the three channels may consist of three separate maps or one map having an average QP value.

CNN 기법의 정확도를 높이기 위한 방법으로, 입력 레이어로 QP 맵, 블록 분할 맵, 화질 개선 대상 영상 외에 학습 과정에 유용한 블록 모드 맵 정보가 힌트 정보로 추가될 수 있다. 여기서, 블록 모드 맵이란 부호화 단위 예를 들어, 블록 혹은 서브 블록에서 사용한 모드 값으로 채워질 수 있다. 예컨대, 블록이 인트라 모드로 부호화 되었는지 또는 인터 모드로 부호화 되었는지를 구별할 수 있는 정보일 수 있으며 상기 정보는 숫자로 표현될 수 있다. 이때, 학습 과정의 결과물인 컨볼루션 커널 계수는 입력 레이어의 데이터뿐만 아니라 힌트까지 포함하여 설정될 수 있다. 기본적으로 CNN 기법의 학습 과정과 추론 과정의 입력 레이어 및 출력 레이어는 동일하게 구성되어야 한다.As a method for improving the accuracy of the CNN technique, block mode map information useful for the learning process may be added as hint information in addition to the QP map, the block division map, and the image to be improved in quality as an input layer. Here, the block mode map may be filled with a mode value used in a coding unit, for example, a block or a sub block. For example, the information may be information for distinguishing whether a block is encoded in an intra mode or an inter mode, and the information may be represented by a number. In this case, the convolution kernel coefficient that is a result of the learning process may be set to include not only the data of the input layer but also a hint. Basically, the input layer and output layer of the learning process and the inference process of the CNN technique should be configured identically.

이후, 추론 과정에서는 학습 과정에서 구한 상기 CNN 기반의 필터의 계수를 적용해 화질을 개선시킬 YUV, 양자화 파라미터 맵, 및 블록 분할 맵으로부터 화질이 개선된 YUV를 생성한다.Subsequently, in the inference process, the YUV with improved image quality is generated from the YUV, the quantization parameter map, and the block partitioning map to improve the image quality by applying the coefficients of the CNN-based filter obtained in the learning process.

도 5a 내지 도 5c는 본 발명의 일 실시예로 연접 레이어의 위치에 따른 CNN의 구조를 나타낸 도면이다.5A to 5C are diagrams illustrating the structure of a CNN according to a position of a concatenated layer according to an embodiment of the present invention.

구체적으로, 입력 레이어(510)로 입력되는 화질을 개선시킬 YUV(501), 양자화 파라미터 맵(503), 및 블록 분할 맵(505)은 CNN 과정 중 연접(concatenate) 레이어(520)를 통해 연접될 수 있다. 다만 연접 레이어(520)의 위치는 변경될 수 있다.In detail, the YUV 501, the quantization parameter map 503, and the block partition map 505, which will improve the image quality input to the input layer 510, may be concatenated through the concatenate layer 520 during the CNN process. Can be. However, the position of the contiguous layer 520 may be changed.

도 5a는 입력 레이어(510) 뒤에 바로 연접 레이어(520)가 위치해, 화질을 개선시킬 YUV(501), 양자화 파라미터 맵(503), 및 블록 분할 맵(505)이 입력 레이어(510)로 입력된 후 바로 연접되는 CNN의 구조를 나타낸 것이다. 5A illustrates that a concatenated layer 520 is positioned immediately after the input layer 510 so that a YUV 501, a quantization parameter map 503, and a block partition map 505 are input to the input layer 510 to improve image quality. It shows the structure of the CNN that is immediately connected.

도 5b는 연접 레이어(520)가 컨볼루션 레이어(530) 사이에 위치한 CNN의 구조를 나타낸 것이며, 도 5c는 연접 레이어(520)가 출력 레이어(540)의 바로 앞에 위치한 CNN의 구조를 나타낸 것이다.FIG. 5B illustrates the structure of the CNN in which the concatenation layer 520 is located between the convolution layers 530. FIG. 5C illustrates the structure of the CNN in which the concatenation layer 520 is located directly in front of the output layer 540.

도 6a 내지 도 6c는 본 발명의 일 실시예에 따른 입력 레이어로 입력될 데이터를 나타낸 도면이다.6A through 6C illustrate data to be input to an input layer according to an embodiment of the present invention.

구체적으로 도 6a는 화질을 개선시킬 Y plane (예를들어, Y 코딩 트리 블록(coding tree block, CTB))으로써 화질을 개선시킬 휘도(luma)의 픽셀값이 생략된 것을 나타낸 것이며, 도 6b는 화질을 개선시킬 Y plane에 적용된 QP 맵, 도 6c는 화질을 개선시킬 Y plane의 블록 분할 맵을 나타낸 것이다. In detail, FIG. 6A illustrates that a Y plane (for example, a Y coding tree block (CTB)) to improve image quality is omitted in a pixel value of luminance luma to improve image quality. QP map applied to the Y plane to improve the image quality, Figure 6c shows a block division map of the Y plane to improve the image quality.

이하에서는 본 발명의 일 실시예에 따른 블록 분할 맵의 다양한 구조에 대해 설명한다. 블록 분할 맵은 블록의 분할 여부를 표시한 것으로, CNN의 학습 과정과 추론 과정에서 블록의 분할된 경계와 블록의 내부 영역에 대한 처리를 다르게 할 수 있도록 돕는다.Hereinafter, various structures of a block division map according to an embodiment of the present invention will be described. The block partitioning map indicates whether a block is divided or not, so that the processing of the partitioned boundary of the block and the inner region of the block may be differently performed during the CNN's learning process and inference process.

도 7a 및 도 7b는 본 발명의 일 실시예에 따른 블록 분할 맵의 일예를 나타낸 도면이다.7A and 7B are diagrams illustrating an example of a block partitioning map according to an embodiment of the present invention.

블록 분할 맵은 필터링 하고자 하는 YUV plane과 동일한 해상도로 설정되며, 블록 분할 여부를 나타내는 값으로 구성될 수 있다. 예를 들어 YUV plane이 다수 개의 부호화 블록 즉, 코딩 블록(coding block, CB)을 포함하는 코딩 트리 블록으로 구성되는 경우, 블록 분할 맵은 코딩 트리 블록 내의 부호화 블록의 분할 경계선을 나타낼 수 있다. 도 7a는 QTBT(quadtree plus binary tree) 방식으로 분할된 코딩 트리 블록을 나타낸 것이며, 도 7b는 상기 코딩 트리 블록에 따른 블록 분할 맵을 나타낸 것이다. 도 7b를 참조하면, 블록 분할 맵에서 부호화 블록의 경계는 '1'로 표시되고 상기 부호화 블록의 내부는 '0'으로 표시되어 있다. The block division map may be set to the same resolution as the YUV plane to be filtered and may be configured to indicate whether to block the partition. For example, when the YUV plane is composed of a coding tree block including a plurality of coding blocks, that is, coding blocks (CBs), the block partitioning map may represent a division boundary of the coding blocks in the coding tree block. FIG. 7A illustrates a coding tree block divided by a quadtree plus binary tree (QTBT) scheme, and FIG. 7B illustrates a block partitioning map according to the coding tree block. Referring to FIG. 7B, the boundaries of the coding blocks are indicated by '1' and the inside of the coding blocks are indicated by '0' in the block division map.

도 8a 내지 도 8b는 본 발명의 일 실시예에 따른 블록 분할 맵의 다른 일예를 나타낸 도면이다.8A to 8B are diagrams illustrating another example of a block partitioning map according to an embodiment of the present invention.

도 8a 내지 도 8b에서는 YUV plane이 코딩 트리 블록으로 구성되는 경우 상기 코딩 트리 블록의 경계에 대한 블로킹 열화 처리가 불가할 수 있어 YUV plane에 여분의 영역(α)을 더할 수 있다. 도 8a 및 도 8b에서는 여분의 영역(α)으로 2픽셀이 설정된 일 예를 나타내고 있으나 다른 값이 설정될 수도 있다. 또한 상기 여분의 영역으로 코딩 트리 블록의 경계를 표시할 수 있으며, 코딩 트리 블록 밖의 영역에 대해서도 블록 분할 여부가 표시될 수 있다. 만약 여분의 영역을 포함하여 필터링되는 경우 상기 필터링 이후 인접하는 다른 코딩 트리 블록과 겹치는 영역이 생기게 되고 상기 겹치는 영역에 대해서는 평균값으로 처리될 수 있다. 구체적으로 도 8c를 참조하면 여분의 영역을 포함하는 코딩 트리 블록(801, 803)이 인접하는 경우 서로 겹치는 영역(805)이 생긴다. 상기 겹치는 영역(805)에 대해서는 인접하는 코딩 트리 블록(801, 803)의 값들의 평균값으로 설정될 수 있다. In FIGS. 8A to 8B, when the YUV plane consists of coding tree blocks, blocking deterioration processing on boundaries of the coding tree blocks may not be possible, and thus an extra area α may be added to the YUV plane. 8A and 8B show an example in which 2 pixels are set as the extra area α, but other values may be set. In addition, the boundary of the coding tree block may be indicated by the extra area, and whether or not the block is split may be displayed also in the area outside the coding tree block. If the filtering includes an extra area, an area overlapping with another adjacent coding tree block is generated after the filtering, and the overlapping area may be processed as an average value. Specifically, referring to FIG. 8C, when the coding tree blocks 801 and 803 including the extra areas are adjacent to each other, an overlapping area 805 is formed. The overlapping region 805 may be set to an average value of values of adjacent coding tree blocks 801 and 803.

도 9a 내지 도 9c는 본 발명의 일 실시예에 따라 디블록킹의 강도를 조절하기 위한 블록 분할 맵을 나타낸 도면이다. 9A to 9C illustrate block division maps for adjusting the strength of deblocking according to an embodiment of the present invention.

이전의 실시예에서는 부호화 블록의 경계를 1픽셀로 구별하였으며, 상기 1픽셀의 값이 0이면 부호화 블록의 내부를 나타내고 1이면 부호화 블록의 경계를 나타냈다. In the previous embodiment, the boundary of the coding block is distinguished by 1 pixel. When the value of the 1 pixel is 0, the inside of the coding block is represented, and the 1 is the boundary of the coding block.

도 9a에서는 디블록킹(de-blocking)의 강도를 조절하기 위해 부호화 블록의 경계를 픽셀의 개수 (혹은, 픽셀의 너비, 루마 샘플 라인, 루마 샘플 길이 등)로 표시한 것이다. 상기 픽셀의 개수는 부호화 블록의 크기, 양자화 파라미터의 값, 부호화 모드 중 적어도 하나에 의해 정해질 수 있다. 예를 들면 도 9a와 같이 부호화 블록이 클 경우 상기 픽셀의 개수를 2개로 설정할 수 있고 부호화 블록이 작을 경우 상기 픽셀의 개수를 1개로 설정할 수 있다. 또한 양자화 파라미터의 값이 크면 상기 픽셀의 개수를 많이, 양자화 파라미터의 값이 작으면 상기 픽셀의 개수를 적게 설정할 수 있다. 다른 예로 부호화 모드가 인트라(intra)이면 상기 픽셀의 개수를 많이, 인터(inter)이면 상기 픽셀의 개수를 적게 설정할 수 있다. 이들은 모두 역으로도 설정 가능하다. In FIG. 9A, the boundaries of the coding blocks are represented by the number of pixels (or, pixel width, luma sample line, luma sample length, etc.) in order to control the strength of de-blocking. The number of pixels may be determined by at least one of a size of a coding block, a value of a quantization parameter, and an encoding mode. For example, as shown in FIG. 9A, when the coding block is large, the number of pixels may be set to two, and when the coding block is small, the number of pixels may be set to one. In addition, if the value of the quantization parameter is large, the number of pixels may be increased. If the value of the quantization parameter is small, the number of pixels may be set. As another example, when the encoding mode is intra, the number of the pixels may be increased, and when the encoding mode is inter, the number of the pixels may be set to be small. These can all be set in reverse.

상기 픽셀의 개수는 필터링을 통해 업데이트 하고자 하는 블록 경계선에 위치하는 픽셀의 개수를 의미할 수 있다. 예컨대, 블록 경계선에 위치하는 한 블록 내 3 픽셀값을 업데이트 하고자 할 때는, 블록 분할 맵에서 3픽셀로 블록의 경계를 표시할 수 있다. 다른 예로, 상기 픽셀의 개수는 필터링에 참조하려는 블록 경계선에 위치하는 픽셀의 개수를 의미할 수 있다. 예컨대, 블록 경계선에 위치하는 한 블록 내 4 픽셀값을 참고하여 필터링을 진행하고자 할 때는 블록 분할 맵에서 4픽셀로 블록의 경계를 표시할 수 있다.The number of pixels may mean the number of pixels located at a block boundary to be updated by filtering. For example, when a 3 pixel value in one block located at the block boundary line is to be updated, the boundary of the block may be indicated by 3 pixels in the block division map. As another example, the number of pixels may mean the number of pixels located at a block boundary line to be referred to for filtering. For example, when filtering is performed by referring to 4 pixel values of a block located at a block boundary line, the block boundary may be indicated by 4 pixels in the block division map.

도 9b에서는 디블록킹의 강도를 조절하기 위해 부호화 블록의 경계값을 달리 표시한 것이다. 상기 부호화 블록의 경계값은 부호화 블록의 크기, 양자화 파라미터의 값, 부호화 모드, 업데이트 할 픽셀의 개수, 및 필터링에 참조하려는 픽셀의 개수 중 적어도 하나에 의해 정해질 수 있다. 도 7a와 마찬가지로 부호화 블록이 크거나, 양자화 파라미터의 값이 크거나 부호화 모드가 인트라(intra)이면 상기 부호화 블록의 경계값을 크게 설정할 수 있고 반대로 부호화 블록이 작거나, 양자화 파라미터의 값이 작거나 부호화 모드가 인터(inter)이면 상기 부호화 블록의 경계값을 작게 설정할 수 있다. 이들 또한 모두 역으로도 설정 가능하다.In FIG. 9B, the boundary values of the coding blocks are differently represented in order to control the strength of deblocking. The boundary value of the coding block may be determined by at least one of a size of a coding block, a value of a quantization parameter, an encoding mode, a number of pixels to be updated, and a number of pixels to be referred to for filtering. As shown in FIG. 7A, when the coding block is large, the value of the quantization parameter is large, or the coding mode is intra, the boundary value of the coding block can be set large, and conversely, the coding block is small, or the value of the quantization parameter is small. If the encoding mode is inter, the boundary value of the coding block may be set small. Both of these can also be set in reverse.

도 9c는 디블록킹의 강도를 조절하기 위해 부호화 블록 경계의 픽셀의 개수 및 부호화 블록의 경계값으로 표시한 것이다. 이에 대한 설명은 도 9a 및 도 9b에서 설명한 바와 같아 여기서는 생략한다. FIG. 9C shows the number of pixels at the coding block boundary and the boundary value of the coding block in order to adjust the strength of deblocking. Descriptions thereof will be omitted herein as described with reference to FIGS. 9A and 9B.

이상에서 설명한 바와 같이 설정된 블록 분할 맵은 학습과정에 이용되어 CNN 필터가 강한 강도의 디블록킹 필터로써 동작하도록 돕는다. As described above, the configured block partitioning map is used in the learning process to help the CNN filter to operate as a strong deblocking filter.

도 10은 본 개시에 따라 CNN 기반의 필터를 이용해 영상을 복호화하는 순서도를 나타낸 도면이다.10 is a flowchart of decoding an image using a CNN-based filter according to the present disclosure.

상기 CNN 기반의 필터에 양자화 파라미터 맵과 블록 분할 맵 중 적어도 하나와 화질을 개선시킬 YUV가 입력된다(1001). 상기 양자화 파라미터 맵은 상기 화질을 개선시킬 YUV와 동일한 해상도로 설정될 수 있다. 상기 블록 분할 맵은 블록의 분할된 경계와 상기 블록의 내부 영역을 값을 달리 표시될 수 있다. 상기 블록 분할 맵에서 블록의 분할된 경계를 나타내는 픽셀의 개수와 값은 부호화 블록의 크기, 양자화 파라미터의 값, 부호화 모드, 업데이트 할 픽셀의 개수, 및 필터링에 참조하려는 픽셀의 개수 중 적어도 하나에 의해 결정될 수 있다.At least one of a quantization parameter map and a block partitioning map and a YUV for improving image quality are input to the CNN-based filter (1001). The quantization parameter map may be set to the same resolution as the YUV to improve the image quality. The block partitioning map may be displayed differently between the partitioned boundary of the block and the inner region of the block. The number and value of pixels representing the partitioned boundary of the block in the block division map are determined by at least one of the size of the coding block, the value of the quantization parameter, the encoding mode, the number of pixels to be updated, and the number of pixels to be referred to for filtering. Can be determined.

화질을 개선시킬 YUV, 양자화 파라미터 맵, 및 블록 분할 맵을 입력으로, 원본의 YUV를 최종 출력으로 하여 학습된 상기 CNN 기반의 필터의 계수를 이용해 화질이 개선된 YUV가 출력된다(1003). 상기 CNN 기반의 필터로 블록 모드 맵과 같은 힌트가 추가로 입력되는 경우, 상기 CNN 기반의 필터의 계수도 힌트가 추가로 입력되어 학습된다. As an input of a YUV, a quantization parameter map, and a block partitioning map for improving image quality, a YUV having improved image quality is output using the coefficients of the CNN-based filter learned by using the original YUV as a final output (1003). When a hint such as a block mode map is additionally input to the CNN-based filter, the hint of the CNN-based filter is additionally input and learned.

도 10에 도시된 과정들은 컴퓨터로 읽을 수 있는 기록매체에 컴퓨터가 읽을 수 있는 코드로서 구현하는 것이 가능하다. 컴퓨터가 읽을 수 있는 기록매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록장치를 포함한다. 즉, 컴퓨터가 읽을 수 있는 기록매체는 마그네틱 저장매체(예를 들면, 롬, 플로피 디스크, 하드디스크 등), 광학적 판독 매체(예를 들면, 시디롬, 디브이디 등) 및 캐리어 웨이브(예를 들면, 인터넷을 통한 전송)와 같은 저장매체를 포함한다. 또한 컴퓨터가 읽을 수 있는 기록매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어 분산방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다.The processes shown in FIG. 10 may be embodied as computer readable codes on a computer readable recording medium. The computer-readable recording medium includes all kinds of recording devices in which data that can be read by a computer system is stored. That is, the computer-readable recording medium may be a magnetic storage medium (for example, ROM, floppy disk, hard disk, etc.), an optical reading medium (for example, CD-ROM, DVD, etc.) and a carrier wave (for example, the Internet Storage medium). The computer readable recording medium can also be distributed over network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion.

도 11은 본 개시에 따른 영상을 복호화하는 장치의 구성도를 개략적으로 나타낸 도면이다.11 is a diagram schematically illustrating a configuration of an apparatus for decoding an image according to the present disclosure.

상기 영상을 복호화하는 장치는 입력부(1101), 필터부(1103), 및 출력부(1105)를 포함할 수 있다. 그 외의 다른 구성을 포함할 수 있으나 본 개시와 직접적 관련이 없는 구성에 대한 설명은 생략하기로 한다. The apparatus for decoding the image may include an input unit 1101, a filter unit 1103, and an output unit 1105. Other configurations may be included, but a description of components not directly related to the present disclosure will be omitted.

입력부(1101)는 양자화 파라미터 맵과 블록 분할 맵 중 적어도 하나와 화질을 개선시킬 YUV가 입력된다. 상기 양자화 파라미터 맵은 상기 화질을 개선시킬 YUV와 동일한 해상도로 설정될 수 있으며, 상기 블록 분할 맵은 블록의 분할된 경계와 상기 블록의 내부 영역의 값을 달리 표시될 수 있다. 상기 블록 분할 맵에서 블록의 분할된 경계를 나타내는 픽셀의 개수와 값은 부호화 블록의 크기, 양자화 파라미터의 값, 부호화 모드 중 적어도 하나에 의해 결정될 수 있다.The input unit 1101 receives at least one of a quantization parameter map and a block partitioning map and a YUV for improving image quality. The quantization parameter map may be set to the same resolution as the YUV to improve the image quality, and the block partitioning map may be displayed differently between the partitioned boundary of the block and the internal region of the block. The number and value of pixels representing the divided boundary of the block in the block division map may be determined by at least one of a size of a coding block, a value of a quantization parameter, and an encoding mode.

필터부(1103)는 입력부(1101)로 입력된 양자화 파라미터 맵과 블록 분할 맵 중 적어도 하나와 화질을 개선시킬 YUV에 학습된 상기 CNN 기반의 필터의 계수를 적용한다.The filter unit 1103 applies coefficients of the learned CNN-based filter to at least one of the quantization parameter map and the block division map input to the input unit 1101 and YUV to improve the image quality.

출력부(1105)는 입력된 양자화 파라미터 맵과 블록 분할 맵 중 적어도 하나와 화질을 개선시킬 YUV에 학습된 상기 CNN 기반의 필터의 계수를 적용해 생성한 화질이 개선된 YUV를 출력한다. The output unit 1105 outputs at least one of the input quantization parameter map and the block division map and the YUV having the improved image quality generated by applying the coefficients of the CNN-based filter learned to the YUV to improve the image quality.

본 개시에서는 입력부(1101), 필터부(1103), 및 출력부(1105)로 나누어 설명하나, 하나의 구성으로 통합되어 구현될 수 있으며 또는 하나의 구성이 여러 개의 구성으로 나누어 구현될 수도 있다.In the present disclosure, the input unit 1101, the filter unit 1103, and the output unit 1105 are described separately, but may be integrated into one configuration and implemented, or one configuration may be implemented by dividing into multiple configurations.

이상의 설명은 본 실시예의 기술 사상을 예시적으로 설명한 것에 불과한 것으로서, 본 실시예가 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 실시예의 본질적인 특성에서 벗어나지 않는 범위에서 다양한 수정 및 변형이 가능할 것이다. 따라서, 본 실시예들은 본 실시예의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시예에 의하여 본 실시예의 기술 사상의 범위가 한정되는 것은 아니다. 본 실시예의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 실시예의 권리범위에 포함되는 것으로 해석되어야 할 것이다.The above description is merely illustrative of the technical idea of the present embodiment, and those skilled in the art to which the present embodiment belongs may make various modifications and changes without departing from the essential characteristics of the present embodiment. Therefore, the present embodiments are not intended to limit the technical idea of the present embodiment but to describe the present invention, and the scope of the technical idea of the present embodiment is not limited by these embodiments. The scope of protection of the present embodiment should be interpreted by the following claims, and all technical ideas within the scope equivalent thereto should be construed as being included in the scope of the present embodiment.

Claims

In the video decoding method using a CNN (Convolutional Neural Network) based filter,
Inputting at least one of a quantization parameter map and a block division map and a first picture to the CNN-based filter; And
Outputting a second picture,
The quantization parameter map indicates information about coding units constituting the first picture, and the block division map indicates information about divided regions constituting the first picture.

The method of claim 1,
And a coefficient of the CNN-based filter is trained using at least one of a quantization parameter map and a block division map, a third picture, and an original picture.

The method of claim 1,
The quantization parameter map is set to the same resolution as the first picture.

The method of claim 1,
And inputting a block mode map indicating an encoding mode to the CNN-based filter.

The method of claim 1,
And the block division map displays different values of the divided boundary of the block and the internal region of the block.

The method of claim 1,
The number of pixels representing the divided boundary of the block in the block division map is determined by at least one of the size of the coding block, the value of the quantization parameter, the encoding mode, the number of pixels to be updated, and the number of pixels to be referred to for filtering. An image decoding method characterized by.

The method of claim 1,
In the block division map, a value of a pixel representing a partitioned boundary of a block is determined by at least one of a size of a coding block, a value of a quantization parameter, an encoding mode, a number of pixels to be updated, and a number of pixels to be referred to for filtering. An image decoding method characterized by.

The method of claim 1,
The coefficient of the CNN-based filter is an image decoding method, characterized in that received from the device for encoding the image.

In the image decoding apparatus using a CNN (Convolutional Neural Network) based filter,
An input unit configured to receive at least one of a quantization parameter map and a block division map and a first picture;
A filter unit which applies coefficients of the CNN-based filter to at least one of the quantization parameter map and the block division map and the first picture input to the input unit; And
An output unit for outputting a second picture by applying at least one of the quantization parameter map and the block division map and coefficients of the CNN-based filter to the first picture,
And wherein the quantization parameter map indicates information about coding units constituting the first picture, and the block partition map indicates information about divided regions constituting the first picture.

The method of claim 9,
The coefficient of the CNN-based filter is trained using at least one of a quantization parameter map, a block division map, a third picture, and an original picture.

The method of claim 9,
And the quantization parameter map is set to the same resolution as the first picture.

The method of claim 9,
The input unit further receives a block mode map indicating an encoding mode,
And the coefficient of the CNN-based filter is additionally learned by further inputting the block mode map.

The method of claim 9,
And the block partitioning map differently displays the divided boundary of the block and the value of the inner region of the block.

The method of claim 9,
The number of pixels representing the divided boundary of the block in the block division map is determined by at least one of the size of the coding block, the value of the quantization parameter, the encoding mode, the number of pixels to be updated, and the number of pixels to be referred to for filtering. An image decoding device.

The method of claim 9,
In the block division map, a value of a pixel representing a partitioned boundary of a block is determined by at least one of a size of a coding block, a value of a quantization parameter, an encoding mode, a number of pixels to be updated, and a number of pixels to be referred to for filtering. An image decoding device.

The method of claim 9,
And a coefficient of the CNN-based filter is received from an apparatus for encoding an image.