KR102648464B1

KR102648464B1 - Method and apparatus for image enhancement using supervised learning

Info

Publication number: KR102648464B1
Application number: KR1020180072499A
Authority: KR
Inventors: 나태영; 이선영; 신재섭; 손세훈; 김효성
Original assignee: 에스케이텔레콤 주식회사
Priority date: 2018-06-25
Filing date: 2018-06-25
Publication date: 2024-03-15
Also published as: KR20200000543A

Abstract

지도 학습을 이용한 영상 개선 방법 및 장치를 개시한다.
본 실시예의 일 측면에 의하면, CNN 기반의 필터를 이용하여 영상을 복호화하는 방법에 있어서, 양자화 파라미터 맵과 블록 분할 맵 중 적어도 하나와 제1 픽처를 상기 CNN 기반의 필터로 입력하는 단계, 및 제2 픽처를 출력하는 단계를 포함하고, 상기 양자화 파라미터 맵은 상기 제1 픽처를 구성하는 부호화 단위에 대한 정보를 나타내고, 상기 블록 분할 맵은 상기 제1 픽처를 구성하는 분할된 영역에 대한 정보를 나타냄을 특징으로 하는 영상을 복호화 하는 방법을 제공한다.Disclosed is an image improvement method and device using supervised learning.
According to one aspect of the present embodiment, in a method of decoding an image using a CNN-based filter, inputting at least one of a quantization parameter map and a block partition map and a first picture into the CNN-based filter, and A step of outputting two pictures, wherein the quantization parameter map represents information about coding units constituting the first picture, and the block partition map represents information about a divided region constituting the first picture. Provides a method of decoding an image characterized by .

Description

Image improvement method and apparatus using supervised learning {Method and apparatus for image enhancement using supervised learning}

본 발명은 지도 학습을 이용한 영상 개선 방법 및 장치에 관한 것이다.The present invention relates to an image improvement method and device using supervised learning.

이 부분에 기술된 내용은 단순히 본 발명에 대한 배경 정보를 제공할 뿐 종래기술을 구성하는 것은 아니다.The content described in this section simply provides background information about the present invention and does not constitute prior art.

딥 러닝 기술을 이용한 영상 부호화 기술이 많은 전문가들에 의해 주목받고 있다. 인-루프(In-loop) 필터인 bilateral filter, 디블록킹 필터(deblocking filter), 및 sample adaptive offset(SAO)을 딥 러닝 기술로 학습된 하나의 컨볼루션 신경망(Convolutional Neural Network) 필터로 대체하여 3.57%의 BDBR(Bjontegaard-delta bit rate) gain을 all intra 실험조건(common testing condition)에서 달성했다. 기존 영상 부복호화 기술에 딥 러닝 기술이 효과적으로 결합될 수 있음이 실험에 의해 증명됨에 따라, 딥 러닝 기술이 크게 활용될 것으로 예상된다.Video encoding technology using deep learning technology is receiving attention from many experts. By replacing the in-loop filters, bilateral filter, deblocking filter, and sample adaptive offset (SAO) with a single convolutional neural network filter learned using deep learning technology, 3.57 % BDBR (Bjontegaard-delta bit rate) gain was achieved under all intra-testing conditions (common testing conditions). As experiments have proven that deep learning technology can be effectively combined with existing video encoding and decoding technology, it is expected that deep learning technology will be greatly utilized.

딥 러닝이란 기계 학습의 한 분야로, 다수의 레이어로 연결된 비선형 변환 기법들의 조합을 통해 다량의 데이터들 사이의 핵심적인 구조를 학습하는 방법들의 집합을 의미한다. 기계 학습은 학습시키는 방식에 따라 크게 지도 학습(Supervised Learning), 비지도 학습(Unsupervised Learning), 그리고 강화 학습(Reinforcement Learning)으로 분류될 수 있다. 이 중에서 지도 학습은 학습하는 데이터에 대한 명시적인 정답인 레이블(label)이 주어진 상태에서 학습을 수행하는 방법을 일컫는다.Deep learning is a field of machine learning and refers to a set of methods that learn the core structure between large amounts of data through a combination of non-linear transformation techniques connected to multiple layers. Machine learning can be broadly classified into supervised learning, unsupervised learning, and reinforcement learning depending on the learning method. Among these, supervised learning refers to a method of performing learning in a state where a label, which is an explicit correct answer to the learning data, is given.

지도 학습 중 대표적인 기법은 인간 뇌의 동작 구조를 공학적으로 모델링한 인공 신경망(Neural Network, 이하 'NN'이라 칭함)이다. NN을 여러 레이어(계층)로 깊이 쌓은 구조를 심층 신경망(Deep Neural Network)이라 지칭한다. 일반적으로 NN은 각 레이어의 모든 노드가 완전 연결된 구조를 가지고 있으며, 이를 영상 처리에 적합하도록 컨볼루션 커널들로 연결된 구조를 컨볼루션 신경망(Convolutional Neural Network, 이하 'CNN'이라 칭함)이라 지칭한다.A representative technique among supervised learning is an artificial neural network (Neural Network, hereinafter referred to as 'NN'), which is an engineered model of the operating structure of the human brain. A structure in which NNs are stacked deeply into multiple layers is called a deep neural network. In general, NN has a structure in which all nodes in each layer are fully connected, and the structure connected with convolution kernels to be suitable for image processing is called a convolutional neural network (hereinafter referred to as 'CNN').

도 1은 영상 개선을 위한 CNN 구조를 나타낸 도면이다. Figure 1 is a diagram showing a CNN structure for image improvement.

도 1을 참조하면, CNN은 입력 레이어(110), 출력 레이어(120)와 컨볼루션 레이어(130)를 포함한다. 컨볼루션 레이어(130)는 다수의 레이어(132, 134, 136, 138, 140)로 구성될 수 있다. 모든 컨볼루션 레이어를 기반으로 확률 분포 모델이 구성될 수 있다. Referring to Figure 1, CNN includes an input layer 110, an output layer 120, and a convolution layer 130. The convolutional layer 130 may be composed of multiple layers 132, 134, 136, 138, and 140. A probability distribution model can be constructed based on all convolutional layers.

CNN은 크게 학습 과정과 추론 과정으로 구분된다. 학습 과정에서 입력 레이어에는 화질 개선 대상 영상 즉, 이미지가 데이터로 입력된다. 각 컨볼루션 레이어의 컨볼루션 커널 계수들은 학습 전에 초기화되며, 오류 역전파(error backpropagation) 알고리즘에 의해 출력 레이어의 데이터인 이미지와 출력 레이어의 레이블 즉, 원본 화질의 이미지 사이의 에러가 최소가 되도록 학습된다. 출력 레이어의 정확도는 학습 과정 중 입/출력 레이어, 컨볼루션 레이어 설계, 및/또는 오류 역전파 알고리즘 등에 의해 달라질 수 있다.CNN is largely divided into a learning process and an inference process. During the learning process, the image to be improved is input to the input layer as data. The convolution kernel coefficients of each convolution layer are initialized before learning, and the error backpropagation algorithm is used to minimize the error between the image, which is the data of the output layer, and the label of the output layer, that is, the image of the original quality. do. The accuracy of the output layer may vary depending on the input/output layer, convolution layer design, and/or error backpropagation algorithm during the learning process.

이후 추론 과정에서는 다수의 학습 과정을 통해 산출된 컨볼루션 커널 계수를 적용하여 화질 개선 대상 이미지로부터 화질 개선된 이미지를 추론할 수 있다.In the subsequent inference process, the image with improved quality can be inferred from the image to be improved by applying the convolution kernel coefficient calculated through multiple learning processes.

본 실시예는, 영상 부호화기 및 복호화기에서 영상을 개선하고 양자화 에러 및 블록킹 열화를 해결할 수 있는 방법 및 장치를 제공하는 데 주된 목적이 있다.The main purpose of this embodiment is to provide a method and device that can improve images and solve quantization errors and blocking degradation in image encoders and decoders.

본 실시예의 일 측면에 의하면, CNN 기반의 필터를 이용하여 영상을 복호화하는 방법에 있어서, 양자화 파라미터 맵과 블록 분할 맵 중 적어도 하나와 제1 픽처를 상기 CNN 기반의 필터로 입력하는 단계, 및 제2 픽처를 출력하는 단계를 포함하고, 상기 양자화 파라미터 맵은 상기 제1 픽처를 구성하는 부호화 단위에 대한 정보를 나타내고, 상기 블록 분할 맵은 상기 제1 픽처를 구성하는 분할된 영역에 대한 정보를 나타냄을 특징으로 하는 영상을 복호화 하는 방법을 제공한다.According to one aspect of the present embodiment, in a method of decoding an image using a CNN-based filter, inputting at least one of a quantization parameter map and a block partition map and a first picture into the CNN-based filter, and A step of outputting two pictures, wherein the quantization parameter map represents information about coding units constituting the first picture, and the block partition map represents information about a divided region constituting the first picture. Provides a method for decoding an image characterized by .

본 실시예의 다른 측면에 의하면, CNN 기반의 필터를 이용한 영상 복호화 장치에 있어서, 양자화 파라미터 맵과 블록 분할 맵 중 적어도 하나와 제1 픽처를 입력받는 입력부, 상기 입력부에 입력된 상기 양자화 파라미터 맵과 상기 블록 분할 맵 중 적어도 하나와 상기 제1 픽처에 상기 CNN 기반의 필터의 계수를 적용하는 필터부, 및 상기 양자화 파라미터 맵과 상기 블록 분할 맵 중 적어도 하나와 상기 제1 픽처에 상기 CNN 기반의 필터의 계수를 적용해 제2 픽처를 출력하는 출력부를 포함하고, 상기 양자화 파라미터 맵은 상기 제1 픽처를 구성하는 부호화 단위를 나타내고, 상기 블록 분할 맵은 상기 제1 픽처를 구성하는 분할된 영역에 대한 정보를 나타냄을 특징으로 하는 영상 복호화 장치를 제공한다.According to another aspect of the present embodiment, in an image decoding device using a CNN-based filter, an input unit that receives at least one of a quantization parameter map and a block partition map and a first picture, the quantization parameter map input to the input unit, and the A filter unit for applying coefficients of the CNN-based filter to at least one of the block partition map and the first picture, and at least one of the quantization parameter map and the block partition map and the CNN-based filter to the first picture. An output unit that applies a coefficient to output a second picture, wherein the quantization parameter map represents a coding unit constituting the first picture, and the block partition map is information about a divided region constituting the first picture. Provides a video decoding device characterized in that it represents.

이상에서 설명한 바와 같이 본 실시예에 의하면, 지도 학습을 통해 학습된 필터를 이용하여 영상 개선, 양자화 에러, 및 블록킹 열화(blocking artifact)를 해결할 수 있다.As described above, according to this embodiment, image improvement, quantization error, and blocking artifacts can be resolved using a filter learned through supervised learning.

도 1은 영상 개선을 위한 CNN 구조를 나타낸 도면,
도 2는 본 개시의 기술들을 구현할 수 있는 영상 부호화 장치에 대한 예시적인 블록도를 나타낸 도면,
도 3은 본 개시의 기술들을 구현할 수 있는 영상 복호화 장치의 예시적인 블록도를 나타낸 도면,
도 4는 본 발명의 일 실시예에 따른 CNN 기반의 필터를 나타낸 도면,
도 5a 내지 도 5c는 본 발명의 일 실시예로 연접 레이어의 위치에 따른 CNN의 구조를 나타낸 도면,
도 6a 내지 도 6c는 본 발명의 일 실시예에 따른 입력 레이어로 입력될 데이터를 나타낸 도면,
도 7a 및 도 7b는 본 발명의 일 실시예에 따른 블록 분할 맵의 일예를 나타낸 도면,
도 8a 내지 도 8c는 본 발명의 일 실시예에 따른 블록 분할 맵의 다른 일예를 나타낸 도면,
도 9a 내지 도 9c는 본 발명의 일 실시예에 따라 디블록킹의 강도를 조절하기 위한 블록 분할 맵을 나타낸 도면,
도 10은 본 개시에 따라 CNN 기반의 필터를 이용해 영상을 복호화하는 순서도를 나타낸 도면,
도 11은 본 개시에 따른 영상을 복호화하는 장치의 구성도를 개략적으로 나타낸 도면이다.Figure 1 is a diagram showing a CNN structure for image improvement,
2 is a diagram illustrating an example block diagram of a video encoding device capable of implementing the techniques of the present disclosure;
3 is a diagram illustrating an example block diagram of an image decoding device capable of implementing the techniques of the present disclosure;
Figure 4 is a diagram showing a CNN-based filter according to an embodiment of the present invention;
Figures 5a to 5c are diagrams showing the structure of a CNN according to the location of the concatenated layer according to an embodiment of the present invention;
6A to 6C are diagrams showing data to be input to an input layer according to an embodiment of the present invention;
7A and 7B are diagrams showing an example of a block division map according to an embodiment of the present invention;
8A to 8C are diagrams showing another example of a block division map according to an embodiment of the present invention;
9A to 9C are diagrams showing block division maps for adjusting the strength of deblocking according to an embodiment of the present invention;
FIG. 10 is a flow chart illustrating an image decoding using a CNN-based filter according to the present disclosure;
Figure 11 is a diagram schematically showing the configuration of an apparatus for decoding an image according to the present disclosure.

이하, 본 발명의 일부 실시예들을 예시적인 도면을 통해 상세하게 설명한다. 각 도면의 구성요소들에 참조부호를 부가함에 있어서, 동일한 구성요소들에 대해서는 비록 다른 도면상에 표시되더라도 가능한 한 동일한 부호를 가지도록 하고 있음에 유의해야 한다. 또한, 본 발명을 설명함에 있어, 관련된 공지 구성 또는 기능에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명은 생략한다.Hereinafter, some embodiments of the present invention will be described in detail through illustrative drawings. When adding reference numerals to components in each drawing, it should be noted that identical components are given the same reference numerals as much as possible even if they are shown in different drawings. Additionally, in describing the present invention, if it is determined that a detailed description of a related known configuration or function may obscure the gist of the present invention, the detailed description will be omitted.

또한, 본 발명의 구성 요소를 설명하는 데 있어서, 제 1, 제 2, A, B, (a), (b) 등의 용어를 사용할 수 있다. 이러한 용어는 그 구성 요소를 다른 구성 요소와 구별하기 위한 것일 뿐, 그 용어에 의해 해당 구성 요소의 본질이나 차례 또는 순서 등이 한정되지 않는다. 명세서 전체에서, 어떤 부분이 어떤 구성요소를 '포함', '구비'한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다. 또한, 명세서에 기재된 '…부', '모듈' 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어나 소프트웨어 또는 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다.Additionally, when describing the components of the present invention, terms such as first, second, A, B, (a), and (b) may be used. These terms are only used to distinguish the component from other components, and the nature, sequence, or order of the component is not limited by the term. Throughout the specification, when a part is said to 'include' or 'have' a certain component, this means that it does not exclude other components but may further include other components, unless specifically stated to the contrary. . In addition, ‘…’ stated in the specification. Terms such as 'unit' and 'module' refer to a unit that processes at least one function or operation, and may be implemented as hardware, software, or a combination of hardware and software.

도 2는 본 개시의 기술들을 구현할 수 있는 영상 부호화 장치에 대한 예시적인 블록도를 나타낸 도면이다.FIG. 2 is a diagram illustrating an example block diagram of a video encoding device capable of implementing the techniques of the present disclosure.

영상 부호화 장치는 블록 분할부(210), 예측부(220), 감산기(230), 변환부(240), 양자화부(245), 부호화부(250), 역양자화부(260), 역변환부(265), 가산기(270), 필터부(280) 및 메모리(290)를 포함한다. 영상 부호화 장치는 각 구성요소가 하드웨어 칩으로 구현될 수 있으며, 또는 소프트웨어로 구현되고 하나 이상의 마이크로프로세서가 각 구성요소에 대응하는 소프트웨어의 기능을 실행하도록 구현될 수도 있다.The image encoding device includes a block division unit 210, a prediction unit 220, a subtractor 230, a transform unit 240, a quantization unit 245, an encoder 250, an inverse quantization unit 260, and an inverse transform unit ( 265), an adder 270, a filter unit 280, and a memory 290. Each component of an image encoding device may be implemented as a hardware chip, or may be implemented as software and one or more microprocessors may be implemented to execute software functions corresponding to each component.

하나의 영상(비디오)는 복수의 픽처들로 구성된다. 각 픽처들은 복수의 영역으로 분할되고 각 영역마다 부호화가 수행된다. 예를 들어, 하나의 픽처는 하나 이상의 슬라이스(slice) 또는/및 타일(Tile)로 분할되고, 각 슬라이스 또는 타일은 하나 이상의 CTU(Coding Tree Unit)로 분할된다. 그리고 각 CTU는 트리 구조에 의해 하나 이상의 CU(Coding Unit)들로 분할된다. 각 CU에 적용되는 정보들은 CU의 신택스로서 부호화되고, 하나의 CTU에 포함된 CU들에 공통적으로 적용되는 정보는 CTU의 신택스로서 부호화된다. One image (video) consists of multiple pictures. Each picture is divided into a plurality of regions and encoding is performed for each region. For example, one picture is divided into one or more slices and/or tiles, and each slice or tile is divided into one or more Coding Tree Units (CTUs). And each CTU is divided into one or more CUs (Coding Units) by a tree structure. Information applied to each CU is encoded as the syntax of the CU, and information commonly applied to CUs included in one CTU is encoded as the syntax of the CTU.

블록 분할부(210)는 CTU(Coding Tree Unit)의 크기를 결정한다. CTU의 크기에 대한 정보(CTU size)는 SPS(Sequence Parameter Set) 또는 PPS(Picture Parameter Set)의 신택스로서 부호화되어 영상 복호화 장치로 전달된다. 블록 분할부(210)는 영상을 구성하는 각 픽처(picture)를 결정된 크기의 복수의 CTU(Coding Tree Unit)로 분할한 이후에, CTU를 트리 구조(tree structure)를 이용하여 반복적으로(recursively) 분할한다. 본 개시에서 블록 분할부(210)는 각 픽처들의 분할된 영역에 대한 정보를 나타내기 위한 블록 분할 맵을 추가로 더 생성할 수 있다.The block division unit 210 determines the size of the CTU (Coding Tree Unit). Information about the size of the CTU (CTU size) is encoded as a syntax of SPS (Sequence Parameter Set) or PPS (Picture Parameter Set) and transmitted to the video decoding device. The block division unit 210 divides each picture constituting the image into a plurality of CTUs (Coding Tree Units) of a determined size, and then recursively divides the CTUs using a tree structure. Divide. In the present disclosure, the block division unit 210 may additionally generate a block division map to indicate information about the divided regions of each picture.

예측부(220)는 현재블록을 예측하여 예측블록을 생성한다. 예측부(220)는 인트라 예측부(222)와 인터 예측부(224)를 포함한다. The prediction unit 220 predicts the current block and generates a prediction block. The prediction unit 220 includes an intra prediction unit 222 and an inter prediction unit 224.

일반적으로, 픽처 내 현재블록들은 각각 예측적으로 코딩될 수 있다. 현재블록의 예측은 (현재블록을 포함하는 픽처로부터의 데이터를 사용하는) 인트라 예측 기술 또는 (현재블록을 포함하는 픽처 이전에 코딩된 픽처로부터의 데이터를 사용하는) 인터 예측 기술을 사용하여 일반적으로 수행될 수 있다. 인터 예측은 단방향 예측과 양방향 예측 모두를 포함한다.In general, each current block in a picture can be coded predictively. Prediction of the current block is typically performed using intra prediction techniques (using data from the picture containing the current block) or inter prediction techniques (using data from pictures coded before the picture containing the current block). It can be done. Inter prediction includes both one-way prediction and two-way prediction.

인트라 예측부(222)는 현재블록이 포함된 현재 픽처 내에서 현재블록의 주변에 위치한 픽셀(참조 픽셀)들을 이용하여 현재블록 내의 픽셀들을 예측한다. 예측 방향에 따라 복수의 인트라 예측모드가 존재한다. The intra prediction unit 222 predicts pixels within the current block using pixels (reference pixels) located around the current block within the current picture including the current block. There are multiple intra prediction modes depending on the prediction direction.

인트라 예측부(222)는 현재블록을 부호화하는데 사용할 인트라 예측 모드를 결정할 수 있다. 일부 예들에서, 인트라 예측부(222)는 여러 인트라 예측 모드들을 사용하여 현재블록을 인코딩하고, 테스트된 모드들로부터 사용할 적절한 인트라 예측 모드를 선택할 수도 있다. 예를 들어, 인트라 예측부(222)는 여러 테스트된 인트라 예측 모드들에 대한 레이트 왜곡(rate-distortion) 분석을 사용하여 레이트 왜곡 값들을 계산하고, 테스트된 모드들 중 최선의 레이트 왜곡 특징들을 갖는 인트라 예측 모드를 선택할 수도 있다. The intra prediction unit 222 may determine the intra prediction mode to be used to encode the current block. In some examples, intra prediction unit 222 may encode the current block using multiple intra prediction modes and select an appropriate intra prediction mode to use from the tested modes. For example, the intra prediction unit 222 calculates rate-distortion values using rate-distortion analysis for several tested intra-prediction modes, and selects the rate-distortion values with the best rate-distortion characteristics among the tested modes. You can also select intra prediction mode.

인트라 예측부(222)는 복수의 인트라 예측 모드 중에서 하나의 인트라 예측 모드를 선택하고, 선택된 인트라 예측 모드에 따라 결정되는 주변 픽셀(참조 픽셀)과 연산식을 사용하여 현재블록을 예측한다. 선택된 인트라 예측 모드에 대한 정보는 부호화부(250)에 의해 부호화되어 영상 복호화 장치로 전달된다.The intra prediction unit 222 selects one intra prediction mode from a plurality of intra prediction modes and predicts the current block using surrounding pixels (reference pixels) and an operation equation determined according to the selected intra prediction mode. Information about the selected intra prediction mode is encoded by the encoder 250 and transmitted to the video decoding device.

인터 예측부(224)는 움직임 보상 과정을 통해 현재블록에 대한 예측블록을 생성한다. 현재 픽처보다 먼저 부호화 및 복호화된 참조픽처 내에서 현재블록과 가장 유사한 블록을 탐색하고, 그 탐색된 블록을 이용하여 현재블록에 대한 예측블록을 생성한다. 그리고, 현재 픽처 내의 현재블록과 참조픽처 내의 예측블록 간의 변위(displacement)에 해당하는 움직임벡터(motion vector)를 생성한다. 일반적으로, 움직임 추정은 루마(luma) 성분에 대해 수행되고, 루마 성분에 기초하여 계산된 모션 벡터는 루마 성분 및 크로마 성분 모두에 대해 사용된다. 현재블록을 예측하기 위해 사용된 참조픽처에 대한 정보 및 움직임벡터에 대한 정보를 포함하는 움직임 정보는 부호화부(250)에 의해 부호화되어 영상 복호화 장치로 전달된다. The inter prediction unit 224 generates a prediction block for the current block through a motion compensation process. The block most similar to the current block is searched within a reference picture that has been encoded and decoded before the current picture, and a prediction block for the current block is generated using the searched block. Then, a motion vector corresponding to the displacement between the current block in the current picture and the prediction block in the reference picture is generated. Typically, motion estimation is performed on the luma component, and motion vectors calculated based on the luma component are used for both the luma component and the chroma component. Motion information including information about reference pictures and motion vectors used to predict the current block is encoded by the encoder 250 and transmitted to the video decoding device.

감산기(230)는 현재블록으로부터 인트라 예측부(222) 또는 인터 예측부(124)에 의해 생성된 예측블록을 감산하여 잔차 블록을 생성한다.The subtractor 230 generates a residual block by subtracting the prediction block generated by the intra prediction unit 222 or the inter prediction unit 124 from the current block.

변환부(240)는 공간 영역의 픽셀 값들을 가지는 잔차 블록 내의 잔차 신호를 주파수 도메인의 변환 계수로 변환한다. 변환부(240)는 잔차 블록 내의 잔차 신호들을 현재블록의 크기를 변환 단위로 사용하여 변환할 수 있으며, 또는 잔차 블록을 더 작은 복수의 서브블록을 분할하고 서브블록 크기의 변환 단위로 잔차 신호들을 변환할 수도 있다. 잔차 블록을 더 작은 서브블록으로 분할하는 방법은 다양하게 존재할 수 있다. 예컨대, 기정의된 동일한 크기의 서브블록으로 분할할 수도 있으며, 또는 잔차 블록을 루트 노드로 하는 QT(quadtree) 방식의 분할을 사용할 수도 있다. The converter 240 converts the residual signal in the residual block with pixel values in the spatial domain into a transform coefficient in the frequency domain. The converter 240 can convert the residual signals in the residual block using the size of the current block as a conversion unit, or divide the residual block into a plurality of smaller subblocks and convert the residual signals into a conversion unit of the subblock size. You can also convert it. There may be various ways to divide the residual block into smaller subblocks. For example, it may be divided into predefined subblocks of the same size, or QT (quadtree) type division may be used with the residual block as the root node.

양자화부(245)는 변환부(240)로부터 출력되는 변환 계수들을 양자화하고, 양자화된 변환 계수들을 부호화부(250)로 출력한다.The quantization unit 245 quantizes the transform coefficients output from the transform unit 240 and outputs the quantized transform coefficients to the encoder 250.

부호화부(250)는 양자화된 변환 계수들을 CABAC 등의 부호화 방식을 사용하여 부호화하여 비트스트림을 생성한다. 또한, 부호화부(250)는 블록 분할과 관련된 CTU size, QT 분할 플래그, BT 분할 플래그, 분할 타입 등의 정보를 부호화하여, 영상 복호화 장치가 영상 부호화 장치와 동일하게 블록을 분할할 수 있도록 한다.The encoder 250 generates a bitstream by encoding the quantized transform coefficients using an encoding method such as CABAC. In addition, the encoder 250 encodes information such as CTU size, QT split flag, BT split flag, and split type related to block splitting, so that the video decoding device can split the block in the same way as the video coding device.

부호화부(250)는 현재블록이 인트라 예측에 의해 부호화되었는지 아니면 인터 예측에 의해 부호화되었는지 여부를 지시하는 예측 타입에 대한 정보를 부호화하고, 예측 타입에 따라 인트라 예측정보(즉, 인트라 예측 모드에 대한 정보) 또는 인터 예측정보(참조픽처 및 움직임벡터에 대한 정보)를 부호화한다. The encoder 250 encodes information about the prediction type indicating whether the current block is encoded by intra prediction or inter prediction, and generates intra prediction information (i.e., information about the intra prediction mode) according to the prediction type. information) or inter prediction information (information about reference pictures and motion vectors) is encoded.

역양자화부(260)는 양자화부(245)로부터 출력되는 양자화된 변환 계수들을 역양자화하여 변환 계수들을 생성한다. 역변환부(265)는 역양자화부(260)로부터 출력되는 변환 계수들을 주파수 도메인으로부터 공간 도메인으로 변환하여 잔차블록을 복원한다.The inverse quantization unit 260 inversely quantizes the quantized transform coefficients output from the quantization unit 245 to generate transform coefficients. The inverse transform unit 265 restores the residual block by converting the transform coefficients output from the inverse quantization unit 260 from the frequency domain to the spatial domain.

가산부(270)는 복원된 잔차블록과 예측부(220)에 의해 생성된 예측블록을 가산하여 현재블록을 복원한다. 복원된 현재블록 내의 픽셀들은 다음 순서의 블록을 인트라 예측할 때 참조 픽셀로서 사용된다.The addition unit 270 restores the current block by adding the restored residual block and the prediction block generated by the prediction unit 220. Pixels in the restored current block are used as reference pixels when intra-predicting the next block.

필터부(280)는 블록 기반의 예측 및 변환/양자화로 인해 발생하는 블록킹 아티팩트(blocking artifacts), 링잉 아티팩트(ringing artifacts), 블러링 아티팩트(blurring artifacts) 등을 줄이기 위해 복원된 픽셀들에 대한 필터링을 수행한다. 필터부(280)는 디블록킹 필터(282)와 SAO 필터(284)를 포함할 수 있다.The filter unit 280 filters the restored pixels to reduce blocking artifacts, ringing artifacts, blurring artifacts, etc. that occur due to block-based prediction and transformation/quantization. Perform. The filter unit 280 may include a deblocking filter 282 and a SAO filter 284.

디블록킹 필터(280)는 블록 단위의 부호화/복호화로 인해 발생하는 블록킹 현상(blocking artifact)을 제거하기 위해 복원된 블록 간의 경계를 필터링하고, SAO 필터(284)는 디블록킹 필터링된 영상에 대해 추가적인 필터링을 수행한다. SAO 필터(284)는 손실 부호화(lossy coding)로 인해 발생하는 복원된 픽셀과 원본 픽셀 간의 차이를 보상하기 위해 사용되는 필터이다. The deblocking filter 280 filters the boundaries between restored blocks to remove blocking artifacts caused by block-level encoding/decoding, and the SAO filter 284 adds additional information to the deblocking-filtered image. Perform filtering. The SAO filter 284 is a filter used to compensate for the difference between the restored pixel and the original pixel caused by lossy coding.

디블록킹 필터(282) 및 SAO 필터(284)를 통해 필터링된 복원 블록은 메모리(290)에 저장한다. 한 픽처 내의 모든 블록들이 복원되면, 복원된 픽처는 이후에 부호화하고자 하는 픽처 내의 블록을 인터 예측하기 위한 참조 픽처로 사용된다.The restored block filtered through the deblocking filter 282 and the SAO filter 284 is stored in the memory 290. When all blocks in one picture are reconstructed, the reconstructed picture is later used as a reference picture for inter prediction of blocks in the picture to be encoded.

도 3은 본 개시의 기술들을 구현할 수 있는 영상 복호화 장치의 예시적인 블록도를 나타낸 도면이다.FIG. 3 is a diagram illustrating an example block diagram of an image decoding device capable of implementing the techniques of the present disclosure.

영상 복호화 장치는 복호화부(310), 역양자화부(320), 역변환부(330), 예측부(340), 가산기(350) 등을 포함하는 영상 복원기(300)와, 필터부(360) 및 메모리(370)를 포함한다. 도 2의 영상 부호화 장치와 마찬가지로, 영상 복호화 장치는 각 구성요소가 하드웨어 칩으로 구현될 수 있으며, 또는 소프트웨어로 구현되고 마이크로프로세서가 각 구성요소에 대응하는 소프트웨어의 기능을 실행하도록 구현될 수도 있다.The image decoding device includes an image restorer 300 including a decoder 310, an inverse quantization unit 320, an inverse transform unit 330, a prediction unit 340, and an adder 350, and a filter unit 360. and memory 370. Like the video encoding device of FIG. 2, each component of the video decoding device may be implemented as a hardware chip, or may be implemented as software and a microprocessor may be implemented to execute the software function corresponding to each component.

복호화부(310)는 영상 부호화 장치로부터 수신한 비트스트림을 복호화하여 블록 분할과 관련된 정보를 추출하여 복호화하고자 하는 현재블록을 결정하고, 현재블록을 복원하기 위해 필요한 예측 정보와 잔차신호에 대한 정보 등을 추출한다.The decoder 310 decodes the bitstream received from the video encoding device, extracts information related to block division, determines the current block to be decoded, and provides prediction information and residual signal information necessary to restore the current block. Extract .

복호화부(310)는 SPS (Sequence Parameter Set) 또는 PPS (Picture Parameter Set)로부터 CTU size에 대한 정보를 추출하여 CTU의 크기를 결정하고, 픽처를 결정된 크기의 CTU로 분할한다. 그리고 CTU를 트리 구조의 최상위 레이어, 즉, 루트 노드로 결정하고, CTU에 대한 분할 정보를 추출함으로써 CTU를 트리 구조를 이용하여 분할한다. The decoder 310 extracts information about the CTU size from the SPS (Sequence Parameter Set) or PPS (Picture Parameter Set), determines the size of the CTU, and divides the picture into CTUs of the determined size. Then, the CTU is determined as the highest layer of the tree structure, that is, the root node, and the CTU is divided using the tree structure by extracting the division information about the CTU.

또한, 복호화부(310)는 트리 구조의 분할을 통해 복호화하고자 하는 현재블록을 결정하게 되면, 현재블록이 인트라 예측되었는지 아니면 인터 예측되었는지를 지시하는 예측 타입에 대한 정보를 추출한다. Additionally, when the decoding unit 310 determines the current block to be decoded by dividing the tree structure, it extracts information about the prediction type indicating whether the current block is intra-predicted or inter-predicted.

예측 타입 정보가 인트라 예측을 지시하는 경우, 복호화부(310)는 현재블록의 인트라 예측정보(인트라 예측 모드)에 대한 신택스 요소를 추출한다. When prediction type information indicates intra prediction, the decoder 310 extracts syntax elements for intra prediction information (intra prediction mode) of the current block.

예측 타입 정보가 인터 예측을 지시하는 경우, 복호화부(310)는 인터 예측정보에 대한 신택스 요소, 즉, 움직임벡터 및 그 움직임벡터가 참조하는 참조픽처를 나타내는 정보를 추출한다. When the prediction type information indicates inter prediction, the decoder 310 extracts syntax elements for the inter prediction information, that is, information indicating a motion vector and a reference picture to which the motion vector refers.

한편, 복호화부(310)는 잔차신호에 대한 정보로서 현재블록의 양자화된 변환계수들에 대한 정보를 추출한다.Meanwhile, the decoder 310 extracts information about the quantized transform coefficients of the current block as information about the residual signal.

역양자화부(320)는 양자화된 변환계수들을 역양자화하고 역변환부(330)는 역양자화된 변환계수들을 주파수 도메인으로부터 공간 도메인으로 역변환하여 잔차신호들을 복원함으로써 현재블록에 대한 잔차블록을 생성한다.The inverse quantization unit 320 inversely quantizes the quantized transform coefficients, and the inverse transformation unit 330 inversely transforms the inverse quantized transform coefficients from the frequency domain to the spatial domain to restore the residual signals, thereby generating a residual block for the current block.

예측부(340)는 인트라 예측부(342) 및 인터 예측부(344)를 포함한다. 인트라 예측부(342)는 현재블록의 예측 타입인 인트라 예측일 때 활성화되고, 인터 예측부(344)는 현재블록의 예측 타입인 인트라 예측일 때 활성화된다.The prediction unit 340 includes an intra prediction unit 342 and an inter prediction unit 344. The intra prediction unit 342 is activated when the prediction type of the current block is intra prediction, and the inter prediction unit 344 is activated when the prediction type of the current block is intra prediction.

인트라 예측부(342)는 복호화부(310)로부터 추출된 인트라 예측 모드에 대한 신택스 요소로부터 복수의 인트라 예측 모드 중 현재블록의 인트라 예측 모드를 결정하고, 인트라 예측 모드에 따라 현재블록 주변의 참조 픽셀들을 이용하여 현재블록을 예측한다. The intra prediction unit 342 determines the intra prediction mode of the current block among a plurality of intra prediction modes from the syntax elements for the intra prediction mode extracted from the decoder 310, and determines the intra prediction mode of the current block according to the intra prediction mode. Use these to predict the current block.

인터 예측부(344)는 복호화부(310)로부터 추출된 인트라 예측 모드에 대한 신택스 요소를 이용하여 현재블록의 움직임 벡터와 그 움직임벡터가 참조하는 참조픽처를 결정하고, 움직임벡터와 참조픽처를 현재블록을 예측한다.The inter prediction unit 344 determines the motion vector of the current block and the reference picture referenced by the motion vector using the syntax elements for the intra prediction mode extracted from the decoder 310, and uses the motion vector and the reference picture to determine the current block. Predict blocks.

가산기(350)는 역변환부로부터 출력되는 잔차블록과 인터 예측부 또는 인트라 예측부로부터 출력되는 예측블록을 가산하여 현재블록을 복원한다. 복원된 현재블록 내의 픽셀들은 이후에 복호화할 블록을 인트라 예측할 때의 참조픽셀로서 활용된다.The adder 350 restores the current block by adding the residual block output from the inverse transform unit and the prediction block output from the inter prediction unit or intra prediction unit. Pixels in the restored current block are used as reference pixels when intra-predicting a block to be decoded later.

영상 복원기(300)에 의해 CU들에 해당하는 현재블록들을 순차적으로 복원함으로써, CU들로 구성된 CTU, CTU들로 구성된 픽처가 복원된다.By sequentially restoring current blocks corresponding to CUs by the image restorer 300, a CTU made up of CUs and a picture made up of CTUs are restored.

필터부(360)는 디블록킹 필터(362) 및 SAO 필터(364)를 포함한다. 디블록킹 필터(362)는 블록 단위의 복호화로 인해 발생하는 블록킹 현상(blocking artifact)를 제거하기 위해 복원된 블록 간의 경계를 디블록킹 필터링한다. SAO 필터(364)는, 손실 부호화(lossy coding)으로 인해 발생하는 복원된 픽셀과 원본 픽셀 간의 차이를 보상하기 위해, 디블록킹 필터링 이후의 복원된 블록에 대해 추가적인 필터링을 수행한다. 디블록킹 필터(362) 및 SAO 필터(364)를 통해 필터링된 복원 블록은 메모리(370)에 저장한다. 한 픽처 내의 모든 블록들이 복원되면, 복원된 픽처는 이후에 부호화하고자 하는 픽처 내의 블록을 인터 예측하기 위한 참조 픽처로 사용된다.The filter unit 360 includes a deblocking filter 362 and a SAO filter 364. The deblocking filter 362 deblocks and filters the boundaries between restored blocks to remove blocking artifacts caused by block-level decoding. The SAO filter 364 performs additional filtering on the reconstructed block after deblocking filtering to compensate for the difference between the reconstructed pixel and the original pixel caused by lossy coding. The restored block filtered through the deblocking filter 362 and the SAO filter 364 is stored in the memory 370. When all blocks in one picture are reconstructed, the reconstructed picture is later used as a reference picture for inter prediction of blocks in the picture to be encoded.

본 개시에서는 디블록킹 필터(182, 362)와 SAO 필터(284, 464)의 기능을 갖는 CNN 기반의 필터에 대해 상세히 설명한다. 본 개시에 따른 CNN 기반의 필터는 영상 부호화 장치 및 영상 복호화 장치 모두에서 사용될 수 있다.In this disclosure, a CNN-based filter with the functions of deblocking filters 182 and 362 and SAO filters 284 and 464 is described in detail. The CNN-based filter according to the present disclosure can be used in both an image encoding device and an image decoding device.

또한, 본 개시에서는 픽처를 구성하는 정보로 YUV를 예로 들어 설명하나, RGB, YCbCr 등에 적용될 수 있다. 즉, 화질을 개선시킬 YUV는 화질을 개선시킬 RGB 또는 화질을 개선시킬 YCbCr 등이 될 수 있다. Additionally, in this disclosure, YUV is used as an example as information constituting a picture, but it can be applied to RGB, YCbCr, etc. That is, YUV, which will improve image quality, can be RGB, which will improve image quality, or YCbCr, which will improve image quality.

도 4는 본 발명의 일 실시예에 따른 CNN 기반의 필터를 나타낸 도면이다.Figure 4 is a diagram showing a CNN-based filter according to an embodiment of the present invention.

입력 레이어에 양자화 파라미터(QP: quantization parameter) 맵(403)과 블록 분할(block partition) 맵(405) 중 적어도 하나와 화질을 개선시킬 YUV(401)가 입력되면 출력 레이어로 YUV 차(difference)(421)가 출력된다. 여기서 화질을 개선시킬 YUV(401)는 부호화기로부터 수신된 비트 스트림으로부터 복원된 YUV(401)일 수 있으며, 원본 YUV가 인위적 또는 비인위적으로 손상된 YUV를 의미한다. 추가적으로 입력 레이어에 힌트(미도시)도 함께 입력될 수 있다. When at least one of the quantization parameter (QP) map 403 and the block partition map 405 and YUV 401 to improve image quality are input to the input layer, the YUV difference (401) is input to the output layer. 421) is output. Here, the YUV 401 to improve picture quality may be a YUV 401 restored from a bit stream received from an encoder, and refers to a YUV in which the original YUV has been artificially or non-artificially damaged. Additionally, a hint (not shown) may also be input to the input layer.

먼저, 학습 과정에서는 상기 출력 레이어로 출력된 YUV 차(421)가 원본 YUV와 화질을 개선시킬 YUV의 차가 되도록 CNN 기반의 필터의 계수 즉, 컨볼루션 커널의 계수가 학습된다. 여기서 컨볼루션 커널은 2D(dimension), 3D 형태 모두 가능하다. CNN 기반의 필터(411)는 입력된 YUV의 화질을 개선시키기 위한 것으로 CNN 기반의 필터(411)의 최종 출력은 YUV 즉, 화질이 개선된 YUV(431)로 칭한다.First, in the learning process, the coefficients of the CNN-based filter, that is, the coefficients of the convolution kernel, are learned so that the YUV difference 421 output to the output layer is the difference between the original YUV and the YUV that will improve image quality. Here, the convolution kernel can be in both 2D (dimension) and 3D forms. The CNN-based filter 411 is intended to improve the image quality of the input YUV, and the final output of the CNN-based filter 411 is called YUV, that is, YUV 431 with improved image quality.

여기서, 화질을 개선시킬 YUV는 각 채널별로 또는 한번에 필터링 될 수 있다. Here, YUV to improve picture quality can be filtered for each channel or all at once.

QP 맵의 크기는 필터링하고자 하는 입력되는 YUV와 동일한 해상도(resolution)로 설정되고, QP 맵의 값은 YUV plane 내 부호화 단위 예를 들어, 블록 혹은 서브 블록에서 사용한 QP 값으로 채워질 수 있다. 이때 YUV가 각 채널별로 필터링된다면, 필터링하고자 하는 채널의 QP 값으로 하나의 맵이 구성될 수 있다. YUV가 한번에 필터링된다면 세 채널의 QP 값이 별도의 3개 맵으로 구성되거나 평균 QP값을 갖는 하나의 맵으로 구성될 수 있다. The size of the QP map is set to the same resolution as the input YUV to be filtered, and the value of the QP map can be filled with the QP value used in a coding unit within the YUV plane, for example, a block or subblock. At this time, if YUV is filtered for each channel, one map can be constructed with the QP value of the channel to be filtered. If YUV is filtered at once, the QP values of the three channels can be composed of three separate maps or one map with the average QP value.

CNN 기법의 정확도를 높이기 위한 방법으로, 입력 레이어로 QP 맵, 블록 분할 맵, 화질 개선 대상 영상 외에 학습 과정에 유용한 블록 모드 맵 정보가 힌트 정보로 추가될 수 있다. 여기서, 블록 모드 맵이란 부호화 단위 예를 들어, 블록 혹은 서브 블록에서 사용한 모드 값으로 채워질 수 있다. 예컨대, 블록이 인트라 모드로 부호화 되었는지 또는 인터 모드로 부호화 되었는지를 구별할 수 있는 정보일 수 있으며 상기 정보는 숫자로 표현될 수 있다. 이때, 학습 과정의 결과물인 컨볼루션 커널 계수는 입력 레이어의 데이터뿐만 아니라 힌트까지 포함하여 설정될 수 있다. 기본적으로 CNN 기법의 학습 과정과 추론 과정의 입력 레이어 및 출력 레이어는 동일하게 구성되어야 한다.As a way to increase the accuracy of the CNN technique, in addition to the QP map, block division map, and image to be improved quality as input layers, block mode map information useful in the learning process can be added as hint information. Here, the block mode map may be filled with mode values used in coding units, for example, blocks or subblocks. For example, it may be information that can distinguish whether a block is encoded in intra mode or inter mode, and the information may be expressed in numbers. At this time, the convolution kernel coefficient, which is the result of the learning process, can be set to include not only the data of the input layer but also a hint. Basically, the input layer and output layer of the CNN technique's learning process and inference process must be configured identically.

이후, 추론 과정에서는 학습 과정에서 구한 상기 CNN 기반의 필터의 계수를 적용해 화질을 개선시킬 YUV, 양자화 파라미터 맵, 및 블록 분할 맵으로부터 화질이 개선된 YUV를 생성한다.Afterwards, in the inference process, the coefficients of the CNN-based filter obtained in the learning process are applied to generate a YUV with improved image quality from the YUV, the quantization parameter map, and the block partition map.

도 5a 내지 도 5c는 본 발명의 일 실시예로 연접 레이어의 위치에 따른 CNN의 구조를 나타낸 도면이다.Figures 5a to 5c are diagrams showing the structure of a CNN according to the location of the concatenated layer according to an embodiment of the present invention.

구체적으로, 입력 레이어(510)로 입력되는 화질을 개선시킬 YUV(501), 양자화 파라미터 맵(503), 및 블록 분할 맵(505)은 CNN 과정 중 연접(concatenate) 레이어(520)를 통해 연접될 수 있다. 다만 연접 레이어(520)의 위치는 변경될 수 있다.Specifically, YUV 501, quantization parameter map 503, and block division map 505, which will improve the image quality input to the input layer 510, will be concatenated through the concatenate layer 520 during the CNN process. You can. However, the position of the contiguous layer 520 may be changed.

도 5a는 입력 레이어(510) 뒤에 바로 연접 레이어(520)가 위치해, 화질을 개선시킬 YUV(501), 양자화 파라미터 맵(503), 및 블록 분할 맵(505)이 입력 레이어(510)로 입력된 후 바로 연접되는 CNN의 구조를 나타낸 것이다. Figure 5a shows that the concatenated layer 520 is located immediately behind the input layer 510, and YUV 501, quantization parameter map 503, and block division map 505, which will improve image quality, are input as the input layer 510. It shows the structure of a CNN that is immediately concatenated.

도 5b는 연접 레이어(520)가 컨볼루션 레이어(530) 사이에 위치한 CNN의 구조를 나타낸 것이며, 도 5c는 연접 레이어(520)가 출력 레이어(540)의 바로 앞에 위치한 CNN의 구조를 나타낸 것이다.Figure 5b shows the structure of a CNN in which the concatenation layer 520 is located between the convolution layers 530, and Figure 5c shows the structure of a CNN in which the concatenation layer 520 is located right in front of the output layer 540.

도 6a 내지 도 6c는 본 발명의 일 실시예에 따른 입력 레이어로 입력될 데이터를 나타낸 도면이다.Figures 6A to 6C are diagrams showing data to be input to the input layer according to an embodiment of the present invention.

구체적으로 도 6a는 화질을 개선시킬 Y plane (예를들어, Y 코딩 트리 블록(coding tree block, CTB))으로써 화질을 개선시킬 휘도(luma)의 픽셀값이 생략된 것을 나타낸 것이며, 도 6b는 화질을 개선시킬 Y plane에 적용된 QP 맵, 도 6c는 화질을 개선시킬 Y plane의 블록 분할 맵을 나타낸 것이다. Specifically, Figure 6a shows that the pixel value of luminance (luma) that will improve picture quality as a Y plane (e.g., Y coding tree block, CTB) that will improve picture quality is omitted, and Figure 6b shows QP map applied to the Y plane to improve image quality, Figure 6c shows a block division map of the Y plane to improve image quality.

이하에서는 본 발명의 일 실시예에 따른 블록 분할 맵의 다양한 구조에 대해 설명한다. 블록 분할 맵은 블록의 분할 여부를 표시한 것으로, CNN의 학습 과정과 추론 과정에서 블록의 분할된 경계와 블록의 내부 영역에 대한 처리를 다르게 할 수 있도록 돕는다.Hereinafter, various structures of a block partition map according to an embodiment of the present invention will be described. The block division map indicates whether the block is divided, and helps to process the divided boundaries of the block and the internal area of the block differently during the CNN learning process and inference process.

도 7a 및 도 7b는 본 발명의 일 실시예에 따른 블록 분할 맵의 일예를 나타낸 도면이다.7A and 7B are diagrams showing an example of a block division map according to an embodiment of the present invention.

블록 분할 맵은 필터링 하고자 하는 YUV plane과 동일한 해상도로 설정되며, 블록 분할 여부를 나타내는 값으로 구성될 수 있다. 예를 들어 YUV plane이 다수 개의 부호화 블록 즉, 코딩 블록(coding block, CB)을 포함하는 코딩 트리 블록으로 구성되는 경우, 블록 분할 맵은 코딩 트리 블록 내의 부호화 블록의 분할 경계선을 나타낼 수 있다. 도 7a는 QTBT(quadtree plus binary tree) 방식으로 분할된 코딩 트리 블록을 나타낸 것이며, 도 7b는 상기 코딩 트리 블록에 따른 블록 분할 맵을 나타낸 것이다. 도 7b를 참조하면, 블록 분할 맵에서 부호화 블록의 경계는 '1'로 표시되고 상기 부호화 블록의 내부는 '0'으로 표시되어 있다. The block division map is set to the same resolution as the YUV plane to be filtered, and can be composed of a value indicating whether the block is divided. For example, when the YUV plane is composed of a coding tree block including a plurality of coding blocks, that is, coding blocks (CB), the block partition map may indicate the division boundary line of the coding block within the coding tree block. FIG. 7A shows a coding tree block divided by a quadtree plus binary tree (QTBT) method, and FIG. 7B shows a block division map according to the coding tree block. Referring to FIG. 7B, in the block division map, the boundary of the coding block is indicated as '1' and the inside of the coding block is indicated as '0'.

도 8a 내지 도 8b는 본 발명의 일 실시예에 따른 블록 분할 맵의 다른 일예를 나타낸 도면이다.Figures 8a and 8b are diagrams showing another example of a block division map according to an embodiment of the present invention.

도 8a 내지 도 8b에서는 YUV plane이 코딩 트리 블록으로 구성되는 경우 상기 코딩 트리 블록의 경계에 대한 블로킹 열화 처리가 불가할 수 있어 YUV plane에 여분의 영역(α)을 더할 수 있다. 도 8a 및 도 8b에서는 여분의 영역(α)으로 2픽셀이 설정된 일 예를 나타내고 있으나 다른 값이 설정될 수도 있다. 또한 상기 여분의 영역으로 코딩 트리 블록의 경계를 표시할 수 있으며, 코딩 트리 블록 밖의 영역에 대해서도 블록 분할 여부가 표시될 수 있다. 만약 여분의 영역을 포함하여 필터링되는 경우 상기 필터링 이후 인접하는 다른 코딩 트리 블록과 겹치는 영역이 생기게 되고 상기 겹치는 영역에 대해서는 평균값으로 처리될 수 있다. 구체적으로 도 8c를 참조하면 여분의 영역을 포함하는 코딩 트리 블록(801, 803)이 인접하는 경우 서로 겹치는 영역(805)이 생긴다. 상기 겹치는 영역(805)에 대해서는 인접하는 코딩 트리 블록(801, 803)의 값들의 평균값으로 설정될 수 있다. In FIGS. 8A and 8B, when the YUV plane is composed of a coding tree block, blocking deterioration processing on the boundary of the coding tree block may not be possible, so an extra area (α) may be added to the YUV plane. Figures 8a and 8b show an example in which 2 pixels are set as the extra area (α), but other values may be set. Additionally, the boundary of the coding tree block can be indicated by the extra area, and whether or not the block is divided can be indicated for the area outside the coding tree block. If filtering includes an extra area, an area overlapping with another adjacent coding tree block is created after the filtering, and the overlapping area can be processed as an average value. Specifically, referring to FIG. 8C, when coding tree blocks 801 and 803 including redundant areas are adjacent, an overlapping area 805 is created. The overlapping area 805 may be set to the average value of the values of adjacent coding tree blocks 801 and 803.

도 9a 내지 도 9c는 본 발명의 일 실시예에 따라 디블록킹의 강도를 조절하기 위한 블록 분할 맵을 나타낸 도면이다. FIGS. 9A to 9C are diagrams showing block division maps for adjusting the strength of deblocking according to an embodiment of the present invention.

이전의 실시예에서는 부호화 블록의 경계를 1픽셀로 구별하였으며, 상기 1픽셀의 값이 0이면 부호화 블록의 내부를 나타내고 1이면 부호화 블록의 경계를 나타냈다. In the previous embodiment, the boundary of the coding block was divided into 1 pixel. If the value of 1 pixel was 0, it indicated the inside of the coding block, and if the value of 1 pixel was 1, it indicated the border of the coding block.

도 9a에서는 디블록킹(de-blocking)의 강도를 조절하기 위해 부호화 블록의 경계를 픽셀의 개수 (혹은, 픽셀의 너비, 루마 샘플 라인, 루마 샘플 길이 등)로 표시한 것이다. 상기 픽셀의 개수는 부호화 블록의 크기, 양자화 파라미터의 값, 부호화 모드 중 적어도 하나에 의해 정해질 수 있다. 예를 들면 도 9a와 같이 부호화 블록이 클 경우 상기 픽셀의 개수를 2개로 설정할 수 있고 부호화 블록이 작을 경우 상기 픽셀의 개수를 1개로 설정할 수 있다. 또한 양자화 파라미터의 값이 크면 상기 픽셀의 개수를 많이, 양자화 파라미터의 값이 작으면 상기 픽셀의 개수를 적게 설정할 수 있다. 다른 예로 부호화 모드가 인트라(intra)이면 상기 픽셀의 개수를 많이, 인터(inter)이면 상기 픽셀의 개수를 적게 설정할 수 있다. 이들은 모두 역으로도 설정 가능하다. In FIG. 9A, the boundary of the coding block is indicated by the number of pixels (or pixel width, luma sample line, luma sample length, etc.) to adjust the strength of de-blocking. The number of pixels may be determined by at least one of the size of the coding block, the value of the quantization parameter, and the encoding mode. For example, as shown in FIG. 9A, if the coding block is large, the number of pixels can be set to 2, and if the coding block is small, the number of pixels can be set to 1. Additionally, if the quantization parameter value is large, the number of pixels can be set to be large, and if the quantization parameter value is small, the number of pixels can be set to be small. As another example, if the encoding mode is intra, the number of pixels can be set to be large, and if the encoding mode is inter, the number of pixels can be set to be small. These can all be set in reverse.

상기 픽셀의 개수는 필터링을 통해 업데이트 하고자 하는 블록 경계선에 위치하는 픽셀의 개수를 의미할 수 있다. 예컨대, 블록 경계선에 위치하는 한 블록 내 3 픽셀값을 업데이트 하고자 할 때는, 블록 분할 맵에서 3픽셀로 블록의 경계를 표시할 수 있다. 다른 예로, 상기 픽셀의 개수는 필터링에 참조하려는 블록 경계선에 위치하는 픽셀의 개수를 의미할 수 있다. 예컨대, 블록 경계선에 위치하는 한 블록 내 4 픽셀값을 참고하여 필터링을 진행하고자 할 때는 블록 분할 맵에서 4픽셀로 블록의 경계를 표시할 수 있다.The number of pixels may refer to the number of pixels located at the block border to be updated through filtering. For example, when you want to update the value of 3 pixels in a block located at the block boundary, the boundary of the block can be displayed with 3 pixels in the block division map. As another example, the number of pixels may refer to the number of pixels located at the block border to be referenced for filtering. For example, when filtering is performed by referring to the 4 pixel values within a block located at the block boundary, the boundary of the block can be displayed with 4 pixels in the block division map.

도 9b에서는 디블록킹의 강도를 조절하기 위해 부호화 블록의 경계값을 달리 표시한 것이다. 상기 부호화 블록의 경계값은 부호화 블록의 크기, 양자화 파라미터의 값, 부호화 모드, 업데이트 할 픽셀의 개수, 및 필터링에 참조하려는 픽셀의 개수 중 적어도 하나에 의해 정해질 수 있다. 도 7a와 마찬가지로 부호화 블록이 크거나, 양자화 파라미터의 값이 크거나 부호화 모드가 인트라(intra)이면 상기 부호화 블록의 경계값을 크게 설정할 수 있고 반대로 부호화 블록이 작거나, 양자화 파라미터의 값이 작거나 부호화 모드가 인터(inter)이면 상기 부호화 블록의 경계값을 작게 설정할 수 있다. 이들 또한 모두 역으로도 설정 가능하다.In Figure 9b, the boundary values of coding blocks are displayed differently to adjust the strength of deblocking. The boundary value of the coding block may be determined by at least one of the size of the coding block, the value of the quantization parameter, the coding mode, the number of pixels to be updated, and the number of pixels to be referenced for filtering. As in FIG. 7A, if the coding block is large, the value of the quantization parameter is large, or the encoding mode is intra, the boundary value of the coding block can be set large. Conversely, if the coding block is small or the value of the quantization parameter is small, the boundary value of the coding block can be set large. If the coding mode is inter, the boundary value of the coding block can be set small. All of these can also be set in reverse.

도 9c는 디블록킹의 강도를 조절하기 위해 부호화 블록 경계의 픽셀의 개수 및 부호화 블록의 경계값으로 표시한 것이다. 이에 대한 설명은 도 9a 및 도 9b에서 설명한 바와 같아 여기서는 생략한다. Figure 9c shows the number of pixels at the boundary of a coding block and the boundary value of the coding block to adjust the strength of deblocking. The explanation for this is the same as that described in FIGS. 9A and 9B, so it is omitted here.

이상에서 설명한 바와 같이 설정된 블록 분할 맵은 학습과정에 이용되어 CNN 필터가 강한 강도의 디블록킹 필터로써 동작하도록 돕는다. As described above, the set block division map is used in the learning process to help the CNN filter operate as a strong deblocking filter.

도 10은 본 개시에 따라 CNN 기반의 필터를 이용해 영상을 복호화하는 순서도를 나타낸 도면이다.Figure 10 is a diagram showing a flowchart of decoding an image using a CNN-based filter according to the present disclosure.

상기 CNN 기반의 필터에 양자화 파라미터 맵과 블록 분할 맵 중 적어도 하나와 화질을 개선시킬 YUV가 입력된다(1001). 상기 양자화 파라미터 맵은 상기 화질을 개선시킬 YUV와 동일한 해상도로 설정될 수 있다. 상기 블록 분할 맵은 블록의 분할된 경계와 상기 블록의 내부 영역을 값을 달리 표시될 수 있다. 상기 블록 분할 맵에서 블록의 분할된 경계를 나타내는 픽셀의 개수와 값은 부호화 블록의 크기, 양자화 파라미터의 값, 부호화 모드, 업데이트 할 픽셀의 개수, 및 필터링에 참조하려는 픽셀의 개수 중 적어도 하나에 의해 결정될 수 있다.At least one of a quantization parameter map and a block division map and YUV to improve image quality are input to the CNN-based filter (1001). The quantization parameter map may be set to the same resolution as YUV, which will improve the image quality. The block division map may display different values for the divided boundaries of the block and the internal area of the block. The number and value of pixels representing the divided boundaries of blocks in the block partition map are determined by at least one of the size of the coding block, the value of the quantization parameter, the encoding mode, the number of pixels to be updated, and the number of pixels to be referenced for filtering. can be decided.

화질을 개선시킬 YUV, 양자화 파라미터 맵, 및 블록 분할 맵을 입력으로, 원본의 YUV를 최종 출력으로 하여 학습된 상기 CNN 기반의 필터의 계수를 이용해 화질이 개선된 YUV가 출력된다(1003). 상기 CNN 기반의 필터로 블록 모드 맵과 같은 힌트가 추가로 입력되는 경우, 상기 CNN 기반의 필터의 계수도 힌트가 추가로 입력되어 학습된다. YUV with improved picture quality is output using the coefficients of the CNN-based filter learned using YUV, a quantization parameter map, and a block division map to improve picture quality as input and the original YUV as the final output (1003). When a hint such as a block mode map is additionally input to the CNN-based filter, the coefficients of the CNN-based filter are also learned by additionally inputting a hint.

도 10에 도시된 과정들은 컴퓨터로 읽을 수 있는 기록매체에 컴퓨터가 읽을 수 있는 코드로서 구현하는 것이 가능하다. 컴퓨터가 읽을 수 있는 기록매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록장치를 포함한다. 즉, 컴퓨터가 읽을 수 있는 기록매체는 마그네틱 저장매체(예를 들면, 롬, 플로피 디스크, 하드디스크 등), 광학적 판독 매체(예를 들면, 시디롬, 디브이디 등) 및 캐리어 웨이브(예를 들면, 인터넷을 통한 전송)와 같은 저장매체를 포함한다. 또한 컴퓨터가 읽을 수 있는 기록매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어 분산방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다.The processes shown in FIG. 10 can be implemented as computer-readable codes on a computer-readable recording medium. Computer-readable recording media include all types of recording devices that store data that can be read by a computer system. In other words, computer-readable recording media include magnetic storage media (e.g. ROM, floppy disk, hard disk, etc.), optical read media (e.g. CD-ROM, DVD, etc.), and carrier waves (e.g. Internet It includes storage media such as transmission through . Additionally, computer-readable recording media can be distributed across networked computer systems so that computer-readable code can be stored and executed in a distributed manner.

도 11은 본 개시에 따른 영상을 복호화하는 장치의 구성도를 개략적으로 나타낸 도면이다.Figure 11 is a diagram schematically showing the configuration of an apparatus for decoding an image according to the present disclosure.

상기 영상을 복호화하는 장치는 입력부(1101), 필터부(1103), 및 출력부(1105)를 포함할 수 있다. 그 외의 다른 구성을 포함할 수 있으나 본 개시와 직접적 관련이 없는 구성에 대한 설명은 생략하기로 한다. The device for decoding the image may include an input unit 1101, a filter unit 1103, and an output unit 1105. Other configurations may be included, but descriptions of configurations not directly related to the present disclosure will be omitted.

입력부(1101)는 양자화 파라미터 맵과 블록 분할 맵 중 적어도 하나와 화질을 개선시킬 YUV가 입력된다. 상기 양자화 파라미터 맵은 상기 화질을 개선시킬 YUV와 동일한 해상도로 설정될 수 있으며, 상기 블록 분할 맵은 블록의 분할된 경계와 상기 블록의 내부 영역의 값을 달리 표시될 수 있다. 상기 블록 분할 맵에서 블록의 분할된 경계를 나타내는 픽셀의 개수와 값은 부호화 블록의 크기, 양자화 파라미터의 값, 부호화 모드 중 적어도 하나에 의해 결정될 수 있다.The input unit 1101 receives at least one of a quantization parameter map and a block division map and YUV to improve image quality. The quantization parameter map may be set to the same resolution as YUV to improve the image quality, and the block division map may display different values of the divided boundaries of the block and the internal area of the block. The number and value of pixels representing the divided boundaries of blocks in the block division map may be determined by at least one of the size of the coding block, the value of the quantization parameter, and the encoding mode.

필터부(1103)는 입력부(1101)로 입력된 양자화 파라미터 맵과 블록 분할 맵 중 적어도 하나와 화질을 개선시킬 YUV에 학습된 상기 CNN 기반의 필터의 계수를 적용한다.The filter unit 1103 applies at least one of the quantization parameter map and the block division map input to the input unit 1101 and the learned coefficients of the CNN-based filter to YUV that will improve image quality.

출력부(1105)는 입력된 양자화 파라미터 맵과 블록 분할 맵 중 적어도 하나와 화질을 개선시킬 YUV에 학습된 상기 CNN 기반의 필터의 계수를 적용해 생성한 화질이 개선된 YUV를 출력한다. The output unit 1105 outputs a YUV with improved image quality generated by applying at least one of the input quantization parameter map and the block division map and the coefficients of the CNN-based filter learned to the YUV that will improve image quality.

본 개시에서는 입력부(1101), 필터부(1103), 및 출력부(1105)로 나누어 설명하나, 하나의 구성으로 통합되어 구현될 수 있으며 또는 하나의 구성이 여러 개의 구성으로 나누어 구현될 수도 있다.In the present disclosure, the input unit 1101, the filter unit 1103, and the output unit 1105 are described separately, but they may be integrated and implemented as one configuration, or one configuration may be implemented by dividing them into several configurations.

이상의 설명은 본 실시예의 기술 사상을 예시적으로 설명한 것에 불과한 것으로서, 본 실시예가 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 실시예의 본질적인 특성에서 벗어나지 않는 범위에서 다양한 수정 및 변형이 가능할 것이다. 따라서, 본 실시예들은 본 실시예의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시예에 의하여 본 실시예의 기술 사상의 범위가 한정되는 것은 아니다. 본 실시예의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 실시예의 권리범위에 포함되는 것으로 해석되어야 할 것이다.The above description is merely an illustrative explanation of the technical idea of the present embodiment, and those skilled in the art will be able to make various modifications and variations without departing from the essential characteristics of the present embodiment. Accordingly, the present embodiments are not intended to limit the technical idea of the present embodiment, but rather to explain it, and the scope of the technical idea of the present embodiment is not limited by these examples. The scope of protection of this embodiment should be interpreted in accordance with the claims below, and all technical ideas within the equivalent scope should be interpreted as being included in the scope of rights of this embodiment.

Claims

In an image decoding method using a CNN (Convolutional Neural Network)-based filter,
Inputting input data including a quantization parameter map, a block division map, and a first picture into the CNN-based filter; and
Generating a second picture by adding output data of the CNN-based filter to the first picture,
The quantization parameter map and the block division map are set to the same resolution as the first picture,
In the block division map, sample positions located in internal areas of blocks constituting the first picture have a first value, and sample positions adjacent to the boundaries of the blocks have values different from the first value,
The values of sample positions adjacent to the boundaries of the blocks are the deblocking strength determined by at least one of the size of the block, the value of the quantization parameter, the encoding mode, the number of samples to be updated by filtering, and the number of samples to be referenced for filtering. A video decoding method characterized by representing.

According to paragraph 1,
A video decoding method, characterized in that the coefficients of the CNN-based filter are learned using training data including a quantization parameter map, a block division map, a third picture, and an original picture corresponding to the third picture.

delete

According to paragraph 1,
The input data further includes a block mode map indicating an encoding mode with the CNN-based filter.

delete

According to paragraph 1,
In the block division map, the range of sample positions adjacent to the boundaries of the blocks depends on at least one of the size of the coding block, the value of the quantization parameter, the encoding mode, the number of pixels to be updated by filtering, and the number of samples to be referenced for filtering. A video decoding method characterized by being determined by.

delete

According to paragraph 1,
An image decoding method, characterized in that the coefficients of the CNN-based filter are received from a device that encodes an image.

In an image decoding device using a CNN (Convolutional Neural Network)-based filter,
An input unit that receives input data including a quantization parameter map, a block division map, and a first picture;
a filter unit providing the quantization parameter map, the block division map, and the first picture input to the input unit to the CNN-based filter; and
An output unit that adds output data of the CNN-based filter to the first picture and outputs a second picture,
The quantization parameter map and the block division map are set to the same resolution as the first picture,
In the block division map, sample positions located in internal areas of blocks constituting the first picture have a first value, and sample positions adjacent to the boundaries of the blocks have values different from the first value,
The values of sample positions adjacent to the boundaries of the blocks are the deblocking strength determined by at least one of the size of the block, the value of the quantization parameter, the encoding mode, the number of samples to be updated by filtering, and the number of samples to be referenced for filtering. A video decoding device characterized in that it represents.

According to clause 9,
A video decoding device, characterized in that the coefficients of the CNN-based filter are learned using training data including a quantization parameter map, a block division map, a third picture, and an original picture corresponding to the third picture.

delete

According to clause 9,
The input data further includes a block mode map indicating an encoding mode.

delete

According to clause 9,
In the block division map, the range of sample positions adjacent to the boundaries of the blocks depends on at least one of the size of the coding block, the value of the quantization parameter, the encoding mode, the number of pixels to be updated by filtering, and the number of samples to be referenced for filtering. A video decoding device characterized in that it is determined by.

delete

According to clause 9,
A video decoding device, characterized in that the coefficients of the CNN-based filter are received from a video encoding device.