KR20240025420A

KR20240025420A - Image decoding apparatus and image encoding apparatus using ai, and methods thereby

Info

Publication number: KR20240025420A
Application number: KR1020220112984A
Authority: KR
Inventors: 김경아; 딘쿠오칸; 박민수; 박민우; 최광표; 표인지
Original assignee: 삼성전자주식회사
Priority date: 2022-08-18
Filing date: 2022-09-06
Publication date: 2024-02-27

Abstract

일 실시예에 따른 영상의 복호화 방법은, 현재 블록의 움직임 벡터를 획득하는 단계; 참조 영상 내에서 움직임 벡터가 가리키는 참조 블록을 이용하여 예비 예측 블록을 획득하는 단계; 현재 블록을 포함하는 현재 영상과 참조 영상 사이의 POC 차이를 포함하는 POC 맵, 예비 예측 블록 또는 양자화 에러 맵 중 적어도 하나를 신경망에 적용하여 현재 블록의 최종 예측 블록을 획득하는 단계; 및 비트스트림으로부터 획득되는 잔차 블록과 최종 예측 블록을 이용하여 현재 블록을 복원하는 단계를 포함하되, 양자화 에러 맵의 샘플 값들은, 참조 블록을 위한 양자화 파라미터로부터 산출될 수 있다.A method of decoding an image according to an embodiment includes obtaining a motion vector of a current block; Obtaining a preliminary prediction block using a reference block indicated by a motion vector in a reference image; Obtaining a final prediction block of the current block by applying at least one of a POC map, a preliminary prediction block, or a quantization error map including a POC difference between a current image including the current block and a reference image to a neural network; and restoring the current block using the residual block and the final prediction block obtained from the bitstream, where sample values of the quantization error map may be calculated from the quantization parameter for the reference block.

Description

Video decoding device, video encoding device and method using AI using AI {IMAGE DECODING APPARATUS AND IMAGE ENCODING APPARATUS USING AI, AND METHODS THEREBY}

본 개시는 영상을 처리하는 방법 및 장치에 관한 것으로서, 보다 구체적으로, AI(Artificial Intelligence)를 이용하여 영상을 부호화/복호화하는 방법 및 장치에 관한 것이다.The present disclosure relates to a method and device for processing an image, and more specifically, to a method and device for encoding/decoding an image using AI (Artificial Intelligence).

H.264 AVC(Advanced Video Coding) 및 HEVC(High Efficiency Video Coding)와 같은 코덱에서는, 영상을 블록으로 분할하고, 분할된 블록을 인터 예측(inter prediction) 또는 인트라 예측(intra prediction)을 통해 예측 부호화 및 예측 복호화한다.In codecs such as H.264 AVC (Advanced Video Coding) and HEVC (High Efficiency Video Coding), the video is divided into blocks, and the divided blocks are predicted and encoded through inter prediction or intra prediction. and predictive decoding.

인트라 예측은 영상 내의 공간적인 중복성을 제거하여 영상을 압축하는 방법이고, 인터 예측은 영상들 사이의 시간적인 중복성을 제거하여 영상을 압축하는 방법이다. Intra prediction is a method of compressing images by removing spatial redundancy within the images, and inter prediction is a method of compressing images by removing temporal redundancy between images.

부호화 과정에서, 인트라 예측 또는 인터 예측을 통해 현재 블록의 예측 블록을 생성하고, 현재 블록으로부터 예측 블록을 감산하여 잔차 블록을 생성하고, 잔차 블록의 잔차 샘플들을 변환 및 양자화한다.In the encoding process, a prediction block of the current block is generated through intra-prediction or inter-prediction, a residual block is generated by subtracting the prediction block from the current block, and residual samples of the residual block are transformed and quantized.

복호화 과정에서는, 잔차 블록의 양자화된 변환 계수들을 역양자화 및 역변환하여 잔차 블록의 잔차 샘플들을 생성하고, 인트라 예측 또는 인터 예측을 통해 생성된 예측 블록을 잔차 블록에 합하여 현재 블록을 복원한다. 복원된 현재 블록은 하나 이상의 필터링 알고리즘에 따라 처리된 후 출력될 수 있다.In the decoding process, the quantized transform coefficients of the residual block are inversely quantized and inversely transformed to generate residual samples of the residual block, and the prediction block generated through intra prediction or inter prediction is added to the residual block to restore the current block. The restored current block may be processed according to one or more filtering algorithms and then output.

H.264 AVC 및 HEVC와 같은 코덱에서는 현재 블록의 인터 예측을 위해 룰 기반의 예측 모드를 이용한다. 룰 기반의 예측 모드는 예를 들어, 스킵 모드(skip mode), 머지 모드(merge mode), AMVP (advanced motion vector prediction) 모드 등을 포함할 수 있다.Codecs such as H.264 AVC and HEVC use a rule-based prediction mode for inter prediction of the current block. Rule-based prediction modes may include, for example, skip mode, merge mode, and advanced motion vector prediction (AMVP) mode.

전통적으로, 룰 기반의 예측 모드는 좋은 성능을 발휘하였으나, 영상의 해상도가 커지고, 영상의 컨텐츠가 다양화됨에 따라 영상의 특성을 유연하게 고려할 수 있는 AI 기반의 예측 모드가 요구될 수 있다.Traditionally, rule-based prediction modes have shown good performance, but as the resolution of images increases and the content of images becomes more diverse, an AI-based prediction mode that can flexibly consider the characteristics of the images may be required.

일 실시예에 따른 영상을 복호화하는 방법은, 현재 블록의 움직임 벡터를 획득하는 단계를 포함할 수 있다.A method of decoding an image according to an embodiment may include obtaining a motion vector of a current block.

일 실시예에서, 영상을 복호화하는 방법은, 참조 영상 내에서 움직임 벡터가 가리키는 참조 블록을 이용하여 예비 예측 블록을 획득하는 단계를 포함할 수 있다.In one embodiment, a method of decoding an image may include obtaining a preliminary prediction block using a reference block indicated by a motion vector in a reference image.

일 실시예에서, 영상을 복호화하는 방법은, 현재 블록을 포함하는 현재 영상과 참조 영상 사이의 POC(Picture Order Difference) 차이를 포함하는 POC 맵, 예비 예측 블록 또는 양자화 에러 맵 중 적어도 하나를 신경망에 적용하여 현재 블록의 최종 예측 블록을 획득하는 단계를 포함할 수 있다.In one embodiment, a method of decoding an image includes sending at least one of a POC (Picture Order Difference) map including a POC (Picture Order Difference) difference between a current image including a current block, a preliminary prediction block, or a quantization error map to a neural network. It may include a step of applying the method to obtain the final prediction block of the current block.

일 실시예에서, 영상을 복호화하는 방법은, 비트스트림으로부터 획득되는 잔차 블록과 최종 예측 블록을 이용하여 현재 블록을 복원하는 단계를 포함할 수 있다.In one embodiment, a method of decoding an image may include restoring a current block using a residual block and a final prediction block obtained from a bitstream.

일 실시예에서, 양자화 에러 맵의 샘플 값들은, 참조 블록을 위한 양자화 파라미터로부터 산출될 수 있다.In one embodiment, sample values of the quantization error map may be calculated from the quantization parameters for the reference block.

일 실시예에 따른 영상을 부호화하는 방법은, 현재 블록에 대응하는 참조 영상 내의 참조 블록을 가리키는 움직임 벡터를 획득하는 단계를 포함할 수 있다.A method of encoding an image according to an embodiment may include obtaining a motion vector pointing to a reference block in a reference image corresponding to a current block.

일 실시예에서, 영상을 부호화하는 방법은, 현재 블록을 포함하는 현재 영상과 참조 영상 사이의 POC 차이를 포함하는 POC 맵, 상기 참조 블록에 기반하여 획득되는 예비 예측 블록 또는 양자화 에러 맵 중 적어도 하나를 신경망에 적용하여 현재 블록의 최종 예측 블록을 획득하는 단계를 포함할 수 있다.In one embodiment, a method of encoding an image includes at least one of a POC map including a POC difference between a current image including a current block and a reference image, a preliminary prediction block obtained based on the reference block, or a quantization error map. It may include the step of applying to a neural network to obtain the final prediction block of the current block.

일 실시예에서, 영상을 부호화하는 방법은, 현재 블록과 최종 예측 블록을 이용하여 잔차 블록을 획득하는 단계를 포함할 수 있다.In one embodiment, a method of encoding an image may include obtaining a residual block using a current block and a final prediction block.

일 실시예에서, 영상을 부호화하는 방법은, 잔차 블록에 대한 정보에 대한 정보를 포함하는 비트스트림을 생성하는 단계를 포함할 수 있다.In one embodiment, a method of encoding an image may include generating a bitstream including information about information about a residual block.

일 실시예에 따른 영상 복호화 장치는, 적어도 하나의 인스트럭션을 저장하는 적어도 하나의 메모리; 및 적어도 하나의 인스트럭션에 따라 동작하는 적어도 하나의 프로세서를 포함할 수 있다.An image decoding device according to an embodiment includes at least one memory storing at least one instruction; and at least one processor operating according to at least one instruction.

일 실시예에서, 영상 복호화 장치의 적어도 하나의 프로세서는, 현재 블록의 움직임 벡터를 획득할 수 있다.In one embodiment, at least one processor of the video decoding apparatus may obtain a motion vector of the current block.

일 실시예에서, 영상 복호화 장치의 적어도 하나의 프로세서는, 참조 영상 내에서 움직임 벡터가 가리키는 참조 블록을 이용하여 예비 예측 블록을 획득할 수 있다.In one embodiment, at least one processor of the image decoding apparatus may obtain a preliminary prediction block using a reference block indicated by a motion vector in a reference image.

일 실시예에서, 영상 복호화 장치의 적어도 하나의 프로세서는, 현재 블록을 포함하는 현재 영상과 참조 영상 사이의 POC 차이를 포함하는 POC 맵, 예비 예측 블록 또는 양자화 에러 맵 중 적어도 하나를 신경망에 적용하여 현재 블록의 최종 예측 블록을 획득할 수 있다.In one embodiment, at least one processor of the image decoding device applies at least one of a POC map, a preliminary prediction block, or a quantization error map including a POC difference between a current image including a current block and a reference image to a neural network to The final prediction block of the current block can be obtained.

일 실시예에서, 영상 복호화 장치의 적어도 하나의 프로세서는, 비트스트림으로부터 획득되는 잔차 블록과 최종 예측 블록을 이용하여 현재 블록을 복원할 수 있다.In one embodiment, at least one processor of the image decoding apparatus may restore the current block using a residual block and a final prediction block obtained from a bitstream.

일 실시예에 따른 영상 부호화 장치는, 적어도 하나의 인스트럭션을 저장하는 적어도 하나의 메모리; 및 적어도 하나의 인스트럭션에 따라 동작하는 적어도 하나의 프로세서를 포함할 수 있다.An image encoding device according to an embodiment includes at least one memory storing at least one instruction; and at least one processor operating according to at least one instruction.

일 실시예에서, 영상 부호화 장치의 적어도 하나의 프로세서는, 현재 블록에 대응하는 참조 영상 내의 참조 블록을 가리키는 움직임 벡터를 획득할 수 있다.In one embodiment, at least one processor of the video encoding device may obtain a motion vector indicating a reference block in a reference image corresponding to the current block.

일 실시예에서, 영상 부호화 장치의 적어도 하나의 프로세서는, 현재 블록을 포함하는 현재 영상과 참조 영상 사이의 POC 차이를 포함하는 POC 맵, 참조 블록에 기반하여 획득되는 예비 예측 블록 또는 양자화 에러 맵 중 적어도 하나를 신경망에 적용하여 현재 블록의 최종 예측 블록을 획득할 수 있다.In one embodiment, at least one processor of the image encoding device may select one of a POC map including a POC difference between a current image including a current block and a reference image, a preliminary prediction block obtained based on a reference block, or a quantization error map. At least one can be applied to the neural network to obtain the final prediction block of the current block.

일 실시예에서, 영상 부호화 장치의 적어도 하나의 프로세서는, 현재 블록과 최종 예측 블록을 이용하여 잔차 블록을 획득할 수 있다.In one embodiment, at least one processor of the image encoding device may obtain a residual block using the current block and the final prediction block.

일 실시예에서, 영상 부호화 장치의 적어도 하나의 프로세서는, 잔차 블록에 대한 정보에 대한 정보를 포함하는 비트스트림을 생성할 수 있다.In one embodiment, at least one processor of an image encoding device may generate a bitstream including information about information about a residual block.

도 1은 일 실시예에 따른 영상 복호화 장치의 구성을 도시하는 도면이다.
도 2는 일 실시예에 따른 AI 기반 예측 복호화부의 구성을 도시하는 도면이다.
도 3은 일 실시예에 따른 현재 블록의 공간적 주변 블록 및 시간적 주변 블록을 도시하는 도면이다.
도 4는 일 실시예에 따른 리스트 0을 위한 움직임 벡터와 리스트 1을 위한 움직임 벡터가 가리키는 참조 블록들을 도시하는 도면이다.
도 5는 일 실시예에 따른 참조 블록을 위한 양자화 파라미터에 기초하여 양자화 에러 맵을 획득하는 방법을 설명하기 위한 도면이다.
도 6은 일 실시예에 따른 참조 블록을 위한 양자화 파라미터에 기초하여 양자화 에러 맵을 획득하는 방법을 설명하기 위한 도면이다.
도 7은 일 실시예에 따른 참조 블록을 위한 양자화 파라미터에 기초하여 양자화 에러 맵을 획득하는 방법을 설명하기 위한 도면이다.
도 8은 일 실시예에 따른 참조 블록을 위한 양자화 파라미터에 기초하여 양자화 에러 맵을 획득하는 방법을 설명하기 위한 도면이다.
도 9는 일 실시예에 따른 신경망의 구조를 도시하는 도면이다.
도 10은 일 실시예에 따른 컨볼루션 레이어에서의 컨볼루션 연산을 설명하기 위한 도면이다.
도 11은 일 실시예에 따른 확장된 예비 예측 블록을 획득하는 방법을 설명하기 위한 도면이다.
도 12는 일 실시예에 따른 참조 블록의 경계가 참조 영상의 경계에 해당하는 경우, 확장된 예비 예측 블록을 획득하는 방법을 설명하기 위한 도면이다.
도 13은 일 실시예에 따른 확장된 양자화 에러 맵을 획득하는 방법을 설명하기 위한 도면이다.
도 14는 일 실시예에 따른 현재 영상 내 복원 완료된 샘플들과 복원되지 않은 샘플들을 도시하는 도면이다.
도 15는 일 실시예에 따른 확장된 현재 복원 블록의 획득 방법을 설명하기 위한 도면이다.
도 16은 일 실시예에 따른 복수의 가중치 세트를 도시하는 도면이다.
도 17은 일 실시예에 따른 영상 복호화 장치에 의한 영상 복호화 방법을 도시하는 도면이다.
도 18은 일 실시예에 따른 신택스를 도시하는 도면이다.
도 19는 일 실시예에 따른 영상 부호화 장치의 구성을 도시하는 도면이다.
도 20은 일 실시예에 따른 AI 기반 예측 부호화부의 구성을 도시하는 도면이다.
도 21은 일 실시예에 따른 분수 정밀도의 움직임 벡터를 정수 정밀도의 움직임 벡터로 변경하는 방법을 설명하기 위한 도면이다.
도 22는 일 실시예에 따른 영상 부호화 장치에 의한 영상 부호화 방법을 도시하는 도면이다.
도 23은 일 실시예에 따른 신경망의 훈련 방법을 설명하기 위한 도면이다.
도 24는 일 실시예에 따른 영상의 전반적인 부호화 및 복호화 과정을 설명하기 위한 도면이다.FIG. 1 is a diagram illustrating the configuration of a video decoding device according to an embodiment.
Figure 2 is a diagram illustrating the configuration of an AI-based prediction decoder according to an embodiment.
FIG. 3 is a diagram illustrating spatial neighboring blocks and temporal neighboring blocks of the current block according to an embodiment.
FIG. 4 is a diagram illustrating reference blocks indicated by a motion vector for list 0 and a motion vector for list 1 according to an embodiment.
FIG. 5 is a diagram illustrating a method of obtaining a quantization error map based on quantization parameters for a reference block according to an embodiment.
FIG. 6 is a diagram illustrating a method of obtaining a quantization error map based on quantization parameters for a reference block according to an embodiment.
FIG. 7 is a diagram illustrating a method of obtaining a quantization error map based on quantization parameters for a reference block according to an embodiment.
FIG. 8 is a diagram illustrating a method of obtaining a quantization error map based on quantization parameters for a reference block according to an embodiment.
Figure 9 is a diagram showing the structure of a neural network according to one embodiment.
Figure 10 is a diagram for explaining a convolution operation in a convolution layer according to an embodiment.
FIG. 11 is a diagram illustrating a method of obtaining an extended preliminary prediction block according to an embodiment.
FIG. 12 is a diagram illustrating a method of obtaining an extended preliminary prediction block when the boundary of a reference block corresponds to the boundary of a reference image, according to an embodiment.
Figure 13 is a diagram for explaining a method of obtaining an extended quantization error map according to an embodiment.
FIG. 14 is a diagram illustrating reconstructed and unrestored samples in a current video according to an embodiment.
FIG. 15 is a diagram illustrating a method of obtaining an extended current restored block according to an embodiment.
Figure 16 is a diagram illustrating a plurality of weight sets according to one embodiment.
FIG. 17 is a diagram illustrating an image decoding method using an image decoding device according to an embodiment.
Figure 18 is a diagram illustrating syntax according to one embodiment.
FIG. 19 is a diagram illustrating the configuration of a video encoding device according to an embodiment.
FIG. 20 is a diagram illustrating the configuration of an AI-based prediction encoder according to an embodiment.
FIG. 21 is a diagram illustrating a method of changing a fractional precision motion vector to an integer precision motion vector according to an embodiment.
FIG. 22 is a diagram illustrating an image encoding method using an image encoding device according to an embodiment.
Figure 23 is a diagram for explaining a neural network training method according to an embodiment.
Figure 24 is a diagram for explaining the overall encoding and decoding process of an image according to an embodiment.

본 개시는 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고, 이를 상세한 설명을 통해 상세히 설명하고자 한다. 그러나, 이는 본 개시의 실시 형태에 대해 한정하려는 것이 아니며, 본 개시는 여러 실시예들의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함할 수 있다.Since the present disclosure can make various changes and have various embodiments, specific embodiments will be illustrated in the drawings and described in detail through detailed description. However, this is not intended to limit the embodiments of the present disclosure, and the present disclosure may include all changes, equivalents, and substitutes included in the spirit and technical scope of the various embodiments.

실시예를 설명함에 있어서, 관련 공지 기술에 대한 구체적인 설명이 본 개시의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략할 수 있다. 또한, 명세서의 설명 과정에서 이용되는 숫자(예를 들어, 제 1, 제 2 등)는 하나의 구성요소를 다른 구성요소와 구분하기 위한 식별 기호에 해당하다.In describing the embodiments, if it is determined that detailed descriptions of related known technologies may unnecessarily obscure the gist of the present disclosure, the detailed descriptions may be omitted. In addition, numbers (eg, first, second, etc.) used in the description of the specification correspond to identification symbols for distinguishing one component from another component.

본 개시에서, "a, b 또는 c 중 적어도 하나" 표현은 "a", "b", "c", "a 및 b", "a 및 c", "b 및 c", "a, b 및 c 모두", 혹은 그 변형들을 지칭할 수 있다.In the present disclosure, the expression “at least one of a, b, or c” refers to “a”, “b”, “c”, “a and b”, “a and c”, “b and c”, “a, b and c", or variations thereof.

본 개시에서, 일 구성요소가 다른 구성요소와 "연결된다"거나 "접속된다" 등으로 언급된 때에는, 상기 일 구성요소가 상기 다른 구성요소와 직접 연결되거나 또는 직접 접속될 수도 있지만, 특별히 반대되는 기재가 존재하지 않는 이상, 중간에 또 다른 구성요소를 매개하여 연결되거나 또는 접속될 수도 있다.In the present disclosure, when a component is referred to as “connected” or “connected” to another component, the component may be directly connected or directly connected to the other component, but in particular, the contrary is not specified. Unless a base material exists, it may be connected or connected through another component in the middle.

본 개시에서 '~부(유닛)', '모듈' 등으로 표현되는 구성요소는 2개 이상의 구성요소가 하나의 구성요소로 합쳐지거나 또는 하나의 구성요소가 보다 세분화된 기능별로 2개 이상으로 분화될 수도 있다. 이하에서 설명할 구성요소 각각은 자신이 담당하는 주기능 이외에도 다른 구성요소가 담당하는 기능 중 일부 또는 전부의 기능을 추가적으로 수행할 수 있으며, 구성요소 각각이 담당하는 주기능 중 일부 기능이 다른 구성요소에 의해 전담되어 수행될 수도 있다.In the present disclosure, components expressed as '~unit (unit)', 'module', etc. are two or more components combined into one component, or one component divided into two or more for more detailed functions. It could be. Each of the components described below may additionally perform some or all of the functions performed by other components in addition to the main functions that each component is responsible for, and some of the main functions performed by each component may be performed by other components. It may also be carried out in full charge by .

본 개시에서, '영상(image) 또는 픽처(picture)'는 정지영상(또는 프레임), 복수의 연속된 정지영상으로 구성된 동영상, 또는 비디오를 의미할 수 있다.In the present disclosure, ‘image or picture’ may mean a still image (or frame), a moving image composed of a plurality of consecutive still images, or a video.

본 개시에서 '신경망(neural network)'은 뇌 신경을 모사한 인공 신경망 모델의 대표적인 예시로서, 특정 알고리즘을 사용한 인공 신경망 모델로 한정되지 않는다. 신경망은 심층 신경망(deep neural network)으로 참조될 수도 있다. In this disclosure, 'neural network' is a representative example of an artificial neural network model that simulates brain nerves, and is not limited to an artificial neural network model using a specific algorithm. A neural network may also be referred to as a deep neural network.

본 개시에서 '가중치(weight)'는 신경망을 이루는 각 레이어의 연산 과정에서 이용되는 값으로서 예를 들어, 입력 값을 소정 연산식에 적용할 때 이용될 수 있다. 가중치는 훈련의 결과로 설정되는 값으로서, 필요에 따라 별도의 훈련 데이터(training data)를 통해 갱신될 수 있다.In the present disclosure, 'weight' is a value used in the calculation process of each layer forming a neural network, and can be used, for example, when applying an input value to a predetermined calculation equation. The weight is a value set as a result of training, and can be updated through separate training data as needed.

본 개시에서 '현재 블록'은 현재의 처리 대상인 블록을 의미한다. 현재 블록은 현재 영상으로부터 분할된 슬라이스, 타일, 최대 부호화 단위, 부호화 단위, 예측 단위 또는 변환 단위일 수 있다.In this disclosure, 'current block' refers to a block that is currently being processed. The current block may be a slice, tile, maximum coding unit, coding unit, prediction unit, or transformation unit divided from the current image.

본 개시에서, '샘플'은 영상, 블록, 필터 커널 또는 맵 등의 데이터 내 샘플링 위치에 할당된 데이터로서 처리 대상이 되는 데이터를 의미한다. 예를 들어, 샘플은 2차원의 영상 내 픽셀을 포함할 수 있다.In the present disclosure, 'sample' refers to data assigned to a sampling location in data such as an image, block, filter kernel, or map, and is the data to be processed. For example, a sample may include pixels in a two-dimensional image.

일 실시예에 따른 영상 복호화 장치(100) 및 영상 부호화 장치(1900)에 대해 설명하기에 앞서, 도 24를 참조하여 전반적인 영상의 부호화 및 복호화 과정에 대해 설명한다.Before describing the video decoding device 100 and the video encoding device 1900 according to an embodiment, the overall video encoding and decoding process will be described with reference to FIG. 24.

도 24는 일 실시예에 따른 영상의 전반적인 부호화 및 복호화 과정을 도시하는 도면이다.FIG. 24 is a diagram illustrating the overall encoding and decoding process of an image according to an embodiment.

부호화 장치(2410)는 영상에 대한 부호화를 통해 생성된 비트스트림을 복호화 장치(2450)로 전송하고, 복호화 장치(2450)는 비트스트림을 수신 및 복호화하여 영상을 복원할 수 있다.The encoding device 2410 transmits a bitstream generated by encoding an image to the decoding device 2450, and the decoding device 2450 can restore the image by receiving and decoding the bitstream.

일 실시예에서, 부호화 장치(2410)의 예측 부호화부(2415)는 현재 블록에 대한 인터 예측 또는 인트라 예측을 통해 예측 블록을 출력하고, 변환 및 양자화부(2420)는 예측 블록과 현재 블록 사이의 잔차 블록의 잔차 샘플들을 변환 및 양자화하여 양자화된 변환 계수를 출력할 수 있다. In one embodiment, the prediction encoder 2415 of the encoding device 2410 outputs a prediction block through inter-prediction or intra-prediction for the current block, and the transform and quantization unit 2420 outputs a prediction block between the prediction block and the current block. The residual samples of the residual block can be transformed and quantized to output quantized transform coefficients.

엔트로피 부호화부(2425)는 양자화된 변환 계수를 부호화하여 비트스트림으로 출력할 수 있다.The entropy encoder 2425 can encode the quantized transform coefficient and output it as a bitstream.

양자화된 변환 계수는 역양자화 및 역변환부(2430)을 거쳐 공간 영역의 잔차 샘플들을 포함하는 잔차 블록으로 복원될 수 있다. 예측 블록과 잔차 블록이 합해진 복원 블록은 디블로킹 필터링부(2435) 및 루프 필터링부(2440)를 거쳐 필터링된 블록으로 출력될 수 있다. 필터링된 블록을 포함하는 복원 영상은 예측 부호화부(2415)에서 다음 영상의 참조 영상으로 사용될 수 있다. The quantized transform coefficient can be restored into a residual block including residual samples in the spatial domain through the inverse quantization and inverse transform unit 2430. The restored block, which is a combination of the prediction block and the residual block, may be output as a filtered block through the deblocking filtering unit 2435 and the loop filtering unit 2440. The reconstructed image including the filtered block can be used as a reference image for the next image in the prediction encoder 2415.

복호화 장치(2450)로 수신된 비트스트림은 엔트로피 복호화부(2455) 및 역양자화 및 역변환부(2460)를 거쳐 공간 영역의 잔차 샘플들을 포함하는 잔차 블록으로 복원될 수 있다.The bitstream received by the decoding device 2450 can be restored into a residual block including residual samples in the spatial domain through the entropy decoding unit 2455 and the inverse quantization and inverse transform unit 2460.

예측 복호화부(2475)로부터 출력된 예측 블록과 잔차 블록이 조합되어 복원 블록이 생성되고, 복원 블록은 디블로킹 필터링부(2465) 및 루프 필터링부(2470)를 거쳐 필터링된 블록으로 출력될 수 있다. 필터링된 블록을 포함하는 복원 영상은 예측 복호화부(2475)에서 다음 영상에 대한 참조 영상으로 이용될 수 있다. The prediction block and the residual block output from the prediction decoder 2475 are combined to generate a restored block, and the restored block can be output as a filtered block through the deblocking filtering unit 2465 and the loop filtering unit 2470. . The reconstructed image including the filtered block can be used by the prediction decoder 2475 as a reference image for the next image.

일 실시예에서, 예측 부호화부(2415) 및 예측 복호화부(2475)는 룰 기반의 예측 모드 및/또는 신경망 기반의 예측 모드에 따라 현재 블록을 예측 부호화 및 예측 복호화할 수 있다. In one embodiment, the prediction encoder 2415 and the prediction decoder 2475 may predictively encode and decode the current block according to a rule-based prediction mode and/or a neural network-based prediction mode.

일 실시예에서, 룰 기반의 예측 모드는 머지 모드, 스킵 모드, AMVP(advanced motion vector prediction) 모드, BDOF(bi-directional optical flow) 모드 또는 BCW(bi-prediction with CU-level weights) 등을 포함할 수 있다.In one embodiment, the rule-based prediction mode includes merge mode, skip mode, advanced motion vector prediction (AMVP) mode, bi-directional optical flow (BDOF) mode, or bi-prediction with CU-level weights (BCW), etc. can do.

일 실시예에서, 예측 부호화부(2415) 및 예측 복호화부(2475)는 룰 기반의 예측 모드와 신경망 기반의 예측 모드를 현재 블록에 대해 적용할 수도 있다.In one embodiment, the prediction encoder 2415 and the prediction decoder 2475 may apply a rule-based prediction mode and a neural network-based prediction mode to the current block.

일 실시예에 따른 신경망 기반의 예측 모드에 대해 도 1 내지 도 23을 참조하여 상세히 설명한다.A neural network-based prediction mode according to an embodiment will be described in detail with reference to FIGS. 1 to 23.

도 1을 참조하면, 영상 복호화 장치(100)는 비트스트림 파싱부(110) 및 복호화부(130)를 포함할 수 있다. 복호화부(130)는 AI 기반 예측 복호화부(132) 및 복원부(134)를 포함할 수 있다.Referring to FIG. 1, the video decoding device 100 may include a bitstream parsing unit 110 and a decoding unit 130. The decoding unit 130 may include an AI-based prediction decoding unit 132 and a restoration unit 134.

일 실시예에서, 비트스트림 파싱부(110)는 도 24에 도시된 엔트로피 복호화부(2455)에 대응할 수 있다. 일 실시예에서, 복호화부(130)는 도 24에 도시된 역양자화 및 역변환부(2460), 예측 복호화부(2475), 디블로킹 필터링부(2465) 및 루프 필터링부(2470)에 대응할 수 있다.In one embodiment, the bitstream parsing unit 110 may correspond to the entropy decoding unit 2455 shown in FIG. 24. In one embodiment, the decoding unit 130 may correspond to the inverse quantization and inverse transform unit 2460, prediction decoding unit 2475, deblocking filtering unit 2465, and loop filtering unit 2470 shown in FIG. 24. .

비트스트림 파싱부(110) 및 복호화부(130)는 적어도 하나의 프로세서로 구현될 수 있다. 비트스트림 파싱부(110) 및 복호화부(130)는 적어도 하나의 메모리에 저장된 적어도 하나의 인스트럭션에 따라 동작할 수 있다.The bitstream parsing unit 110 and the decoding unit 130 may be implemented with at least one processor. The bitstream parsing unit 110 and the decoding unit 130 may operate according to at least one instruction stored in at least one memory.

도 1은 비트스트림 파싱부(110) 및 복호화부(130)를 개별적으로 도시하고 있으나, 비트스트림 파싱부(110) 및 복호화부(130)는 하나의 프로세서를 통해 구현될 수 있다. 이 경우, 비트스트림 파싱부(110) 및 복호화부(130)는 전용 프로세서로 구현될 수도 있고, AP(application processor), CPU(central processing unit) 또는 GPU(graphic processing unit)와 같은 범용 프로세서와 소프트웨어의 조합을 통해 구현될 수도 있다. 또한, 전용 프로세서의 경우, 본 개시의 실시예를 구현하기 위한 메모리를 포함하거나, 외부 메모리를 이용하기 위한 메모리 처리부를 포함할 수 있다.Although FIG. 1 shows the bitstream parsing unit 110 and the decoding unit 130 separately, the bitstream parsing unit 110 and the decoding unit 130 may be implemented through one processor. In this case, the bitstream parsing unit 110 and the decoding unit 130 may be implemented with a dedicated processor, or a general-purpose processor such as an application processor (AP), a central processing unit (CPU), or a graphic processing unit (GPU) and software. It can also be implemented through a combination of . Additionally, in the case of a dedicated processor, it may include a memory for implementing an embodiment of the present disclosure, or a memory processing unit for using an external memory.

일 실시예에서, 비트스트림 파싱부(110) 및 복호화부(130)는 복수의 프로세서로 구성될 수 있다. 이 경우, 전용 프로세서들의 조합으로 구현될 수도 있고, AP, CPU, 또는 GPU와 같은 다수의 범용 프로세서들과 소프트웨어의 조합을 통해 구현될 수도 있다.In one embodiment, the bitstream parsing unit 110 and the decoding unit 130 may be comprised of a plurality of processors. In this case, it may be implemented through a combination of dedicated processors, or it may be implemented through a combination of software and multiple general-purpose processors such as AP, CPU, or GPU.

비트스트림 파싱부(110)는 영상에 대한 부호화 결과를 포함하는 비트스트림을 획득할 수 있다. The bitstream parsing unit 110 may obtain a bitstream including an encoding result for an image.

비트스트림 파싱부(110)는 영상 부호화 장치(1900)로부터 네트워크를 통해 비트스트림을 수신할 수 있다. The bitstream parsing unit 110 may receive a bitstream from the video encoding device 1900 through a network.

일 실시예에서, 비트스트림 파싱부(110)는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM 및 DVD와 같은 광기록 매체, 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical medium) 등을 포함하는 데이터 저장 매체로부터 비트스트림을 획득할 수도 있다.In one embodiment, the bitstream parsing unit 110 may be used in magnetic media such as hard disks, floppy disks, and magnetic tapes, optical recording media such as CD-ROMs and DVDs, and magneto-optical media such as floptical disks. A bitstream can also be obtained from a data storage medium including a (magneto-optical medium).

비트스트림 파싱부(110)는 비트스트림을 파싱하여 영상을 복원하는데 필요한 정보들을 획득할 수 있다. The bitstream parsing unit 110 may parse the bitstream to obtain information necessary to restore the image.

일 실시예에서, 비트스트림 파싱부(110)는 비트스트림으로부터 영상의 복원을 위한 신택스 엘리먼트들을 획득할 수 있다. 신택스 엘리먼트들에 해당하는 이진 값들은 영상의 계층 구조에 따라 비트스트림에 포함될 수 있다. 비트스트림 파싱부(110)는 비트스트림에 포함된 이진 값들을 엔트로피 코딩하여 신택스 엘리먼트들을 획득할 수 있다.In one embodiment, the bitstream parsing unit 110 may obtain syntax elements for image reconstruction from the bitstream. Binary values corresponding to syntax elements may be included in the bitstream according to the hierarchical structure of the image. The bitstream parsing unit 110 may obtain syntax elements by entropy coding binary values included in the bitstream.

일 실시예에서, 비트스트림 파싱부(110)는 비트스트림으로부터 획득한 움직임 벡터에 대한 정보 및 잔차 블록에 대한 정보를 복호화부(130)로 전달할 수 있다.In one embodiment, the bitstream parsing unit 110 may transmit information about the motion vector and information about the residual block obtained from the bitstream to the decoding unit 130.

복호화부(130)는 비트스트림 파싱부(110)로부터 전달된 정보에 기초하여 현재 복원 블록을 획득할 수 있다. 현재 복원 블록은 부호화된 현재 블록에 대한 복호화를 통해 획득된 블록을 의미할 수 있다.The decoder 130 may obtain the current restored block based on the information delivered from the bitstream parsing unit 110. The current restored block may refer to a block obtained through decoding the encoded current block.

일 실시예에서, AI 기반 예측 복호화부(132)는 움직임 벡터에 대한 정보를 이용하여 현재 블록의 최종 예측 블록을 획득할 수 있다. In one embodiment, the AI-based prediction decoder 132 may obtain the final prediction block of the current block using information about the motion vector.

AI 기반 예측 복호화부(132)는 AI, 예를 들어, 신경망을 이용하여 현재 블록의 최종 예측 블록을 획득할 수 있다. The AI-based prediction decoder 132 may obtain the final prediction block of the current block using AI, for example, a neural network.

AI 기반 예측 복호화부(132)가 신경망을 이용하여 현재 블록의 최종 예측 블록을 획득하는 모드를 신경망 기반의 예측 모드로 정의할 수 있다.The mode in which the AI-based prediction decoder 132 obtains the final prediction block of the current block using a neural network can be defined as a neural network-based prediction mode.

복원부(134)는 비트스트림 파싱부(110)로부터 제공되는 잔차 블록에 대한 정보를 이용하여 현재 블록의 잔차 블록을 획득할 수 있다. 일 실시예에서, 잔차 블록에 대한 정보는 양자화된 변환 계수에 대한 정보를 포함할 수 있다. The restoration unit 134 may obtain the residual block of the current block using information about the residual block provided from the bitstream parsing unit 110. In one embodiment, information about the residual block may include information about quantized transform coefficients.

일 실시예에서, 복원부(134)는 양자화된 변환 계수를 역양자화 및 역변환하여 공간 영역의 잔차 블록을 획득할 수 있다.In one embodiment, the restoration unit 134 may obtain a residual block in the spatial domain by inversely quantizing and inversely transforming the quantized transform coefficient.

복원부(134)는 최종 예측 블록과 잔차 블록을 이용하여 현재 블록에 대응하는 현재 복원 블록을 획득할 수 있다. 일 실시예에서, 복원부(134)는 최종 예측 블록의 샘플 값들과 잔차 블록의 샘플 값들을 합하여 현재 복원 블록을 획득할 수 있다.The reconstruction unit 134 may obtain a current reconstruction block corresponding to the current block using the final prediction block and the residual block. In one embodiment, the reconstruction unit 134 may obtain the current reconstruction block by adding the sample values of the final prediction block and the sample values of the residual block.

이하에서, 도 2를 참조하여, AI 기반 예측 복호화부(132)에 대해 좀 더 상세히 설명한다.Below, with reference to FIG. 2, the AI-based prediction decoder 132 will be described in more detail.

도 2는 일 실시예에 따른 AI 기반 예측 복호화부(132)의 구성을 도시하는 도면이다.FIG. 2 is a diagram illustrating the configuration of the AI-based prediction decoder 132 according to an embodiment.

도 2를 참조하면, AI 기반 예측 복호화부(132)는 움직임 정보 획득부(210), 예측 블록 획득부(220), 신경망 설정부(230) 및 신경망(240)을 포함할 수 있다.Referring to FIG. 2, the AI-based prediction decoder 132 may include a motion information acquisition unit 210, a prediction block acquisition unit 220, a neural network setup unit 230, and a neural network 240.

신경망(240)은 메모리에 저장될 수 있다. 일 실시예에서, 신경망(240)은 AI 프로세서로 구현될 수도 있다.Neural network 240 may be stored in memory. In one embodiment, neural network 240 may be implemented with an AI processor.

움직임 정보 획득부(210)는 움직임 벡터에 대한 정보를 이용하여 현재 블록의 움직임 벡터를 획득할 수 있다. 후술하는 바와 같이, 영상 부호화 장치(1900)의 움직임 정보 획득부(2010)에 의해 현재 블록의 움직임 벡터의 정밀도가 분수 정밀도에서 정수 정밀도로 변경된 경우, 움직임 정보 획득부(210)는 현재 블록의 정수 정밀도의 움직임 벡터를 획득할 수 있다.The motion information acquisition unit 210 may obtain the motion vector of the current block using information about the motion vector. As described later, when the precision of the motion vector of the current block is changed from fractional precision to integer precision by the motion information acquisition unit 2010 of the image encoding device 1900, the motion information acquisition unit 210 determines the integer precision of the current block. A motion vector with high precision can be obtained.

움직임 벡터에 대한 정보는 움직임 벡터 후보 리스트에 포함된 움직임 벡터 후보들 중 하나 이상의 움직임 벡터 후보를 가리키는 정보, 예를 들어, 플래그 또는 인덱스를 포함할 수 있다. 일 실시예에서, 움직임 벡터에 대한 정보는 현재 블록의 예측 움직임 벡터와 현재 블록의 움직임 벡터 사이의 차이에 대응하는 잔차 움직임 벡터에 대한 정보를 더 포함할 수 있다.Information about the motion vector may include information indicating one or more motion vector candidates among the motion vector candidates included in the motion vector candidate list, for example, a flag or an index. In one embodiment, the information about the motion vector may further include information about a residual motion vector corresponding to the difference between the predicted motion vector of the current block and the motion vector of the current block.

예측 블록 획득부(220)는 움직임 정보 획득부(210)에 의해 획득된 현재 블록의 움직임 벡터와 참조 영상을 이용하여 예비 예측 블록을 획득할 수 있다.The prediction block acquisition unit 220 may obtain a preliminary prediction block using the motion vector and reference image of the current block acquired by the motion information acquisition unit 210.

일 실시예에서, 움직임 정보 획득부(210)와 예측 블록 획득부(220)는 룰 기반의 예측 모드에 기반하여 현재 블록의 예비 예측 블록을 획득할 수 있다. In one embodiment, the motion information acquisition unit 210 and the prediction block acquisition unit 220 may obtain a preliminary prediction block of the current block based on a rule-based prediction mode.

룰 기반의 예측 모드는 머지 모드, 스킵 모드 또는 AMVP 모드 등을 포함할 수 있다.Rule-based prediction modes may include merge mode, skip mode, or AMVP mode.

머지 모드 또는 스킵 모드에 따르면, 움직임 정보 획득부(210)는 현재 블록의 주변 블록들의 움직임 벡터들을 움직임 벡터 후보들로 포함하는 움직임 벡터 후보 리스트를 구축하고, 움직임 벡터 후보 리스트에 포함된 움직임 벡터 후보들 중 비트스트림에 포함된 정보가 가리키는 움직임 벡터 후보를 현재 블록의 움직임 벡터로 결정할 수 있다.According to the merge mode or skip mode, the motion information acquisition unit 210 constructs a motion vector candidate list including motion vectors of neighboring blocks of the current block as motion vector candidates, and selects a motion vector candidate list among the motion vector candidates included in the motion vector candidate list. The motion vector candidate indicated by the information included in the bitstream can be determined as the motion vector of the current block.

또한, AMVP 모드에 따르면, 움직임 정보 획득부(210)는 현재 블록의 주변 블록들의 움직임 벡터들을 움직임 벡터 후보들로 포함하는 움직임 벡터 후보 리스트를 구축하고, 움직임 벡터 후보 리스트에 포함된 움직임 벡터 후보들 중 비트스트림에 포함된 정보가 가리키는 움직임 벡터 후보를 현재 블록의 예측 움직임 벡터로 결정할 수 있다. 그리고, 움직임 정보 획득부(210)는 현재 블록의 예측 움직임 벡터와 잔차 움직임 벡터를 이용하여 현재 블록의 움직임 벡터를 결정할 수 있다.In addition, according to the AMVP mode, the motion information acquisition unit 210 constructs a motion vector candidate list including motion vectors of neighboring blocks of the current block as motion vector candidates, and selects bits from among the motion vector candidates included in the motion vector candidate list. The motion vector candidate indicated by the information included in the stream can be determined as the predicted motion vector of the current block. Additionally, the motion information acquisition unit 210 may determine the motion vector of the current block using the predicted motion vector and the residual motion vector of the current block.

머지 모드, 스킵 모드 또는 AMVP 모드는 룰 기반의 예측 모드에 대한 예시이고, 일 실시예에서, 룰 기반의 예측 모드는 DMVR(decoder-side motion vector refinement) 모드 등을 더 포함할 수 있다.Merge mode, skip mode, or AMVP mode are examples of rule-based prediction modes, and in one embodiment, the rule-based prediction mode may further include decoder-side motion vector refinement (DMVR) mode, etc.

머지 모드, 스킵 모드 및 AMVP 모드 등에서는 공통적으로 움직임 벡터 후보 리스트를 구축하는 과정이 수행되는데, 움직임 벡터 후보 리스트에 포함될 수 있는 주변 블록들에 대해 도 3을 참조하여 설명한다.A process of constructing a motion vector candidate list is commonly performed in merge mode, skip mode, and AMVP mode, and neighboring blocks that can be included in the motion vector candidate list will be described with reference to FIG. 3.

도 3을 참조하면, 현재 블록(300)의 주변 블록은 현재 블록(300)과 공간적으로 인접한 공간적 주변 블록(A0, A1, B0, B1, B2) 및 현재 블록(300)과 시간적으로 인접한 시간적 주변 블록(Col, Br)을 포함할 수 있다. Referring to FIG. 3, the neighboring blocks of the current block 300 are spatial neighboring blocks (A0, A1, B0, B1, B2) spatially adjacent to the current block 300, and temporal neighboring blocks temporally adjacent to the current block 300. May include blocks (Col, Br).

일 실시예에서, 공간적 주변 블록은, 좌측 하부 코너 블록(A0), 좌측 하부 블록(A1), 우측 상부 코너 블록(B0), 우측 상부 블록(B1) 또는 좌측 상부 코너 블록(B2) 중 적어도 하나를 포함할 수 있다. In one embodiment, the spatial surrounding block is at least one of a lower left corner block (A0), a lower left block (A1), an upper right corner block (B0), an upper right block (B1), or an upper left corner block (B2). may include.

일 실시예에서, 시간적 주변 블록은 현재 블록(300)을 포함하는 현재 영상의 POC(Picture Order Count)와 상이한 POC를 갖는 콜로케이티드 영상 내에서 현재 블록(300)과 동일한 지점에 위치하는 블록(Col) 또는 동일 지점에 위치하는 블록(Col)에 대해 공간적으로 인접한 블록(Br) 중 적어도 하나를 포함할 수 있다. In one embodiment, the temporal neighboring block is a block ( It may include at least one of Col) or a block (Br) that is spatially adjacent to a block (Col) located at the same point.

블록(Br)은 현재 블록(300)과 동일한 지점에 위치하는 블록(Col)의 우측 하부에 위치할 수 있다. 현재 블록(300)과 동일한 지점에 위치하는 블록(Col)은, 콜로케이티드 영상에 포함된 픽셀들 중 현재 블록(300) 내 중앙 픽셀에 대응하는 픽셀을 포함하는 블록일 수 있다.The block Br may be located in the lower right corner of the block Col located at the same point as the current block 300. The block Col located at the same point as the current block 300 may be a block that includes a pixel corresponding to the center pixel in the current block 300 among pixels included in the collocated image.

움직임 정보 획득부(210)는 주변 블록들의 이용 가능성을 소정 순서에 따라 판단하고, 판단 결과에 따라 순차적으로 주변 블록들의 움직임 벡터를 움직임 벡터 후보로서 움직임 벡터 후보 리스트에 포함시킬 수 있다. The motion information acquisition unit 210 may determine the availability of neighboring blocks in a predetermined order, and sequentially include the motion vectors of the neighboring blocks as motion vector candidates in the motion vector candidate list according to the determination result.

일 실시예에서, 움직임 정보 획득부(210)는 주변 블록이 인트라 예측된 경우, 해당 블록의 이용 가능성이 없는 것으로 결정할 수 있다.In one embodiment, the motion information acquisition unit 210 may determine that the block is not available if the neighboring block is intra-predicted.

일 실시예에서, 움직임 정보 획득부(210)가 획득하는 움직임 벡터는 리스트 0을 위한 움직임 벡터, 리스트 1을 위한 움직임 벡터, 또는 리스트 0을 위한 움직임 벡터와 리스트 1을 위한 움직임 벡터를 포함할 수 있다.In one embodiment, the motion vector acquired by the motion information acquisition unit 210 may include a motion vector for list 0, a motion vector for list 1, or a motion vector for list 0 and a motion vector for list 1. there is.

리스트 0을 위한 움직임 벡터는 리스트 0(또는 참조 영상 리스트 0)에 포함된 참조 영상 내 참조 블록을 가리키기 위한 움직임 벡터이고, 리스트 1을 위한 움직임 벡터는 리스트 1(또는 참조 영상 리스트 1)에 포함된 참조 영상 내 참조 블록을 가리키기 위한 움직임 벡터일 수 있다.The motion vector for list 0 is a motion vector pointing to a reference block in the reference image included in list 0 (or reference image list 0), and the motion vector for list 1 is included in list 1 (or reference image list 1). It may be a motion vector to indicate a reference block in the reference image.

예측 블록 획득부(220)는 참조 영상 내에서 움직임 벡터가 가리키는 참조 블록을 이용하여 예비 예측 블록을 획득할 수 있다. The prediction block acquisition unit 220 may obtain a preliminary prediction block using the reference block indicated by the motion vector in the reference image.

일 실시예에서, 예비 예측 블록은 참조 영상 내에서 움직임 벡터가 가리키는 참조 블록에 대해 인터폴레이션(interpolation)이 적용됨으로써 획득될 수 있다. 이에 따라 예비 예측 블록은 정수 화소들에 대해 필터링이 적용됨으로써 획득된 부화소들을 포함할 수 있다.In one embodiment, the preliminary prediction block may be obtained by applying interpolation to the reference block indicated by the motion vector within the reference image. Accordingly, the preliminary prediction block may include subpixels obtained by applying filtering to integer pixels.

일 실시예에서, 참조 영상 내에서 움직임 벡터가 가리키는 참조 블록이 예비 예측 블록으로 결정될 수 있다. 예를 들어, 현재 블록의 움직임 벡터의 정밀도가 정수 정밀도인 경우, 움직임 벡터가 가리키는 참조 블록이 예비 예측 블록으로 결정될 수 있다.In one embodiment, the reference block indicated by the motion vector within the reference image may be determined as a preliminary prediction block. For example, if the precision of the motion vector of the current block is integer precision, the reference block indicated by the motion vector may be determined as the preliminary prediction block.

움직임 정보 획득부(210)에 의해 리스트 0을 위한 움직임 벡터가 획득된 경우, 예측 블록 획득부(220)는 리스트 0에 포함된 참조 영상 내에서 리스트 0을 위한 움직임 벡터가 가리키는 참조 블록을 획득하고, 해당 참조 블록을 이용하여 리스트 0를 위한 예비 예측 블록을 획득할 수 있다.When the motion vector for list 0 is acquired by the motion information acquisition unit 210, the prediction block acquisition unit 220 acquires the reference block indicated by the motion vector for list 0 within the reference image included in list 0, , a preliminary prediction block for list 0 can be obtained using the corresponding reference block.

일 실시예에서, 움직임 정보 획득부(210)에 의해 리스트 1을 위한 움직임 벡터가 획득된 경우, 예측 블록 획득부(220)는 리스트 1에 포함된 참조 영상 내에서 리스트 1을 위한 움직임 벡터가 가리키는 참조 블록을 획득하고, 해당 참조 블록을 이용하여 리스트 1을 위한 예비 예측 블록을 획득할 수 있다.In one embodiment, when a motion vector for List 1 is acquired by the motion information acquisition unit 210, the prediction block acquisition unit 220 determines the motion vector indicated by the motion vector for List 1 within the reference image included in List 1. A reference block can be obtained, and a preliminary prediction block for List 1 can be obtained using the reference block.

일 실시예에서, 움직임 정보 획득부(210)에 의해 리스트 0을 위한 움직임 벡터와 리스트 1을 위한 움직임 벡터가 획득된 경우, 예측 블록 획득부(220)는 리스트 0에 포함된 참조 영상 내에서 리스트 0을 위한 움직임 벡터가 가리키는 참조 블록을 이용하여 리스트 0를 위한 예측 블록을 획득하고, 리스트 1에 포함된 참조 영상 내에서 리스트 1을 위한 움직임 벡터가 가리키는 참조 블록을 이용하여 리스트 1을 위한 예측 블록을 획득할 수 있다.In one embodiment, when a motion vector for list 0 and a motion vector for list 1 are acquired by the motion information acquisition unit 210, the prediction block acquisition unit 220 obtains the list within the reference image included in list 0. Obtain a prediction block for list 0 using the reference block pointed to by the motion vector for list 0, and obtain a prediction block for list 1 using the reference block pointed to by the motion vector for list 1 within the reference image included in list 1. can be obtained.

도 4는 일 실시예에 따른 리스트 0을 위한 움직임 벡터와 리스트 1을 위한 움직임 벡터가 가리키는 참조 블록들을 도시하는 도면이다.FIG. 4 is a diagram illustrating reference blocks indicated by a motion vector for list 0 and a motion vector for list 1 according to an embodiment.

도 4에 도시된 바와 같이, 현재 영상(400) 내 현재 블록(300)에 대해 리스트 0을 위한 움직임 벡터(mv1)와 리스트 1을 위한 움직임 벡터(mv2)가 획득된 경우, 예측 블록 획득부(220)는 리스트 0에 포함된 제 1 참조 영상(410) 내에서 리스트 0을 위한 움직임 벡터(mv1)가 가리키는 제 1 참조 블록(415)을 획득하고, 리스트 1에 포함된 제 2 참조 영상(430) 내에서 리스트 1을 위한 움직임 벡터(mv2)가 가리키는 제 2 참조 블록(435)을 획득할 수 있다. 제 1 참조 블록(415)으로부터 리스트 0를 위한 제 1 예비 예측 블록이 획득되고, 제 2 참조 블록(435)으로부터 리스트 1를 위한 제 2 예비 예측 블록이 획득될 수 있다.As shown in FIG. 4, when a motion vector (mv1) for list 0 and a motion vector (mv2) for list 1 are obtained for the current block 300 in the current image 400, the prediction block acquisition unit ( 220) acquires the first reference block 415 indicated by the motion vector (mv1) for list 0 within the first reference image 410 included in list 0, and the second reference image 430 included in list 1 ), the second reference block 435 indicated by the motion vector (mv2) for list 1 can be obtained. A first preliminary prediction block for list 0 may be obtained from the first reference block 415, and a second preliminary prediction block for list 1 may be obtained from the second reference block 435.

다시 도 2를 참조하면, 신경망 설정부(230)는 신경망(240)으로 입력될 데이터를 획득할 수 있다.Referring again to FIG. 2, the neural network setting unit 230 may obtain data to be input to the neural network 240.

일 실시예에서, 신경망 설정부(230)는 참조 영상, 예비 예측 블록 및 참조 블록을 위한 양자화 파라미터에 기초하여 신경망(240)으로 입력될 데이터를 획득할 수 있다.In one embodiment, the neural network setting unit 230 may obtain data to be input to the neural network 240 based on quantization parameters for the reference image, the preliminary prediction block, and the reference block.

일 실시예에서, 신경망 설정부(230)에 의해 예비 예측 블록, POC 맵 또는 양자화 에러 맵 중 적어도 하나가 신경망(240)으로 입력될 수 있다.In one embodiment, at least one of a preliminary prediction block, a POC map, or a quantization error map may be input to the neural network 240 by the neural network setting unit 230.

신경망(240)으로 입력되는 데이터에 대해 상세히 설명하면, 예비 예측 블록은 룰 기반 예측 모드 하에서 현재 블록과 유사한 것으로 판단된 블록으로서, 현재 블록에 보다 유사한 최종 예측 블록을 획득하는데 이용될 수 있다.Describing the data input to the neural network 240 in detail, the preliminary prediction block is a block determined to be similar to the current block under a rule-based prediction mode, and can be used to obtain a final prediction block that is more similar to the current block.

참조 블록을 위한 양자화 파라미터는, 참조 블록을 부호화/복호화하는 과정에서 참조 블록의 잔차 데이터를 양자화/역양자화하는데 이용될 수 있다. 양자화 파라미터에 따라 양자화/역양자화에 따른 에러량이 달라질 수 있다. 즉, 양자화 파라미터는 부호화/복호화를 통해 복원된 참조 블록에 포함되어 있는 에러량 또는 왜곡량을 의미한다고 할 수 있다. The quantization parameter for the reference block can be used to quantize/dequantize the residual data of the reference block in the process of encoding/decoding the reference block. Depending on the quantization parameter, the amount of error due to quantization/dequantization may vary. In other words, the quantization parameter can be said to mean the amount of error or distortion included in the reference block restored through encoding/decoding.

양자화 파라미터에 기반하여 산출된 값이 신경망(240)에 입력됨으로써, 신경망(240)은 예비 예측 블록으로부터 최종 예측 블록을 획득하는데 있어 예비 예측 블록의 샘플들의 신뢰도 또는 예비 예측 블록의 샘플들이 최종 예측 블록의 샘플들에 미치는 영향력을 고려할 수 있다. 후술하는 바와 같이, 신경망(240)은 훈련용 양자화 에러 맵 및 훈련용 예비 예측 블록 등에 기반하여 출력되는 훈련용 최종 예측 블록과, 훈련용 현재 블록(또는 원본 블록) 사이의 차이가 작아지는 방향으로 훈련될 수 있다. 따라서, 신경망(240)은 훈련용 양자화 에러 맵이 훈련용 최종 예측 블록에 미치는 영향을 확인하여 훈련용 현재 블록에 유사한 훈련용 최종 예측 블록을 출력할 수 있다.By inputting the value calculated based on the quantization parameter to the neural network 240, the neural network 240 determines the reliability of the samples of the preliminary prediction block or the reliability of the samples of the preliminary prediction block in obtaining the final prediction block from the preliminary prediction block. The influence on the samples can be considered. As will be described later, the neural network 240 operates in a direction where the difference between the final prediction block for training, which is output based on the quantization error map for training and the preliminary prediction block for training, and the current block (or original block) for training becomes smaller. It can be trained. Accordingly, the neural network 240 may check the effect of the quantization error map for training on the final prediction block for training and output a final prediction block for training that is similar to the current block for training.

POC 맵은 현재 블록을 포함하는 현재 영상의 POC(picture order count)와 참조 영상의 POC 사이의 차이(이하, POC 차이)를 샘플 값들로 포함할 수 있다. POC는 영상의 출력 순서를 나타낼 수 있다. 따라서, 현재 영상과 참조 영상의 POC 차이는 현재 영상과 참조 영상의 출력 순서의 차이 또는 시간적 차이를 의미할 수 있다. 객체의 움직임으로 인해 연속적인 영상들 내에서 객체의 위치나 크기에 변화가 발생할 수 있으므로, 신경망(240)은 현재 영상과 참조 영상의 시간적 차이의 영향력을 학습하여 현재 블록에 보다 유사한 최종 예측 블록을 출력할 수 있다.The POC map may include the difference between the picture order count (POC) of the current image including the current block and the POC of the reference image (hereinafter referred to as POC difference) as sample values. POC can indicate the output order of images. Therefore, the POC difference between the current image and the reference image may mean a difference in the output order or a temporal difference between the current image and the reference image. Because the movement of an object may cause a change in the position or size of the object in consecutive images, the neural network 240 learns the influence of the temporal difference between the current image and the reference image to create a final prediction block that is more similar to the current block. Can be printed.

일 실시예에서, 신경망(240)은 하나 이상의 컨볼루션 레이어를 포함할 수 있다. 신경망(240)은 신경망 설정부(230)로부터 입력된 예비 예측 블록, POC 맵 또는 양자화 에러 맵 중 적어도 하나를 처리하여 최종 예측 블록을 출력할 수 있다.In one embodiment, neural network 240 may include one or more convolutional layers. The neural network 240 may process at least one of a preliminary prediction block, a POC map, or a quantization error map input from the neural network setting unit 230 and output a final prediction block.

도 9 및 도 10을 참조하여 후술하는 바와 같이, 신경망(240)에서는 입력 데이터에 대해 소정 연산이 적용됨으로써 최종 예측 블록의 샘플 값들이 개별적으로 결정될 수 있다. 룰 기반의 예측 모드에서는 영상의 블록별로 움직임 벡터가 산출되는데 반해, 일 실시예에 따른 신경망 기반의 예측 모드에서는 최종 예측 블록의 샘플들이 개별적으로 결정되므로, 이는 신경망(240)이 현재 블록에 대한 움직임 벡터를 샘플별로 고려하는 것으로 이해될 수 있다. 따라서, 일 실시예에 따른 신경망 기반의 예측 모드에 따르면, 룰 기반 예측 모드에 비해 현재 블록에 보다 유사한 최종 예측 블록을 획득할 수 있다.As will be described later with reference to FIGS. 9 and 10 , in the neural network 240, sample values of the final prediction block may be individually determined by applying a predetermined operation to the input data. In the rule-based prediction mode, a motion vector is calculated for each block of the image, whereas in the neural network-based prediction mode according to one embodiment, the samples of the final prediction block are determined individually, which means that the neural network 240 determines the motion for the current block. It can be understood as considering vectors on a sample-by-sample basis. Therefore, according to the neural network-based prediction mode according to one embodiment, a final prediction block that is more similar to the current block can be obtained compared to the rule-based prediction mode.

이하, 도 5 내지 도 8을 참조하여 신경망(240)으로 입력되는 양자화 에러 맵의 획득 방법에 대해 설명한다.Hereinafter, a method of obtaining a quantization error map input to the neural network 240 will be described with reference to FIGS. 5 to 8.

도 5 내지 도 8은 일 실시예에 따른 참조 블록을 위한 양자화 파라미터에 기초하여 양자화 에러 맵을 획득하는 방법을 설명하기 위한 도면들이다.5 to 8 are diagrams for explaining a method of obtaining a quantization error map based on quantization parameters for a reference block according to an embodiment.

일 실시예에서, 양자화 에러 맵의 샘플 값들은 참조 블록을 위한 양자화 파라미터로부터 산출될 수 있다. In one embodiment, sample values of the quantization error map may be calculated from the quantization parameters for the reference block.

참조 블록을 위한 양자화 파라미터는 참조 블록의 복호화를 위한 정보를 포함하고 있는 비트스트림으로부터 획득될 수 있다.Quantization parameters for a reference block can be obtained from a bitstream containing information for decoding the reference block.

일 실시예에서, 양자화 에러 맵은 양자화 파라미터로부터 산출되는 양자화 에러 값들을 샘플 값들로 포함할 수 있다. In one embodiment, the quantization error map may include quantization error values calculated from quantization parameters as sample values.

양자화 에러 값들은 참조 블록에 대한 부호화 및 복호화 과정에서 잔차 샘플들에 대해 적용되는 양자화 및 역양자화로 인해 야기될 수 있는 에러량을 나타낼 수 있다. Quantization error values may represent the amount of error that may be caused by quantization and dequantization applied to residual samples during the encoding and decoding process for the reference block.

양자화 에러 값이 크다는 것은, 양자화 전의 변환 계수와 역양자화 후의 변환 계수 사이의 차이가 클 수 있다는 것을 의미할 수 있다. 양자화 전의 변환 계수와 역양자화 후의 변환 계수 사이의 차이가 클수록 원본 블록과 부호화 데이터에 대한 복호화를 통해 획득되는 참조 블록 사이의 동일성이 저감될 수 있다.A large quantization error value may mean that the difference between the transform coefficient before quantization and the transform coefficient after dequantization may be large. As the difference between the transform coefficient before quantization and the transform coefficient after dequantization is larger, the identity between the original block and the reference block obtained through decoding the encoded data may be reduced.

양자화 및 역양자화로 인해 야기되는 에러는 아티팩트에 해당하므로, 양자화 에러 값들을 고려하여 신경망 기반의 인터 예측이 수행되어야 할 필요가 있다. Since errors caused by quantization and dequantization are artifacts, neural network-based inter prediction needs to be performed taking quantization error values into consideration.

일 실시예에서, 양자화 에러 값은 아래 수학식 1로부터 산출될 수 있다.In one embodiment, the quantization error value can be calculated from Equation 1 below.

[수학식 1][Equation 1]

양자화 에러 값 = 양자화 스텝 사이즈^2 / 12Quantization error value = Quantization step size^2 / 12

수학식 1을 참조하면, 양자화 에러 값은 양자화 스텝 사이즈를 제곱한 값에 비례할 수 있다. Referring to Equation 1, the quantization error value may be proportional to the square of the quantization step size.

양자화 스텝 사이즈는 변환 계수의 양자화에 이용되는 값으로서, 변환 계수를 양자화 스텝 사이즈로 나눔으로써 변환 계수가 양자화될 수 있다. 반대로, 양자화된 변환 계수에 양자화 스텝 사이즈를 곱함으로써 양자화된 변환 계수가 역양자화될 수 있다.The quantization step size is a value used to quantize the transform coefficient, and the transform coefficient can be quantized by dividing the transform coefficient by the quantization step size. Conversely, the quantized transform coefficient can be dequantized by multiplying the quantized transform coefficient by the quantization step size.

양자화 스텝 사이즈는 아래 수학식 2로 근사화될 수 있다.The quantization step size can be approximated by Equation 2 below.

[수학식 2][Equation 2]

양자화 스텝 사이즈 = 2^(양자화 파라미터/n) / 양자화 스케일[양자화 파라미터%n]Quantization step size = 2^(quantization parameter/n) / quantization scale[quantization parameter%n]

수학식 2에서, 양자화 스케일[양자화 파라미터%n]는 미리 결정된 n개의 스케일 값들 중 양자화 파라미터가 가리키는 스케일 값을 나타낸다. HEVC 코덱에서는 6개의 스케일 값(26214, 23302, 20560, 18396, 16384 및 14564)을 정의하고 있으므로, HEVC 코덱에 의하면 n은 6이다.In Equation 2, quantization scale [quantization parameter %n] represents the scale value indicated by the quantization parameter among n predetermined scale values. The HEVC codec defines six scale values (26214, 23302, 20560, 18396, 16384, and 14564), so n is 6 according to the HEVC codec.

수학식 1 및 수학식 2를 참조하면, 양자화 파라미터가 커질수록 양자화 스텝 사이즈가 커지고, 양자화 에러 값이 커질 수 있다.Referring to Equation 1 and Equation 2, as the quantization parameter increases, the quantization step size increases and the quantization error value may increase.

일 실시예에서, 양자화 에러 맵은 양자화 파라미터로부터 산출되는 양자화 스텝 사이즈를 샘플 값으로 포함할 수도 있다.In one embodiment, the quantization error map may include the quantization step size calculated from the quantization parameter as a sample value.

도 5를 참조하면, 참조 블록(510)이 16개의 샘플을 포함하고, 16개의 샘플들에 대한 양자화 파라미터가 a1이면, 양자화 에러 맵(530)의 샘플 값들은 a1으로부터 산출된 a2를 가질 수 있다.Referring to FIG. 5, if the reference block 510 includes 16 samples and the quantization parameter for the 16 samples is a1, the sample values of the quantization error map 530 may have a2 calculated from a1. .

도 5에 도시된 참조 블록(510)을 위한 양자화 파라미터는 참조 블록(510)에 대해 설정될 수 있고, 또는, 참조 블록(510)의 상위 블록, 예를 들어, 참조 블록(510)을 포함하는 슬라이스에 대해 설정될 수 있다. 다시 말하면, 신경망 설정부(230)는 참조 블록(510)에 대해 설정된 양자화 파라미터 또는 참조 블록(510)을 포함하는 슬라이스에 대해 설정된 양자화 파라미터로부터 양자화 에러 맵(530)의 샘플 값들을 획득할 수 있다.The quantization parameters for the reference block 510 shown in FIG. 5 may be set for the reference block 510, or may be set to an upper block of the reference block 510, e.g., a block containing the reference block 510. Can be set for slices. In other words, the neural network setting unit 230 may obtain sample values of the quantization error map 530 from the quantization parameter set for the reference block 510 or the quantization parameter set for the slice including the reference block 510. .

참조 블록(510)을 위한 양자화 파라미터가, 참조 블록(510)의 상위 블록, 예를 들어, 참조 블록(510)을 포함하는 슬라이스에 대해 설정된 경우, 해당 슬라이스에 포함된 블록들에 대해 동일한 양자화 파라미터가 적용될 수 있다. 즉, 슬라이스에 대해 설정된 양자화 파라미터에 기초하여 해당 슬라이스에 포함된 블록들의 양자화 에러 맵들이 획득될 수 있으므로, 블록별로 또는 샘플별로 양자화 파라미터를 설정하는 경우에 비해 비트스트림에 포함되는 정보의 개수를 감소시킬 수 있다.If the quantization parameter for the reference block 510 is set for an upper block of the reference block 510, for example, a slice including the reference block 510, the quantization parameter is the same for the blocks included in the slice. can be applied. That is, since quantization error maps of blocks included in the slice can be obtained based on the quantization parameter set for the slice, the number of information included in the bitstream is reduced compared to the case of setting the quantization parameter for each block or sample. You can do it.

다음으로, 도 6을 참조하면, 참조 블록(510-1)을 위한 양자화 파라미터는 참조 블록(510-1)의 샘플 별로 설정될 수 있다. Next, referring to FIG. 6, quantization parameters for the reference block 510-1 may be set for each sample of the reference block 510-1.

신경망 설정부(230)는 참조 블록(510-1)의 샘플별 양자화 파라미터로부터 양자화 에러 맵(530-1)의 샘플 값들을 획득할 수 있다.The neural network setting unit 230 may obtain sample values of the quantization error map 530-1 from the quantization parameters for each sample of the reference block 510-1.

신경망 설정부(230)는 참조 블록(510-1)의 좌측 상부 샘플(611)의 양자화 파라미터 a1으로부터 양자화 에러 맵(530-1)의 좌측 상부 샘플(631)의 값을 a2로 산출하고, 참조 블록(510-1)의 좌측 상부 샘플(611)의 우측에 위치하는 샘플(612)의 양자화 파라미터 b1으로부터 양자화 에러 맵(530-1)의 좌측 상부 샘플(631)의 우측에 위치하는 샘플(632)의 값을 b2로 산출할 수 있다. The neural network setting unit 230 calculates the value of the upper left sample 631 of the quantization error map 530-1 as a2 from the quantization parameter a1 of the upper left sample 611 of the reference block 510-1, and refers to From the quantization parameter b1 of the sample 612 located to the right of the upper left sample 611 of the block 510-1, the sample 632 located to the right of the upper left sample 631 of the quantization error map 530-1 ) can be calculated as b2.

참조 블록(510-1)의 샘플 별로 양자화 파라미터가 설정되는 경우, 참조 블록(510-1)을 위한 양자화 파라미터를 확인하기 위해 비트스트림으로부터 획득하여야 할 정보량이 많아질 수 있다. 하지만, 에러가 적은 참조 블록(510-1)을 이용하여 현재 블록의 최종 예측 블록이 획득되므로, 현재 블록과 최종 예측 블록 사이의 잔차 블록을 표현하기 위한 비트 수가 감소될 수 있다.When quantization parameters are set for each sample of the reference block 510-1, the amount of information that must be obtained from the bitstream to check the quantization parameters for the reference block 510-1 may increase. However, since the final prediction block of the current block is obtained using the reference block 510-1 with few errors, the number of bits for representing the residual block between the current block and the final prediction block can be reduced.

다음으로, 도 7을 참조하면, 신경망 설정부(230)는 양자화 에러 맵(530-2)을 참조 블록(510-2)의 하위 블록들(710, 720, 730, 740)에 대응하는 서브 영역들(750, 760, 770, 780)로 구분하고, 양자화 에러 맵(530-2)의 서브 영역들(750, 760, 770, 780) 각각에 포함된 샘플 값들을, 참조 블록(510-2)의 하위 블록(710, 720, 730, 740) 내 소정 위치의 샘플을 위한 양자화 파라미터로부터 산출할 수 있다. Next, referring to FIG. 7, the neural network setting unit 230 stores the quantization error map 530-2 in sub-regions corresponding to the sub-blocks 710, 720, 730, and 740 of the reference block 510-2. It is divided into fields 750, 760, 770, and 780, and sample values included in each of the sub-regions 750, 760, 770, and 780 of the quantization error map 530-2 are used in the reference block 510-2. It can be calculated from the quantization parameters for samples at predetermined positions within the subblocks 710, 720, 730, and 740.

일 실시예에서, 참조 블록(510-2)이 부호화 단위에 해당한다면, 하위 블록들(710, 720, 730, 740)은 예측 단위에 해당할 수 있다.In one embodiment, if the reference block 510-2 corresponds to a coding unit, the lower blocks 710, 720, 730, and 740 may correspond to a prediction unit.

일 실시예에서, 소정 위치는 하위 블록 내 좌측 상부 위치를 포함할 수 있다.In one embodiment, the predetermined location may include the upper left location within the sub-block.

도 7에 도시된 바와 같이, 양자화 에러 맵(530-2)의 제 1 서브 영역(750)의 샘플 값들은 참조 블록(510-2) 내 제 1 하위 블록(710)의 샘플들 중 좌측 상부 위치의 샘플(711)을 위한 양자화 파라미터 a1으로부터 산출된 a2를 가질 수 있다. As shown in FIG. 7, the sample values of the first sub-region 750 of the quantization error map 530-2 are located at the upper left position among the samples of the first sub-block 710 in the reference block 510-2. It may have a2 calculated from the quantization parameter a1 for the sample 711 of .

또한, 양자화 에러 맵(530-2)의 제 2 서브 영역(760)의 샘플 값들은 참조 블록(510-2) 내 제 2 하위 블록(720)의 샘플들 중 좌측 상부 위치의 샘플(721)을 위한 양자화 파라미터 e1으로부터 산출된 e2를 가질 수 있다. Additionally, the sample values of the second sub-region 760 of the quantization error map 530-2 include the sample 721 at the upper left position among the samples of the second sub-block 720 in the reference block 510-2. You can have e2 calculated from the quantization parameter e1.

또한, 양자화 에러 맵(530-2)의 제 3 서브 영역(770)의 샘플 값들은 참조 블록(510-2) 내 제 3 하위 블록(730)의 샘플들 중 좌측 상부 위치의 샘플(731)을 위한 양자화 파라미터 c1으로부터 산출된 c2를 가질 수 있고, 양자화 에러 맵(530-2)의 제 4 서브 영역(780)의 샘플 값들은 참조 블록(510-2) 내 제 4 하위 블록(740)의 샘플들 중 좌측 상부 위치의 샘플(741)을 위한 양자화 파라미터 b1으로부터 산출된 b2를 가질 수 있다.Additionally, the sample values of the third sub-region 770 of the quantization error map 530-2 include the sample 731 at the upper left position among the samples of the third sub-block 730 in the reference block 510-2. c2 calculated from the quantization parameter c1, and the sample values of the fourth sub-region 780 of the quantization error map 530-2 are the samples of the fourth sub-block 740 in the reference block 510-2. It may have b2 calculated from quantization parameter b1 for the sample 741 at the upper left position.

일 예로, 소정 위치의 샘플은 하위 블록 내 중앙 위치의 샘플을 포함할 수도 있다. 여기서, 중앙 위치의 샘플이란, 소정 영역의 폭과 높이를 반으로 나눈 지점의 좌측 하부에 위치하는 샘플, 좌측 상부에 위치하는 샘플, 우측 하부에 위치하는 샘플 또는 우측 상부에 위치하는 샘플을 의미할 수 있다.As an example, a sample at a certain location may include a sample at a central location within a subblock. Here, the sample in the central position means a sample located in the lower left, a sample located in the upper left, a sample located in the lower right, or a sample located in the upper right of the point where the width and height of the predetermined area are divided in half. You can.

도 8은 중앙 위치의 샘플이 소정 영역의 폭과 높이를 반으로 나눈 지점의 좌측 하부에 위치하는 샘플인 것으로 예시하고 있다.FIG. 8 illustrates that the sample at the central location is a sample located at the lower left of the point where the width and height of the predetermined area are divided in half.

도 8에 도시된 바와 같이, 양자화 에러 맵(530-3)의 제 1 서브 영역(850)의 샘플 값들은 참조 블록(510-2) 내 제 1 하위 블록(710)의 샘플들 중 중앙 위치의 샘플(811)을 위한 양자화 파라미터 a1으로부터 산출된 a2를 가질 수 있다. As shown in FIG. 8, the sample values of the first sub-region 850 of the quantization error map 530-3 are at the central position among the samples of the first sub-block 710 in the reference block 510-2. It may have a2 calculated from the quantization parameter a1 for the sample 811.

또한, 양자화 에러 맵(530-3)의 제 2 서브 영역(860)의 샘플 값들은 참조 블록(510-2) 내 제 2 하위 블록(720)의 샘플들 중 중앙 위치의 샘플(821)을 위한 양자화 파라미터 e1으로부터 산출된 e2를 가질 수 있다. Additionally, the sample values of the second sub-region 860 of the quantization error map 530-3 are for the sample 821 at the center position among the samples of the second sub-block 720 in the reference block 510-2. It may have e2 calculated from the quantization parameter e1.

또한, 양자화 에러 맵(530-3)의 제 3 서브 영역(870)의 샘플 값들은 참조 블록(510-2) 내 제 3 하위 블록(730)의 샘플들 중 중앙 위치의 샘플(831)을 위한 양자화 파라미터 a1으로부터 산출된 a2를 가질 수 있고, 양자화 에러 맵(830-3)의 제 4 서브 영역(880)의 샘플 값들은 참조 블록(510-2) 내 제 4 하위 블록(740)의 샘플들 중 중앙 위치의 샘플(841)을 위한 양자화 파라미터 c1으로부터 산출된 c2를 가질 수 있다.Additionally, the sample values of the third sub-region 870 of the quantization error map 530-3 are for the sample 831 at the center position among the samples of the third sub-block 730 in the reference block 510-2. It may have a2 calculated from the quantization parameter a1, and the sample values of the fourth sub-region 880 of the quantization error map 830-3 are samples of the fourth sub-block 740 in the reference block 510-2. It may have c2 calculated from the quantization parameter c1 for the sample 841 at the center position.

도 7 및 도 8과 관련하여 설명한 좌측 상부 위치의 샘플 및 중앙 위치의 샘플은 하나의 예시일 뿐이며, 일 실시예에서, 양자화 에러 맵(530-2, 530-3)의 서브 영역들의 샘플 값들을 획득하기 위한 하위 블록(710, 720, 730, 740) 내 특정 위치는 다양하게 변경될 수 있다. The sample at the upper left position and the sample at the center position described in relation to FIGS. 7 and 8 are just one example, and in one embodiment, sample values of sub-regions of the quantization error maps 530-2 and 530-3 are The specific location within the subblocks 710, 720, 730, and 740 to be acquired can be changed in various ways.

참조 블록을 위한 양자화 파라미터가 하위 블록별 또는 샘플별로 설정되는 경우, 도 7 및 도 8과 관련하여 설명한 실시예에서와 같이, 참조 블록(510-2)의 하위 블록(710, 720, 730, 740) 내 소정 위치의 샘플을 위한 양자화 파라미터로부터 양자화 에러 맵(530-2, 530-3)의 서브 영역들의 샘플 값들을 산출함으로써, 보다 빠르게 양자화 에러 맵(530-2, 530-3)이 생성될 수 있다.When the quantization parameters for a reference block are set for each sub-block or for each sample, as in the embodiment described in connection with FIGS. 7 and 8, the sub-blocks 710, 720, 730, and 740 of the reference block 510-2 ) By calculating the sample values of the sub-regions of the quantization error maps 530-2 and 530-3 from the quantization parameters for samples at predetermined positions within the quantization error maps 530-2 and 530-3, the quantization error maps 530-2 and 530-3 can be generated more quickly. You can.

일 실시예에서, 신경망 설정부(230)는 현재 블록의 크기, 현재 블록의 예측 방향, 영상의 계층적 구조에서 현재 영상이 속한 레이어, 또는 비트스트림으로부터 획득되는 정보(예를 들어, 플래그 또는 인덱스) 중 적어도 하나에 기초하여, 서로 다른 양자화 에러 맵의 획득 방법(예를 들어, 도 5 내지 도 8에 도시된 양자화 에러 맵의 획득 방법들)들 중 어느 하나를 선택하고, 선택된 방법에 따라 양자화 에러 맵을 획득할 수 있다.In one embodiment, the neural network setting unit 230 determines the size of the current block, the prediction direction of the current block, the layer to which the current image belongs in the hierarchical structure of the image, or information obtained from the bitstream (e.g., a flag or index). ) Based on at least one of the following, one of the different quantization error map acquisition methods (e.g., the quantization error map acquisition methods shown in FIGS. 5 to 8) is selected, and quantization is performed according to the selected method. An error map can be obtained.

전술한 바와 같이, 예비 예측 블록, 양자화 에러 맵 또는 POC 맵 중 적어도 하나가 신경망(240)으로 입력됨에 따라 최종 예측 블록이 획득될 수 있는데, 도 9를 참조하여 신경망(240)의 예시적인 구조에 대해 설명한다.As described above, a final prediction block may be obtained as at least one of a preliminary prediction block, a quantization error map, or a POC map is input to the neural network 240. Referring to FIG. 9, an example structure of the neural network 240 is shown. Explain.

도 9는 일 실시예에 따른 신경망(240)의 구조를 예시하는 도면이다.FIG. 9 is a diagram illustrating the structure of a neural network 240 according to one embodiment.

도 9에 도시된 바와 같이, 예비 예측 블록(902), 양자화 에러 맵(904) 및 POC 맵(906)은 제 1 컨볼루션 레이어(910)로 입력될 수 있다. As shown in FIG. 9 , the preliminary prediction block 902, the quantization error map 904, and the POC map 906 may be input to the first convolutional layer 910.

일 실시예에서, 예비 예측 블록(902), 양자화 에러 맵(904) 및 POC 맵(906)의 크기는 현재 블록의 크기와 동일할 수 있다. In one embodiment, the size of the preliminary prediction block 902, quantization error map 904, and POC map 906 may be the same as the size of the current block.

POC 맵(906)은 현재 영상과 참조 영상 사이의 POC 차이를 샘플 값들로 포함하므로, POC 맵(906) 내의 샘플 값들은 모두 동일할 수 있다.Since the POC map 906 includes the POC difference between the current image and the reference image as sample values, all sample values in the POC map 906 may be the same.

도 9에 도시된 제 1 컨볼루션 레이어(910)에 표시된 6X5X5X32는 6개의 채널의 입력 데이터에 대해 5x5의 크기의 32개의 필터 커널을 이용하여 컨볼루션 처리를 하는 것을 예시한다. 컨볼루션 처리 결과 32개의 필터 커널에 의해 32개의 특징 맵이 생성될 수 있다. The 6 As a result of convolution processing, 32 feature maps can be generated by 32 filter kernels.

제 1 컨볼루션 레이어(910)가 6개 채널의 입력 데이터를 처리한다는 것은, 현재 블록이 양방향 예측되는 경우를 고려한 것이다.The fact that the first convolution layer 910 processes input data of 6 channels takes into account the case where the current block is bidirectionally predicted.

일 실시예에서, 현재 블록이 양방향 예측되는 경우, 현재 블록에 대해 리스트 0을 위한 제 1 움직임 벡터와 리스트 1을 위한 제 2 움직임 벡터가 도출되고, 현재 블록의 참조 영상으로서 리스트 0에 포함된 제 1 참조 영상과 리스트 1에 포함된 제 2 참조 영상이 획득될 수 있다. In one embodiment, when the current block is bidirectionally predicted, a first motion vector for list 0 and a second motion vector for list 1 are derived for the current block, and the first motion vector included in list 0 as a reference image of the current block is derived. 1 reference image and a second reference image included in list 1 may be obtained.

또한, 리스트 0을 위한 제 1 움직임 벡터가 가리키는 제 1 참조 영상 내 제 1 참조 블록으로부터 제 1 예비 예측 블록이 획득되고, 리스트 1을 위한 제 2 움직임 벡터가 가리키는 제 2 참조 영상 내 제 2 참조 블록으로부터 제 2 예비 예측 블록이 획득될 수 있다. 즉, 신경망(240)에 입력되는 데이터로서, 제 1 예비 예측 블록, 제 2 예비 예측 블록, 제 1 참조 블록을 위한 양자화 파라미터로부터 산출된 샘플 값들을 포함하는 제 1 양자화 에러 맵, 제 2 참조 블록을 위한 양자화 파라미터로부터 산출된 샘플 값들을 포함하는 제 2 양자화 에러 맵, 현재 영상과 제 1 참조 영상 사이의 POC 차이를 포함하는 제 1 POC 맵, 및 현재 영상과 제 2 참조 영상 사이의 POC 차이를 포함하는 제 2 POC 맵이 획득될 수 있다. In addition, the first preliminary prediction block is obtained from the first reference block in the first reference image indicated by the first motion vector for list 0, and the second reference block in the second reference image indicated by the second motion vector for list 1 A second preliminary prediction block can be obtained from. That is, data input to the neural network 240 includes a first preliminary prediction block, a second preliminary prediction block, a first quantization error map including sample values calculated from quantization parameters for the first reference block, and a second reference block. A second quantization error map including sample values calculated from the quantization parameters for, a first POC map including a POC difference between the current image and the first reference image, and a POC difference between the current image and the second reference image. A second POC map including

제 1 컨볼루션 레이어(910)는 제 1 예비 예측 블록, 제 2 예비 예측 블록, 제 1 양자화 에러 맵, 제 2 양자화 에러 맵, 제 1 POC 맵 및 제 2 POC 맵을 5X5 크기의 32개의 필터 커널로 컨볼루션 처리할 수 있다.The first convolution layer 910 combines the first preliminary prediction block, the second preliminary prediction block, the first quantization error map, the second quantization error map, the first POC map, and the second POC map into 32 filter kernels of 5X5 size. It can be convolutionally processed.

만약, 현재 블록에 대해 단방향 예측, 예를 들어, 리스트 0 방향 예측 또는 리스트 1 방향 예측이 적용되는 경우, 다시 말하면, 현재 블록에 대해 리스트 0을 위한 제 1 움직임 벡터만이 획득되거나 리스트 1을 위한 제 2 움직임 벡터만이 도출되는 경우, 3개의 채널의 입력 데이터만이 획득될 수 있다. 도 9에 도시된 제 1 컨볼루션 레이어(910)에서 처리할 수 있는 채널 수가 6개이므로, 3개 채널의 입력 데이터를 6개 채널의 입력 데이터로 증가시켜야 할 필요가 발생한다. If unidirectional prediction, for example, list 0 direction prediction or list 1 direction prediction, is applied to the current block, in other words, only the first motion vector for list 0 is obtained for the current block or the first motion vector for list 1 is obtained. When only the second motion vector is derived, only three channels of input data can be obtained. Since the number of channels that can be processed in the first convolution layer 910 shown in FIG. 9 is 6, there is a need to increase the input data of 3 channels to input data of 6 channels.

신경망 설정부(230)는 제 1 예비 예측 블록 (또는 제 2 예비 예측 블록), 제 1 양자화 에러 맵 (또는 제 2 양자화 에러 맵), 및 제 1 POC 맵(또는 제 2 POC 맵)을 복사하여 두 개의 제 1 예비 예측 블록(또는 두 개의 제 2 예비 예측 블록), 두 개의 제 1 양자화 에러 맵 (또는 두 개의 제 2 양자화 에러 맵) 및 두 개의 제 1 POC 맵 (또는 두 개의 제 2 POC 맵)을 획득하고, 6개 채널의 입력 데이터를 신경망(240)으로 입력할 수 있다.The neural network setting unit 230 copies the first preliminary prediction block (or second preliminary prediction block), the first quantization error map (or second quantization error map), and the first POC map (or second POC map) Two first preliminary prediction blocks (or two second preliminary prediction blocks), two first quantization error maps (or two second quantization error maps) and two first POC maps (or two second POC maps) ) can be obtained, and the input data of 6 channels can be input into the neural network 240.

제 1 컨볼루션 레이어(910)에 의해 생성된 특징 맵들은 입력 데이터의 고유한 특성들을 나타낼 수 있다. 예를 들어, 특징 맵들은 입력 데이터의 수직 방향 특성, 수평 방향 특성 또는 에지 특성 등을 나타낼 수 있다.Feature maps generated by the first convolutional layer 910 may represent unique characteristics of the input data. For example, feature maps may represent vertical characteristics, horizontal characteristics, or edge characteristics of input data.

도 10을 참조하여, 제 1 컨볼루션 레이어(910)에서의 컨볼루션 연산에 대해 상세히 설명한다.With reference to FIG. 10 , the convolution operation in the first convolution layer 910 will be described in detail.

제 1 컨볼루션 레이어(910)에서 이용되는 5X5의 크기를 갖는 필터 커널(1010)의 가중치들과 그에 대응하는 입력 데이터(1005) (예를 들어, 예비 예측 블록(902)) 내 샘플 값들 사이의 곱 연산 및 덧셈 연산을 통해 하나의 특징 맵(1030)이 생성될 수 있다. Between the weights of the filter kernel 1010 with a size of 5X5 used in the first convolutional layer 910 and the sample values in the corresponding input data 1005 (e.g., preliminary prediction block 902) One feature map 1030 can be created through a multiplication operation and an addition operation.

제 1 컨볼루션 레이어(910)에서는 32개의 필터 커널이 이용되므로, 32개의 필터 커널을 이용한 컨볼루션 연산 과정을 통해 32개의 특징 맵이 생성될 수 있다.Since 32 filter kernels are used in the first convolution layer 910, 32 feature maps can be generated through a convolution operation process using 32 filter kernels.

도 10에서 입력 데이터(1005)에 표시된 I1 내지 I49는 입력 데이터(1005)의 샘플들을 나타내고, 필터 커널(1010)에 표시된 F1 내지 F25는 필터 커널(1010)의 샘플들을 나타낸다. 또한, 특징 맵(1030)에 표시된 M1 내지 M9는 특징 맵(1030)의 샘플들을 나타낸다. In FIG. 10 , I1 to I49 displayed in the input data 1005 represent samples of the input data 1005, and F1 to F25 displayed in the filter kernel 1010 represent samples of the filter kernel 1010. Additionally, M1 to M9 displayed in the feature map 1030 represent samples of the feature map 1030.

컨볼루션 연산 과정에서, 입력 데이터(1005)의 I1 내지 I5, I8 내지 I12, I15 내지 I19, I22 내지 I26 및 I29 내지 I33의 샘플 값들 각각과 필터 커널(1010)의 F1 내지 F25 각각의 곱 연산이 수행되고, 곱 연산의 결과 값들을 조합(예를 들어, 덧셈 연산)한 값이 특징 맵(1030)의 M1의 값으로 할당될 수 있다. In the convolution operation process, the product of sample values I1 to I5, I8 to I12, I15 to I19, I22 to I26, and I29 to I33 of the input data 1005 and F1 to F25 of the filter kernel 1010 is performed. A value obtained by combining the result values of the multiplication operation (e.g., an addition operation) may be assigned as the value of M1 of the feature map 1030.

컨볼루션 연산의 스트라이드(stride)가 1이라면, 입력 데이터(1005)의 I2 내지 I6, I9 내지 I13, I16 내지 I20, I23 내지 I27 및 I30 내지 I34의 샘플 값들 각각과 필터 커널(1010)의 F1 내지 F25 각각의 곱 연산이 수행되고, 곱 연산의 결과 값들을 조합한 값이 특징 맵(1030)의 M2의 값으로 할당될 수 있다.If the stride of the convolution operation is 1, each of the sample values I2 to I6, I9 to I13, I16 to I20, I23 to I27, and I30 to I34 of the input data 1005 and F1 to I34 of the filter kernel 1010 Each F25 multiplication operation is performed, and a value obtained by combining the result values of the multiplication operation may be assigned as the value of M2 of the feature map 1030.

필터 커널(1010)이 입력 데이터(1005)의 마지막 샘플에 도달할 때까지 스트라이드에 따라 이동하는 동안 입력 데이터(1005) 내 샘플 값들과 필터 커널(1010)의 샘플들 사이의 컨볼루션 연산이 수행됨으로써, 소정 크기를 갖는 특징 맵(1030)이 획득될 수 있다.A convolution operation is performed between the sample values in the input data 1005 and the samples of the filter kernel 1010 while the filter kernel 1010 moves according to the stride until it reaches the last sample of the input data 1005. , a feature map 1030 having a predetermined size can be obtained.

신경망(240)에 포함된 컨볼루션 레이어들은 도 10과 관련하여 설명한 컨볼루션 연산 과정에 따른 처리를 할 수 있으나, 도 10에서 설명한 컨볼루션 연산 과정은 하나의 예시일 뿐이며, 일 실시예는 이에 한정되지 않는다.The convolution layers included in the neural network 240 may be processed according to the convolution operation process described in relation to FIG. 10, but the convolution operation process described in FIG. 10 is only an example, and one embodiment is limited to this. It doesn't work.

다시 도 9를 참조하면, 제 1 컨볼루션 레이어(910)의 특징 맵들은 제 1 활성화 레이어(920)로 입력될 수 있다.Referring again to FIG. 9, the feature maps of the first convolutional layer 910 may be input to the first activation layer 920.

제 1 활성화 레이어(920)는 각각의 특징 맵에 대해 비선형(Non-linear) 특성을 부여할 수 있다. 제 1 활성화 레이어(920)는 시그모이드 함수(sigmoid function), Tanh 함수, ReLU(Rectified Linear Unit) 함수 등을 포함할 수 있으나, 일 실시예는 이에 한정되지 않는다.The first activation layer 920 may provide non-linear characteristics to each feature map. The first activation layer 920 may include a sigmoid function, a Tanh function, a Rectified Linear Unit (ReLU) function, etc., but the embodiment is not limited thereto.

제 1 활성화 레이어(920)에서 비선형 특성을 부여하는 것은, 특징 맵들의 일부 샘플 값을 변경하여 출력하는 것을 의미할 수 있다. 이때, 변경은 비선형 특성을 적용하여 수행될 수 있다.Giving non-linear characteristics in the first activation layer 920 may mean changing and outputting some sample values of feature maps. At this time, the change can be performed by applying non-linear characteristics.

제 1 활성화 레이어(920)는 특징 맵의 샘플 값들을 제 2 컨볼루션 레이어(930)로 전달할지 여부를 결정할 수 있다. 예를 들어, 특징 맵의 샘플 값들 중 어떤 샘플 값들은 제 1 활성화 레이어(920)에 의해 활성화되어 제 2 컨볼루션 레이어(930)로 전달되고, 어떤 샘플 값들은 제 1 활성화 레이어(920)에 의해 비활성화되어 제 2 컨볼루션 레이어(930)로 전달되지 않을 수 있다. 특징 맵들이 나타내는 입력 데이터의 고유 특성이 제 1 활성화 레이어(920)에 의해 강조될 수 있다.The first activation layer 920 may determine whether to transfer sample values of the feature map to the second convolution layer 930. For example, among the sample values of the feature map, some sample values are activated by the first activation layer 920 and transmitted to the second convolution layer 930, and some sample values are activated by the first activation layer 920. It may be deactivated and not transmitted to the second convolution layer 930. Unique characteristics of the input data represented by the feature maps may be emphasized by the first activation layer 920.

제 1 활성화 레이어(920)에서 출력된 특징 맵들(925)은 제 2 컨볼루션 레이어(930)로 입력될 수 있다. 도 9에 도시된 특징 맵들(925) 중 어느 하나는 도 10과 관련하여 설명한 특징 맵(1030)이 제 1 활성화 레이어(920)에서 처리된 결과에 해당할 수 있다.The feature maps 925 output from the first activation layer 920 may be input to the second convolution layer 930. One of the feature maps 925 shown in FIG. 9 may correspond to the result of processing the feature map 1030 described with reference to FIG. 10 in the first activation layer 920.

제 2 컨볼루션 레이어(930)에 표시된 32X5X5X32는 5x5의 크기의 32개의 필터 커널을 이용하여 32개 채널의 특징 맵들(925)에 대해 컨볼루션 처리하는 것을 예시한다. 제 2 컨볼루션 레이어(930)의 출력은 제 2 활성화 레이어(940)로 입력될 수 있다. 제 2 활성화 레이어(940)는 입력된 특징 맵들에 대해 비선형 특성을 부여할 수 있다.32 The output of the second convolution layer 930 may be input to the second activation layer 940. The second activation layer 940 may provide non-linear characteristics to the input feature maps.

제 2 활성화 레이어(940)에서 출력된 특징 맵들(945)은 제 3 컨볼루션 레이어(950)로 입력될 수 있다. 제 3 컨볼루션 레이어(950)에 표시된 32X5X5X1은 5x5의 크기의 1개의 필터 커널을 이용하여 1개의 최종 예측 블록(955)을 만들기 위해 32개의 특징 맵들(945)에 대해 컨볼루션 처리를 하는 것을 예시한다. The feature maps 945 output from the second activation layer 940 may be input to the third convolution layer 950. 32 do.

도 9는 신경망(240)이 세 개의 컨볼루션 레이어(제 1 컨볼루션 레이어(910), 제 2 컨볼루션 레이어(930) 및 제 3 컨볼루션 레이어(950))와 두 개의 활성화 레이어 (제 1 활성화 레이어(920) 및 제 2 활성화 레이어(940))를 포함하고 있는 것으로 도시하고 있으나, 이는 하나의 예시일 뿐이며, 일 실시예에서, 신경망(240)에 포함된 컨볼루션 레이어 및 활성화 레이어의 개수는 다양하게 변경될 수 있다. 9 shows that the neural network 240 has three convolutional layers (first convolutional layer 910, second convolutional layer 930, and third convolutional layer 950) and two activation layers (first activation layer). Although it is shown as including a layer 920 and a second activation layer 940, this is only an example. In one embodiment, the number of convolution layers and activation layers included in the neural network 240 is It can be changed in various ways.

일 실시예에서, 신경망(240)은 RNN(recurrent neural network)을 통해 구현될 수도 있다. 이는 도 9에 도시된 신경망(240)의 CNN 구조를 RNN 구조로 변경하는 것을 의미할 수 있다.In one embodiment, the neural network 240 may be implemented through a recurrent neural network (RNN). This may mean changing the CNN structure of the neural network 240 shown in FIG. 9 to an RNN structure.

일 실시예에서, 영상 복호화 장치(100) 및 후술하는 영상 부호화 장치(1900)는 전술한 컨볼루션 연산 및 활성화 레이어의 연산을 위한 적어도 하나의 ALU(Arithmetic logic unit)를 포함할 수 있다. In one embodiment, the image decoding device 100 and the image encoding device 1900 described later may include at least one Arithmetic logic unit (ALU) for the above-described convolution operation and activation layer operation.

ALU는 프로세서로 구현될 수 있다. 컨볼루션 연산을 위해, ALU는 입력 데이터 또는 이전 레이어에서 출력된 특징 맵의 샘플 값들과 필터 커널의 샘플 값들 사이의 곱 연산을 수행하는 곱셈기 및 곱셈의 결과 값들을 더하는 가산기를 포함할 수 있다. ALU can be implemented as a processor. For the convolution operation, the ALU may include a multiplier that performs a multiplication operation between sample values of input data or a feature map output from the previous layer and sample values of the filter kernel, and an adder that adds the resultant values of the multiplication.

활성화 레이어의 연산을 위해, ALU는 미리 결정된 시그모이드 함수, Tanh 함수 또는 ReLU 함수 등에서 이용되는 가중치를 입력된 샘플 값에 곱하는 곱셈기, 및 곱한 결과와 소정 값을 비교하여 입력된 샘플 값을 다음 레이어로 전달할지를 판단하는 비교기를 포함할 수 있다.For the operation of the activation layer, the ALU has a multiplier that multiplies the input sample value by a weight used in a predetermined sigmoid function, Tanh function, or ReLU function, and compares the multiplication result with a predetermined value to transfer the input sample value to the next layer. It may include a comparator that determines whether to forward it to .

앞서, 도 10에 도시된 컨볼루션 연산 과정에 따르면, 7X7 크기의 입력 데이터(1005)에 대해 5X5 크기의 필터 커널(1010)이 적용됨에 따라 3X3 크기의 특징 맵(1030)이 획득된다. Previously, according to the convolution operation process shown in FIG. 10, a 5X5-sized filter kernel 1010 is applied to 7X7-sized input data 1005, thereby obtaining a 3X3-sized feature map 1030.

일반적으로 패딩이 되지 않은 입력 데이터에 대해 컨볼루션 처리가 수행되는 경우, 입력 데이터보다 작은 크기의 출력 데이터가 출력된다. 따라서, 최종 예측 블록의 크기를 현재 블록의 크기에 일치시키기 위해서는 입력 데이터에 대해 패딩(padding)이 수행되어야 할 필요가 있다.In general, when convolution processing is performed on unpadded input data, output data of a size smaller than the input data is output. Therefore, in order to match the size of the final prediction block to the size of the current block, padding needs to be performed on the input data.

도 10에 도시된 실시예에, 입력 데이터(1005)의 크기와 동일한 크기의 특징 맵(1030)이 획득되기 위해서는, 입력 데이터(1005)의 좌측 방향, 우측 방향, 상부 방향 및 하부 방향으로 2의 거리만큼 패딩이 이루어져야 한다. In the embodiment shown in FIG. 10, in order to obtain the feature map 1030 of the same size as the size of the input data 1005, 2 Padding should be equal to the distance.

일 실시예에서, 신경망 설정부(230)는 현재 블록과 동일한 크기의 최종 예측 블록을 획득하기 위해, 예비 예측 블록, 양자화 에러 맵 또는 POC 맵 중 적어도 하나를 패딩하고, 패딩을 통해 획득된 확장된 예비 예측 블록, 확장된 양자화 에러 맵 또는 확장된 POC 맵 중 적어도 하나를 신경망(240)에 입력할 수 있다. In one embodiment, the neural network setting unit 230 pads at least one of a preliminary prediction block, a quantization error map, or a POC map to obtain a final prediction block of the same size as the current block, and uses the extended prediction block obtained through padding. At least one of a preliminary prediction block, an extended quantization error map, or an extended POC map may be input to the neural network 240.

일 실시예에서, 신경망(240)의 각 컨볼루션 레이어에서 입력 데이터 또는 이전 레이어에서 출력된 데이터에 대해 컨볼루션 연산이 수행되기 전에, 입력 데이터 또는 이전 레이어에서 출력된 데이터가 패딩됨으로써 컨볼루션 연산 전의 데이터의 크기와 컨볼루션 연산 후의 데이터의 크기가 동일하게 유지될 수 있다. 따라서, 확장된 예비 예측 블록, 확장된 양자화 에러 맵 또는 확장된 POC 맵 중 적어도 하나가 신경망(240)으로 입력되는 경우, 신경망(240)에서 출력되는 최종 예측 블록의 크기는 확장된 예비 예측 블록, 확장된 양자화 에러 맵 또는 확장된 POC 맵 중 적어도 하나와 동일할 수 있다. 이 경우, 일 실시예에서, 신경망(240)에서 출력된 최종 예측 블록의 크기가 현재 블록의 크기와 동일해지도록 최종 예측 블록이 크로핑(cropping)될 수 있다.In one embodiment, before the convolution operation is performed on the input data or the data output from the previous layer in each convolution layer of the neural network 240, the input data or the data output from the previous layer are padded so that the convolution operation before the convolution operation is performed. The size of the data and the size of the data after the convolution operation may remain the same. Accordingly, when at least one of an extended preliminary prediction block, an extended quantization error map, or an extended POC map is input to the neural network 240, the size of the final prediction block output from the neural network 240 is the extended preliminary prediction block, It may be identical to at least one of an extended quantization error map or an extended POC map. In this case, in one embodiment, the final prediction block may be cropped so that the size of the final prediction block output from the neural network 240 is the same as the size of the current block.

일 실시에에서, 신경망 설정부(230)는 신경망(240)에 포함된 컨볼루션 레이어의 개수, 컨볼루션 레이어에서 이용되는 필터 커널의 크기 및 스트라이드에 기초하여 패딩을 위한 확장 거리를 산출할 수 있다.In one embodiment, the neural network setting unit 230 may calculate an extension distance for padding based on the number of convolutional layers included in the neural network 240, the size and stride of the filter kernel used in the convolutional layer. .

일 실시예에서, 신경망(240)에 포함된 각각의 컨볼루션 레이어에서 이용되는 커널의 크기가 k_i(i=0, 1, ..., L-1)이고, 각각의 컨볼루션 레이어에서의 스트라이드가 s_i(i=0, 1, ..., L-1)인 경우, 확장 거리는 하기 수학식 3에 따라 산출될 수 있다.In one embodiment, the size of the kernel used in each convolution layer included in the neural network 240 is k _i (i=0, 1, ..., L-1), and the size of the kernel in each convolution layer is k i (i=0, 1, ..., L-1). When the stride is s _i (i=0, 1, ..., L-1), the extension distance can be calculated according to Equation 3 below.

[수학식 3][Equation 3]

수학식 3에서, h는 수평 방향으로의 확장 거리, v는 수직 방향으로의 확장 거리, M은 입력 데이터의 수평 방향 크기, N은 입력 데이터의 수직 방향 크기를 나타낸다.In Equation 3, h is the expansion distance in the horizontal direction, v is the expansion distance in the vertical direction, M is the horizontal size of the input data, and N is the vertical size of the input data.

신경망(240)에 포함된 컨볼루션 레이어들에서 이용되는 필터 커널의 크기가 모두 k로 동일하고, 컨볼루션 레이어들에서의 스트라이드가 모두 s인 경우, 수학식 3은 하기 수학식 4로 변경될 수 있다.If the size of the filter kernels used in the convolutional layers included in the neural network 240 are all the same as k, and the strides in the convolutional layers are all s, Equation 3 can be changed to Equation 4 below: there is.

[수학식 4][Equation 4]

신경망 설정부(230)는 수학식 3 또는 수학식 4에 기초하여, 예비 예측 블록, 양자화 에러 맵 및 POC 맵의 패딩을 위한 수평 방향의 확장 거리와 수직 방향의 확장 거리를 결정하고, 확장 거리에 대응하는 샘플들을 더 포함하는 확장된 예비 예측 블록, 확장된 양자화 에러 맵 및 확장된 POC 맵을 획득할 수 있다.The neural network setting unit 230 determines the horizontal extension distance and the vertical extension distance for padding of the preliminary prediction block, quantization error map, and POC map based on Equation 3 or Equation 4, and sets the extension distance to the extension distance. An extended preliminary prediction block, an extended quantization error map, and an extended POC map that further include corresponding samples may be obtained.

일 실시예에서, 수평 방향의 확장 거리와 수직 방향의 확장 거리가 1인 경우, 신경망 설정부(230)는 예비 예측 블록으로부터 좌측 방향 및 우측 방향 각각으로 1의 확장 거리만큼 샘플들을 추가하고, 예비 예측 블록의 상부 방향 및 하부 방향 각각으로 1의 확장 거리만큼 샘플들을 추가하여 확장된 예비 예측 블록을 획득할 수 있다.In one embodiment, when the horizontal extension distance and the vertical extension distance are 1, the neural network setting unit 230 adds samples by an extension distance of 1 in the left and right directions, respectively, from the preliminary prediction block, and An extended preliminary prediction block can be obtained by adding samples by an extension distance of 1 in each of the upper and lower directions of the prediction block.

일 실시예에서, 신경망 설정부(230)는 확장된 예비 예측 블록, 확장된 양자화 에러 맵 및 확장된 POC 맵을 획득하기 위해, 예비 예측 블록, 양자화 에러 맵 및 POC 맵의 경계 외부에 미리 결정된 값의 샘플을 추가할 수 있다. 미리 결정된 값은 0일 수 있다.In one embodiment, the neural network setting unit 230 sets a predetermined value outside the boundaries of the preliminary prediction block, the quantization error map, and the POC map to obtain the extended preliminary prediction block, the extended quantization error map, and the extended POC map. Samples can be added. The predetermined value may be 0.

일 실시예에서, 예비 예측 블록이 참조 영상 내의 일부인 참조 블록으로부터 획득된 점을 고려하여, 신경망 설정부(230)는 예비 예측 블록, 양자화 에러 맵 및 POC 맵을 패딩할 때, 참조 블록의 주변 샘플들을 고려할 수도 있다. 즉, 예비 예측 블록을 미리 결정된 샘플 값에 따라 패딩하는 대신 참조 블록에 인접한 샘플들을 이용함으로써, 현재 블록을 인터 예측하는데 있어 참조 영상의 공간적 특성이 함께 고려될 수 있다.In one embodiment, considering that the preliminary prediction block is obtained from a reference block that is part of the reference image, the neural network setting unit 230 pads the preliminary prediction block, quantization error map, and POC map by using neighboring samples of the reference block. You may also consider them. That is, by using samples adjacent to the reference block instead of padding the preliminary prediction block according to a predetermined sample value, the spatial characteristics of the reference image can be taken into consideration when inter-predicting the current block.

도 11은 일 실시예에 따른 확장된 예비 예측 블록을 획득하는 방법을 설명하기 위한 도면이다.FIG. 11 is a diagram illustrating a method of obtaining an extended preliminary prediction block according to an embodiment.

도 11에 도시된 바와 같이, 신경망 설정부(230)는 수평 방향의 확장 거리가 h이고, 수직 방향의 확장 거리가 v인 경우, 참조 영상(1100) 내에서 참조 블록(1110)에 인접한 샘플들 중 확장 거리 h에 대응하는 샘플들(1120)과 예비 예측 블록의 샘플들을 포함하는 확장된 예비 예측 블록을 획득할 수 있다. 이에 따라, 확장된 예비 예측 블록의 수평 거리는 예비 예측 블록보다 2h만큼 더 크고, 확장된 예비 예측 블록의 수직 거리는 예비 예측 블록보다 2v만큼 더 클 수 있다.As shown in FIG. 11, when the horizontal extension distance is h and the vertical extension distance is v, the neural network setting unit 230 selects samples adjacent to the reference block 1110 in the reference image 1100. An extended preliminary prediction block including samples 1120 corresponding to the middle extended distance h and samples of the preliminary prediction block can be obtained. Accordingly, the horizontal distance of the extended preliminary prediction block may be greater than the preliminary prediction block by 2h, and the vertical distance of the extended preliminary prediction block may be greater than the preliminary prediction block by 2v.

도 12는 일 실시예에 따른 참조 블록(1110)의 경계가 참조 영상(1100)의 경계에 해당하는 경우, 확장된 예비 예측 블록을 획득하는 방법을 설명하기 위한 도면이다.FIG. 12 is a diagram illustrating a method of obtaining an extended preliminary prediction block when the boundary of the reference block 1110 corresponds to the boundary of the reference image 1100, according to an embodiment.

수평 방향의 확장 거리와 수직 방향의 확장 거리가 3으로 결정된 경우, 신경망 설정부(230)는 참조 블록(1110)의 경계 외부에 위치하는 주변 샘플들 중 3의 확장 거리 내에 위치하는 주변 샘플들과 예비 예측 블록 내 샘플들을 포함하는 확장된 예비 예측 블록을 획득할 수 있다.When the horizontal extension distance and the vertical extension distance are determined to be 3, the neural network setting unit 230 configures the peripheral samples located within the extension distance of 3 among the peripheral samples located outside the boundary of the reference block 1110 and An extended preliminary prediction block including samples within the preliminary prediction block can be obtained.

도 12에 도시된 바와 같이, 참조 영상(1100)이 6개의 블록들(1210, 1220, 1230, 1240, 1250, 1110)로 분할되고, 중앙 상부에 위치한 블록이 참조 블록(1110)인 경우, 신경망 설정부(230)는 참조 블록(1110)의 좌측 블록(1210) 내에 위치하면서 참조 블록(1110)의 좌측 경계로부터 3의 확장 거리 내에 위치하는 주변 샘플들, 참조 블록(1110)의 우측 블록(1250) 내에 위치하면서 참조 블록(1110)의 우측 경계로부터 3의 확장 거리 내에 위치하는 주변 샘플들, 및 참조 블록(1110)의 하부 블록(1230) 내에 위치하면서 참조 블록(1110)의 하부 경계로부터 3의 확장 거리 내에 위치하는 주변 샘플들을 선택할 수 있다. 이 때, 사각형의 확장된 예비 예측 블록을 결정하기 위해 참조 블록(1110)의 좌측 하부 블록(1220) 내에 위치하는 주변 샘플들과 참조 블록(1110)의 우측 하부 블록(1240) 내에 위치하는 주변 샘플들 역시 선택될 수 있다.As shown in FIG. 12, when the reference image 1100 is divided into six blocks (1210, 1220, 1230, 1240, 1250, and 1110), and the block located in the upper center is the reference block 1110, the neural network The setting unit 230 is located within the left block 1210 of the reference block 1110, surrounding samples located within an extended distance of 3 from the left boundary of the reference block 1110, and the right block 1250 of the reference block 1110. ) and surrounding samples located within an extended distance of 3 from the right border of the reference block 1110, and located within the lower block 1230 of the reference block 1110 and within an extended distance of 3 from the lower border of the reference block 1110. Neighboring samples located within the extended distance can be selected. At this time, in order to determine the rectangular extended preliminary prediction block, the surrounding samples located within the lower left block 1220 of the reference block 1110 and the surrounding samples located within the lower right block 1240 of the reference block 1110 may also be selected.

일 실시예에서, 참조 블록(1110)의 경계가 참조 영상(1100)의 경계와 일치하는 경우, 예를 들어, 도 10에 도시된 바와 같이, 참조 블록(1110)의 상부 경계가 참조 영상(1100)의 상부 경계와 일치하는 경우, 참조 블록(1110)의 상부 경계의 외부에 위치하는 주변 샘플들이 참조 영상(1100) 내에 포함되어 있지 않다. 따라서, 신경망 설정부(230)는 참조 블록(1110)의 상부 경계 외부에 위치하는 주변 샘플들(1260) 각각으로부터 가장 가까운 참조 영상(1100) 내의 샘플들을 이용하여 참조 블록(1110)의 상부 경계의 외부에 위치하는 주변 샘플들(1260)을 결정할 수 있다.In one embodiment, when the boundary of the reference block 1110 coincides with the boundary of the reference image 1100, for example, as shown in FIG. 10, the upper boundary of the reference block 1110 matches the boundary of the reference image 1100. ), the surrounding samples located outside the upper boundary of the reference block 1110 are not included in the reference image 1100. Accordingly, the neural network setting unit 230 uses samples in the reference image 1100 that are closest to each of the surrounding samples 1260 located outside the upper boundary of the reference block 1110 to determine the upper boundary of the reference block 1110. Surrounding samples 1260 located outside can be determined.

도 10을 참조하면, 참조 블록(1110)의 상부 경계의 외부에 위치하는 주변 샘플들(1260) 중 가장 좌측열에 위치하는 주변 샘플들의 값들은 인접 블록(1210) 내 가장 가까운 샘플 값 a로 결정되고, 주변 샘플들(1260) 중 가장 우측열에 위치하는 주변 샘플들의 값들은 인접 블록(1250) 내 가장 가까운 샘플 값 k로 결정될 수 있다.Referring to FIG. 10, the values of the neighboring samples located in the leftmost row among the neighboring samples 1260 located outside the upper boundary of the reference block 1110 are determined by the closest sample value a in the adjacent block 1210. , the values of the neighboring samples located in the rightmost column among the neighboring samples 1260 may be determined as the nearest sample value k within the adjacent block 1250.

신경망 설정부(230)는 5x5 크기의 참조 블록(1110)(및 예비 예측 블록)보다 크기가 큰 11x11 크기의 확장된 예비 예측 블록을 신경망(240)에 적용하고, 현재 블록의 크기와 동일한 크기인 5x5의 최종 예측 블록을 획득할 수 있다.The neural network setting unit 230 applies an extended preliminary prediction block of 11x11 size, which is larger than the reference block 1110 (and preliminary prediction block) of 5x5 size, to the neural network 240, and applies an extended preliminary prediction block of 11x11 size, which is the same size as the current block, to the neural network 240. A final prediction block of 5x5 can be obtained.

신경망(240)으로 입력되는 양자화 에러 맵의 크기는 확장된 예비 예측 블록의 크기와 동일하여야 하므로, 확장된 예비 예측 블록이 신경망(240)으로 입력되는 경우, 신경망 설정부(230)는 확장된 예비 예측 블록과 동일한 크기를 갖는 확장된 양자화 에러 맵을 획득할 수 있다. 이에 대해 도 13을 참조하여 설명한다.Since the size of the quantization error map input to the neural network 240 must be the same as the size of the extended preliminary prediction block, when the extended preliminary prediction block is input to the neural network 240, the neural network setting unit 230 sets the extended preliminary prediction block. An extended quantization error map with the same size as the prediction block can be obtained. This will be explained with reference to FIG. 13.

도 13은 일 실시예에 따른 확장된 양자화 에러 맵(1300)을 획득하는 방법을 설명하기 위한 도면이다.FIG. 13 is a diagram illustrating a method of obtaining an extended quantization error map 1300 according to an embodiment.

도 13의 좌측에는 4개의 샘플들을 포함하는 양자화 에러 맵(530-4)이 도시되어 있고, 도 13의 우측에는 1의 확장 거리에 따라 확장된 양자화 에러 맵(1300)이 도시되어 있다.The left side of FIG. 13 shows a quantization error map 530-4 including four samples, and the right side of FIG. 13 shows a quantization error map 1300 expanded according to an expansion distance of 1.

참조 블록에 포함된 4개의 샘플들을 위한 양자화 파라미터가 각각 a1, b1, c1 및 a1일 때, 양자화 에러 맵(530-4) 내 제 1 샘플(1301) 내지 제 4 샘플(1304)은 a2, b2, c2 및 a2의 샘플 값들을 가질 수 있다. When the quantization parameters for the four samples included in the reference block are a1, b1, c1, and a1, respectively, the first to fourth samples 1301 to 1304 in the quantization error map 530-4 are a2 and b2. , may have sample values of c2 and a2.

예비 예측 블록 내 샘플들과 참조 블록에 인접하는 주변 샘플들이 확장된 예비 예측 블록을 구성하는 경우, 참조 블록의 경계 외부에 위치하는 주변 샘플들을 위한 양자화 파라미터에 따라 양자화 에러 맵(530-4)의 외부의 주변 샘플들의 값들이 결정될 수 있다.When the samples in the preliminary prediction block and the surrounding samples adjacent to the reference block form an extended preliminary prediction block, the quantization error map 530-4 is changed according to the quantization parameter for the surrounding samples located outside the boundary of the reference block. Values of external surrounding samples may be determined.

도 13의 우측에 도시된 바와 같이, 제 1 샘플(1301)의 좌측 샘플(1310)은 e2의 값을 가지고, 제 3 샘플(1303)의 좌측 샘플(1315)은 a2의 값을 가질 수 있다. 여기서, e2 및 a2는 참조 블록 내 제 1 샘플(1301)에 대응하는 샘플의 좌측에 위치하는 샘플을 위한 양자화 파라미터 및 제 3 샘플(1303)에 대응하는 샘플의 좌측에 위치하는 샘플을 위한 양자화 파라미터로부터 각각 산출될 수 있다.As shown on the right side of FIG. 13, the left sample 1310 of the first sample 1301 may have a value of e2, and the left sample 1315 of the third sample 1303 may have a value of a2. Here, e2 and a2 are quantization parameters for the sample located to the left of the sample corresponding to the first sample 1301 in the reference block and quantization parameters for the sample located to the left of the sample corresponding to the third sample 1303. Each can be calculated from .

또한, 제 3 샘플(1303)의 좌측 하부 샘플(1320)은 f2의 값을 가질 수 있고, 제 3 샘플(1303)의 하부 샘플(1325)은 c2의 값을 가지고, 제 4 샘플(1304)의 하부 샘플(1330)은 a2의 값을 가질 수 있다. 또한, 제 4 샘플(1304)의 우측 하부 샘플(1335)은 e2의 값을 가지고, 제 4 샘플(1304)의 우측 샘플(1340)은 d2의 값을 가지고, 제 2 샘플(1302)의 우측 샘플(1345)은 e2의 값을 가질 수 있다.Additionally, the lower left sample 1320 of the third sample 1303 may have a value of f2, the lower sample 1325 of the third sample 1303 may have a value of c2, and the lower left sample 1320 of the third sample 1303 may have a value of c2. The lower sample 1330 may have a value of a2. Additionally, the lower right sample 1335 of the fourth sample 1304 has a value of e2, the right sample 1340 of the fourth sample 1304 has a value of d2, and the right sample of the second sample 1302 (1345) can have the value of e2.

도 12와 관련하여 설명한 바와 같이, 참조 블록(1110)의 경계(예를 들어, 상부 경계)가 참조 영상(1100)의 경계에 해당하는 경우, 참조 블록(1110)의 경계 외부에 위치하는 주변 샘플들(1260)은 해당 주변 샘플들(1260)이 이용할 수 있는 가장 가까운 샘플들로부터 결정될 수 있다. 마찬가지로 양자화 에러 맵(530-4)의 경계 외부에 위치하는 주변 샘플들의 값들도 해당 주변 샘플들이 이용할 수 있는 가장 가까운 샘플들로부터 결정될 수 있다. As described with reference to FIG. 12, when the boundary (e.g., upper boundary) of the reference block 1110 corresponds to the boundary of the reference image 1100, surrounding samples located outside the boundary of the reference block 1110 s 1260 may be determined from the nearest samples for which the corresponding neighboring samples 1260 are available. Likewise, the values of neighboring samples located outside the boundary of the quantization error map 530-4 may also be determined from the nearest samples available to the corresponding neighboring samples.

참조 블록의 상부 경계가 참조 영상의 경계에 해당하는 경우, 도 13의 우측에 도시된 바와 같이, 양자화 에러 맵(530-4)의 상부 경계의 외부에 위치하는 샘플들(1305, 1360, 1355, 1350)은 e2, a2, b2 및 e2의 값들을 가질 수 있다. 즉, 제 1 샘플(1301)의 좌측 상부에 위치하는 샘플(1305)은 가장 가까운 제 1 샘플(1301)의 좌측 샘플(1310)로부터 결정되고, 제 1 샘플(1301)의 상부에 위치하는 샘플(1360)은 가장 가까운 제 1 샘플(1301)로부터 결정될 수 있다. 또한, 제 2 샘플(1302)의 상부에 위치하는 샘플(1355)은 가장 가까운 제 2 샘플(1302)로부터 결정되고, 제 2 샘플(1302)의 우측 상부에 위치하는 샘플(1350)은 가장 가까운 제 2 샘플(1302)의 우측 샘플(1345)로부터 결정될 수 있다.When the upper boundary of the reference block corresponds to the boundary of the reference image, as shown on the right side of FIG. 13, samples 1305, 1360, and 1355 located outside the upper boundary of the quantization error map 530-4. 1350) may have the values e2, a2, b2, and e2. That is, the sample 1305 located at the upper left of the first sample 1301 is determined from the closest left sample 1310 of the first sample 1301, and the sample located at the upper left of the first sample 1301 ( 1360) can be determined from the closest first sample 1301. In addition, the sample 1355 located at the top of the second sample 1302 is determined from the nearest second sample 1302, and the sample 1350 located at the upper right of the second sample 1302 is determined from the closest second sample 1302. 2 can be determined from the sample 1345 to the right of sample 1302.

일 실시예에서, POC 맵 내의 샘플 값들은 현재 영상과 참조 영상의 POC 차이에 해당하므로, POC 맵이 확장 거리에 따라 패딩되는 경우, 확장된 POC 맵 내 샘플 값들은 모두 현재 영상과 참조 영상의 POC 차이 값을 가질 수 있다.In one embodiment, the sample values in the POC map correspond to the POC difference between the current image and the reference image, so when the POC map is padded according to the extension distance, the sample values in the extended POC map all correspond to the POC difference between the current image and the reference image. It can have difference values.

한편, 일 실시예에서, 현재 영상 내에 현재 블록보다 먼저 복원된 샘플들이 존재하는 경우, 해당 샘플들을 포함하는 블록이 신경망(240)으로 더 입력될 수 있는데, 이에 대해 도 14 및 도 15를 참조하여 설명한다.Meanwhile, in one embodiment, if there are samples restored before the current block in the current image, blocks containing the samples may be further input to the neural network 240, with reference to FIGS. 14 and 15. Explain.

도 14는 일 실시예에 따른 현재 영상(1400) 내 복원 완료된 샘플들과 복원되지 않은 샘플들을 도시하는 도면이다.FIG. 14 is a diagram illustrating reconstructed and unrestored samples in the current image 1400 according to an embodiment.

현재 영상(1400)은 미리 결정된 스캔 순서에 따라 복원될 수 있다. 예를 들어, 현재 영상(1400) 내 블록들 및 샘플들은 래스터 스캔(raster scan)에 따라 복원될 수 있다.The current image 1400 can be restored according to a predetermined scanning order. For example, blocks and samples in the current image 1400 may be restored according to raster scan.

도 14에 도시된 바와 같이, 현재 영상(1400)이 래스터 스캔에 따라 복원될 때, 현재 블록(1410)의 좌측에 위치하는 샘플들과 현재 블록(1410)의 상부에 위치하는 샘플들은 현재 블록(1410)의 복원 전에 복원이 완료되어 있을 수 있다.As shown in FIG. 14, when the current image 1400 is restored according to raster scan, the samples located on the left of the current block 1410 and the samples located on top of the current block 1410 are the current block ( The restoration may have been completed before the restoration of 1410).

현재 블록(1410) 이전에 복원이 완료된 샘플들은 현재 블록과 공간적으로 인접해 있다는 점에서 현재 블록(1410)과 유사한 최종 예측 블록을 생성하는 데 유용한 정보를 신경망(240)으로 제공할 수 있다.Samples whose restoration was completed before the current block 1410 may provide the neural network 240 with useful information for generating a final prediction block similar to the current block 1410 in that they are spatially adjacent to the current block.

일 실시예에서, 신경망 설정부(230)는 확장된 예비 예측 블록, 확장된 양자화 에러 맵 또는 확장된 POC 맵 중 적어도 하나를 신경망(240)으로 입력할 때, 확장된 예비 예측 블록, 확장된 양자화 에러 맵 또는 확장된 POC 맵 중 적어도 하나와 동일한 크기를 갖는 확장된 현재 복원 블록을 신경망(240)으로 함께 입력할 수 있다.In one embodiment, when the neural network setting unit 230 inputs at least one of an extended preliminary prediction block, an extended quantization error map, or an extended POC map to the neural network 240, the extended preliminary prediction block, the extended quantization An extended current reconstruction block having the same size as at least one of the error map or the extended POC map may be input to the neural network 240 together.

확장된 현재 복원 블록은 현재 블록(1410) 이전에 복원된 샘플들 중 확장 거리에 대응하는 샘플들을 포함할 수 있다.The extended current reconstruction block may include samples corresponding to the extended distance among samples reconstructed before the current block 1410.

도 14를 참조하면, 확장 거리가 1인 경우, 신경망 설정부(230)는 현재 블록(1410) 이전에 복원된 샘플들 중 현재 블록(1410)의 경계로부터 확장 거리 1에 위치하는 샘플들(1420)을 이용하여 확장된 현재 복원 블록을 획득할 수 있다.Referring to FIG. 14, when the extension distance is 1, the neural network setting unit 230 selects samples 1420 located at an extension distance 1 from the boundary of the current block 1410 among the samples restored before the current block 1410. ) can be used to obtain the extended current restored block.

도 14에서 현재 블록(1410)의 좌측 방향 및 상부 방향으로 확장 거리 1에 대응하는 샘플들(1420)은 이미 복원이 완료되었지만, 현재 블록(1410) 내의 샘플들과 현재 블록(1410)의 우측 방향 및 하부 방향으로 확장 거리 1에 대응하는 샘플들(1430)은 복원되지 않았으므로, 신경망 설정부(230)는 이들 복원이 완료되지 않은 샘플들을 확장된 예비 예측 블록으로부터 획득할 수 있다. 이에 대해 도 15를 참조한다.In FIG. 14 , the samples 1420 corresponding to an extension distance of 1 in the left and upper directions of the current block 1410 have already been restored, but the samples in the current block 1410 and the samples in the right direction of the current block 1410 And since the samples 1430 corresponding to the extension distance 1 in the downward direction have not been reconstructed, the neural network setup unit 230 may obtain these samples for which reconstruction has not been completed from the extended preliminary prediction block. Refer to Figure 15 for this.

도 15는 일 실시예에 따른 확장된 현재 복원 블록(1500)의 획득 방법을 설명하기 위한 도면이다.FIG. 15 is a diagram illustrating a method of obtaining an extended current restoration block 1500 according to an embodiment.

도 15를 참조하면, 현재 블록(1410) 이전에 복원이 완료된 샘플들(1420)과, 확장된 예비 예측 블록(1150)의 샘플들 중 현재 블록 이전에 복원된 샘플들(1420)에 대응하는 샘플들 이외의 샘플들(1125)을 포함하는 확장된 현재 복원 블록(1500)이 획득될 수 있다.Referring to FIG. 15, samples 1420 whose reconstruction was completed before the current block 1410 and samples corresponding to the samples 1420 restored before the current block among the samples of the extended preliminary prediction block 1150. An extended current reconstruction block 1500 containing samples 1125 other than these may be obtained.

일 실시예에서, 현재 블록이 양방향 예측됨에 따라 확장된 예비 예측 블록이 2개 존재하는 경우, 도 15에 도시된 확장된 예비 예측 블록(1150)은 2개의 확장된 예비 예측 블록의 샘플 값들의 평균 값들을 샘플 값들로 포함할 수 있다.In one embodiment, when there are two extended preliminary prediction blocks as the current block is bi-predicted, the extended preliminary prediction block 1150 shown in FIG. 15 is the average of the sample values of the two extended preliminary prediction blocks. You can include values as sample values.

전술한 확장된 예비 예측 블록, 확장된 양자화 에러 맵 및 확장된 POC 맵과 함께 확장된 현재 복원 블록(1500)이 신경망(240)에 의해 처리됨으로써, 현재 영상 내의 공간적 특성도 함께 고려될 수 있다.As the expanded current reconstruction block 1500 along with the above-described extended preliminary prediction block, extended quantization error map, and extended POC map are processed by the neural network 240, spatial characteristics within the current image can also be considered.

일 실시예에서, 신경망 설정부(230)는 복수의 가중치 세트 중 어느 하나를 소정 기준에 따라 선택하고, 선택된 가중치 세트에 따라 신경망(240)이 동작되도록 할 수 있다.In one embodiment, the neural network setting unit 230 may select one of a plurality of weight sets according to a predetermined criterion and allow the neural network 240 to operate according to the selected weight set.

복수의 가중치 세트 각각은 신경망(240)에 포함된 레이어의 연산 과정에서 이용되는 가중치를 포함할 수 있다.Each of the plurality of weight sets may include weights used in the calculation process of the layer included in the neural network 240.

일 실시예에서, 신경망 설정부(230)는 현재 블록의 크기, 현재 블록의 예측 방향, 참조 블록을 위한 양자화 파라미터, 영상의 계층적 구조에서 현재 영상이 속한 레이어, 또는 비트스트림으로부터 획득되는 정보 중 적어도 하나에 기반하여, 복수의 가중치 세트 중 최종 예측 블록을 획득하는데 이용될 가중치 세트를 선택할 수 있다.In one embodiment, the neural network setting unit 230 selects the size of the current block, the prediction direction of the current block, the quantization parameter for the reference block, the layer to which the current image belongs in the hierarchical structure of the image, or information obtained from the bitstream. Based on at least one of the plurality of weight sets, a weight set to be used to obtain the final prediction block may be selected.

도 16을 참조하면, 신경망 설정부(230)는 A 가중치 세트 내지 C 가중치 세트 중 비트스트림으로부터 획득한 인덱스가 가리키는 가중치 세트를 신경망(240)에 설정함으로써 현재 블록에 보다 유사한 최종 예측 블록을 획득할 수 있다.Referring to FIG. 16, the neural network setting unit 230 sets the weight set indicated by the index obtained from the bitstream among the A weight set to the C weight set in the neural network 240 to obtain a final prediction block more similar to the current block. You can.

도 16은 비트스트림으로부터 획득되는 정보에 기초하여 가중치 세트가 선택되는 것으로 예시하고 있지만, 이는 실시예에 해당한다. 일 실시예에서, 현재 블록의 크기와 미리 결정된 임계치의 비교 결과에 따라 복수의 가중치 세트 중 최종 예측 블록을 획득하는데 이용될 가중치 세트가 선택될 수 있다.Figure 16 illustrates that a weight set is selected based on information obtained from the bitstream, but this corresponds to an embodiment. In one embodiment, a weight set to be used to obtain the final prediction block from among a plurality of weight sets may be selected according to a comparison result between the size of the current block and a predetermined threshold.

예를 들어, 현재 블록의 크기가 64X64 이상이면, C 가중치 세트가 선택되고, 현재 블록의 크기가 16X16 이상 64X64 미만이면, B 가중치 세트가 선택되고, 현재 블록의 크기가 16X16 미만이면 C 가중치 세트가 선택될 수 있다.For example, if the size of the current block is greater than or equal to 64X64, then the C weight set is selected; if the size of the current block is greater than or equal to 16 can be selected.

또한, 예를 들어, 현재 영상이 영상의 계층적 구조에서 레이어 1에 해당한다면 A 가중치 세트가 선택되고, 현재 영상이 레이어 2에 해당한다면 B 가중치 세트가 선택되고, 현재 영상이 레이어 3에 해당한다면 C 가중치 세트가 선택될 수 있다.Also, for example, if the current image corresponds to layer 1 in the image hierarchical structure, the A weight set is selected, if the current image corresponds to layer 2, the B weight set is selected, and if the current image corresponds to layer 3, then the A weight set is selected. A set of C weights may be selected.

복수의 가중치 세트 각각은 신경망(240)에 대한 훈련 결과로 생성될 수 있다. 예를 들어, 도 16에 도시된 A 가중치 세트, B 가중치 세트 및 C 가중치 세트는 서로 다른 훈련 목적에 따라 신경망(240)을 훈련시킴에 따라 획득될 수 있다. 훈련 목적을 다르게 설정한다는 것은, 신경망(240)을 훈련시키는데 이용되는 훈련용 영상의 종류를 달리하거나, 손실 정보를 다른 방식으로 산출하는 것을 의미할 수 있다.Each of the plurality of weight sets may be generated as a result of training for the neural network 240. For example, the A weight set, B weight set, and C weight set shown in FIG. 16 may be obtained by training the neural network 240 according to different training purposes. Setting a different training purpose may mean changing the type of training image used to train the neural network 240 or calculating loss information in a different way.

일 예로, 후술하는 도 23에 도시된 신경망(240)의 훈련 과정에서 훈련용 최종 예측 블록(2305)과 훈련용 현재 블록(2301) 사이의 차이에 대응하는 손실 정보(2306)가 이용될 수 있는데, 두 블록 사이의 차이를 산출하는 방법은 다양하므로, 제 1 방법에 따라 산출된 손실 정보에 기반하여 신경망(240)을 훈련시킴으로써 A 가중치 세트를 생성하고, 제 2 방법에 따라 산출된 손실 정보에 기반하여 신경망(240)을 훈련시킴으로써 B 가중치 세트를 생성할 수 있다. 또한, 제 3 방법에 따라 산출된 손실 정보에 기반하여 신경망(240)을 훈련시킴으로써 C 가중치 세트를 생성할 수 있다.As an example, in the training process of the neural network 240 shown in FIG. 23, which will be described later, loss information 2306 corresponding to the difference between the final prediction block for training 2305 and the current block for training 2301 may be used. , Since there are various methods for calculating the difference between two blocks, a weight set A is generated by training the neural network 240 based on the loss information calculated according to the first method, and the A weight set is applied to the loss information calculated according to the second method. A B weight set can be generated by training the neural network 240 based on the B weight set. Additionally, a C weight set can be generated by training the neural network 240 based on the loss information calculated according to the third method.

일 실시예에서, 신경망 설정부(230)는 복수의 신경망 중 최종 예측 블록을 획득하는데 이용될 신경망을 선택하고, 선택된 신경망에 입력 데이터(예를 들어, 예비 예측 블록)를 적용하여 현재 블록의 최종 예측 블록을 획득할 수도 있다. 복수의 신경망은 AI 기반 예측 복호화부(132)에 포함될 수 있다.In one embodiment, the neural network setting unit 230 selects a neural network to be used to obtain the final prediction block among a plurality of neural networks, and applies input data (e.g., preliminary prediction block) to the selected neural network to obtain the final prediction block of the current block. Prediction blocks can also be obtained. A plurality of neural networks may be included in the AI-based prediction decoder 132.

복수의 신경망의 레이어의 종류, 레이어의 개수, 필터 커널의 크기 또는 스트라이드 중 적어도 하나는 서로 상이할 수 있다. At least one of the type of layer, number of layers, size of filter kernel, or stride of the plurality of neural networks may be different from each other.

일 실시예에서, 신경망 설정부(230)는 현재 블록의 크기, 현재 블록의 예측 방향, 참조 블록을 위한 양자화 파라미터, 영상의 계층적 구조에서 현재 영상이 속한 레이어, 또는 비트스트림으로부터 획득되는 정보 중 적어도 하나에 기반하여, 복수의 신경망 중 최종 예측 블록을 획득하는데 이용될 신경망을 선택할 수 있다.In one embodiment, the neural network setting unit 230 selects the size of the current block, the prediction direction of the current block, the quantization parameter for the reference block, the layer to which the current image belongs in the hierarchical structure of the image, or information obtained from the bitstream. Based on at least one of the plurality of neural networks, a neural network to be used to obtain the final prediction block may be selected.

일 실시예에서, 신경망 설정부(230)는 비트스트림으로부터 획득되는 정보, 현재 블록의 예측 방향 또는 확장된 예비 예측 블록이 참조 영상의 경계를 벗어나는지 여부 중 적어도 하나에 기초하여 전술한 신경망 기반 예측 모드의 적용 여부를 결정할 수 있다. In one embodiment, the neural network setting unit 230 performs the neural network-based prediction described above based on at least one of information obtained from the bitstream, the prediction direction of the current block, or whether the extended preliminary prediction block exceeds the boundary of the reference image. You can decide whether to apply the mode or not.

예를 들어, 비트스트림으로부터 획득되는 정보가 신경망 기반 예측 모드의 미적용을 나타내는 경우, 예측 블록 획득부(220)에 의해 획득되는 예비 예측 블록이 복원부(134)로 전달될 수 있다. For example, when information obtained from the bitstream indicates that the neural network-based prediction mode is not applied, the preliminary prediction block obtained by the prediction block acquisition unit 220 may be transmitted to the restoration unit 134.

또한, 예를 들어, 확장된 예비 예측 블록이 참조 영상의 경계를 벗어나는 경우, 신경망 설정부(230)는 현재 블록에 대해 신경망 기반 예측 모드가 적용되지 않는 것으로 판단하고, 확장된 예비 예측 블록이 참조 영상 내에 위치하는 경우, 예를 들어, 참조 블록의 경계가 참조 영상의 경계에 해당하지 않는 경우, 신경망 기반 예측 모드가 현재 블록에 대해 적용되는 것으로 판단할 수 있다.Additionally, for example, if the extended preliminary prediction block is outside the boundary of the reference image, the neural network setting unit 230 determines that the neural network-based prediction mode is not applied to the current block, and the extended preliminary prediction block is used as a reference image. When located within an image, for example, if the boundary of the reference block does not correspond to the boundary of the reference image, it may be determined that the neural network-based prediction mode is applied to the current block.

또한, 예를 들어, 현재 블록의 예측 방향이 양방향인 경우, 다시 말하면, 현재 블록에 대해 리스트 0를 위한 움직임 벡터와 리스트 1을 위한 움직임 벡터가 획득되는 경우, 신경망 기반 예측 모드가 현재 블록에 대해 적용되는 것으로 판단하고, 현재 블록의 예측 방향이 단방향인 경우, 신경망 기반 예측 모드가 현재 블록에 대해 적용되지 않는 것으로 판단할 수 있다.Additionally, for example, if the prediction direction of the current block is bidirectional, that is, if a motion vector for list 0 and a motion vector for list 1 are obtained for the current block, the neural network-based prediction mode is used for the current block. If it is determined to be applied, and the prediction direction of the current block is unidirectional, it may be determined that the neural network-based prediction mode is not applied to the current block.

도 17은 일 실시예에 따른 영상 복호화 장치(100)에 의한 영상 복호화 방법을 도시하는 도면이다.FIG. 17 is a diagram illustrating an image decoding method using the image decoding device 100 according to an embodiment.

S1710 단계에서, 영상 복호화 장치(100)는 현재 블록의 움직임 벡터를 획득한다.In step S1710, the video decoding device 100 obtains the motion vector of the current block.

일 실시예에서, 영상 복호화 장치(100)는 룰 기반의 예측 모드에 따라 현재 블록의 움직임 벡터를 획득할 수 있다. 룰 기반의 예측 모드는 머지 모드, 스킵 모드, AMVP(advanced motion vector prediction) 모드, BDOF(bi-directional optical flow) 모드, BCW(bi-prediction with CU-level weights), 또는 DMVR(decoder-side motion vector refinement) 모드를 포함할 수 있다. 여러 룰 기반의 예측 모드들 중 어느 모드에 따라 현재 블록의 움직임 벡터를 획득하여야 하는지는 비트스트림에 포함된 예측 모드 정보에 기초하여 결정될 수 있다.In one embodiment, the video decoding apparatus 100 may obtain the motion vector of the current block according to a rule-based prediction mode. Rule-based prediction modes include merge mode, skip mode, advanced motion vector prediction (AMVP) mode, bi-directional optical flow (BDOF) mode, bi-prediction with CU-level weights (BCW), or decoder-side motion (DMVR). vector refinement) mode may be included. Which of several rule-based prediction modes should be used to obtain the motion vector of the current block can be determined based on prediction mode information included in the bitstream.

일 실시예에서, 영상 복호화 장치(100)는 현재 블록의 움직임 벡터를 획득하기 위해 비트스트림에 포함된 정보, 예를 들어, 플래그 또는 인덱스를 이용할 수 있다.In one embodiment, the video decoding apparatus 100 may use information included in the bitstream, for example, a flag or index, to obtain the motion vector of the current block.

일 실시예에서, 영상 복호화 장치(100)는 현재 블록의 움직임 벡터를 획득하기 위해, 현재 블록의 공간적 주변 블록 및/또는 시간적 주변 블록의 움직임 벡터를 움직임 벡터 후보로 포함하는 움직임 벡터 후보 리스트를 구축할 수도 있다.In one embodiment, in order to obtain the motion vector of the current block, the video decoding apparatus 100 builds a motion vector candidate list including the motion vectors of the spatial neighboring blocks and/or the temporal neighboring blocks of the current block as motion vector candidates. You may.

S1720 단계에서, 영상 복호화 장치(100)는 현재 블록의 참조 영상과 현재 블록의 움직임 벡터를 이용하여 예비 예측 블록을 획득한다.In step S1720, the image decoding apparatus 100 obtains a preliminary prediction block using the reference image of the current block and the motion vector of the current block.

일 실시예에서, 영상 복호화 장치(100)는 참조 영상 내에서 현재 블록의 움직임 벡터가 가리키는 참조 블록을 이용하여 예비 예측 블록을 획득할 수 있다.In one embodiment, the image decoding apparatus 100 may obtain a preliminary prediction block using a reference block indicated by the motion vector of the current block in the reference image.

일 실시예에서, 움직임 벡터를 이용하여 참조 영상으로부터 현재 블록와 유사한 예비 예측 블록을 획득하는 과정은, 움직임 보상 과정으로 참조될 수 있다.In one embodiment, the process of obtaining a preliminary prediction block similar to the current block from a reference image using a motion vector may be referred to as a motion compensation process.

일 실시예에서, 예비 예측 블록은 참조 영상 내에서 현재 블록의 움직임 벡터가 가리키는 참조 블록에 대해 인터폴레이션을 적용한 결과에 해당할 수 있다. 이에 따라 예비 예측 블록은 정수 화소들에 대해 필터링이 적용됨으로써 획득된 부화소들을 포함할 수 있다.In one embodiment, the preliminary prediction block may correspond to the result of applying interpolation to the reference block indicated by the motion vector of the current block in the reference image. Accordingly, the preliminary prediction block may include subpixels obtained by applying filtering to integer pixels.

S1730 단계에서, 영상 복호화 장치(100)는 현재 블록을 포함하는 현재 영상과 참조 영상 사이의 POC 차이를 포함하는 POC 맵, 예비 예측 블록 또는 양자화 에러 맵 중 적어도 하나를 신경망(240)에 적용하여 현재 블록의 최종 예측 블록을 획득할 수 있다. In step S1730, the image decoding device 100 applies at least one of a POC map including the POC difference between the current image including the current block and the reference image, a preliminary prediction block, or a quantization error map to the neural network 240 to obtain the current image. The final prediction block of the block can be obtained.

일 실시예에서, 영상 복호화 장치(100)는 POC 맵, 예비 예측 블록 또는 양자화 에러 맵 중 적어도 하나를 패딩하고, 패딩을 통해 획득된 확장된 POC 맵, 확장된 예비 예측 블록 또는 확장된 양자화 에러 맵 중 적어도 하나를 신경망(240)에 적용하여 현재 블록의 최종 예측 블록을 획득할 수 있다. 일 실시예에서, 영상 복호화 장치(100)는 확장된 현재 복원 블록을 신경망(240)에 더 입력할 수도 있다.In one embodiment, the image decoding apparatus 100 pads at least one of a POC map, a preliminary prediction block, or a quantization error map, and uses an extended POC map, an extended preliminary prediction block, or an extended quantization error map obtained through padding. At least one of these can be applied to the neural network 240 to obtain the final prediction block of the current block. In one embodiment, the image decoding apparatus 100 may further input the expanded current reconstruction block to the neural network 240.

S1740 단계에서, 영상 복호화 장치(100)는 최종 예측 블록과 비트스트림으로부터 획득되는 잔차 블록을 이용하여 현재 블록을 복원한다.In step S1740, the image decoding device 100 restores the current block using the final prediction block and the residual block obtained from the bitstream.

일 실시예에서, 영상 복호화 장치(100)는 최종 예측 블록의 샘플 값들과 잔차 블록의 샘플 값들을 합하여 현재 블록을 획득할 수 있다.In one embodiment, the image decoding apparatus 100 may obtain the current block by adding sample values of the final prediction block and sample values of the residual block.

일 실시예에서, 영상 복호화 장치(100)는 비트스트림으로부터 양자화된 변환 계수에 대한 정보를 획득하고, 양자화된 변환 계수에 대해 역양자화 및 역변환을 적용하여 잔차 블록을 획득할 수 있다.In one embodiment, the image decoding apparatus 100 may obtain information about a quantized transform coefficient from a bitstream and obtain a residual block by applying inverse quantization and inverse transform to the quantized transform coefficient.

복원된 현재 블록을 포함하는 영상은 다음 블록의 복호화에 이용될 수 있다.The image including the restored current block can be used to decode the next block.

도 18은 일 실시예에 따른 신택스를 도시하는 도면이다.Figure 18 is a diagram illustrating syntax according to one embodiment.

일 실시예에서, 전술한 신경망 기반의 예측 모드는 스킵 모드, 머지 모드 또는 AMVP 모드와 함께 이용될 수 있다.In one embodiment, the neural network-based prediction mode described above may be used in conjunction with skip mode, merge mode, or AMVP mode.

도 18을 참조하면, S1810에서, 현재 블록에 대해 스킵 모드가 적용되는 경우(inter_skip=1), merge_data()가 호출되고, merge_data() 내 S1840에서 NNinter()가 호출됨으로써 현재 블록에 대해 신경망 기반의 예측 모드가 적용될 수 있다.Referring to FIG. 18, in S1810, when skip mode is applied to the current block (inter_skip=1), merge_data() is called, and NNinter() is called in S1840 within merge_data(), thereby creating a neural network-based network for the current block. A prediction mode of may be applied.

또한, S1820에서, 현재 블록에 대해 머지 모드가 적용되는 경우(merge_flag=1), merge_data()가 호출되고, merge_data() 내 S1840에서 NNinter()가 호출됨으로써 현재 블록에 대해 신경망 기반의 예측 모드가 적용될 수 있다.Additionally, in S1820, if the merge mode is applied to the current block (merge_flag=1), merge_data() is called, and NNinter() is called in S1840 within merge_data(), thereby establishing a neural network-based prediction mode for the current block. It can be applied.

또한, S1830에서, 현재 블록에 대해 AMVP 모드가 적용되는 경우, NNinter()가 호출됨으로써 현재 블록에 대해 신경망 기반의 예측 모드가 적용될 수 있다.Additionally, in S1830, when AMVP mode is applied to the current block, NNinter() is called so that a neural network-based prediction mode can be applied to the current block.

일 실시예에 따르면, 기존의 스킵 모드, 머지 모드 또는 AMVP 모드에 따라 획득되는 현재 블록의 움직임 벡터가 신경망(240)에 기반하여 샘플별로 조정됨으로써 현재 블록에 보다 유사한 최종 예측 블록이 획득될 수 있다.According to one embodiment, the motion vector of the current block obtained according to the existing skip mode, merge mode, or AMVP mode is adjusted for each sample based on the neural network 240, so that a final prediction block more similar to the current block can be obtained. .

도 19는 일 실시예에 따른 영상 부호화 장치(1900)의 구성을 도시하는 도면이다.FIG. 19 is a diagram illustrating the configuration of a video encoding device 1900 according to an embodiment.

도 19를 참조하면, 영상 부호화 장치(1900)는 부호화부(1910) 및 비트스트림 생성부(1930)를 포함할 수 있다. 부호화부(1910)는 AI 기반 예측 부호화부(1912)와 잔차 데이터 획득부(1914)를 포함할 수 있다. Referring to FIG. 19, the image encoding device 1900 may include an encoder 1910 and a bitstream generator 1930. The encoder 1910 may include an AI-based prediction encoder 1912 and a residual data acquisition unit 1914.

일 실시예에서, 부호화부(1910)는 도 24에 도시된 예측 부호화부(2415), 변환 및 양자화부(2420), 역양자화 및 역변환부(2430), 디블로킹 필터링부(2435) 및 루프 필터링부(2440)에 대응할 수 있다. 일 실시예에서, 비트스트림 생성부(1930)는 도 24에 도시된 엔트로피 부호화부(2425)에 대응할 수 있다.In one embodiment, the encoder 1910 includes the prediction encoder 2415, transform and quantization unit 2420, inverse quantization and inverse transform unit 2430, deblocking filtering unit 2435, and loop filtering shown in FIG. 24. It can correspond to part 2440. In one embodiment, the bitstream generator 1930 may correspond to the entropy encoder 2425 shown in FIG. 24.

부호화부(1910) 및 비트스트림 생성부(1930)는 적어도 하나의 프로세서로 구현될 수 있다. 부호화부(1910) 및 비트스트림 생성부(1930)는 적어도 하나의 메모리에 저장된 적어도 하나의 인스트럭션에 따라 동작할 수 있다.The encoder 1910 and the bitstream generator 1930 may be implemented with at least one processor. The encoder 1910 and the bitstream generator 1930 may operate according to at least one instruction stored in at least one memory.

도 19는 부호화부(1910) 및 비트스트림 생성부(1930)를 개별적으로 도시하고 있으나, 부호화부(1910) 및 비트스트림 생성부(1930)는 하나의 프로세서를 통해 구현될 수 있다. 이 경우, 부호화부(1910) 및 비트스트림 생성부(1930)는 전용 프로세서로 구현될 수도 있고, AP(application processor), CPU(central processing unit) 또는 GPU(graphic processing unit)와 같은 범용 프로세서와 소프트웨어의 조합을 통해 구현될 수도 있다. 또한, 전용 프로세서의 경우, 본 개시의 실시예를 구현하기 위한 메모리를 포함하거나, 외부 메모리를 이용하기 위한 메모리 처리부를 포함할 수 있다.Although FIG. 19 shows the encoder 1910 and the bitstream generator 1930 separately, the encoder 1910 and the bitstream generator 1930 can be implemented through one processor. In this case, the encoder 1910 and the bitstream generator 1930 may be implemented with a dedicated processor, or a general-purpose processor such as an application processor (AP), a central processing unit (CPU), or a graphic processing unit (GPU) and software. It can also be implemented through a combination of . Additionally, in the case of a dedicated processor, it may include a memory for implementing an embodiment of the present disclosure, or a memory processing unit for using an external memory.

일 실시예에서, 부호화부(1910) 및 비트스트림 생성부(1930)는 복수의 프로세서로 구성될 수 있다. 이 경우, 전용 프로세서들의 조합으로 구현될 수도 있고, AP, CPU, 또는 GPU와 같은 다수의 범용 프로세서들과 소프트웨어의 조합을 통해 구현될 수도 있다.In one embodiment, the encoder 1910 and the bitstream generator 1930 may be comprised of a plurality of processors. In this case, it may be implemented through a combination of dedicated processors, or it may be implemented through a combination of software and multiple general-purpose processors such as AP, CPU, or GPU.

부호화부(1910)는 현재 블록의 참조 영상을 이용하여 현재 블록을 부호화할 수 있다. 현재 블록에 대한 부호화 결과로서 잔차 블록에 대한 정보와 움직임 벡터에 대한 정보가 출력될 수 있다.The encoder 1910 may encode the current block using the reference image of the current block. As a result of encoding the current block, information about the residual block and information about the motion vector may be output.

일 실시예에서, 현재 블록에 대해 룰 기반 부호화 모드(예를 들어, 스킵 모드)에 따라 부호화부(1910)에 의해 잔차 블록에 대한 정보가 출력되지 않을 수도 있다.In one embodiment, information about the residual block may not be output by the encoder 1910 according to the rule-based coding mode (eg, skip mode) for the current block.

일 실시예에서, AI 기반 예측 부호화부(1912)는 참조 영상과 현재 블록을 이용하여 현재 블록의 최종 예측 블록을 획득할 수 있다. 최종 예측 블록은 잔차 데이터 획득부(1914)로 전달될 수 있다.In one embodiment, the AI-based prediction encoder 1912 may obtain the final prediction block of the current block using a reference image and the current block. The final prediction block may be transmitted to the residual data acquisition unit 1914.

잔차 데이터 획득부(1914)는 현재 블록과 최종 예측 블록 사이의 차이에 대응하는 잔차 블록을 획득할 수 있다. The residual data acquisition unit 1914 may obtain a residual block corresponding to the difference between the current block and the final prediction block.

일 실시예에서, 잔차 데이터 획득부(1914)는 현재 블록의 샘플 값들로부터 최종 예측 블록의 샘플 값들을 차감하여 잔차 블록을 획득할 수 있다.In one embodiment, the residual data acquisition unit 1914 may obtain a residual block by subtracting sample values of the final prediction block from sample values of the current block.

일 실시예에서, 잔차 데이터 획득부(1914)는 잔차 블록의 샘플들에 대해 변환 및 양자화를 적용하여 양자화된 변환 계수를 획득할 수 있다.In one embodiment, the residual data acquisition unit 1914 may obtain quantized transform coefficients by applying transform and quantization to samples of the residual block.

부호화부(1910)에 의해 획득된 잔차 블록에 대한 정보와 움직임 벡터에 대한 정보는 비트스트림 생성부(1930)로 전달될 수 있다.Information about the residual block and information about the motion vector obtained by the encoder 1910 may be transmitted to the bitstream generator 1930.

일 실시예에서, 잔차 블록에 대한 정보는 양자화된 변환 계수에 대한 정보(예를 들어, 양자화된 변환 계수가 0인지 여부를 나타내는 플래그 등)를 포함할 수 있다. In one embodiment, the information about the residual block may include information about the quantized transform coefficient (eg, a flag indicating whether the quantized transform coefficient is 0, etc.).

일 실시예에서, 움직임 벡터에 대한 정보는 움직임 벡터 후보 리스트에 포함된 움직임 벡터 후보들 중 하나 이상의 움직임 벡터 후보를 가리키는 정보, 예를 들어, 플래그 또는 인덱스를 포함할 수 있다. In one embodiment, the information about the motion vector may include information indicating one or more motion vector candidates among the motion vector candidates included in the motion vector candidate list, for example, a flag or an index.

일 실시예에서, 움직임 벡터에 대한 정보는 현재 블록의 움직임 벡터와 예측 움직임 벡터 사이의 차분 움직임 벡터를 포함할 수 있다.In one embodiment, the information about the motion vector may include a differential motion vector between the motion vector of the current block and the predicted motion vector.

비트스트림 생성부(1930)는 현재 블록에 대한 부호화 결과를 포함하는 비트스트림을 생성할 수 있다. The bitstream generator 1930 may generate a bitstream including the encoding result for the current block.

비트스트림은 네트워크를 통해 영상 복호화 장치(100)로 전송될 수 있다. 일 실시예에서, 비트스트림은 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM 및 DVD와 같은 광기록 매체, 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical medium) 등을 포함하는 데이터 저장 매체에 저장될 수 있다.The bitstream may be transmitted to the video decoding device 100 through a network. In one embodiment, the bitstream is a magnetic media such as hard disks, floppy disks and magnetic tapes, optical recording media such as CD-ROMs and DVDs, and magneto-optical media such as floptical disks. ) and the like.

일 실시예에서, 비트스트림 생성부(1930)는 영상에 대한 부호화 결과에 해당하는 신택스 엘리먼트들을 엔트로피 코딩하여 비트스트림을 생성할 수 있다.In one embodiment, the bitstream generator 1930 may generate a bitstream by entropy coding syntax elements corresponding to the encoding result for the image.

도 20은 일 실시예에 따른 AI 기반 예측 부호화부(1912)의 구성을 도시하는 도면이다.FIG. 20 is a diagram illustrating the configuration of an AI-based predictive encoding unit 1912 according to an embodiment.

도 20을 참조하면, AI 기반 예측 부호화부(1912)는 움직임 정보 획득부(2010), 예측 블록 획득부(2020), 신경망 설정부(2030) 및 신경망(2040)을 포함할 수 있다. Referring to FIG. 20, the AI-based prediction encoding unit 1912 may include a motion information acquisition unit 2010, a prediction block acquisition unit 2020, a neural network setting unit 2030, and a neural network 2040.

신경망(2040)은 메모리에 저장될 수 있다. 일 실시예에서, 신경망(2040)은 AI 프로세서로 구현될 수도 있다.Neural network 2040 may be stored in memory. In one embodiment, neural network 2040 may be implemented with an AI processor.

움직임 정보 획득부(2010)는 참조 영상 내에서 현재 블록과 유사한 블록을 써치하고, 써치된 블록을 가리키는 움직임 벡터를 획득할 수 있다.The motion information acquisition unit 2010 may search for a block similar to the current block in the reference image and obtain a motion vector indicating the searched block.

일 실시예에서, 움직임 정보 획득부(2010)는 현재 블록의 움직임 벡터를 부호화하기 위해, 현재 블록의 공간적 주변 블록 및/또는 시간적 주변 블록의 움직임 벡터를 움직임 벡터 후보로 포함하는 움직임 벡터 후보 리스트를 구축할 수 있다.In one embodiment, in order to encode the motion vector of the current block, the motion information acquisition unit 2010 creates a motion vector candidate list that includes the motion vectors of the spatial neighboring blocks and/or the temporal neighboring blocks of the current block as motion vector candidates. It can be built.

일 실시예에서, 움직임 정보 획득부(2010)가 획득하는 움직임 벡터는 리스트 0을 위한 움직임 벡터, 리스트 1을 위한 움직임 벡터, 또는 리스트 0을 위한 움직임 벡터와 리스트 1을 위한 움직임 벡터를 포함할 수 있다.In one embodiment, the motion vector acquired by the motion information acquisition unit 2010 may include a motion vector for list 0, a motion vector for list 1, or a motion vector for list 0 and a motion vector for list 1. there is.

예측 블록 획득부(2020)는 참조 영상 내에서 움직임 벡터가 가리키는 참조 블록을 이용하여 예비 예측 블록을 획득할 수 있다. The prediction block acquisition unit 2020 may acquire a preliminary prediction block using the reference block indicated by the motion vector in the reference image.

일 실시예에서, 예비 예측 블록은 참조 영상 내에서 움직임 벡터가 가리키는 참조 블록에 대해 인터폴레이션(interpolation)이 적용됨으로써 획득될 수 있다. 이 경우, 예비 예측 블록은 정수 화소들에 대한 필터링을 통해 획득된 부화소들을 포함할 수 있다.In one embodiment, the preliminary prediction block may be obtained by applying interpolation to the reference block indicated by the motion vector within the reference image. In this case, the preliminary prediction block may include subpixels obtained through filtering on integer pixels.

일 실시예에서, 움직임 정보 획득부(2010) 및 예측 블록 획득부(2020)는 룰 기반의 예측 모드에 따라 예비 예측 블록을 획득할 수 있다.In one embodiment, the motion information acquisition unit 2010 and the prediction block acquisition unit 2020 may acquire a preliminary prediction block according to a rule-based prediction mode.

전술한 바와 같이, 움직임 정보 획득부(2010)에 의해 리스트 0을 위한 움직임 벡터가 획득된 경우, 예측 블록 획득부(2020)는 리스트 0에 포함된 참조 영상 내에서 리스트 0을 위한 움직임 벡터가 가리키는 참조 블록을 이용하여 예비 예측 블록을 획득할 수 있다.As described above, when the motion vector for list 0 is acquired by the motion information acquisition unit 2010, the prediction block acquisition unit 2020 determines the motion vector indicated by the motion vector for list 0 within the reference image included in list 0. A preliminary prediction block can be obtained using a reference block.

일 실시예에서, 움직임 정보 획득부(2010)에 의해 리스트 1을 위한 움직임 벡터가 획득된 경우, 예측 블록 획득부(2020)는 리스트 1에 포함된 참조 영상 내에서 리스트 1을 위한 움직임 벡터가 가리키는 참조 블록을 이용하여 예비 예측 블록을 획득할 수 있다.In one embodiment, when a motion vector for List 1 is acquired by the motion information acquisition unit 2010, the prediction block acquisition unit 2020 determines the motion vector pointed to by the motion vector for List 1 within the reference image included in List 1. A preliminary prediction block can be obtained using a reference block.

일 실시예에서, 움직임 정보 획득부(2010)에 의해 리스트 0을 위한 움직임 벡터와 리스트 1을 위한 움직임 벡터가 획득된 경우, 예측 블록 획득부(2020)는 리스트 0에 포함된 참조 영상 내에서 리스트 0을 위한 움직임 벡터가 가리키는 참조 블록을 이용하여 리스트 0를 위한 예비 예측 블록을 획득하고, 리스트 1에 포함된 참조 영상 내에서 리스트 1을 위한 움직임 벡터가 가리키는 참조 블록을 이용하여 리스트 1을 위한 예비 예측 블록을 획득할 수 있다.In one embodiment, when a motion vector for list 0 and a motion vector for list 1 are acquired by the motion information acquisition unit 2010, the prediction block acquisition unit 2020 acquires the list within the reference image included in list 0. Obtain a preliminary prediction block for list 0 using the reference block pointed to by the motion vector for list 0, and obtain a preliminary prediction block for list 1 using the reference block pointed to by the motion vector for list 1 within the reference image included in list 1. A prediction block can be obtained.

신경망 설정부(2030)는 신경망(2040)으로 입력될 데이터를 획득할 수 있다. The neural network setting unit 2030 may acquire data to be input to the neural network 2040.

일 실시예에서, 신경망 설정부(2030)는 참조 영상, 예비 예측 블록 및 참조 블록을 위한 양자화 파라미터에 기초하여 신경망(2040)으로 입력될 데이터를 획득할 수 있다.In one embodiment, the neural network setting unit 2030 may obtain data to be input to the neural network 2040 based on quantization parameters for the reference image, preliminary prediction block, and reference block.

일 실시예에서, 신경망 설정부(2030)에 의해 예비 예측 블록, POC 맵 또는 양자화 에러 맵 중 적어도 하나가 신경망(2040)에 적용됨에 따라 현재 블록의 최종 예측 블록이 획득될 수 있다.In one embodiment, the final prediction block of the current block may be obtained by applying at least one of a preliminary prediction block, a POC map, or a quantization error map to the neural network 2040 by the neural network setting unit 2030.

신경망 설정부(2030) 및 신경망(2040)은 앞서 도 2에 도시된 AI 예측 복호화부(132)에 포함된 신경망 설정부(230) 및 신경망(240)과 동일할 수 있으므로, 상세한 설명을 생략한다.Since the neural network setting unit 2030 and the neural network 2040 may be the same as the neural network setting unit 230 and the neural network 240 included in the AI prediction decoding unit 132 shown in FIG. 2, detailed descriptions are omitted. .

일 실시예에서, 신경망 설정부(2030)는 서로 다른 양자화 에러 맵의 획득 방법(예를 들어, 도 5 내지 도 8에 도시된 양자화 에러 맵의 획득 방법들)들 중 어느 하나를 선택하고, 선택된 방법에 따라 양자화 에러 맵을 획득할 수 있다. In one embodiment, the neural network setting unit 2030 selects one of different quantization error map acquisition methods (e.g., the quantization error map acquisition methods shown in FIGS. 5 to 8) and selects the selected A quantization error map can be obtained depending on the method.

일 실시에에서, 신경망 설정부(2030)는 서로 다른 양자화 에러 맵의 획득 방법(예를 들어, 도 5 내지 도 8에 도시된 양자화 에러 맵의 획득 방법들)들 중 어느 하나를 코스트에 기초하여 선택할 수 있다. 코스트 계산시 율-왜곡 비용(rate-distortion cost)이 이용될 수 있다.In one embodiment, the neural network setup unit 2030 uses any one of different quantization error map acquisition methods (e.g., the quantization error map acquisition methods shown in FIGS. 5 to 8) based on the cost. You can choose. When calculating cost, rate-distortion cost can be used.

일 실시예에서, 움직임 정보 획득부(2010)는 현재 블록의 움직임 벡터의 정밀도를 분수(fraction) 정밀도에서 정수(integer) 정밀도로 변경하고, 예측 블록 획득부(2020)는 정수 정밀도의 움직임 벡터에 따라 예비 예측 블록을 획득할 수도 있다. 일 실시예에서, 정수 정밀도의 움직임 벡터가 참조 영상 내에서 가리키는 참조 블록이 예비 예측 블록으로 결정될 수 있다.In one embodiment, the motion information acquisition unit 2010 changes the precision of the motion vector of the current block from fraction precision to integer precision, and the prediction block acquisition unit 2020 changes the precision of the motion vector of the current block to integer precision. Accordingly, a preliminary prediction block may be obtained. In one embodiment, a reference block pointed to by an integer precision motion vector within a reference image may be determined as a preliminary prediction block.

현재 블록의 움직임 벡터의 정밀도가 정수 정밀도로 변경됨에 따라 현재 블록의 움직임 벡터를 표현하기 위한 비트 수가 감소될 수 있고, 이에 따라 비트스트림의 비트레이트가 감소될 수 있다.As the precision of the motion vector of the current block is changed to integer precision, the number of bits for expressing the motion vector of the current block may be reduced, and thus the bit rate of the bitstream may be reduced.

현재 블록의 움직임 벡터의 정밀도를 분수 정밀도에서 정수 정밀도로 변경하는 또 다른 이유는, 정확도가 떨어지는 움직임 벡터에 대한 정보를 신경망(2040)으로 의도적으로 제공함으로써 신경망(2040)이 부정확한 움직임 벡터로부터 정확한 움직임 벡터를 스스로 도출할 수 있게 하기 위함이다.Another reason for changing the precision of the motion vector of the current block from fractional precision to integer precision is to intentionally provide information about the less accurate motion vectors to the neural network 2040, so that the neural network 2040 can change the precision from inaccurate motion vectors to accurate motion vectors. This is to be able to derive the motion vector on its own.

일 실시예에 따른 움직임 벡터의 정밀도 변경은, VVC 표준에 포함된 AMVR (Adaptive motion vector resolution) 모드와 연관성이 존재할 수 있다. AMVR 모드는 움직임 벡터 및 잔차 움직임 벡터의 해상도를 적응적으로 선택하여 사용하는 모드이다. Change in motion vector precision according to one embodiment may be related to AMVR (Adaptive motion vector resolution) mode included in the VVC standard. AMVR mode is a mode that adaptively selects and uses the resolution of motion vectors and residual motion vectors.

AMVR 모드에서는 일반적으로 1/4pel, 1/2pel, 1pel 및 4pel의 해상도 중 어느 하나를 이용하여 움직임 벡터 및 잔차 움직임 벡터가 부호화/복호화될 수 있다.In AMVR mode, motion vectors and residual motion vectors can generally be encoded/decoded using any one of the resolutions of 1/4pel, 1/2pel, 1pel, and 4pel.

AMVR 모드에서 1pel 또는 4pel의 움직임 벡터를 이용하여 현재 블록이 부호화/복호화되는 경우, 신경망 기반의 예측 모드를 통해 더 높은 해상도의 움직임 벡터에 기반하여 생성된 것과 같은 최종 예측 블록을 기대할 수 있다. If the current block is encoded/decoded using a 1pel or 4pel motion vector in AMVR mode, the final prediction block can be expected to be the same as the one generated based on a higher resolution motion vector through the neural network-based prediction mode.

또한, AMVR 모드에서 1/4pel 또는 1/2pel의 움직임 벡터를 이용하여 현재 블록이 부호화/복호화되는 경우, 신경망 기반의 예측 모드에서는 1/4pel 또는 1/2pel을 1pel로 변경함으로써, 움직임 벡터를 시그널링하는데 필요한 비트 수를 감소시킬 수 있고, 신경망(2040)에 의한 예비 예측 블록의 처리를 통해 높은 퀄리티의 최종 예측 블록을 획득할 수 있다.In addition, when the current block is encoded/decoded using a motion vector of 1/4pel or 1/2pel in AMVR mode, in neural network-based prediction mode, the motion vector is signaled by changing 1/4pel or 1/2pel to 1pel. The number of bits required can be reduced, and a high quality final prediction block can be obtained through processing of the preliminary prediction block by the neural network 2040.

움직임 정보 획득부(2010)가 현재 블록의 움직임 벡터의 정밀도를 분수 정밀도에서 정수 정밀도로 변경하는 방법에 대해 도 21을 참조하여 설명한다.A method by which the motion information acquisition unit 2010 changes the precision of the motion vector of the current block from fractional precision to integer precision will be described with reference to FIG. 21.

도 21은 일 실시예에 따른 분수 정밀도의 움직임 벡터를 정수 정밀도의 움직임 벡터로 변경하는 방법을 설명하기 위한 도면이다.FIG. 21 is a diagram illustrating a method of changing a fractional precision motion vector to an integer precision motion vector according to an embodiment.

일 실시예에서, 움직임 정보 획득부(2010)는 현재 블록의 움직임 벡터(A)의 정밀도가 분수 정밀도(예를 들어, 1/2pe, 1/4pel 또는 1/8pel 등)인 경우, 현재 블록의 움직임 벡터(A)가 정수 화소를 가리키도록 변경할 수 있다.In one embodiment, the motion information acquisition unit 2010 determines the motion vector (A) of the current block when the precision of the motion vector (A) is fractional precision (e.g., 1/2pe, 1/4pel, or 1/8pel, etc.). The motion vector (A) can be changed to point to integer pixels.

일 실시예에서, 도 21에서, 움직임 벡터(A)가 좌표 (0,0)을 기준으로 좌표 (19/4, 27/4)(2110)를 가리키는 것으로 가정한다. 좌표 (19/4, 27/4)(2110)는 정수 화소가 아니므로, 움직임 정보 획득부(2010)는 움직임 벡터(A)가 정수 화소를 가리키도록 조정할 수 있다. In one embodiment, in Figure 21, it is assumed that the motion vector (A) points to coordinates (19/4, 27/4) (2110) based on coordinates (0,0). Since the coordinates (19/4, 27/4) 2110 are not integer pixels, the motion information acquisition unit 2010 can adjust the motion vector (A) to point to integer pixels.

좌표 (19/4, 27/4)(2110)를 중심으로 한 주변의 정수 화소의 좌표 각각은 (16/4, 28/4)(2130), (16/4, 24/4)(2120), (20/4, 28/4)(2140), (20/4, 24/4)(2150)가 된다. 이 때, 움직임 정보 획득부(2010)는 움직임 벡터(A)가 좌표 (19/4, 27/4)(2110) 대신 우측-상단에 위치하는 좌표인 (20/4, 28/4)(2140)를 가리키도록 변경할 수 있다. The coordinates of surrounding integer pixels centered on coordinates (19/4, 27/4)(2110) are (16/4, 28/4)(2130), (16/4, 24/4)(2120), respectively. , (20/4, 28/4)(2140), (20/4, 24/4)(2150). At this time, the motion information acquisition unit 2010 uses the coordinates (20/4, 28/4) (2140) where the motion vector (A) is located in the upper right corner instead of the coordinates (19/4, 27/4) (2110). ) can be changed to point to .

일 실시예에서, 움직임 정보 획득부(2010)는 움직임 벡터(A)가 좌측-하단에 위치하는 좌표(2120), 좌측-상단에 위치하는 좌표(2130) 또는 우측-하단에 위치하는 좌표(2150)를 가리키도록 변경할 수도 있다.In one embodiment, the motion information acquisition unit 2010 determines the motion vector (A) at coordinates 2120 located at the bottom left, coordinates 2130 located at the top left, or coordinates 2150 located at the bottom right. ) can also be changed to point to .

움직임 벡터(A)의 분수 정밀도를 정수 정밀도로 변경하는 과정을 움직임 벡터의 라운딩(rounding)으로 참조할 수 있다.The process of changing the fractional precision of the motion vector (A) to integer precision can be referred to as rounding the motion vector.

도 22는 일 실시예에 따른 영상 부호화 장치(1900)에 의한 영상 부호화 방법을 도시하는 도면이다.FIG. 22 is a diagram illustrating an image encoding method using the image encoding device 1900 according to an embodiment.

S2210 단계에서, 영상 부호화 장치(1900)는 참조 영상을 이용하여 현재 블록의 움직임 벡터를 획득한다.In step S2210, the image encoding device 1900 obtains the motion vector of the current block using the reference image.

일 실시예에서, 영상 부호화 장치(1900)는 참조 영상 내에서 현재 블록과 유사한 블록을 가리키는 움직임 벡터를 획득할 수 있다. In one embodiment, the video encoding apparatus 1900 may obtain a motion vector indicating a block similar to the current block within the reference video.

일 실시예에서, 참조 영상을 이용하여 현재 블록의 움직임 벡터를 획득하는 과정은, 움직임 예측 과정으로 참조될 수 있다.In one embodiment, the process of acquiring the motion vector of the current block using a reference image may be referred to as a motion prediction process.

S2220 단계에서, 영상 부호화 장치(1900)는 현재 블록의 참조 영상과 현재 블록의 움직임 벡터를 이용하여 예비 예측 블록을 획득한다.In step S2220, the image encoding device 1900 obtains a preliminary prediction block using the reference image of the current block and the motion vector of the current block.

일 실시예에서, 영상 부호화 장치(1900)는 현재 블록의 움직임 벡터가 참조 영상 내에서 가리키는 참조 블록을 이용하여 예비 예측 블록을 획득할 수 있다.In one embodiment, the image encoding apparatus 1900 may obtain a preliminary prediction block using the reference block indicated by the motion vector of the current block in the reference image.

일 실시예에서, 예비 예측 블록은 참조 영상 내에서 현재 블록의 움직임 벡터가 가리키는 참조 블록에 대해 인터폴레이션을 적용한 결과에 해당할 수 있다.In one embodiment, the preliminary prediction block may correspond to the result of applying interpolation to the reference block indicated by the motion vector of the current block in the reference image.

S2230 단계에서, 영상 부호화 장치(1900)는 현재 블록을 포함하는 현재 영상과 참조 영상 사이의 POC 차이를 샘플 값들로 포함하는 POC 맵, 예비 예측 블록 또는 양자화 에러 맵 중 적어도 하나를 신경망(2040)에 적용하여 현재 블록의 최종 예측 블록을 획득할 수 있다.In step S2230, the image encoding device 1900 sends at least one of a POC map, a preliminary prediction block, or a quantization error map including the POC difference between the current image including the current block and the reference image as sample values to the neural network 2040. By applying it, you can obtain the final prediction block of the current block.

일 실시예에서, 영상 부호화 장치(1900)는 POC 맵, 예비 예측 블록 또는 양자화 에러 맵 중 적어도 하나를 패딩하고, 패딩을 통해 획득된 확장된 POC 맵, 확장된 예비 예측 블록 또는 확장된 양자화 에러 맵 중 적어도 하나를 신경망(2040)에 적용하여 현재 블록의 최종 예측 블록을 획득할 수 있다. 이 경우, 일 실시예에서, 영상 부호화 장치(1900)는 확장된 현재 복원 블록을 신경망(2040)에 더 입력할 수도 있다.In one embodiment, the image encoding device 1900 pads at least one of a POC map, a preliminary prediction block, or a quantization error map, and uses an extended POC map, an extended preliminary prediction block, or an extended quantization error map obtained through padding. At least one of these can be applied to the neural network 2040 to obtain the final prediction block of the current block. In this case, in one embodiment, the image encoding apparatus 1900 may further input the expanded current reconstruction block to the neural network 2040.

S2240 단계에서, 영상 부호화 장치(1900)는 현재 블록과 최종 예측 블록을 이용하여 잔차 블록을 획득한다.In step S2240, the image encoding device 1900 obtains a residual block using the current block and the final prediction block.

일 실시예에서, 영상 부호화 장치(1900)는 현재 블록의 샘플 값들로부터 최종 예측 블록의 샘플 값들을 차감하여 잔차 블록을 획득할 수 있다.In one embodiment, the image encoding apparatus 1900 may obtain a residual block by subtracting sample values of the final prediction block from sample values of the current block.

일 실시예에서, 영상 부호화 장치(1900)는 잔차 블록의 샘플들에 대해 변환 및 양자화를 적용하여 양자화된 변환 계수들을 획득할 수 있다.In one embodiment, the image encoding apparatus 1900 may obtain quantized transform coefficients by applying transform and quantization to samples of the residual block.

S2250 단계에서, 영상 부호화 장치(1900)는 잔차 블록에 대한 정보를 포함하는 비트스트림을 생성할 수 있다.In step S2250, the image encoding device 1900 may generate a bitstream including information about the residual block.

일 실시예에서, 비트스트림은 현재 블록의 움직임 벡터에 대한 정보를 더 포함할 수 있다. 현재 블록의 움직임 벡터에 대한 정보는 움직임 벡터 후보 리스트에 포함된 움직임 벡터 후보들 중 하나 이상의 움직임 벡터 후보를 가리키는 정보, 예를 들어, 플래그 또는 인덱스를 포함할 수 있다. In one embodiment, the bitstream may further include information about the motion vector of the current block. Information about the motion vector of the current block may include information indicating one or more motion vector candidates among the motion vector candidates included in the motion vector candidate list, for example, a flag or an index.

일 실시예에서, 비트스트림은 잔차 블록에 대한 정보를 포함하지 않을 수도 있다. 예를 들어, 현재 블록에 대해 적용되는 룰 기반 예측 모드가 스킵 모드인 경우, 비트스트림에는 현재 블록을 위한 잔차 블록에 대한 정보가 포함되지 않을 수 있다.In one embodiment, the bitstream may not include information about residual blocks. For example, if the rule-based prediction mode applied to the current block is skip mode, the bitstream may not include information about the residual block for the current block.

이하에서는, 도 23을 참조하여 영상 복호화 장치(100) 또는 영상 부호화 장치(1900) 중 적어도 하나에 의해 이용되는 신경망(240)의 훈련 방법에 대해 설명한다.Hereinafter, a training method of the neural network 240 used by at least one of the image decoding apparatus 100 or the image encoding apparatus 1900 will be described with reference to FIG. 23.

도 23은 일 실시예에 따른 신경망(240)의 훈련 방법을 설명하기 위한 도면이다.FIG. 23 is a diagram for explaining a training method of the neural network 240 according to an embodiment.

도 23에 도시된 훈련용 현재 블록(2301)은 전술한 현재 블록에 대응할 수 있다. 또한, 훈련용 예비 예측 블록(2302), 훈련용 양자화 에러 맵(2303) 및 훈련용 POC 맵(2304)은 도 2 및 도 20에 도시된 예비 예측 블록, 양자화 에러 맵 및 POC 맵에 대응할 수 있다. 훈련용 예비 예측 블록(2302)은 훈련용 현재 블록(2301)의 움직임 벡터에 의해 훈련용 참조 영상 내에서 식별된 블록에 해당할 수 있다.The current block for training 2301 shown in FIG. 23 may correspond to the current block described above. Additionally, the preliminary prediction block for training 2302, the quantization error map for training 2303, and the POC map for training 2304 may correspond to the preliminary prediction block, quantization error map, and POC map shown in FIGS. 2 and 20. . The preliminary prediction block for training 2302 may correspond to a block identified in the reference image for training by the motion vector of the current block for training 2301.

본 개시에 따른 신경망(240)의 훈련 방법에 따르면, 신경망(240)으로부터 출력되는 훈련용 최종 예측 블록(2305)이 훈련용 현재 블록(2301)과 동일 또는 유사해지도록 신경망(240)을 훈련시킨다. 이를 위해, 훈련용 최종 예측 블록(2305)과 훈련용 현재 블록(2301) 사이의 차이에 대응하는 손실 정보(2306)가 신경망(240)의 훈련에 이용될 수 있다.According to the training method of the neural network 240 according to the present disclosure, the neural network 240 is trained so that the final prediction block 2305 for training output from the neural network 240 is the same or similar to the current block 2301 for training. . To this end, loss information 2306 corresponding to the difference between the final prediction block for training 2305 and the current block for training 2301 may be used for training the neural network 240.

도 23을 참조하여 신경망(240)의 훈련 과정을 상세히 설명하면, 먼저, 훈련용 예비 예측 블록(2302), 훈련용 양자화 에러 맵(2303) 및 훈련용 POC 맵(2304)이 신경망(240)으로 입력되고, 신경망(240)으로부터 훈련용 최종 예측 블록(2305)이 출력될 수 있다. 신경망(240)은 미리 설정된 가중치에 따라 동작할 수 있다.When describing the training process of the neural network 240 in detail with reference to FIG. 23, first, the preliminary prediction block for training 2302, the quantization error map for training 2303, and the POC map for training 2304 are transferred to the neural network 240. input, and the final prediction block 2305 for training may be output from the neural network 240. The neural network 240 may operate according to preset weights.

훈련용 최종 예측 블록(2305)과 훈련용 현재 블록(2301) 사이의 차이에 대응하는 손실 정보(2306)가 산출되고, 손실 정보(2306)에 따라 신경망(240)에 설정된 가중치가 갱신될 수 있다. 신경망(240)은 손실 정보(2306)가 감소 또는 최소화되도록 가중치를 갱신할 수 있다.Loss information 2306 corresponding to the difference between the final prediction block for training 2305 and the current block for training 2301 is calculated, and the weight set in the neural network 240 can be updated according to the loss information 2306. . The neural network 240 may update the weights so that the loss information 2306 is reduced or minimized.

손실 정보(2306)는 훈련용 현재 블록(2301)과 훈련용 최종 예측 블록(2305) 사이의 차이에 대한 L1-norm 값, L2-norm 값, SSIM(Structural Similarity) 값, PSNR-HVS(Peak Signal-To-Noise Ratio-Human Vision System) 값, MS-SSIM(Multiscale SSIM) 값, VIF(Variance Inflation Factor) 값 또는 VMAF(Video Multimethod Assessment Fusion) 값 중 적어도 하나를 포함할 수 있다.The loss information 2306 includes the L1-norm value, L2-norm value, Structural Similarity (SSIM) value, and Peak Signal Signal (PSNR-HVS) for the difference between the current block for training (2301) and the final prediction block for training (2305). -It may include at least one of a To-Noise Ratio-Human Vision System) value, a Multiscale SSIM (MS-SSIM) value, a Variance Inflation Factor (VIF) value, or a Video Multimethod Assessment Fusion (VMAF) value.

일 실시예에 따른 신경망(240)의 훈련은 훈련 장치에 의해 수행될 수 있다. 훈련 장치는 영상 복호화 장치(100) 또는 영상 부호화 장치(1900)일 수 있다. 구현예에 따라, 훈련 장치는 외부 서버일 수도 있다. 이 경우, 외부 서버에 의해 훈련된 신경망(240) 및 가중치들이 영상 복호화 장치(100) 및 영상 부호화 장치(1900)로 전달될 수 있다.Training of the neural network 240 according to one embodiment may be performed by a training device. The training device may be the video decoding device 100 or the video encoding device 1900. Depending on the implementation, the training device may be an external server. In this case, the neural network 240 and weights trained by an external server may be transmitted to the image decoding device 100 and the image encoding device 1900.

일 실시예에 따른 AI를 이용하는 영상 복호화 장치(100), 영상 부호화 장치(1900) 및 이들에 의한 방법은 기존 룰 기반의 예측 모드 대비 현재 블록(300;1410)에 보다 유사한 최종 예측 블록(955)을 획득하는 것을 과제로 한다. The video decoding device 100 and video encoding device 1900 using AI according to an embodiment, and the method using them, produce a final prediction block 955 that is more similar to the current block 300; 1410 compared to the existing rule-based prediction mode. The task is to obtain.

또한, 일 실시예에 따른 AI를 이용하는 영상 복호화 장치(100), 영상 부호화 장치(1900) 및 이들에 의한 방법은 잔차 블록에 대한 정보를 포함하는 비트스트림의 비트레이트를 감소시키는 것을 과제로 한다.In addition, the image decoding apparatus 100 and the image encoding apparatus 1900 using AI according to an embodiment, and the method thereof have the task of reducing the bit rate of a bitstream including information about a residual block.

일 실시예에 따른 영상의 복호화 방법은, 현재 블록(300;1410)의 움직임 벡터를 획득하는 단계(S1710)를 포함할 수 있다.A method of decoding an image according to an embodiment may include obtaining a motion vector of the current block (300; 1410) (S1710).

일 실시예에서, 영상의 복호화 방법은, 참조 영상(410;430;1100) 내에서 움직임 벡터가 가리키는 참조 블록(415;435;510;510-1;510-2;1110)을 이용하여 예비 예측 블록(902)을 획득하는 단계(S1720)를 포함할 수 있다.In one embodiment, the image decoding method uses a reference block (415;435;510;510-1;510-2;1110) indicated by a motion vector within the reference image (410;430;1100) to make preliminary prediction. It may include obtaining the block 902 (S1720).

일 실시예에서, 영상의 복호화 방법은, 현재 블록(300;1410)을 포함하는 현재 영상(400;1400)과 참조 영상(410;430;1100) 사이의 POC(Picture Order Difference) 차이를 포함하는 POC 맵(906), 예비 예측 블록(902) 또는 양자화 에러 맵(530;530-1;530-2;530-3;530-4;904) 중 적어도 하나를 신경망(240)에 적용하여 현재 블록(300;1410)의 최종 예측 블록(955)을 획득하는 단계(S1730)를 포함할 수 있다.In one embodiment, the image decoding method includes a Picture Order Difference (POC) difference between the current image (400;1400) including the current block (300;1410) and the reference image (410;430;1100). At least one of the POC map 906, the preliminary prediction block 902, or the quantization error map 530;530-1;530-2;530-3;530-4;904 is applied to the neural network 240 to determine the current block. It may include a step (S1730) of obtaining the final prediction block 955 of (300;1410).

일 실시예에서, 영상의 복호화 방법은, 비트스트림으로부터 획득되는 잔차 블록과 최종 예측 블록(955)을 이용하여 현재 블록(300;1410)을 복원하는 단계(S1740)를 포함할 수 있다.In one embodiment, the image decoding method may include restoring the current block 300 (1410) using the residual block and the final prediction block 955 obtained from the bitstream (S1740).

일 실시예에서, 양자화 에러 맵(530;530-1;530-2;530-3;530-4;904)의 샘플 값들은, 참조 블록(415;435;510;510-1;510-2;1110)을 위한 양자화 파라미터로부터 산출될 수 있다.In one embodiment, the sample values of the quantization error map (530;530-1;530-2;530-3;530-4;904) are the reference block (415;435;510;510-1;510-2) ;1110) can be calculated from the quantization parameters.

일 실시예에서, 양자화 에러 맵(530;530-1;530-2;530-3;530-4;904)의 샘플 값들은, 양자화 파라미터로부터 산출되는 양자화 스텝 사이즈 또는 양자화 에러 값에 대응할 수 있다.In one embodiment, the sample values of the quantization error map (530;530-1;530-2;530-3;530-4;904) may correspond to the quantization step size or quantization error value calculated from the quantization parameter. .

일 실시예에서, 양자화 에러 맵(530;530-1;530-2;530-3;530-4;904)은, 참조 블록(415;435;510;510-1;510-2;1110)의 하위 블록들에 대응하는 서브 영역들로 구분될 수 있고, 양자화 에러 맵(530;530-1;530-2;530-3;530-4;904)의 서브 영역들 각각에 포함된 샘플 값들은, 참조 블록(415;435;510;510-1;510-2;1110)의 하위 블록 내 소정 위치의 샘플을 위한 양자화 파라미터로부터 산출될 수 있다.In one embodiment, the quantization error map (530;530-1;530-2;530-3;530-4;904) is the reference block (415;435;510;510-1;510-2;1110) can be divided into sub-areas corresponding to the sub-blocks, and the sample value included in each of the sub-areas of the quantization error map (530;530-1;530-2;530-3;530-4;904) may be calculated from quantization parameters for samples at a certain location within a sub-block of the reference block (415; 435; 510; 510-1; 510-2; 1110).

일 실시예에서, 현재 블록(300;1410)의 최종 예측 블록(955)을 획득하는 단계는, 확장된(enlarged) POC 맵, 확장된 예비 예측 블록(1150) 또는 확장된 양자화 에러 맵(1300) 중 적어도 하나를 신경망(240)에 적용하여 현재 블록(300;1410)의 최종 예측 블록(955)을 획득하는 단계를 포함할 수 있다.In one embodiment, obtaining the final prediction block 955 of the current block 300 (1410) includes using an enlarged POC map, an enlarged preliminary prediction block 1150, or an enlarged quantization error map 1300. It may include obtaining a final prediction block 955 of the current block 300 (1410) by applying at least one of them to the neural network 240.

일 실시예에서, 확장된 POC 맵, 확장된 예비 예측 블록(1150) 또는 확장된 양자화 에러 맵(1300) 중 적어도 하나는 POC 맵(906), 예비 예측 블록(902) 또는 양자화 에러 맵(530;530-1;530-2;530-3;530-4;904) 중 적어도 하나가 확장 거리에 따라 패딩된 것일 수 있다.In one embodiment, at least one of the extended POC map, extended preliminary prediction block 1150, or extended quantization error map 1300 includes POC map 906, preliminary prediction block 902, or quantization error map 530; At least one of 530-1;530-2;530-3;530-4;904) may be padded according to the extension distance.

일 실시예에서, 신경망(240)은 하나 이상의 컨볼루션 레이어를 포함할 수 있다.In one embodiment, neural network 240 may include one or more convolutional layers.

일 실시예에서, 영상의 복호화 방법은, 신경망(240)에 포함된 컨볼루션 레이어의 개수, 컨볼루션 레이어에서 이용되는 필터 커널의 크기 및 스트라이드에 기반하여 확장 거리를 결정하고, 참조 영상(410;430;1100) 내 참조 블록(415;435;510;510-1;510-2;1110)의 경계 외부의 주변 샘플들 중 확장 거리에 대응하는 주변 샘플들과 예비 예측 블록(920)의 샘플들을 포함하는 확장된 예비 예측 블록(1150)을 획득하는 단계를 포함할 수 있다.In one embodiment, the image decoding method determines the expansion distance based on the number of convolutional layers included in the neural network 240, the size and stride of the filter kernel used in the convolutional layer, and determines the expansion distance based on the reference image 410; Among the neighboring samples outside the boundary of the reference block (415;435;510;510-1;510-2;1110) within 430;1100), the neighboring samples corresponding to the extension distance and the samples of the preliminary prediction block 920 It may include obtaining an extended preliminary prediction block 1150 that includes.

일 실시예에서, 영상의 복호화 방법은, 참조 영상(410;430;1100) 내 확장 거리에 대응하는 주변 샘플들을 위한 양자화 파라미터 및 참조 블록(415;435;510;510-1;510-2;1110)을 위한 양자화 파라미터로부터 산출되는 샘플 값들을 포함하는 확장된 양자화 에러 맵(1300)을 획득하는 단계를 포함할 수 있다.In one embodiment, the image decoding method includes quantization parameters for neighboring samples corresponding to the extension distance within the reference image (410;430;1100) and reference blocks (415;435;510;510-1;510-2); It may include obtaining an extended quantization error map 1300 including sample values calculated from quantization parameters for 1110).

일 실시예에서, 참조 블록(415;435;510;510-1;510-2;1110)의 경계가 참조 영상(410;430;1100)의 경계에 해당하면, 확장 거리에 대응하는 주변 샘플들은, 참조 영상(410;430;1100) 내 이용 가능한 가장 가까운 샘플로부터 결정될 수 있다.In one embodiment, if the boundary of the reference block (415;435;510;510-1;510-2;1110) corresponds to the boundary of the reference image (410;430;1100), the surrounding samples corresponding to the extension distance are , can be determined from the closest available sample in the reference image (410;430;1100).

일 실시예에서, 현재 블록(300;1410)의 최종 예측 블록(955)을 획득하는 단계는, 확장된 POC 맵, 확장된 예비 예측 블록(1150) 또는 확장된 양자화 에러 맵(1300) 중 적어도 하나와 함께 확장된 현재 복원 블록(1500)을 신경망(240)에 적용하는 단계를 포함할 수 있다.In one embodiment, obtaining the final prediction block 955 of the current block 300; 1410 includes at least one of an extended POC map, an extended preliminary prediction block 1150, or an extended quantization error map 1300. It may include applying the expanded current restoration block 1500 to the neural network 240.

일 실시예에서, 확장된 현재 복원 블록(1500)은, 현재 영상(400;1400) 내에서 현재 블록(300;1410) 이전에 복원된 주변 샘플들(1420)과, 확장된 예비 예측 블록(1150)의 샘플들 중 현재 블록(300;1410) 이전에 복원된 주변 샘플들(1420)에 대응하는 샘플들 이외의 샘플들(1125)을 포함할 수 있다.In one embodiment, the extended current reconstruction block 1500 includes neighboring samples 1420 reconstructed before the current block 300; 1410 in the current image 400; 1400, and the extended preliminary prediction block 1150. ) may include samples 1125 other than the samples corresponding to the surrounding samples 1420 restored before the current block 300 (1410).

일 실시예에서, 영상의 복호화 방법은, 현재 블록(300;1410)의 크기, 현재 블록(300;1410)의 예측 방향, 참조 블록(415;435;510;510-1;510-2;1110)을 위한 양자화 파라미터, 영상의 계층적 구조에서 현재 영상(400;1400)이 속한 레이어, 또는 비트스트림으로부터 획득되는 정보 중 적어도 하나에 기반하여, 복수의 가중치 세트 중 최종 예측 블록(955)을 획득하는데 이용될 가중치 세트를 선택하는 단계를 포함할 수 있다.In one embodiment, the image decoding method includes the size of the current block (300;1410), the prediction direction of the current block (300;1410), and the reference block (415;435;510;510-1;510-2;1110). ), based on at least one of the quantization parameters for, the layer to which the current image (400; 1400) belongs in the hierarchical structure of the image, or information obtained from the bitstream, obtain the final prediction block 955 from a plurality of weight sets. It may include selecting a set of weights to be used.

일 실시예에서, 선택된 가중치 세트에 따라 동작하는 신경망(240)에 기반하여 최종 예측 블록(955)이 획득될 수 있다.In one embodiment, the final prediction block 955 may be obtained based on the neural network 240 operating according to the selected weight set.

일 실시예에 따른 영상의 부호화 방법은, 현재 블록(300;1410)에 대응하는 참조 영상(410;430;1100) 내의 참조 블록(415;435;510;510-1;510-2;1110)을 가리키는 움직임 벡터를 획득하는 단계(S2210)를 포함할 수 있다.An image encoding method according to an embodiment includes reference blocks (415;435;510;510-1;510-2;1110) in the reference image (410;430;1100) corresponding to the current block (300;1410). It may include a step (S2210) of acquiring a motion vector pointing to .

일 실시예에서, 현재 블록(300;1410)을 포함하는 현재 영상(400;1400)과 참조 영상(410;430;1100) 사이의 POC 차이를 포함하는 POC 맵(906), 참조 블록(415;435;510;510-1;510-2;1110)에 기반하여 획득되는 예비 예측 블록(902) 또는 양자화 에러 맵(530;530-1;530-2;530-3;530-4;904) 중 적어도 하나를 신경망(2040)에 적용하여 현재 블록(300;1410)의 최종 예측 블록(955)을 획득하는 단계(S2230)를 포함할 수 있다.In one embodiment, a POC map 906 containing the POC difference between the current image 400;1400 and the reference image 410;430;1100, including the current block 300;1410, the reference block 415; The preliminary prediction block 902 or the quantization error map (530;530-1;530-2;530-3;530-4;904) obtained based on 435;510;510-1;510-2;1110) It may include a step (S2230) of obtaining the final prediction block 955 of the current block 300 (1410) by applying at least one of them to the neural network 2040.

일 실시예에서, 영상의 부호화 방법은, 현재 블록(300;1410)과 최종 예측 블록(955)을 이용하여 잔차 블록을 획득하는 단계(S2240)를 포함할 수 있다.In one embodiment, the video encoding method may include obtaining a residual block using the current block 300 (1410) and the final prediction block 955 (S2240).

일 실시예에서, 영상의 부호화 방법은, 잔차 블록에 대한 정보에 대한 정보를 포함하는 비트스트림을 생성하는 단계(S2250)를 포함할 수 있다.In one embodiment, a method of encoding an image may include generating a bitstream including information about information about a residual block (S2250).

일 실시예에서, 영상의 부호화 방법은, 획득한 움직임 벡터의 정밀도를 분수(faction) 정밀도에서 정수 정밀도로 변경하는 단계를 포함할 수 있다.In one embodiment, the video encoding method may include changing the precision of the obtained motion vector from fractional precision to integer precision.

일 실시예에서, 정수 정밀도의 움직임 벡터가 가리키는 참조 블록(415;435;510;510-1;510-2;1110)이 예비 예측 블록(902)으로 결정될 수 있다.In one embodiment, the reference block (415;435;510;510-1;510-2;1110) indicated by the integer precision motion vector may be determined as the preliminary prediction block 902.

일 실시예에서, 현재 블록(300;1410)의 최종 예측 블록(955)을 획득하는 단계는, 확장된 POC 맵, 확장된 예비 예측 블록(1150) 또는 확장된 양자화 에러 맵(1300) 중 적어도 하나를 신경망(2040)에 적용하여 현재 블록(300;1410)의 최종 예측 블록(955)을 획득하는 단계를 포함할 수 있다.In one embodiment, obtaining the final prediction block 955 of the current block 300; 1410 includes at least one of an extended POC map, an extended preliminary prediction block 1150, or an extended quantization error map 1300. It may include applying to the neural network 2040 to obtain the final prediction block 955 of the current block 300; 1410.

일 실시예에서, 확장된 POC 맵, 확장된 예비 예측 블록(1150) 또는 확장된 양자화 에러 맵(1300) 중 적어도 하나는 POC 맵(906), 예비 예측 블록(902) 또는 양자화 에러 맵(530;530-1;530-2;530-3;530-4;904) 중 적어도 하나로부터 확장 거리에 따라 패딩된 것일 수 있다.In one embodiment, at least one of the extended POC map, extended preliminary prediction block 1150, or extended quantization error map 1300 includes POC map 906, preliminary prediction block 902, or quantization error map 530; It may be padded according to the extension distance from at least one of (530-1;530-2;530-3;530-4;904).

일 실시예에서, 영상 복호화 장치의 적어도 하나의 프로세서는, 현재 블록(300;1410)의 움직임 벡터를 획득할 수 있다.In one embodiment, at least one processor of the video decoding apparatus may obtain a motion vector of the current block 300 (1410).

일 실시예에서, 영상 복호화 장치의 적어도 하나의 프로세서는, 참조 영상(410;430;1100) 내에서 움직임 벡터가 가리키는 참조 블록(415;435;510;510-1;510-2;1110)을 이용하여 예비 예측 블록(902)을 획득할 수 있다.In one embodiment, at least one processor of the image decoding apparatus selects a reference block (415;435;510;510-1;510-2;1110) indicated by a motion vector within the reference image (410;430;1100). The preliminary prediction block 902 can be obtained using.

일 실시예에서, 영상 복호화 장치의 적어도 하나의 프로세서는, 현재 블록(300;1410)을 포함하는 현재 영상(400;1400)과 참조 영상(410;430;1100) 사이의 POC 차이를 포함하는 POC 맵(906), 예비 예측 블록(902) 또는 양자화 에러 맵(530;530-1;530-2;530-3;530-4;904) 중 적어도 하나를 신경망(240)에 적용하여 현재 블록(300;1410)의 최종 예측 블록(955)을 획득을 획득할 수 있다.In one embodiment, at least one processor of the image decoding apparatus is configured to configure a POC difference between a current image (400;1400) including a current block (300;1410) and a reference image (410;430;1100). At least one of the map 906, the preliminary prediction block 902, or the quantization error map 530;530-1;530-2;530-3;530-4;904 is applied to the neural network 240 to determine the current block ( The final prediction block 955 (300;1410) can be obtained.

일 실시예에서, 영상 복호화 장치의 적어도 하나의 프로세서는, 비트스트림으로부터 획득되는 잔차 블록과 최종 예측 블록(955)을 이용하여 현재 블록(300;1410)을 복원할 수 있다.In one embodiment, at least one processor of the image decoding apparatus may restore the current block 300 (1410) using the residual block and the final prediction block 955 obtained from the bitstream.

일 실시예에 따른 영상 부호화 장치는 적어도 하나의 인스트럭션을 저장하는 적어도 하나의 메모리; 및 적어도 하나의 인스트럭션에 따라 동작하는 적어도 하나의 프로세서를 포함할 수 있다.An image encoding device according to an embodiment includes at least one memory storing at least one instruction; and at least one processor operating according to at least one instruction.

일 실시예에서, 영상 부호화 장치의 적어도 하나의 프로세서는, 현재 블록(300;1410)에 대응하는 참조 영상(410;430;1100) 내의 참조 블록(415;435;510;510-1;510-2;1110)을 가리키는 움직임 벡터를 획득할 수 있다.In one embodiment, at least one processor of the image encoding device is configured to control the reference block (415;435;510;510-1;510-) in the reference image (410;430;1100) corresponding to the current block (300;1410). A motion vector pointing to 2;1110) can be obtained.

일 실시예에서, 영상 부호화 장치의 적어도 하나의 프로세서는, 현재 블록(300;1410)을 포함하는 현재 영상(400;1400)과 참조 영상(410;430;1100) 사이의 POC 차이를 포함하는 POC 맵(906), 참조 블록(415;435;510;510-1;510-2;1110)에 기반하여 획득되는 예비 예측 블록(902) 또는 양자화 에러 맵(530;530-1;530-2;530-3;530-4;904) 중 적어도 하나를 신경망(2040)에 적용하여 현재 블록(300;1410)의 최종 예측 블록(955)을 획득할 수 있다.In one embodiment, the at least one processor of the image encoding device is configured to configure a POC difference between a current image (400;1400) including a current block (300;1410) and a reference image (410;430;1100). The map 906, the preliminary prediction block 902 obtained based on the reference block 415;435;510;510-1;510-2;1110, or the quantization error map 530;530-1;530-2; At least one of 530-3;530-4;904) can be applied to the neural network 2040 to obtain the final prediction block 955 of the current block 300;1410.

일 실시예에서, 영상 부호화 장치의 적어도 하나의 프로세서는, 현재 블록(300;1410)과 최종 예측 블록(955)을 이용하여 잔차 블록을 획득할 수 있다.In one embodiment, at least one processor of the image encoding device may obtain a residual block using the current block 300 (1410) and the final prediction block 955.

일 실시예에 따른 AI를 이용하는 영상 복호화 장치(100), 영상 부호화 장치(1900) 및 이들에 의한 방법은 기존 룰 기반의 예측 모드 대비 현재 블록(300;1410)에 보다 유사한 최종 예측 블록(955)을 획득할 수 있다.The video decoding device 100 and video encoding device 1900 using AI according to an embodiment, and the method using them, produce a final prediction block 955 that is more similar to the current block 300; 1410 compared to the existing rule-based prediction mode. can be obtained.

또한, 일 실시예에 따른 AI를 이용하는 영상 복호화 장치(100), 영상 부호화 장치(1900) 및 이들에 의한 방법은 잔차 블록에 대한 정보를 포함하는 비트스트림의 비트레이트를 감소시킬 수 있다.In addition, the image decoding apparatus 100 and the image encoding apparatus 1900 using AI according to an embodiment, and the method thereof can reduce the bit rate of a bitstream including information about the residual block.

한편, 상술한 본 개시의 실시예들은 컴퓨터에서 실행될 수 있는 프로그램으로 작성가능하고, 작성된 프로그램은 기기로 읽을 수 있는 저장매체에 저장될 수 있다.Meanwhile, the above-described embodiments of the present disclosure can be written as a program that can be executed on a computer, and the written program can be stored in a storage medium that can be read by a device.

기기로 읽을 수 있는 저장매체는, 비일시적(non-transitory) 저장매체의 형태로 제공될 수 있다. 여기서, ‘비일시적 저장매체'는 실재(tangible)하는 장치이고, 신호(signal)(예: 전자기파)를 포함하지 않는다는 것을 의미할 뿐이며, 이 용어는 데이터가 저장매체에 반영구적으로 저장되는 경우와 임시적으로 저장되는 경우를 구분하지 않는다. 예로, '비일시적 저장매체'는 데이터가 임시적으로 저장되는 버퍼를 포함할 수 있다.A storage medium that can be read by a device may be provided in the form of a non-transitory storage medium. Here, 'non-transitory storage medium' simply means that it is a tangible device and does not contain signals (e.g. electromagnetic waves). This term refers to cases where data is semi-permanently stored in a storage medium and temporary storage media. It does not distinguish between cases where it is stored as . For example, a 'non-transitory storage medium' may include a buffer where data is temporarily stored.

일 실시예에 따르면, 본 문서에 개시된 다양한 실시예들에 따른 방법은 컴퓨터 프로그램 제품(computer program product)에 포함되어 제공될 수 있다. 컴퓨터 프로그램 제품은 상품으로서 판매자 및 구매자 간에 거래될 수 있다. 컴퓨터 프로그램 제품은 기기로 읽을 수 있는 저장 매체(예: compact disc read only memory (CD-ROM))의 형태로 배포되거나, 또는 어플리케이션 스토어를 통해 또는 두개의 사용자 장치들(예: 스마트폰들) 간에 직접, 온라인으로 배포(예: 다운로드 또는 업로드)될 수 있다. 온라인 배포의 경우에, 컴퓨터 프로그램 제품(예: 다운로더블 앱(downloadable app))의 적어도 일부는 제조사의 서버, 어플리케이션 스토어의 서버, 또는 중계 서버의 메모리와 같은 기기로 읽을 수 있는 저장 매체에 적어도 일시 저장되거나, 임시적으로 생성될 수 있다.According to one embodiment, methods according to various embodiments disclosed in this document may be provided and included in a computer program product. Computer program products are commodities and can be traded between sellers and buyers. A computer program product may be distributed in the form of a machine-readable storage medium (e.g. compact disc read only memory (CD-ROM)) or through an application store or between two user devices (e.g. smartphones). It may be distributed in person or online (e.g., downloaded or uploaded). In the case of online distribution, at least a portion of the computer program product (e.g., a downloadable app) is stored on a machine-readable storage medium, such as the memory of a manufacturer's server, an application store's server, or a relay server. It can be temporarily stored or created temporarily.

이상, 본 개시의 기술적 사상을 바람직한 실시예를 들어 상세하게 설명하였으나, 본 개시의 기술적 사상은 상기 실시예들에 한정되지 않고, 본 개시의 기술적 사상의 범위 내에서 당 분야에서 통상의 지식을 가진 자에 의하여 여러 가지 변형 및 변경이 가능하다.Above, the technical idea of the present disclosure has been described in detail with preferred embodiments, but the technical idea of the present disclosure is not limited to the above embodiments, and those with ordinary knowledge in the field within the scope of the technical idea of the present disclosure Various modifications and changes are possible depending on the user.

Claims

In the method of decoding a video,
Obtaining the motion vector of the current block (300; 1410) (S1710);
Obtaining a preliminary prediction block 902 using the reference block (415;435;510;510-1;510-2;1110) indicated by the motion vector in the reference image (410;430;1100) (S1720) );
A POC map 906 including a POC (Picture Order Difference) difference between the current image 400; 1400 including the current block 300; 1410 and the reference image 410; 430; 1100, and the preliminary prediction. At least one of the block 902 or the quantization error map (530;530-1;530-2;530-3;530-4;904) is applied to the neural network 240 to determine the final result of the current block (300;1410). Obtaining a prediction block 955 (S1730); and
Comprising a step (S1740) of restoring the current block (300; 1410) using a residual block obtained from a bitstream and the final prediction block (955),
The sample values of the quantization error map (530;530-1;530-2;530-3;530-4;904) are the reference block (415;435;510;510-1;510-2;1110) A method of decoding an image, calculated from quantization parameters for.

According to paragraph 1,
Sample values of the quantization error map (530;530-1;530-2;530-3;530-4;904) are,
A method of decoding an image corresponding to the quantization step size or quantization error value calculated from the quantization parameter.

According to any one of paragraphs 1 and 2,
The quantization error map (530;530-1;530-2;530-3;530-4;904) is a subblock of the reference block (415;435;510;510-1;510-2;1110). It is divided into sub-areas corresponding to the
Sample values included in each of the sub-regions of the quantization error map (530;530-1;530-2;530-3;530-4;904) are the reference block (415;435;510;510-1). ;510-2;1110) A decoding method of an image calculated from the quantization parameter for a sample at a predetermined position in a subblock.

According to any one of claims 1 to 3,
The step of obtaining the final prediction block 955 of the current block 300; 1410 is,
At least one of an enlarged POC map, an expanded preliminary prediction block 1150, or an expanded quantization error map 1300 is applied to the neural network 240 to generate a final prediction block ( 955), including the step of obtaining,
At least one of the extended POC map, the extended preliminary prediction block 1150, or the extended quantization error map 1300 is the POC map 906, the preliminary prediction block 902, or the quantization error map 530. ;530-1;530-2;530-3;530-4;904), wherein at least one of the following is padded according to the extension distance.

According to any one of claims 1 to 4,
The neural network 240 includes one or more convolutional layers,
The method of decoding the video is,
determining the expansion distance based on the number of convolutional layers included in the neural network 240 and the size and stride of a filter kernel used in the convolutional layer; and
Among the surrounding samples outside the boundary of the reference block (415;435;510;510-1;510-2;1110) in the reference image (410;430;1100), the surrounding samples corresponding to the extension distance and the A method for decoding an image, further comprising obtaining the extended preliminary prediction block 1150 including samples of the preliminary prediction block 920.

According to any one of claims 1 to 5,
The method of decoding the video is,
Calculated from the quantization parameters for surrounding samples corresponding to the extension distance in the reference image (410;430;1100) and the quantization parameters for the reference block (415;435;510;510-1;510-2;1110) A method for decoding an image, further comprising obtaining the extended quantization error map 1300 including sample values.

According to any one of claims 1 to 6,
If the boundary of the reference block (415;435;510;510-1;510-2;1110) corresponds to the boundary of the reference image (410;430;1100), the surrounding samples corresponding to the extension distance are A method of decoding an image, determined from the closest available sample in the reference image (410;430;1100).

According to any one of claims 1 to 7,
The step of obtaining the final prediction block 955 of the current block 300; 1410 is,
Applying the extended current reconstruction block 1500 together with at least one of the extended POC map, the extended preliminary prediction block 1150, or the extended quantization error map 1300 to the neural network 240. do,
The extended current reconstruction block 1500 includes neighboring samples 1420 reconstructed before the current block 300; 1410 in the current image 400; 1400, and the extended preliminary prediction block 1150. A method of decoding an image, including samples (1125) other than samples corresponding to neighboring samples (1420) restored before the current block (300; 1410) among the samples.

According to any one of claims 1 to 8,
The method of decoding the video is,
Size of the current block (300;1410), prediction direction of the current block (300;1410), quantization parameters for the reference block (415;435;510;510-1;510-2;1110), image A weight set to be used to obtain the final prediction block 955 among a plurality of weight sets based on at least one of the layer to which the current image (400; 1400) belongs in a hierarchical structure or information obtained from the bitstream. Further comprising the step of selecting,
A method of decoding an image in which the final prediction block 955 is obtained based on the neural network 240 operating according to the selected weight set.

A computer-readable recording medium on which a program for performing the method of any one of claims 1 to 9 is recorded on a computer.

In a method of encoding an image,
Obtaining a motion vector pointing to the reference block (415;435;510;510-1;510-2;1110) in the reference image (410;430;1100) corresponding to the current block (300;1410) (S2210) ;
A POC map 906 including the POC difference between the current image 400;1400 including the current block 300;1410 and the reference image 410;430;1100, the reference block 415;435; At least one of the preliminary prediction block 902 obtained based on 510;510-1;510-2;1110) or the quantization error map (530;530-1;530-2;530-3;530-4;904) Obtaining the final prediction block 955 of the current block 300; 1410 by applying one to the neural network 2040 (S2230);
Obtaining a residual block using the current block (300; 1410) and the final prediction block (955) (S2240); and
A step of generating a bitstream including information about the residual block (S2250),
The sample values of the quantization error map (530;530-1;530-2;530-3;530-4;904) are the reference block (415;435;510;510-1;510-2;1110) An image encoding method calculated from quantization parameters for.

In paragraph 11
The video encoding method is,
Further comprising changing the precision of the obtained motion vector from fractional precision to integer precision,
A video encoding method in which a reference block (415;435;510;510-1;510-2;1110) indicated by the integer precision motion vector is determined as the preliminary prediction block (902).

According to any one of claims 11 and 12,
The step of obtaining the final prediction block 955 of the current block 300; 1410 is,
At least one of an extended POC map, an extended preliminary prediction block 1150, or an extended quantization error map 1300 is applied to the neural network 2040 to produce the final prediction block 955 of the current block 300; 1410. Including the step of obtaining,
At least one of the extended POC map, the extended preliminary prediction block 1150, or the extended quantization error map 1300 is the POC map 906, the preliminary prediction block 902, or the quantization error map 530. ;530-1;530-2;530-3;530-4;904), wherein the video is padded according to an extension distance from at least one of the following.

at least one memory storing at least one instruction; and
At least one processor operating according to the at least one instruction,
The at least one processor,
Obtain the motion vector of the current block (300; 1410),
Obtain a preliminary prediction block 902 using the reference block (415;435;510;510-1;510-2;1110) indicated by the motion vector in the reference image (410;430;1100),
A POC map 906 including the POC difference between the current image 400;1400 including the current block 300;1410 and the reference image 410;430;1100, the preliminary prediction block 902, or At least one of the quantization error maps (530;530-1;530-2;530-3;530-4;904) is applied to the neural network 240 to generate the final prediction block 955 of the current block 300;1410. obtain,
Restore the current block (300; 1410) using the residual block obtained from the bitstream and the final prediction block (955),
The sample values of the quantization error map (530;530-1;530-2;530-3;530-4;904) are the reference block (415;435;510;510-1;510-2;1110) An image decoding device calculated from quantization parameters for.

at least one memory storing at least one instruction; and
At least one processor operating according to the at least one instruction,
The at least one processor,
Obtain a motion vector pointing to the reference block (415;435;510;510-1;510-2;1110) in the reference image (410;430;1100) corresponding to the current block (300;1410),
A POC map 906 including the POC difference between the current image 400;1400 including the current block 300;1410 and the reference image 410;430;1100, the reference block 415;435; At least one of the preliminary prediction block 902 obtained based on 510;510-1;510-2;1110) or the quantization error map (530;530-1;530-2;530-3;530-4;904) Apply one to the neural network 2040 to obtain the final prediction block 955 of the current block 300; 1410,
Obtaining a residual block using the current block (300; 1410) and the final prediction block (955),
Generating a bitstream containing information about information about the residual block,
The sample values of the quantization error map (530;530-1;530-2;530-3;530-4;904) are the reference block (415;435;510;510-1;510-2;1110) An image encoding device calculated from quantization parameters for.