KR20220088277A

KR20220088277A - Image processing apparatus and method for processing multi-frame thereby

Info

Publication number: KR20220088277A
Application number: KR1020210075637A
Authority: KR
Inventors: 쿠오칸 딘; 진경환; 최광표
Original assignee: 삼성전자주식회사
Priority date: 2020-12-18
Filing date: 2021-06-10
Publication date: 2022-06-27

Abstract

메모리 및 프로세서를 포함하고, 프로세서는, 현재 프레임의 현재 샘플에 대응하는 예측 샘플을 이전 프레임에서 식별하고, 이전 프레임의 콜로케이티드 샘플의 샘플 값을 예측 샘플의 샘플 값에 따라 변경하여 현재 프레임의 예측 프레임을 생성하고, 현재 샘플의 샘플 값과 예측 샘플의 샘플 값을 비교하여 가중치를 도출하고, 예측 프레임의 콜로케이티드 샘플에 가중치를 적용하고, 컨볼루션 레이어를 포함하는 신경망을 통해 가중치 적용된 예측 프레임과 현재 프레임을 처리하여 현재 출력 프레임을 획득하는, 일 실시예에 따른 영상 처리 장치가 개시된다.a memory and a processor, wherein the processor identifies, in a previous frame, a predicted sample corresponding to a current sample of the current frame, and changes the sample value of the collocated sample of the previous frame according to the sample value of the prediction sample to change the sample value of the current frame. A prediction frame is generated, weights are derived by comparing sample values of the current sample with those of a prediction sample, weighted collocated samples of the prediction frame are weighted, and weighted prediction is performed through a neural network including a convolutional layer. An image processing apparatus according to an embodiment is provided for obtaining a current output frame by processing a frame and a current frame.

Description

Image processing apparatus and multi-frame processing method by the same

본 개시는 영상 또는 프레임의 처리 분야에 관한 것이다. 보다 구체적으로, 본 개시는 여러 영상들 또는 여러 프레임들을 신경망 기반으로 처리하는 분야에 관한 것이다.The present disclosure relates to the field of processing of images or frames. More specifically, the present disclosure relates to a field of processing multiple images or multiple frames based on a neural network.

영상의 부호화 또는 디스플레이 전에 영상을 처리하는 다양한 기술이 존재한다. 영상 처리 기술이란 입출력이 영상인 모든 형태의 정보를 처리하는 기술로서, 영상의 개선, 강조, 압축 등과 같이 인간의 이해를 돕거나 이차적인 응용을 위해 영상을 가공하고, 변형시키는 기술을 의미한다. Various techniques exist for image processing prior to encoding or display of the image. The image processing technology is a technology that processes all types of information whose input/output is an image, and refers to a technology that processes and transforms an image for secondary applications or to aid human understanding, such as image improvement, emphasis, and compression.

영상 처리 기술들은 알고리즘 기반으로 발전하였는데, 근래에 인공지능 기술이 발전함에 따라 영상 처리의 상당 부분이 인공 지능 기반으로 수행되고 있다. 인공 지능 모델의 대표적인 예시로서, 신경망(neural network)이 존재한다. Image processing technologies have been developed based on algorithms, and in recent years, with the development of artificial intelligence technology, a significant portion of image processing is being performed based on artificial intelligence. As a representative example of an artificial intelligence model, a neural network exists.

신경망은 훈련 데이터를 통해 훈련될 수 있다. 신경망은 훈련을 통해 설정된 가중 값으로 영상을 처리함으로써 원하는 처리 결과를 획득할 수 있다. 다만, 아직까지의 신경망 기반의 영상 처리는 시계열적으로 관련된 멀티 프레임을 처리하는데 효과적이지 않다.Neural networks can be trained through training data. The neural network can obtain a desired processing result by processing the image with a weight value set through training. However, the neural network-based image processing so far is not effective for processing time-series related multi-frames.

일 실시예는, 멀티 프레임의 시간적 상관성(temporal correlation)을 고려하여 프레임들을 효과적으로 처리하는 것을 기술적 과제로 한다.According to an embodiment, it is a technical task to effectively process frames in consideration of temporal correlation of multi-frames.

일 실시예에 따른 영상 처리 장치는, 하나 이상의 인스트럭션을 저장하는 메모리; 및 상기 메모리에 저장된 상기 하나 이상의 인스트럭션을 실행하는 프로세서를 포함하고, 상기 프로세서는, 현재 프레임(current frame)의 현재 샘플에 대응하는 예측 샘플(prediction sample)을 이전 프레임(previous frame)에서 식별하고, 상기 이전 프레임의 콜로케이티드 샘플(collocated sample)의 샘플 값을 상기 예측 샘플의 샘플 값에 따라 변경하여 상기 현재 프레임의 예측 프레임(prediction frame)을 생성하고, 상기 현재 샘플의 샘플 값과 상기 예측 샘플의 샘플 값을 비교하여 가중치를 도출하고, 상기 예측 프레임의 콜로케이티드 샘플에 상기 가중치를 적용하고, 컨볼루션 레이어를 포함하는 신경망을 통해 상기 가중치 적용된 예측 프레임(weighted prediction frame)과 상기 현재 프레임을 처리하여 현재 출력 프레임(current reconstructed frame)을 획득할 수 있다.An image processing apparatus according to an embodiment includes a memory for storing one or more instructions; and a processor executing the one or more instructions stored in the memory, wherein the processor identifies, from a previous frame, a prediction sample corresponding to a current sample of a current frame, A prediction frame of the current frame is generated by changing a sample value of a collocated sample of the previous frame according to a sample value of the prediction sample, and the sample value of the current sample and the prediction sample A weight is derived by comparing the sample values of the prediction frame, the weight is applied to the collocated sample of the prediction frame, and the weighted prediction frame and the current frame are separated through a neural network including a convolutional layer. processing to obtain the current reconstructed frame.

상기 프로세서는, 상기 이전 프레임의 콜로케이티드 샘플 및 상기 콜로케이티드 샘플의 주변 샘플들(neighboring samples) 중 상기 현재 샘플의 샘플 값에 가장 유사한 샘플 값을 갖는 샘플을 상기 예측 샘플로 식별할 수 있다.The processor may identify a sample having a sample value most similar to the sample value of the current sample among the collocated sample of the previous frame and neighboring samples of the collocated sample as the prediction sample. .

상기 프로세서는, 미리 결정된 제 1 필터 커널로 상기 현재 샘플 및 상기 현재 샘플의 주변 샘플들을 컨볼루션 처리하여 상기 제 1 필터 커널에 대응하는 샘플 값을 획득하고, 미리 결정된 복수의 제 2 필터 커널들로 상기 이전 프레임의 콜로케이티드 샘플 및 상기 콜로케이티드 샘플의 주변 샘플들을 컨볼루션 처리하여 상기 복수의 제 2 필터 커널들에 대응하는 샘플 값들을 획득하고, 상기 복수의 제 2 필터 커널들에 대응하는 샘플 값들 중 상기 제 1 필터 커널에 대응하는 샘플 값에 가장 유사한 샘플 값을 식별하고, 상기 이전 프레임의 콜로케이티드 샘플 및 상기 콜로케이티드 샘플의 주변 샘플들 중 상기 식별된 샘플 값에 대응하는 샘플을 상기 예측 샘플로 결정할 수 있다.The processor performs convolution processing on the current sample and neighboring samples of the current sample with a first predetermined filter kernel to obtain a sample value corresponding to the first filter kernel, and includes a plurality of predetermined second filter kernels. Convolution processing the collocated sample of the previous frame and neighboring samples of the collocated sample to obtain sample values corresponding to the plurality of second filter kernels, and to obtain sample values corresponding to the plurality of second filter kernels identify a sample value most similar to a sample value corresponding to the first filter kernel among the sample values, and a sample corresponding to the identified sample value among the collocated sample of the previous frame and neighboring samples of the collocated sample may be determined as the prediction sample.

상기 제 1 필터 커널은, 상기 현재 샘플에 대응하는 샘플이 기 설정된 제 1 값을 갖고, 다른 샘플들이 0의 값을 가질 수 있다.In the first filter kernel, a sample corresponding to the current sample may have a preset first value, and other samples may have a value of 0.

상기 복수의 제 2 필터 커널들은, 어느 하나의 샘플이 기 설정된 제 2 값을 갖고, 다른 샘플들이 0의 값을 갖되, 상기 제 2 값을 갖는 상기 어느 하나의 샘플의 위치는 상기 복수의 제 2 필터 커널마다 다를 수 있다.In the plurality of second filter kernels, one sample has a preset second value and other samples have a value of 0, and the position of the one sample having the second value is determined by the plurality of second filter kernels. This can be different for different filter kernels.

상기 기 설정된 제 1 값과 상기 기 설정된 제 2 값의 부호는 서로 반대일 수 있다.Signs of the preset first value and the preset second value may be opposite to each other.

상기 프로세서는, 상기 이전 프레임의 콜로케이티드 샘플 및 상기 콜로케이티드 샘플의 주변 샘플들을 미리 결정된 제 3 필터 커널로 컨볼루션 처리하여 상기 콜로케이티드 샘플의 샘플 값을 변경하되, 상기 제 3 필터 커널은, 상기 예측 샘플에 대응하는 샘플이 기 설정된 제 3 값을 갖고, 다른 샘플들이 0의 값을 가질 수 있다.The processor is configured to change the sample value of the collocated sample by convolution processing the collocated sample of the previous frame and neighboring samples of the collocated sample with a predetermined third filter kernel, wherein the third filter kernel , a sample corresponding to the prediction sample may have a preset third value, and other samples may have a value of 0.

상기 가중치는, 상기 현재 샘플의 샘플 값과 상기 예측 샘플의 샘플 값의 차이에 반비례할 수 있다.The weight may be inversely proportional to a difference between a sample value of the current sample and a sample value of the prediction sample.

상기 프로세서는, 상기 신경망에 의한 상기 이전 프레임의 처리 결과로 출력되는 이전 출력 프레임과 이전 특징 맵을 획득하고, 상기 현재 샘플과 상기 이전 프레임 내 예측 샘플 사이의 위치 관계에 따라 상기 이전 출력 프레임 및 이전 특징 맵의 콜로케이티드 샘플의 샘플 값을 변경하여 예측 출력 프레임 및 예측 특징 맵을 생성하고, 상기 예측 출력 프레임 및 상기 예측 특징 맵의 콜로케이티드 샘플에 상기 가중치를 적용하고, 상기 가중치 적용된 예측 출력 프레임, 상기 가중치 적용된 예측 특징 맵, 상기 가중치 적용된 예측 프레임, 및 상기 현재 프레임을 상기 신경망에 입력할 수 있다.The processor obtains a previous output frame and a previous feature map output as a result of processing the previous frame by the neural network, and according to a positional relationship between the current sample and a predicted sample in the previous frame, the previous output frame and the previous A prediction output frame and a prediction feature map are generated by changing sample values of collocated samples of a feature map, the weight is applied to the prediction output frame and collocated samples of the prediction feature map, and the weighted prediction output A frame, the weighted prediction feature map, the weighted prediction frame, and the current frame may be input to the neural network.

상기 이전 출력 프레임은, 상기 신경망에서 출력되는 제 1 이전 출력 프레임과, 상기 제 1 이전 출력 프레임에 대한 부호화 및 복호화를 통해 복원된 제 1 이전 출력 프레임을 상기 신경망에서 처리한 결과로 획득되는 제 2 이전 출력 프레임을 포함할 수 있다.The previous output frame is a first previous output frame output from the neural network and a second output frame obtained as a result of processing in the neural network a first previous output frame restored through encoding and decoding of the first previous output frame May include previous output frames.

상기 신경망은, 제 1 컨볼루션 레이어, 제 2 컨볼루션 레이어 및 복수의 제 3 컨볼루션 레이어로 구성된 복수의 서브 신경망을 포함하되, 첫 번째 서브 신경망의 제 1 컨볼루션 레이어는, 상기 가중치 적용된 예측 출력 프레임, 상기 가중치 적용된 예측 프레임, 및 상기 현재 프레임이 연접된(concatenated) 결과를 컨볼루션 처리하고, 상기 첫 번째 서브 신경망의 제 2 컨볼루션 레이어는, 상기 가중치 적용된 예측 특징 맵을 컨볼루션 처리하고, 상기 첫 번째 서브 신경망의 복수의 제 3 컨볼루션 레이어는, 상기 제 1 컨볼루션 레이어로부터 출력된 특징 맵과 상기 제 2 컨볼루션 레이어로부터 출력된 특징 맵이 연접된 결과를 순차적으로 컨볼루션 처리할 수 있다.The neural network includes a plurality of sub-neural networks including a first convolutional layer, a second convolutional layer, and a plurality of third convolutional layers, wherein the first convolutional layer of the first sub-neural network includes the weighted prediction output The frame, the weighted prediction frame, and the concatenated result of the current frame are concatenated, and the second convolutional layer of the first sub-neural network convolves the weighted prediction feature map, The plurality of third convolutional layers of the first sub-neural network may sequentially concatenate the result of concatenating the feature map output from the first convolution layer and the feature map output from the second convolution layer. have.

상기 첫 번째 서브 신경망 이외의 서브 신경망의 제 1 컨볼루션 레이어는, 상기 가중치 적용된 예측 프레임, 상기 현재 프레임 및 이전 서브 신경망으로부터 출력되는 중간 출력 프레임(intermediate reconstructed frame)이 연접된(concatenated) 결과를 컨볼루션 처리하고, 상기 첫 번째 서브 신경망 이외의 서브 신경망의 제 2 컨볼루션 레이어는, 상기 이전 서브 신경망으로부터 출력되는 중간 특징 맵(intermediate feature map)을 컨볼루션 처리하고, 상기 첫 번째 서브 신경망 이외의 서브 신경망의 복수의 제 3 컨볼루션 레이어는, 상기 제 1 컨볼루션 레이어로부터 출력된 특징 맵과 상기 제 2 컨볼루션 레이어로부터 출력된 특징 맵이 연접된 결과를 순차적으로 컨볼루션 처리할 수 있다.A first convolutional layer of a sub-neural network other than the first sub-neural network is a concatenated result of the weighted prediction frame, the current frame, and intermediate reconstructed frames output from the previous sub-neural network. A second convolutional layer of a sub-neural network other than the first sub-neural network performs convolution processing, and a second convolutional layer of a sub-neural network other than the first sub-neural network convolves an intermediate feature map output from the previous sub-neural network, The plurality of third convolutional layers of the neural network may sequentially concatenate a result of concatenating the feature map output from the first convolutional layer and the feature map output from the second convolutional layer.

상기 프로세서는, 상기 현재 출력 프레임에 대한 부호화를 통해 생성된 비트스트림을 단말 장치로 전송할 수 있다.The processor may transmit a bitstream generated through encoding of the current output frame to the terminal device.

상기 프로세서는, 상기 현재 출력 프레임을 디스플레이를 통해 재생할 수 있다.The processor may reproduce the current output frame through a display.

일 실시예에 따른 멀티 프레임의 처리 방법은, 현재 프레임(current frame)의 현재 샘플에 대응하는 예측 샘플(prediction sample)을 이전 프레임(previous frame)에서 식별하는 단계; 상기 이전 프레임의 콜로케이티드 샘플(collocated sample)의 샘플 값을 상기 예측 샘플의 샘플 값에 따라 변경하여 상기 현재 프레임의 예측 프레임(prediction frame)을 생성하는 단계; 상기 현재 샘플의 샘플 값과 상기 예측 샘플의 샘플 값을 비교하여 가중치를 도출하는 단계; 상기 예측 프레임의 콜로케이티드 샘플에 상기 가중치를 적용하는 단계; 및 컨볼루션 레이어를 포함하는 신경망을 통해 상기 가중치 적용된 예측 프레임(weighted prediction frame)과 상기 현재 프레임을 처리하여 현재 출력 프레임(current reconstructed frame)을 획득하는 단계를 포함할 수 있다.A multi-frame processing method according to an embodiment includes: identifying a prediction sample corresponding to a current sample of a current frame from a previous frame; generating a prediction frame of the current frame by changing a sample value of a collocated sample of the previous frame according to a sample value of the prediction sample; deriving a weight by comparing the sample value of the current sample with the sample value of the prediction sample; applying the weight to collocated samples of the prediction frame; and processing the weighted prediction frame and the current frame through a neural network including a convolutional layer to obtain a current reconstructed frame.

일 실시예에 따른 영상 처리 장치 및 이에 의한 멀티 프레임의 처리 방법은 현재 프레임과 이전 프레임 사이의 시간적 상관성을 기초로 현재 프레임을 처리함으로써 현재 프레임의 처리 성능을 향상시킬 수 있다.The image processing apparatus and the multi-frame processing method according to an embodiment may improve processing performance of the current frame by processing the current frame based on the temporal correlation between the current frame and the previous frame.

다만, 일 실시예에 따른 영상 처리 장치 및 이에 의한 멀티 프레임의 처리 방법이 달성할 수 있는 효과는 이상에서 언급한 것들로 제한되지 않으며, 언급하지 않은 또 다른 효과들은 아래의 기재로부터 본 개시가 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.However, effects achievable by the image processing apparatus and the multi-frame processing method by the image processing apparatus according to an embodiment are not limited to those mentioned above, and other effects that are not mentioned are from the description below. It will be clearly understood by those of ordinary skill in the art.

본 명세서에서 인용되는 도면을 보다 충분히 이해하기 위하여 각 도면의 간단한 설명이 제공된다.
도 1은 일 실시예에 따른 영상 처리 장치의 구성을 도시하는 도면이다.
도 2는 일 실시예에 따른 현재 프레임의 처리 과정을 도시하는 도면이다.
도 3은 컨볼루션 연산을 설명하기 위한 도면이다.
도 4는 도 2에 도시된 움직임 예측 과정을 설명하기 위한 도면이다.
도 5는 움직임 예측을 위해 현재 프레임에 적용되는 컨볼루션 연산을 도시하는 예시적인 도면이다.
도 6은 움직임 예측을 위해 이전 프레임에 적용되는 컨볼루션 연산을 도시하는 예시적인 도면이다.
도 7은 현재 프레임 내 샘플들에 대응하는 예측 샘플들을 도시하는 예시적인 도면이다.
도 8은 도 2에 도시된 움직임 보상 과정을 설명하기 위한 도면이다.
도 9는 움직임 예측 결과를 이용하여 이전 프레임을 움직임 보상하는 과정을 도시하는 예시적인 도면이다.
도 10은 움직임 보상의 결과로 획득된 예측 프레임에 가중치를 적용하는 과정을 도시하는 예시적인 도면이다.
도 11은 다운샘플링된 프레임을 대상으로 하여 획득된 움직임 벡터의 개수를 증가시키는 방법을 설명하기 위한 도면이다.
도 12는 신경망에 포함된 제 1 서브 신경망의 구조를 도시하는 도면이다.
도 13은 신경망에 포함된 마지막 서브 신경망의 구조를 도시하는 도면이다.
도 14는 일 실시예에 따른 영상 처리 방법의 응용 예를 도시하는 도면이다.
도 15는 다른 실시예에 따른 영상 처리 방법의 응용 예를 도시하는 도면이다.
도 16은 또 다른 실시예에 따른 영상 처리 방법의 응용 예를 도시하는 도면이다.
도 17은 일 실시예에 따른 멀티 프레임의 처리 방법의 순서도이다.In order to more fully understand the drawings cited herein, a brief description of each drawing is provided.
1 is a diagram illustrating a configuration of an image processing apparatus according to an exemplary embodiment.
2 is a diagram illustrating a process of processing a current frame according to an embodiment.
3 is a diagram for explaining a convolution operation.
FIG. 4 is a diagram for explaining the motion prediction process shown in FIG. 2 .
5 is an exemplary diagram illustrating a convolution operation applied to a current frame for motion prediction.
6 is an exemplary diagram illustrating a convolution operation applied to a previous frame for motion prediction.
7 is an exemplary diagram illustrating prediction samples corresponding to samples in a current frame.
FIG. 8 is a diagram for explaining the motion compensation process shown in FIG. 2 .
9 is an exemplary diagram illustrating a process of motion compensation for a previous frame using a motion prediction result.
10 is an exemplary diagram illustrating a process of applying a weight to a prediction frame obtained as a result of motion compensation.
11 is a diagram for explaining a method of increasing the number of motion vectors obtained by targeting a downsampled frame.
12 is a diagram illustrating a structure of a first sub-neural network included in a neural network.
13 is a diagram illustrating the structure of the last sub-neural network included in the neural network.
14 is a diagram illustrating an application example of an image processing method according to an embodiment.
15 is a diagram illustrating an application example of an image processing method according to another embodiment.
16 is a diagram illustrating an application example of an image processing method according to another embodiment.
17 is a flowchart of a multi-frame processing method according to an embodiment.

본 개시는 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고, 이를 상세한 설명을 통해 설명하고자 한다. 그러나, 이는 본 개시를 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 개시의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다.Since the present disclosure can make various changes and can have various embodiments, specific embodiments are illustrated in the drawings, and this will be described through detailed description. However, this is not intended to limit the present disclosure to specific embodiments, and it should be understood to include all modifications, equivalents and substitutes included in the spirit and scope of the present disclosure.

실시예를 설명함에 있어서, 관련된 공지 기술에 대한 구체적인 설명이 요지를 불필요하게 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다. 또한, 실시예의 설명 과정에서 이용되는 숫자(예를 들어, 제 1, 제 2 등)는 하나의 구성요소를 다른 구성요소와 구분하기 위한 식별기호에 불과하다.In describing the embodiment, if it is determined that a detailed description of a related known technology may unnecessarily obscure the subject matter, the detailed description thereof will be omitted. In addition, numbers (eg, first, second, etc.) used in the description process of the embodiment are only identification symbols for distinguishing one component from other components.

또한, 본 명세서에서 일 구성요소가 다른 구성요소와 "연결된다" 거나 "접속된다" 등으로 언급된 때에는, 상기 일 구성요소가 상기 다른 구성요소와 직접 연결되거나 또는 직접 접속될 수도 있지만, 특별히 반대되는 기재가 존재하지 않는 이상, 중간에 또 다른 구성요소를 매개하여 연결되거나 또는 접속될 수도 있다고 이해되어야 할 것이다.Also, in this specification, when a component is referred to as “connected” or “connected” to another component, the component may be directly connected to or directly connected to the other component, but the opposite is particularly true. Unless there is a description to be used, it will be understood that it may be connected or connected through another element in the middle.

또한, 본 명세서에서 '~부(유닛)', '모듈' 등으로 표현되는 구성요소는 2개 이상의 구성요소가 하나의 구성요소로 합쳐지거나 또는 하나의 구성요소가 보다 세분화된 기능별로 2개 이상으로 분화될 수도 있다. 또한, 이하에서 설명할 구성요소 각각은 자신이 담당하는 주기능 이외에도 다른 구성요소가 담당하는 기능 중 일부 또는 전부의 기능을 추가적으로 수행할 수도 있으며, 구성요소 각각이 담당하는 주기능 중 일부 기능이 다른 구성요소에 의해 전담되어 수행될 수도 있음은 물론이다.In addition, in the present specification, components expressed as '~ part (unit)', 'module', etc. are two or more components combined into one component, or two or more components for each more subdivided function. may be differentiated into In addition, each of the components to be described below may additionally perform some or all of the functions of other components in addition to the main functions they are responsible for, and some of the main functions of each component may have different functions. It goes without saying that it may be performed exclusively by the component.

또한, 본 명세서에서, '프레임'은 정지 영상일 수 있다. 예를 들어, '프레임'은 동영상(또는 비디오)를 구성하는 정지 영상을 포함할 수 있다.Also, in this specification, a 'frame' may be a still image. For example, a 'frame' may include a still image constituting a moving picture (or video).

또한, 본 명세서에서, '샘플'은 프레임 또는 특징 맵의 샘플링 위치에 할당된 데이터로서 처리 대상이 되는 데이터를 의미한다. 예를 들어, 공간 영역의 프레임에서 픽셀 값일 수 있다. 적어도 하나의 샘플들을 포함하는 단위를 블록이라고 정의할 수 있다.Also, in the present specification, a 'sample' refers to data assigned to a sampling position of a frame or a feature map and to be processed. For example, it may be a pixel value in a frame in a spatial domain. A unit including at least one sample may be defined as a block.

또한, 본 명세서에서, '현재 샘플(current sample)'은 처리 대상이 되는 현재 프레임에 포함된 특정의 샘플 또는 현재 프레임에 포함된 샘플들 중 처리 대상이 되는 샘플을 의미한다. '콜로케이티드 샘플(collocated)'은 현재 프레임 이외의 프레임(예를 들어, 이전 프레임, 다음 프레임, 출력 프레임, 특징 맵 등)에 포함된 샘플들 중 현재 샘플과 동일한 위치에 있는 샘플을 의미한다.Also, in the present specification, a 'current sample' means a specific sample included in a current frame to be processed or a sample to be processed among samples included in the current frame. A 'collocated sample' refers to a sample located at the same location as the current sample among samples included in a frame other than the current frame (eg, previous frame, next frame, output frame, feature map, etc.) .

또한, 본 명세서에서 '신경망(neural network)'은 뇌 신경을 모사한 인공지능 모델의 예시로서, 특정 알고리즘을 사용한 신경망 모델로 한정되지 않는다.Also, in the present specification, a 'neural network' is an example of an artificial intelligence model simulating a brain nerve, and is not limited to a neural network model using a specific algorithm.

또한, 본 명세서에서 '가중 값(weighting value)'는 신경망을 이루는 각 레이어의 연산 과정에서 이용되는 값으로서 예를 들어, 입력 값을 소정 연산식에 적용할 때 이용될 수 있다. 일반적으로, 가중 값(weighting value)은 가중치(weight)로도 참조되는데, 본 개시에서는 후술하는 가중치 도출 과정(도 1의 230 참조)에서 도출되는 가중치(weight)와의 구별을 위해 신경망의 연산 과정에서 이용되는 가중치를 가중 값으로 참조한다. 가중 값은 훈련의 결과로 설정되는 값으로서, 필요에 따라 별도의 훈련 데이터(training data)를 통해 갱신될 수 있다. Also, in the present specification, a 'weighting value' is a value used in a calculation process of each layer constituting the neural network, and may be used, for example, when an input value is applied to a predetermined calculation expression. In general, a weighting value is also referred to as a weight. In the present disclosure, it is used in the calculation process of a neural network to distinguish it from a weight derived in a weight derivation process (refer to 230 of FIG. 1 ) to be described later. The weight to be used is referred to as a weight value. The weight value is a value set as a result of training, and may be updated through separate training data if necessary.

이하, 본 개시의 기술적 사상에 의한 실시예들을 차례로 상세히 설명한다.Hereinafter, embodiments according to the technical spirit of the present disclosure will be described in detail in turn.

도 1은 일 실시예에 따른 영상 처리 장치(100)의 구성을 도시하는 도면이다.1 is a diagram illustrating a configuration of an image processing apparatus 100 according to an exemplary embodiment.

영상 처리 장치(100)는 메모리(110) 및 프로세서(130)를 포함한다. The image processing apparatus 100 includes a memory 110 and a processor 130 .

영상 처리 장치(100)는 서버, TV, 카메라, 휴대폰, 태블릿 PC, 노트북 등 영상 처리 기능을 갖춘 기기로 구현될 수 있다.The image processing apparatus 100 may be implemented as a device having an image processing function, such as a server, a TV, a camera, a mobile phone, a tablet PC, and a notebook computer.

도 1에는 메모리(110) 및 프로세서(130)가 개별적으로 도시되어 있으나, 메모리(110) 및 프로세서(130)는 하나의 하드웨어 모듈(예를 들어, 칩)을 통해 구현될 수 있다. Although the memory 110 and the processor 130 are separately illustrated in FIG. 1 , the memory 110 and the processor 130 may be implemented through one hardware module (eg, a chip).

프로세서(130)는 신경망 기반의 영상 처리를 위한 전용 프로세서로 구현될 수 있다. 또는, 프로세서(130)는 AP(application processor), CPU(central processing unit) 또는 GPU(graphic processing unit)와 같은 범용 프로세서와 소프트웨어의 조합을 통해 구현될 수도 있다. 전용 프로세서의 경우, 본 개시의 실시예를 구현하기 위한 메모리를 포함하거나, 외부 메모리를 이용하기 위한 메모리 처리부를 포함할 수 있다. The processor 130 may be implemented as a dedicated processor for neural network-based image processing. Alternatively, the processor 130 may be implemented through a combination of software and a general-purpose processor such as an application processor (AP), a central processing unit (CPU), or a graphic processing unit (GPU). A dedicated processor may include a memory for implementing an embodiment of the present disclosure or a memory processing unit for using an external memory.

프로세서(130)는 복수의 프로세서로 구성될 수도 있다. 이 경우, 전용 프로세서들의 조합으로 구현될 수도 있고, AP, CPU 또는 GPU와 같은 다수의 범용 프로세서들과 소프트웨어의 조합을 통해 구현될 수도 있다.The processor 130 may include a plurality of processors. In this case, it may be implemented as a combination of dedicated processors, or may be implemented through a combination of software and a plurality of general-purpose processors such as an AP, CPU, or GPU.

프로세서(130)는 후술하는 컨볼루션 연산을 위한 적어도 하나의 ALU(Arithmetic logic unit)를 포함할 수 있다. 컨볼루션 연산을 위해, ALU는 샘플 값들 사이의 곱 연산을 수행하는 곱셈기 및 곱셈의 결과 값들을 더하는 가산기를 포함할 수 있다.The processor 130 may include at least one arithmetic logic unit (ALU) for a convolution operation, which will be described later. For a convolution operation, the ALU may include a multiplier that performs a multiplication operation between sample values and an adder that adds the result values of the multiplication.

메모리(110)는 연속된 프레임들의 처리를 위한 하나 이상의 인스트럭션을 저장할 수 있다. 일 실시예에서, 메모리(110)는 출력 프레임의 생성에 이용되는 신경망을 저장할 수 있다. 신경망이 인공 지능을 위한 전용 하드웨어 칩 형태로 구현되거나, 기존의 범용 프로세서(예를 들어, CPU 또는 애플리케이션 프로세서) 또는 그래픽 전용 프로세서(예를 들어, GPU)의 일부로 구현되는 경우에는, 신경망이 메모리(110)에 저장되지 않을 수 있다.The memory 110 may store one or more instructions for processing consecutive frames. In an embodiment, the memory 110 may store a neural network used to generate an output frame. When a neural network is implemented in the form of a dedicated hardware chip for artificial intelligence, or as part of an existing general-purpose processor (eg, CPU or application processor) or graphics-only processor (eg, GPU), the neural network is 110) may not be stored.

프로세서(130)는 메모리(110)에 저장된 인스트럭션에 따라 연속된 프레임들을 순차적으로 처리하여 연속된 출력 프레임들을 획득한다. 여기서, 연속된 프레임은 대표적으로 동영상을 구성하는 프레임들을 의미할 수 있다. 그러나, 본 개시에서 연속된 프레임이 반드시 하나의 동영상을 구성할 필요는 없다. 다시 말하면, 서로 개별적으로 촬영된 정지 영상들이 미리 결정된 순서, 임의의 순서 또는 사용자가 설정한 순서에 따라 영상 처리 장치(100)에 의해 처리될 수 있다.The processor 130 sequentially processes consecutive frames according to an instruction stored in the memory 110 to obtain consecutive output frames. Here, the continuous frame may refer to frames constituting a moving picture representatively. However, in the present disclosure, consecutive frames do not necessarily constitute one moving picture. In other words, still images photographed separately from each other may be processed by the image processing apparatus 100 according to a predetermined order, an arbitrary order, or an order set by a user.

도 1에 도시된 것과 같이, 영상 처리 장치(100)는 제 1 프레임(X₁) 내지 제 n 프레임(X_n)을 순차적으로 처리하여 제 1 출력 프레임(Y₁) 내지 제 n 출력 프레임(Y_n)을 획득할 수 있다. 도 1에서 X_t의 t는 프레임이 영상 처리 장치(100)에 의해 처리되는 순서를 나타낸다.1 , the image processing apparatus 100 sequentially processes a first frame (X ₁ ) to an nth frame (X _n ) to sequentially process a first output frame (Y ₁ ) to an nth output frame (Y) _n ) can be obtained. In FIG. 1 , t of X _t represents an order in which frames are processed by the image processing apparatus 100 .

영상 처리 장치(100)는 미리 훈련된 신경망을 통해 제 1 출력 프레임(Y₁) 내지 제 n 출력 프레임(Y_n)을 획득할 수 있다. 신경망은 프레임의 해상도 증가, 노이즈 제거, 다이내믹 레인지의 확장 또는 화질 개선을 위해 미리 훈련될 수 있다.The image processing apparatus 100 may obtain a first output frame Y ₁ to an nth output frame Y _n through a pre-trained neural network. Neural networks can be pre-trained to increase frame resolution, remove noise, extend dynamic range, or improve image quality.

예를 들어, 신경망이 프레임의 해상도를 증가시키도록 훈련된 경우, 영상 처리 장치(100)는 제 1 프레임(X₁) 내지 제 n 프레임(X_n)을 신경망 기반으로 처리하여 제 1 프레임(X₁) 내지 제 n 프레임(X_n)의 해상도보다 큰 해상도를 갖는 제 1 출력 프레임(Y₁) 내지 제 n 출력 프레임(Y_n)을 획득할 수 있다. 프레임의 해상도를 증가시키기 위한 신경망을 훈련시키는 방법은 다양할 수 있다. 일 예로, 신경망에 의한 훈련용 프레임의 처리 결과로 획득되는 훈련용 출력 프레임과 미리 증가된 해상도를 갖는 정답 프레임을 비교하여 손실 정보를 산출하고, 산출된 손실 정보를 최소화시키는 방향으로 신경망을 훈련시킬 수 있다. 신경망의 훈련 결과, 신경망 내 레이어들에서 이용되는 가중 값들이 갱신될 수 있다.For example, when the neural network is trained to increase the resolution of a frame, the image processing apparatus 100 processes the first frame X ₁ to the nth frame X _n based on the neural network to process the first frame X ₁ ) to n-th frame (X _n ) having a resolution greater than that of the first output frame (Y ₁ ) to the n-th output frame (Y _n ) may be obtained. There may be various ways to train a neural network to increase the resolution of a frame. As an example, by comparing the training output frame obtained as a result of processing the training frame by the neural network with the correct answer frame having a pre-increased resolution, loss information is calculated, and the neural network is trained in a direction to minimize the calculated loss information. can As a result of training the neural network, weight values used in layers in the neural network may be updated.

다른 예로, 신경망이 프레임의 노이즈를 제거시키도록 훈련된 경우, 영상 처리 장치(100)는 제 1 프레임(X₁) 내지 제 n 프레임(X_n)을 신경망 기반으로 처리하여 제 1 프레임(X₁) 내지 제 n 프레임(X_n)에 포함된 노이즈보다 적은 노이즈를 갖는 제 1 출력 프레임(Y₁) 내지 제 n 출력 프레임(Y_n)을 획득할 수 있다. 프레임의 노이즈를 감소시키기 위한 신경망을 훈련시키는 방법은 다양할 수 있다. 일 예로, 신경망에 의한 훈련용 프레임의 처리 결과로 획득되는 훈련용 출력 프레임과 미리 노이즈가 제거된 정답 프레임을 비교하여 손실 정보를 산출하고, 산출된 손실 정보를 최소화시키는 방향으로 신경망을 훈련시킬 수 있다.As another example, when the neural network is trained to remove noise from a frame, the image processing apparatus 100 processes the first frame (X ₁ ) to the nth frame (X _n ) based on the neural network to process the first frame (X ₁ ) ) to n-th frame (X _n ) having less noise than noise included in the first output frame (Y ₁ ) to the n-th output frame (Y _n ) may be obtained. There may be various ways to train a neural network to reduce frame noise. For example, by comparing the training output frame obtained as a result of processing the training frame by the neural network with the correct answer frame from which noise has been removed in advance, the loss information is calculated, and the neural network can be trained in a direction to minimize the calculated loss information. have.

신경망은 지도 학습(Supervised Learning), 비지도 학습(Unsupervised Learning) 또는 강화 학습(Reinforcement Learning)을 통해 훈련될 수 있다.The neural network may be trained through supervised learning, unsupervised learning, or reinforcement learning.

본 개시의 일 실시예는 연속된 프레임들 중 현재 처리 대상인 현재 프레임을 처리할 때, 이전 프레임도 함께 이용한다. 즉, 도 1에 도시된 바와 같이, 제 2 프레임(X₂)이 처리될 차례에 제 1 프레임(X₁)도 영상 처리 장치(100)로 입력된다. 후술하는 바와 같이, 제 1 프레임(X₁)의 처리 결과로 획득되는 제 1 출력 프레임(Y₁)과 제 1 프레임(X₁)의 처리 중에 획득되는 특징 맵도 제 2 프레임(X₂)과 함께 영상 처리 장치(100)로 입력될 수 있다.According to an embodiment of the present disclosure, when processing a current frame that is a current processing target among consecutive frames, a previous frame is also used. That is, as shown in FIG. 1 , the first frame X ₁ is also input to the image processing apparatus 100 when the second frame X ₂ is to be processed. As will be described later, the first output frame Y ₁ obtained as a result of the processing of the first frame X ₁ and the feature map obtained during the processing of the first frame X ₁ also include the second frame X ₂ and Together, they may be input to the image processing apparatus 100 .

현재 프레임의 처리 시 이전 프레임을 영상 처리 장치(100)로 입력하는 이유는 연속된 프레임들 사이의 시간적 상관성(temporal correlation)을 고려하기 위함이다. 현재 프레임의 처리에 이전 프레임의 정보, 예를 들어, 이전 프레임의 샘플 값들을 반영함으로써, 현재 프레임만을 신경망 기반으로 처리할 때보다 더 우수한 결과를 기대할 수 있다.The reason for inputting the previous frame to the image processing apparatus 100 when processing the current frame is to consider temporal correlation between successive frames. By reflecting the information of the previous frame, for example, sample values of the previous frame, in the processing of the current frame, better results can be expected than when only the current frame is processed based on the neural network.

다만, 이전 프레임을 그대로 이용하는 경우에는 현재 출력 프레임에 포함된 오브젝트의 위치에 에러가 발생할 가능성이 있다. 그 이유는 서로 다른 시점에 촬영된 이전 프레임과 현재 프레임에 공통적으로 포함된 오브젝트의 위치가 다를 수 있기 때문이다. 다시 말하면, 공통의 오브젝트가 현재 프레임과 이전 프레임에서 서로 다른 지점에 위치하는 경우, 현재 프레임의 처리 과정 중에 이전 프레임에 포함된 오브젝트의 위치가 반영됨으로써 현재 출력 프레임에 포함된 오브젝트의 위치가 현재 프레임과 달라질 수 있는 것이다.However, when the previous frame is used as it is, there is a possibility that an error may occur in the position of an object included in the current output frame. This is because the positions of the objects commonly included in the previous frame and the current frame captured at different points of time may be different. In other words, when the common object is located at different points between the current frame and the previous frame, the position of the object included in the previous frame is reflected during the processing of the current frame, so that the position of the object included in the current output frame is changed to the current frame. may be different from

또한, 오브젝트의 움직임으로 인해, 이전 프레임에 존재하는 오브젝트가 현재 프레임에서는 오클루젼(occlusion)되어 있을 수도 있다. 오클루젼이란, 이전 프레임에 존재하는 오브젝트의 전부 또는 일부가 현재 프레임에 포함되지 않는 상황을 의미한다. 예를 들어, 이전 프레임에 포함된 오브젝트가 현재 프레임에서는 다른 오브젝트에 의해 가려져(hided) 있거나, 현재 프레임의 촬영시 카메라에 의해 캡쳐되지 않을 수 있다. 현재 프레임에서 오클루젼된 이전 프레임의 오브젝트는 현재 프레임의 처리에 도움이 되지 않을 수 있다.Also, due to the movement of the object, an object existing in the previous frame may be occluded in the current frame. Occlusion means a situation in which all or part of an object existing in a previous frame is not included in the current frame. For example, an object included in the previous frame may be hidden by another object in the current frame, or may not be captured by the camera when capturing the current frame. An object of a previous frame that is occluded in the current frame may not be helpful in processing the current frame.

이에 본 개시의 일 실시예에 따른 영상 처리 장치(100)는 현재 프레임과 이전 프레임의 시간적 상관성을 고려하기 위해 현재 프레임의 처리 시 이전 프레임도 함께 이용하지만, 이전 프레임을 그대로 이용하지 않고 대신 이전 프레임으로부터 생성된 예측 프레임을 현재 프레임의 처리에 이용한다. Accordingly, the image processing apparatus 100 according to an embodiment of the present disclosure uses the previous frame together when processing the current frame in order to consider the temporal correlation between the current frame and the previous frame, but does not use the previous frame as it is, but instead uses the previous frame The prediction frame generated from . is used for processing the current frame.

또한, 영상 처리 장치(100)는 현재 프레임의 처리 과정에서 예측 프레임의 샘플들을 어느 정도로 이용할지를 판단하고, 그에 따라 예측 프레임을 후술하는 게이팅(gating) 처리할 수 있다.Also, the image processing apparatus 100 may determine to what extent samples of the prediction frame are to be used in the processing of the current frame, and may perform gating processing on the prediction frame, which will be described later.

도 1에는 도시되어 있지는 않지만, 영상 처리 장치(100)는 디스플레이를 더 포함하거나, 별도의 디스플레이 장치와 연결될 수 있다. 영상 처리 장치(100)에 의해 생성된 연속된 출력 프레임들 중 적어도 하나는 디스플레이 또는 디스플레이 장치에서 재생될 수 있다. 필요한 경우, 출력 프레임들 중 적어도 하나는 후처리된 후 디스플레이 또는 디스플레이 장치에서 재생될 수도 있다.Although not shown in FIG. 1 , the image processing apparatus 100 may further include a display or be connected to a separate display apparatus. At least one of the continuous output frames generated by the image processing apparatus 100 may be reproduced on a display or a display apparatus. If necessary, at least one of the output frames may be post-processed and then reproduced on a display or display device.

구현예에 따라, 영상 처리 장치(100)는 주파수 변환을 이용한 영상 압축 방법을 통해 출력 프레임들 중 적어도 하나를 부호화할 수 있다. 주파수 변환을 이용한 영상 압축 방법은 출력 프레임을 예측하여 예측 데이터를 생성하는 과정, 출력 프레임과 예측 데이터 사이의 차이에 해당하는 잔차 데이터를 생성하는 과정, 공간 영역 성분인 잔차 데이터를 주파수 영역 성분으로 변환(transformation)하는 과정, 주파수 영역 성분으로 변환된 잔차 데이터를 양자화(quantization)하는 과정 및 양자화된 잔차 데이터를 엔트로피 부호화하는 과정 등을 포함할 수 있다. 이와 같은 영상 압축 방법은 MPEG-2, H.264 AVC(Advanced Video Coding), MPEG-4, HEVC(High Efficiency Video Coding), VC-1, VP8, VP9 및 AV1(AOMedia Video 1) 등 주파수 변환을 이용한 영상 압축 방법들 중 하나를 통해 구현될 수 있다. According to an embodiment, the image processing apparatus 100 may encode at least one of the output frames through an image compression method using frequency transformation. An image compression method using frequency transform predicts an output frame to generate prediction data, generates residual data corresponding to a difference between the output frame and prediction data, and transforms the spatial domain component residual data into a frequency domain component It may include a process of transforming, a process of quantizing the residual data transformed into a frequency domain component, and a process of entropy encoding the quantized residual data. This video compression method uses frequency conversion such as MPEG-2, H.264 Advanced Video Coding (AVC), MPEG-4, High Efficiency Video Coding (HEVC), VC-1, VP8, VP9, and AV1 (AOMedia Video 1). It may be implemented through one of the used image compression methods.

출력 프레임의 부호화를 통해 생성된 부호화 데이터 또는 비트스트림은 네트워크를 통해 외부 기기로 전달되거나, 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM 및 DVD와 같은 광기록 매체, 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical medium) 등의 데이터 저장 매체에 저장될 수 있다.The encoded data or bitstream generated by encoding the output frame is transmitted to an external device through a network, magnetic media such as hard disks, floppy disks and magnetic tapes, optical recording media such as CD-ROMs and DVDs, and floppy disks. The data may be stored in a data storage medium such as a magneto-optical medium such as a optical disk.

이하에서는, 도 2를 참조하여, 영상 처리 장치(100)에 의한 현재 프레임(X_t)의 처리 과정을 구체적으로 설명한다.Hereinafter, a processing process of the current frame X _t by the image processing apparatus 100 will be described in detail with reference to FIG. 2 .

영상 처리 장치(100)는 움직임 예측 과정(210), 움직임 보상 과정(220), 가중치 도출 과정(230) 및 게이팅 과정(240)을 통해 신경망(250)으로 입력될 데이터를 획득한다. The image processing apparatus 100 obtains data to be input to the neural network 250 through a motion prediction process 210 , a motion compensation process 220 , a weight derivation process 230 , and a gating process 240 .

먼저, 움직임 예측(210)은, 현재 프레임(X_t)의 샘플들과 이전 프레임(X_t-1)의 샘플들 사이의 움직임 벡터를 결정하는 과정이다. 움직임 벡터란, 이전 프레임(X_t-1)과 현재 프레임(X_t)에 존재하는 동일 또는 유사한 샘플의 상대적인 위치 관계를 나타낸다. 예를 들어, 특정의 샘플이 이전 프레임(X_t-1)에서 (a,b) 좌표에 위치하고, 해당 샘플이 현재 프레임(X_t)에서 (c, d) 좌표에 위치하는 경우, 해당 샘플의 움직임 벡터는 (c-a, d-b)로 표현될 수 있다. 후술하는 바와 같이, 본 개시의 일 실시예에서 움직임 벡터는 컨볼루션 연산을 위한 필터 커널로 표현될 수도 있다.First, the motion prediction 210 is a process of determining a motion vector between samples of a current frame (X _t ) and samples of a previous frame (X _t-1 ). The motion vector indicates a relative positional relationship between the same or similar samples existing in the previous frame (X _t-1 ) and the current frame (X _t ). For example, if a specific sample is located at (a,b) coordinates in the previous frame (X _t-1 ) and the corresponding sample is located at (c, d) coordinates in the current frame (X _t ), the sample's The motion vector may be expressed as (ca, db). As will be described later, in an embodiment of the present disclosure, a motion vector may be expressed as a filter kernel for a convolution operation.

영상 처리 장치(100)는 움직임 예측(210)을 통해 현재 프레임(X_t)의 샘플들에 대응하는 예측 샘플들을 이전 프레임(X_t-1)에서 식별한다. 구체적으로, 영상 처리 장치(100)는 현재 프레임(X_t)의 샘플들 각각이 이전 프레임(X_t-1)의 샘플들 중 어느 샘플과 유사한지를 써치하고, 써치된 이전 프레임(X_t-1) 내 샘플들을 현재 프레임(X_t) 내 샘플들의 예측 샘플로 식별할 수 있다. 예를 들어, 영상 처리 장치(100)는 현재 프레임(X_t)의 현재 샘플이 이전 프레임(X_t-1)의 샘플들 중 현재 샘플과 동일한 위치에 있는 샘플(즉, 콜로케이티드 샘플)의 우측에 있는 샘플과 가장 유사한 경우, 콜로케이티드 샘플의 우측에 있는 샘플을 현재 샘플의 예측 샘플로 식별할 수 있다.The image processing apparatus 100 identifies prediction samples corresponding to samples of the current frame X _t in the previous frame X _t-1 through the motion prediction 210 . Specifically, the image processing apparatus 100 searches which samples of the current frame (X _t ) are similar to which of the samples of the previous frame (X _t-1 ) are similar, and the searched previous frame (X _t-1) ) may be identified as prediction samples of samples in the current frame (X _t ). For example, the image processing apparatus 100 determines that the current sample of the current frame X _t is the same as the current sample among samples of the previous frame X _t-1 (ie, the collocated sample). The sample to the right of the collocated sample can be identified as the predictive sample of the current sample if it is most similar to the sample on the right.

일 실시예에서, 영상 처리 장치(100)는 움직임 예측(210)을 컨볼루션 연산을 통해 수행할 수 있는데, 이에 대해서는 도 4 내지 도 7을 참조하여 설명한다.In an embodiment, the image processing apparatus 100 may perform the motion prediction 210 through a convolution operation, which will be described with reference to FIGS. 4 to 7 .

영상 처리 장치(100)는 현재 프레임(X_t)의 샘플들과 그에 대응하는 이전 프레임(X_t-1) 내 예측 샘플들 사이의 상대적인 위치 관계를 움직임 벡터로서 움직임 보상 과정(220)에서 이용한다.The image processing apparatus 100 uses the relative positional relationship between the samples of the current frame (X _t ) and the corresponding prediction samples in the previous frame (X _t-1 ) as a motion vector in the motion compensation process 220 .

움직임 보상 과정(220)은 이전 프레임(X_t-1)의 샘플들의 샘플 값들을 변경하여 현재 프레임(X_t)의 예측 버전인 예측 프레임(X_{t_pred})을 획득하는 과정이다. The motion compensation process 220 is a process of obtaining a predicted frame X _{t_pred} , which is a predicted version of the current frame X _t , by changing sample values of samples of the previous frame X _t-1 .

영상 처리 장치(100)는 이전 프레임(X_t-1)의 샘플 값들을 예측 샘플들의 샘플 값들에 따라 변경하여 예측 프레임(X_{t_pred})을 획득할 수 있다. 구체적으로, 현재 프레임(X_t)의 샘플들과 동일한 위치에 있는 이전 프레임(X_t-1)의 콜로케이티드 샘플들의 샘플 값들이 예측 샘플들의 샘플 값들에 따라 변경될 수 있다. 예를 들어, 현재 프레임(X_t)의 좌측 상단에 위치하는 현재 샘플의 예측 샘플이 이전 프레임(X_t-1)의 콜로케이티드 샘플의 우측에 위치하는 경우, 이전 프레임(X_t-1)의 콜로케이티드 샘플의 샘플 값은 우측에 위치하는 샘플 값(즉, 예측 샘플의 샘플 값)에 따라 변경될 수 있다.The image processing apparatus 100 may obtain the prediction frame X _{t_pred} by changing sample values of the previous frame X _t-1 according to sample values of the prediction samples. Specifically, sample values of collocated samples of a previous frame (X _t-1 ) at the same position as samples of the current frame (X _t ) may be changed according to sample values of prediction samples. For example, when the prediction sample of the current sample located at the upper left of the current frame (X _t ) is located at the right of the collocated sample of the previous frame (X _t -1 ), the previous frame (X _t-1 ) The sample value of the collocated sample of may change according to the sample value located to the right (ie, the sample value of the prediction sample).

가중치 도출 과정(230)은, 현재 프레임(X_t)을 처리하는데 예측 프레임(X_{t_pred})의 샘플들이 어느 정도로 도움이 되는지 또는 현재 프레임(X_t)의 처리 과정에서의 예측 프레임(X_{t_pred}) 내 샘플들의 기여도를 판단하는 과정으로 이해될 수 있다. The weight derivation process 230 determines to what extent samples of the prediction frame X _{t_pred} are helpful in processing the current frame X _t or in the prediction frame X _{t_pred} in the processing of the current frame X _t . It can be understood as a process of determining the contribution of samples.

가중치 도출 과정(230)을 통해 예측 프레임(X_{t_pred})의 샘플들에 적용될 가중치들이 도출된다. 현재 프레임(X_t)을 처리하는데 기여도가 높은 샘플에 대해서는 높은 가중치가 도출되고, 기여도가 낮은 샘플에 대해서는 낮은 가중치가 도출된다.Weights to be applied to samples of the prediction frame X _{t_pred} are derived through the weight derivation process 230 . In processing the current frame (X _t ), a high weight is derived for a sample having a high contribution, and a low weight is derived from a sample having a low contribution to processing the current frame (X t ).

가중치는 현재 프레임(X_t)의 샘플들의 샘플 값들과 그에 대응하는 예측 샘플들의 샘플 값들 사이의 차이 값에 기반할 수 있다. 차이 값이 클수록 가중치는 작아지고, 차이 값이 작을수록 가중치는 커진다. 차이 값이 크다는 것은, 예측 샘플의 샘플 값이 현재 프레임(X_t)의 샘플의 처리에 기여도가 높지 않다는 것을 의미하므로, 가중치는 작게 산출된다.The weight may be based on a difference value between sample values of samples of the current frame (X _t ) and sample values of corresponding prediction samples. The larger the difference value, the smaller the weight, and the smaller the difference value, the larger the weight. A large difference value means that the sample value of the prediction sample does not have a high contribution to the processing of the sample of the current frame X _t , so the weight is calculated to be small.

게이팅 과정(240)은 예측 프레임(X_{t_pred})의 샘플들에 가중치를 적용하는 과정이다. 예측 프레임(X_{t_pred})의 샘플들의 기여도에 따라 예측 프레임(X_{t_pred})의 샘플 값들이 변경된다. The gating process 240 is a process of applying a weight to the samples of the prediction frame X _{t_pred} . Sample values of the prediction frame X _{t_pred} are changed according to the contribution of the samples of the prediction frame X _{t_pred} .

일 실시예에서, 게이팅 과정(240)에서는 예측 프레임(X_{t_pred})의 샘플들의 샘플 값들에 가중치가 곱해질 수 있다. 1의 가중치가 곱해진 샘플의 샘플 값은 그대로 유지되지만, 1보다 작은 가중치가 곱해진 샘플의 샘플 값은 작아진다.In an embodiment, in the gating process 240 , sample values of samples of the prediction frame X _{t_pred} may be multiplied by a weight. The sample value of the sample multiplied by a weight of 1 remains the same, but the sample value of the sample multiplied by a weight less than 1 becomes smaller.

도 1과 관련하여 설명한 것과 같이, 영상 처리 장치(100)는 현재 프레임(X_t)의 처리 시 이전 프레임(X_t-1)에 대응하는 이전 출력 프레임(Y_t-1)과 이전 프레임(X_t-1)의 처리 과정 중에 획득되는 이전 특징 맵(S_t-1)을 더 이용할 수도 있다.As described with reference to FIG. 1 , the image processing apparatus 100 processes the current frame X _t and the previous output frame Y _{t-1 corresponding to the previous frame X t-1} _and the previous frame X The previous feature map (S _t-1 ₎ obtained during the processing of t-1) may be further used.

이전 출력 프레임(Y_t-1)과 이전 특징 맵(S_t-1)은 이전 프레임(X_t-1)의 처리 과정에서 신경망(250)에 의해 출력될 수 있다. 이전 출력 프레임(Y_t-1)은 신경망(250)의 마지막 레이어에서 출력되고, 이전 특징 맵(S_t-1)은 신경망(250)의 마지막 레이어의 이전 레이어에서 출력될 수 있다. 여기서 마지막 레이어의 이전 레이어는, 마지막 레이어와 직접적으로 연결된 이전 레이어 또는 마지막 레이어와의 사이에 하나 이상의 레이어가 존재하는 이전 레이어를 의미할 수 있다.The previous output frame Y _t-1 and the previous feature map S _t-1 may be output by the neural network 250 in the process of processing the previous frame X _t-1 . The previous output frame Y _t-1 may be output from the last layer of the neural network 250 , and the previous feature map S _t-1 may be output from a previous layer of the last layer of the neural network 250 . Here, the previous layer of the last layer may mean a previous layer directly connected to the last layer or a previous layer in which one or more layers exist between the last layer and the last layer.

이전 출력 프레임(Y_t-1)과 이전 특징 맵(S_t-1)도 이전 프레임(X_t-1)의 특성을 가지므로, 전술한 움직임 보상 과정(220) 및 게이팅 과정(240)의 적용이 필요하다. 즉, 움직임 보상 과정(220)을 통해 이전 출력 프레임(Y_t-1)과 이전 특징 맵(S_t-1)으로부터 현재 출력 프레임(Y_t)의 예측 버전인 예측 출력 프레임(Y_{t_pred})과, 현재 특징 맵(S_t)의 예측 버전인 예측 특징 맵(S_{t_pred})이 획득될 수 있다. Since the previous output frame (Y _t-1 ) and the previous feature map (S _t-1 ) also have the characteristics of the previous frame (X _t-1 ), the above-described motion compensation process 220 and gating process 240 are applied I need this. That is, the prediction output frame (Y _{t_pred} ), which is a predicted version of the current output frame (Y _t ) from the previous output frame (Y _t-1 ) and the previous feature map (S _t-1 ) through the motion compensation process 220, and, A predictive feature map (S _{t_pred} ) that is a predictive version of the current feature map (S _t ) may be obtained.

이전 출력 프레임(Y_t-1)과 이전 특징 맵(S_t-1)에 대한 움직임 보상 과정(220)은 전술한 이전 프레임(X_t-1)에 대한 움직임 보상 과정(220)과 동일하다. 구체적으로, 현재 프레임(X_t)의 현재 샘플과 이전 프레임(X_t-1) 내 예측 샘플 사이의 위치 관계(즉, 움직임 벡터)에 따라 이전 출력 프레임(Y_t-1) 및 이전 특징 맵(S_t-1)의 콜로케이티드 샘플의 샘플 값을 변경하여 예측 출력 프레임(Y_{t_pred}) 및 예측 특징 맵(S_{t_pred})이 생성될 수 있다. 예를 들어, 현재 프레임(X_t)의 현재 샘플의 예측 샘플이 이전 프레임(X_t-1)의 콜로케이티드 샘플의 우측에 위치하는 경우, 이전 출력 프레임(Y_t-1) 및 이전 특징 맵(S_t-1)의 콜로케이티드 샘플의 샘플 값은 그 우측에 위치하는 샘플 값에 따라 변경될 수 있다.The motion compensation process 220 for the previous output frame Y _t-1 and the previous feature map S _t-1 is the same as the motion compensation process 220 for the previous frame X _t-1 described above. Specifically, the previous output frame (Y _t _- ₁ ) and the previous feature map (Y t-1 ) and the previous feature map ( A prediction output frame (Y _{t_pred} ) and a prediction feature map (S _{t_pred} ) may be generated by changing the sample value of the collocated sample of S _t-1 . For example, if the predicted sample of the current sample of the current frame (X _t ) is located to the right of the collocated sample of the previous frame (X _t-1 ), the previous output frame (Y _t-1 ) and the previous feature map The sample value of the collocated sample of (S _t-1 ) may be changed according to the sample value located to the right of it.

가중치 도출 과정(230)을 통해 획득된 가중치들이 게이팅 과정(240)에서 예측 출력 프레임(Y_{t_pred}) 및 예측 특징 맵(S_{t_pred})의 샘플들에 적용됨으로써, 가중치 적용된 예측 출력 프레임(Y'_{t_pred}) 및 가중치 적용된 예측 특징 맵(S'_{t_pred})이 획득될 수 있다.Weights obtained through the weight derivation process 230 are applied to samples of the prediction output frame (Y _{t_pred} ) and the prediction feature map (S _{t_pred} ) in the gating process 240 , so that the weighted prediction output frame (Y' _{t_pred} ) and a weighted prediction feature map (S' _{t_pred} ) may be obtained.

도 2는 현재 프레임(X_t)의 처리 시 이전 출력 프레임(Y_t-1)과 이전 특징 맵(S_t-1)이 모두 이용되는 것으로 도시되어 있으나, 이는 하나의 예시일 뿐, 이전 출력 프레임(Y_t-1)과 이전 특징 맵(S_t-1)은 현재 프레임(X_t)의 처리 과정에서 이용되지 않을 수 있다. 즉, 현재 프레임(X_t)의 처리 시 이전 프레임(X_t-1)만 고려될 수 있다. 다른 예로, 현재 프레임(X_t)의 처리시 이전 출력 프레임(Y_t-1)과 이전 특징 맵(S_t-1) 중 어느 하나만 이용될 수도 있다.2 shows that both the previous output frame (Y _t-1 ) and the previous feature map (S _t-1 ) are used in processing the current frame (X _t ), but this is only an example, and the previous output frame (Y _t-1 ) and the previous feature map (S _t-1 ) may not be used in the process of processing the current frame (X _t ). That is, when processing the current frame (X _t ), only the previous frame (X _t-1 ) may be considered. As another example, when processing the current frame (X _t ), only one of the previous output frame (Y _t-1 ) and the previous feature map (S _t-1 ) may be used.

게이팅 과정(240)을 통해 도출된 가중치 적용된 예측 프레임(X'_{t_pred}), 가중치 적용된 예측 출력 프레임(Y'_{t_pred}), 가중치 적용된 예측 특징 맵(S'_{t_pred}) 및 현재 프레임(X_t)이 신경망(250)으로 입력된다. 신경망(250)의 처리 결과 현재 프레임(X_t)에 대응하는 현재 출력 프레임(Y_t)이 획득된다.The weighted prediction frame (X' _{t_pred} ), the weighted prediction output frame (Y' _{t_pred} ), the weighted prediction feature map (S' _{t_pred} ) and the current frame (X _t ) derived through the gating process 240 are 250) is entered. As a result of the processing of the neural network 250 , the current output frame Y _t corresponding to the current frame X _t is obtained.

일 실시예에 따른 신경망(250)은 컨볼루션 레이어를 포함할 수 있다. 컨볼루션 레이어에서는 필터 커널을 이용하여 입력 데이터에 대한 컨볼루션 처리가 수행된다. 컨볼루션 레이어에 대한 컨볼루션 처리에 대해서는 도 3을 참조하여 설명한다.The neural network 250 according to an embodiment may include a convolutional layer. In the convolution layer, convolution processing is performed on input data using a filter kernel. Convolution processing for the convolution layer will be described with reference to FIG. 3 .

신경망(250)은 하나 이상의 서브 신경망(260-1 ... 260-n)을 포함할 수 있다. 도 2는 여러 서브 신경망(260-1 ... 260-n)들이 신경망(250)에 포함되어 있는 것으로 도시되어 있으나, 이는 예시일 뿐 하나의 서브 신경망(260-1)만이 신경망(250)에 포함되어 있을 수도 있다. 하나의 서브 신경망(260-1)이 신경망(250)에 포함되어 있다는 것은, 신경망(250)이 융합 레이어(262) 및 복수의 컨볼루션 레이어들로 이루어져 있다는 것을 의미할 수 있다.The neural network 250 may include one or more sub-neural networks 260-1 ... 260-n. 2 shows that several sub-neural networks 260-1 ... 260-n are included in the neural network 250, but this is only an example and only one sub-neural network 260-1 is included in the neural network 250. may be included. The fact that one sub-neural network 260-1 is included in the neural network 250 may mean that the neural network 250 includes a fusion layer 262 and a plurality of convolutional layers.

융합(fusion) 레이어는 현재 프레임(X_t)과, 게이팅 과정(240)에서 출력되는 데이터들, 즉, 가중치 적용된 예측 프레임(X'_{t_pred}), 가중치 적용된 예측 출력 프레임(Y'_{t_pred}) 및 가중치 적용된 예측 특징 맵(S'_{t_pred})을 융합한다. 융합 과정을 통해 서로 다른 종류의 데이터들이 통합될 수 있다.The fusion layer includes a current frame (X _t ), data output from the gating process 240 , that is, a weighted prediction frame (X' _{t_pred} ), a weighted prediction output frame (Y' _{t_pred} ) and weighted Fuse the prediction feature map (S' _{t_pred} ). Through the fusion process, different types of data can be integrated.

가중치 적용된 예측 프레임(X'_{t_pred}), 가중치 적용된 예측 출력 프레임(Y'_{t_pred}), 가중치 적용된 예측 특징 맵(S'_{t_pred}) 및 현재 프레임(X_t)이 융합 레이어(262)에 의해 통합된 결과는 후속하는 컨볼루션 레이어들(264)에 의해 컨볼루션 처리된다.The result of integrating the weighted prediction frame (X' _{t_pred} ), the weighted prediction output frame (Y' _{t_pred} ), the weighted prediction feature map (S' _{t_pred} ) and the current frame (X _t ) by the fusion layer 262 is Convolution is performed by subsequent convolutional layers 264 .

첫 번째 서브 신경망(260-1)에 의한 처리 결과, 중간 출력 프레임(Y_{t_int})과 중간 특징 맵(S_{t_int})이 획득된다. 중간 출력 프레임(Y_{t_int})은 첫 번째 서브 신경망(260-1)에 포함된 마지막 레이어에 의해 출력되고, 중간 특징 맵(S_{t_int})은 첫 번째 서브 신경망(260-1)에 포함된 마지막 레이어의 이전 레이어에 의해 출력된다.As a result of processing by the first sub-neural network 260-1, an intermediate output frame Y _{t_int} and an intermediate feature map S _{t_int} are obtained. The intermediate output frame (Y _{t_int} ) is output by the last layer included in the first sub-neural network 260-1, and the intermediate feature map (S _{t_int} ) is the last layer included in the first sub-neural network 260-1. output by the previous layer.

현재 프레임(X_t)과 가중치 적용된 예측 프레임(X'_{t_pred}), 및 첫 번째 서브 신경망(260-1)에서 출력되는 중간 출력 프레임(Y_{t_int})과 중간 특징 맵(S_{t_int})이 두 번째 서브 신경망(260-2)으로 입력된다. 첫 번째 서브 신경망(260-1)과 마찬가지로, 두 번째 서브 신경망(260-2)의 융합 레이어(262)에서 현재 프레임(X_t), 가중치 적용된 예측 프레임(X'_{t_pred}), 중간 출력 프레임(Y_{t_int}) 및 중간 특징 맵(S_{t_int})이 통합된 후 컨볼루션 처리된다. 두 번째 서브 신경망(260-2)의 처리 결과 중간 출력 프레임(Y_{t_int}) 및 중간 특징 맵(S_{t_int})이 출력되고, 출력된 중간 출력 프레임(Y_{t_int}) 및 중간 특징 맵(S_{t_int})이 세 번째 서브 신경망(260-3)으로 입력된다. 두 번째 서브 신경망(260-2)과 마찬가지로, 현재 프레임(X_t)과 가중치 적용된 예측 프레임(X'_{t_pred})이 세 번째 서브 신경망(260-3)으로 더 입력될 수 있다. 마지막 서브 신경망(260-n)에 의한 처리 결과 현재 프레임(X_t)에 대응하는 현재 출력 프레임(Y_t)이 획득된다.The current frame (X _t ), the weighted prediction frame (X' _{t_pred} ), and the intermediate output frame (Y _{t_int} ) and the intermediate feature map (S _{t_int} ) output from the first sub-neural network 260-1 are the second sub-neural network It is entered as (260-2). Like the first sub-neural network 260-1, in the fusion layer 262 of the second sub-neural network 260-2, the current frame (X _t ), the weighted prediction frame (X' _{t_pred} ), and the intermediate output frame (Y) _{t_int} ) and the intermediate feature map (S _{t_int} ) are integrated and then convolved. As a result of the processing of the second sub-neural network 260-2, an intermediate output frame (Y _{t_int} ) and an intermediate feature map (S _{t_int} ) are output, and the output intermediate output frame (Y _{t_int} ) and the intermediate feature map (S _{t_int} ) are three It is input to the second sub neural network 260 - 3 . Similar to the second sub-neural network 260 - 2 , the current frame X _t and the weighted prediction frame X' _{t_pred} may be further input to the third sub neural network 260 - 3 . As a result of processing by the last sub-neural network 260-n, the current output frame Y _t corresponding to the current frame X _t is obtained.

현재 프레임(X_t), 마지막 서브 신경망(260-n)에서 출력되는 현재 출력 프레임(Y_t)과 현재 특징 맵(S_t)은 다음 프레임(X_t+1)의 처리 과정에서 이용될 수 있다.The current frame (X _t ), the current output frame (Y _t ) output from the last sub-neural network 260-n, and the current feature map (S _t ) may be used in the processing of the next frame (X _t+1 ). .

한편, 현재 프레임(X_t)이 연속된 프레임의 첫 번째 프레임인 경우, 이전 프레임(X_t-1), 이전 출력 프레임(Y_t-1) 및 이전 특징 맵(S_t-1)은 미리 결정된 샘플 값(예를 들어, 0)을 갖는 것으로 설정될 수 있다. On the other hand, when the current frame (X _t ) is the first frame of a continuous frame, the previous frame (X _t-1 ), the previous output frame (Y _t-1 ), and the previous feature map (S _t-1 ) are predetermined It can be set to have a sample value (eg, 0).

이하에서, 움직임 예측 과정(210)과 움직임 보상 과정(220)에 대해 상세히 설명하기에 앞서 컨볼루션 연산에 대해 설명한다.Hereinafter, a convolution operation will be described before the motion prediction process 210 and the motion compensation process 220 are described in detail.

도 3은 컨볼루션 연산을 설명하기 위한 도면이다.3 is a diagram for explaining a convolution operation.

컨볼루션 레이어에서 이용되는 필터 커널(330)의 가중 값들과 그에 대응하는 프레임(310) 내 샘플 값들 사이의 곱 연산 및 덧셈 연산을 통해 특징 맵(350)이 생성된다. 필터 커널(330)은 미리 결정된 크기(도 3에서는 3x3)를 갖는다. The feature map 350 is generated through multiplication and addition operations between weight values of the filter kernel 330 used in the convolution layer and sample values in the frame 310 corresponding thereto. The filter kernel 330 has a predetermined size (3x3 in FIG. 3).

필터 커널(330)의 개수에 따라 특징 맵(350)의 개수가 달라진다. 필터 커널(330)의 개수와 특징 맵(350)의 개수는 동일할 수 있다. 즉, 컨볼루션 레이어에서 하나의 필터 커널(330)이 이용되면 하나의 특징 맵(350)이 생성되고, 두 개의 필터 커널(330)이 이용되면 두 개의 특징 맵(350)이 생성될 수 있다.The number of feature maps 350 varies according to the number of filter kernels 330 . The number of filter kernels 330 and the number of feature maps 350 may be the same. That is, when one filter kernel 330 is used in the convolution layer, one feature map 350 may be generated, and if two filter kernels 330 are used, two feature maps 350 may be generated.

도 3에서 프레임(310)에 표시된 I1 내지 I49는 프레임(310)의 샘플들을 나타내고, 필터 커널(330)에 표시된 F1 내지 F9는 필터 커널(330)의 가중 값들을 나타낸다. 또한, 특징 맵(350)에 표시된 M1 내지 M9는 특징 맵(350)의 샘플들을 나타낸다. In FIG. 3 , I1 to I49 indicated in the frame 310 indicate samples of the frame 310 , and F1 to F9 indicated in the filter kernel 330 indicate weight values of the filter kernel 330 . Also, M1 to M9 displayed in the feature map 350 represent samples of the feature map 350 .

도 3은 프레임(310)이 49개의 샘플을 포함하는 것으로 예시하고 있으나, 이는 하나의 예시일 뿐이며, 프레임(310)이 4K의 해상도를 갖는 경우, 예를 들어, 3840 X 2160개의 샘플을 포함할 수 있다.3 illustrates that the frame 310 includes 49 samples, but this is only an example, and when the frame 310 has a resolution of 4K, for example, it may include 3840 X 2160 samples. can

컨볼루션 연산 과정에서, 프레임(310)의 I1, I2, I3, I8, I9, I10, I15, I16, I17의 샘플 값들 각각과 필터 커널(330)의 F1, F2, F3, F4, F5, F6, F7, F8 및 F9 각각의 곱 연산이 수행되고, 곱 연산의 결과 값들을 조합(예를 들어, 덧셈 연산)한 값이 특징 맵(350)의 M1의 값으로 할당될 수 있다. 컨볼루션 연산의 스트라이드(stride)가 2라면, 프레임(310)의 I3, I4, I5, I10, I11, I12, I17, I18, I19의 샘플 값들 각각과 필터 커널(330)의 F1, F2, F3, F4, F5, F6, F7, F8 및 F9 각각의 곱 연산이 수행되고, 곱 연산의 결과 값들을 조합한 값이 특징 맵(350)의 M2의 값으로 할당될 수 있다.In the convolution operation process, each of the sample values of I1, I2, I3, I8, I9, I10, I15, I16, and I17 of the frame 310 and F1, F2, F3, F4, F5, F6 of the filter kernel 330 is performed. , F7, F8, and F9 may each be multiplied, and a value obtained by combining (eg, addition operation) result values of the multiplication operation may be assigned as the value of M1 of the feature map 350 . If the stride of the convolution operation is 2, each of the sample values of I3, I4, I5, I10, I11, I12, I17, I18, I19 of the frame 310 and F1, F2, F3 of the filter kernel 330 , F4, F5, F6, F7, F8, and F9 may each be multiplied, and a value obtained by combining result values of the multiplication operation may be assigned as the value of M2 of the feature map 350 .

필터 커널(330)이 프레임(310)의 마지막 샘플에 도달할 때까지 스트라이드에 따라 이동하는 동안 프레임(310) 내 샘플 값들과 필터 커널(330)의 가중 값들 사이의 컨볼루션 연산이 수행됨으로써, 소정 크기를 갖는 특징 맵(350)이 획득될 수 있다.A convolution operation is performed between the sample values in the frame 310 and the weight values of the filter kernel 330 while the filter kernel 330 moves along the stride until the last sample of the frame 310 is reached. A feature map 350 having a size may be obtained.

도 4는 도 2에 도시된 움직임 예측 과정(210)을 설명하기 위한 도면이다.FIG. 4 is a diagram for explaining the motion prediction process 210 shown in FIG. 2 .

일 실시예에서, 영상 처리 장치(100)는 현재 프레임(X_t) 및 이전 프레임(X_t-1)에 대한 컨볼루션 연산(410, 420)을 기반으로 움직임 예측을 수행할 수 있다.In an embodiment, the image processing apparatus 100 may perform motion prediction based on the convolution operations 410 and 420 on the current frame X _t and the previous frame X _t-1 .

영상 처리 장치(100)는 미리 결정된 제 1 필터 커널(415)로 현재 프레임(X_t)에 대해 컨볼루션 연산(410)을 수행하여 제 1 특징 맵(417)을 획득하고, 미리 결정된 복수의 제 2 필터 커널들(425)로 이전 프레임(X_t-1)에 대해 컨볼루션 연산(420)을 수행하여 복수의 제 2 특징 맵들(427)을 획득할 수 있다.The image processing apparatus 100 obtains a first feature map 417 by performing a convolution operation 410 on the current frame X _t with a first predetermined filter kernel 415 , and a plurality of predetermined first filter kernels 415 . A plurality of second feature maps 427 may be obtained by performing a convolution operation 420 on the previous frame X _t−1 using the two filter kernels 425 .

제 1 필터 커널(415)에 기반한 컨볼루션 연산(410)은 현재 프레임(X_t)의 샘플들에 대해 순차적으로 수행될 수 있고, 복수의 제 2 필터 커널들(425)에 기반한 컨볼루션 연산(420)은 이전 프레임(X_t-1)의 샘플들에 대해 순차적으로 수행될 수 있다. A convolution operation 410 based on the first filter kernel 415 may be sequentially performed on samples of the current frame X _t , and a convolution operation based on a plurality of second filter kernels 425 ( 420 may be sequentially performed on samples of the previous frame (X _t−1 ).

현재 프레임(X_t)의 현재 샘플 및 현재 샘플의 주변 샘플들에 대해 제 1 필터 커널(415) 기반의 컨볼루션 연산(410)이 수행됨으로써, 제 1 특징 맵(415)의 콜로케이티드 샘플의 샘플 값이 획득될 수 있다. 또한, 이전 프레임(X_t-1)의 콜로케이티드 샘플 및 콜로케이티드 샘플의 주변 샘플들에 대해 복수의 제 2 필터 커널들(425) 기반의 컨볼루션 연산(420)이 수행됨으로써, 복수의 제 2 특징 맵들(427)의 콜로케이티드 샘플의 샘플 값이 획득될 수 있다.The convolution operation 410 based on the first filter kernel 415 is performed on the current sample of the current frame X _t and neighboring samples of the current sample, so that the collocated sample of the first feature map 415 is Sample values may be obtained. In addition, the convolution operation 420 based on the plurality of second filter kernels 425 is performed on the collocated sample of the previous frame (X _t-1 ) and neighboring samples of the collocated sample, so that a plurality of A sample value of the collocated sample of the second feature maps 427 may be obtained.

제 1 필터 커널(415) 및 복수의 제 2 필터 커널들(425)은 소정의 크기를 가질 수 있다. 예를 들어, 제 1 필터 커널(415) 및 복수의 제 2 필터 커널들(425)은 도 4에 도시된 것과 같이 3x3의 크기를 가질 수 있으나, 이에 한정되는 것은 아니다. 구현예에 따라, 제 1 필터 커널(415) 및 복수의 제 2 필터 커널들(425)은 4x4 또는 5x5 등의 크기를 가질 수도 있다.The first filter kernel 415 and the plurality of second filter kernels 425 may have a predetermined size. For example, the first filter kernel 415 and the plurality of second filter kernels 425 may have a size of 3x3 as shown in FIG. 4 , but is not limited thereto. Depending on the implementation, the first filter kernel 415 and the plurality of second filter kernels 425 may have a size of 4x4 or 5x5.

제 1 필터 커널(415)에 기반한 현재 프레임(X_t)에 대한 컨볼루션 연산(410) 결과 현재 프레임(X_t)과 동일한 크기의 제 1 특징 맵(417)이 획득될 수 있다. 컨볼루션 연산(410)을 통해 현재 프레임(X_t)과 동일한 크기의 제 1 특징 맵(417)을 획득하기 위해 현재 프레임(X_t)이 패딩(padding)될 수 있다. 패딩이란, 현재 프레임(X_t)의 좌측 바운더리, 상부 바운더리, 우측 바운더리 및 하부 바운더리 중 적어도 하나의 외부에 미리 결정된 샘플 값(예를 들어, 0)을 갖는 샘플을 할당하는 것을 의미한다. 패딩을 통해 현재 프레임(X_t)의 샘플 개수가 증가하게 된다.As a result of the convolution operation 410 for the current frame X _t based on the first filter kernel 415 , a first feature map 417 having the same size as the current frame X _t may be obtained. The current frame X _t may be padded to obtain the first feature map 417 having the same size as the current frame X _t through the convolution operation 410 . Padding means allocating a sample having a predetermined sample value (eg, 0) outside at least one of a left boundary, an upper boundary, a right boundary, and a lower boundary of the current frame (X _t ). The number of samples of the current frame (X _t ) is increased through padding.

제 1 필터 커널(415)은 현재 샘플에 대응하는 샘플이 기 설정된 제 1 값을 가지고, 나머지 샘플들이 0의 값을 가질 수 있다. 도 4는 제 1 필터 커널(415)이 1의 값과 0의 값을 가지는 것으로 도시하고 있으나, 1은 기 설정된 제 1 값의 예시일 뿐이다. In the first filter kernel 415 , a sample corresponding to the current sample may have a preset first value, and the remaining samples may have a value of zero. Although FIG. 4 shows that the first filter kernel 415 has a value of 1 and a value of 0, 1 is only an example of a preset first value.

제 1 필터 커널(415)은 현재 프레임(X_t)의 현재 샘플 및 현재 샘플들의 주변 샘플에 적용되는데, 현재 샘플에 대응하는 샘플이란, 제 1 필터 커널(415)의 샘플들 중 현재 샘플과의 곱연산에 적용되는 샘플을 의미한다. 도 4는 제 1 필터 커널(415)의 중앙 샘플이 1의 가중 값을 가지는 것으로 도시하고 있는데, 이는 현재 프레임(X_t)이 좌측 바운더리의 좌측 방향으로, 그리고 상부 바운더리의 상부 방향으로 패딩되어 있는 경우를 고려한 것이다. 예를 들어, 현재 프레임(X_t)의 좌상측 샘플에 대해 컨볼루션 연산(410)이 수행되는 경우, 현재 프레임(X_t)이 좌측 방향 및 상부 방향으로 패딩되어야만 좌상측 샘플과 제 1 필터 커널(415)의 중앙 샘플 사이의 곱연산이 수행될 수 있다. 따라서, 현재 프레임(X_t)이 좌측 방향 및 상부 방향으로 패딩되어 있지 않다면, 제 1 필터 커널(415)의 좌상측 샘플이 1의 값을 가지게 된다.The first filter kernel 415 is applied to the current sample of the current frame (X _t ) and neighboring samples of the current samples. It means the sample applied to the multiplication operation. 4 shows that the central sample of the first filter kernel 415 has a weight of 1, which means that the current frame X _t is padded to the left of the left boundary and toward the top of the upper boundary. case is taken into account. For example, when the convolution operation 410 is performed on the upper-left sample of the current frame (X _t ), the upper-left sample and the first filter kernel must be padded in the left and upper directions of the current frame (X _t ). A multiplication operation between the central samples of (415) may be performed. Accordingly, if the current frame X _t is not padded in the left direction and the upper direction, the upper left sample of the first filter kernel 415 has a value of 1.

제 1 필터 커널(415)의 샘플들 중 현재 샘플에 대응하는 샘플이 기 설정된 제 1 값을 가짐으로써, 제 1 특징 맵(417)의 샘플들이 현재 프레임(X_t)의 샘플들의 샘플 값들에 제 1 값을 곱한 값으로 산출된다. 따라서, 제 1 값이 1이라면, 제 1 특징 맵(417)의 샘플 값들은 현재 프레임(X_t)의 샘플 값들과 동일하게 된다. Among the samples of the first filter kernel 415 , a sample corresponding to the current sample has a preset first value, so that the samples of the first feature map 417 are added to sample values of samples of the current frame X _t . It is calculated by multiplying the value by 1. Accordingly, if the first value is 1, the sample values of the first feature map 417 are equal to the sample values of the current frame X _t .

구현예에 따라, 움직임 예측 과정에서 현재 프레임(X_t)에 대한 컨볼루션 연산(410)은 생략될 수도 있다. 그 이유는, 현재 샘플에 대응하는 제 1 필터 커널(415)의 샘플이 1의 값을 가지고, 그 이외의 샘플들이 0의 값을 가지는 경우, 컨볼루션 연산(410) 결과로 획득되는 제 1 특징 맵(417)은 결국 현재 프레임(X_t)과 동일하기 때문이다. 따라서, 이전 프레임(X_t-1)에 대한 컨볼루션 연산(420) 결과로 획득되는 제 2 특징 맵들(427)과 현재 프레임(X_t)의 비교를 통해 예측 샘플들(430)이 식별될 수 있다.According to an embodiment, the convolution operation 410 for the current frame (X _t ) may be omitted in the motion prediction process. The reason is that when the sample of the first filter kernel 415 corresponding to the current sample has a value of 1 and other samples have a value of 0, the first feature obtained as a result of the convolution operation 410 is This is because the map 417 is eventually identical to the current frame X _t . Accordingly, the prediction samples 430 can be identified through comparison of the second feature maps 427 obtained as a result of the convolution operation 420 for the previous frame X _t-1 and the current frame X _t . have.

이전 프레임(X_t-1)에 대한 컨볼루션 연산(420)에 이용되는 복수의 제 2 필터 커널들(425)은 0의 값 및 기 설정된 제 2 값을 갖는다. 기 설정된 제 2 값은 기 설정된 제 1 값과 동일할 수 있다. 예를 들어, 제 1 값과 제 2 값 모두 1일 수 있다. 제 1 값과 제 2 값이 동일한 경우, 제 1 필터 커널(415)은 복수의 제 2 필터 커널들(425) 중 어느 하나에 해당할 수 있다. The plurality of second filter kernels 425 used in the convolution operation 420 for the previous frame X _t-1 have a value of 0 and a preset second value. The preset second value may be the same as the preset first value. For example, both the first value and the second value may be 1. When the first value and the second value are the same, the first filter kernel 415 may correspond to any one of the plurality of second filter kernels 425 .

구현예에 따라, 기 설정된 제 2 값은 기 설정된 제 1 값과 부호가 다른 값일 수 있다. 예를 들어, 제 1 값이 1이면 제 2 값은 -1일 수 있다.According to an embodiment, the preset second value may have a different sign from the preset first value. For example, if the first value is 1, the second value may be -1.

복수의 제 2 필터 커널들(425)의 샘플들 중 어느 하나의 샘플은 기 설정된 제 2 값을 가지고, 나머지 샘플들은 0의 값을 가질 수 있다. 제 2 값을 가지는 샘플의 위치는 복수의 제 2 필터 커널(425) 별로 다를 수 있다. 도 4에 도시된 것과 같이, 어느 하나의 제 2 필터 커널(425)은 좌상측 샘플이 제 2 값을 가지고, 다른 하나의 제 2 필터 커널(425)은 좌상측 샘플의 우측 샘플이 제 2 값을 가질 수 있다.Any one of the samples of the plurality of second filter kernels 425 may have a preset second value, and the remaining samples may have a value of 0. The position of the sample having the second value may be different for each of the plurality of second filter kernels 425 . As shown in FIG. 4 , in any one of the second filter kernels 425 , the upper left sample has the second value, and in the other second filter kernel 425 , the right sample of the upper left sample has the second value. can have

제 2 필터 커널들(425)의 개수는 제 2 필터 커널들(425)의 크기에 따라 달라질 수 있다. 제 2 필터 커널들(425)의 크기가 3x3이라면, 제 2 필터 커널들(425)의 개수는 9개일 수 있다. 이는, 제 2 값을 갖는 샘플의 위치가 제 2 필터 커널들(425)마다 다르기 때문이다.The number of second filter kernels 425 may vary according to the size of the second filter kernels 425 . If the size of the second filter kernels 425 is 3x3, the number of the second filter kernels 425 may be nine. This is because the position of the sample having the second value is different for each of the second filter kernels 425 .

제 2 필터 커널들(425)을 이용한 이전 프레임(X_t-1)에 대한 컨볼루션 연산(420)을 통해 제 2 특징 맵들(427)이 획득된다. 제 2 특징 맵들(427)의 개수는 제 2 필터 커널들(425)의 개수와 동일하다.Second feature maps 427 are obtained through a convolution operation 420 on the previous frame X _t-1 using the second filter kernels 425 . The number of second feature maps 427 is equal to the number of second filter kernels 425 .

제 2 필터 커널들(425)은 이전 프레임(X_t-1)의 콜로케이티드 샘플과 콜로케이티드 샘플의 주변 샘플들 중 어느 하나의 샘플 값을 추출하는데 이용될 수 있다. 예를 들어, 좌상측 샘플이 제 2 값을 갖는 제 2 필터 커널은, 이전 프레임(X_t-1)의 콜로케이티드 샘플의 좌측 상부에 위치한 샘플의 샘플 값을 추출하는데 이용될 수 있고, 우상측 샘플이 제 2 값을 갖는 제 2 필터 커널은, 이전 프레임(X_t-1)의 콜로케이티드 샘플의 우측 상부에 위치한 샘플의 샘플 값을 추출하는데 이용될 수 있다.The second filter kernels 425 may be used to extract a sample value of any one of the collocated sample of the previous frame (X _t−1 ) and neighboring samples of the collocated sample. For example, the second filter kernel in which the upper-left sample has a second value may be used to extract a sample value of a sample located in the upper-left corner of the collocated sample of the previous frame (X _t-1 ), The second filter kernel in which the side sample has a second value may be used to extract a sample value of a sample located at the upper right of the collocated sample of the previous frame (X _t-1 ).

영상 처리 장치(100)는 제 1 특징 맵(417)의 샘플 값들과 제 2 특징 맵들(427)의 샘플 값들을 비교하여 현재 프레임(X_t) 내 샘플들의 예측 샘플들(430)을 식별할 수 있다. 영상 처리 장치(100)는 제 2 특징 맵들(427)의 소정 위치의 샘플 값들 중 제 1 특징 맵(417)의 소정 위치의 샘플 값과 가장 유사한 샘플을 확인하고, 확인된 샘플에 대응하는 이전 프레임(X_t-1) 내 샘플을 소정 위치의 현재 샘플의 예측 샘플(430)로 식별할 수 있다.The image processing apparatus 100 compares sample values of the first feature map 417 with sample values of the second feature maps 427 to identify predicted samples 430 of samples in the current frame X _t . have. The image processing apparatus 100 identifies a sample most similar to a sample value of a predetermined location of the first feature map 417 among sample values of a predetermined location of the second feature maps 427 and a previous frame corresponding to the checked sample A sample within (X _t-1 ) may be identified as a prediction sample 430 of the current sample at a predetermined position.

예를 들면, 현재 샘플이 현재 프레임(X_t) 내 중앙 샘플이라면, 제 2 특징 맵들(427)의 중앙 샘플들의 샘플 값들 중 제 1 특징 맵(417)의 중앙 샘플의 샘플 값과 가장 유사한 샘플 값이 확인된다. 그리고, 확인된 샘플 값에 대응하는 이전 프레임(X_t-1) 내 샘플이 현재 샘플의 예측 샘플(430)로 식별될 수 있다. 만약, 우상측 샘플이 제 2 값을 갖는 제 2 필터 커널(425)에 기반하여 생성된 제 2 특징 맵(427)의 중앙 샘플의 샘플 값이 제 1 특징 맵(417)의 중앙 샘플의 샘플 값과 가장 유사하다면, 이전 프레임(X_t-1)의 중앙 샘플의 우측 상부에 위치한 샘플이 현재 샘플의 예측 샘플(430)로 결정될 수 있다.For example, if the current sample is a center sample within the current frame X _t , a sample value most similar to a sample value of the center sample of the first feature map 417 among the sample values of the center samples of the second feature maps 427 . This is confirmed. In addition, a sample in the previous frame (X _t-1 ) corresponding to the checked sample value may be identified as the prediction sample 430 of the current sample. If the sample value of the center sample of the second feature map 427 generated based on the second filter kernel 425 in which the right-right sample has the second value is the sample value of the center sample of the first feature map 417 . , the sample located at the upper right of the center sample of the previous frame (X _t−1 ) may be determined as the prediction sample 430 of the current sample.

이하에서는, 도 5 내지 도 7을 참조하여 움직임 예측 과정에 대해 예를 들어 설명한다.Hereinafter, a motion prediction process will be described with reference to FIGS. 5 to 7 as an example.

도 5는 움직임 예측을 위해 현재 프레임(510)에 적용되는 컨볼루션 연산을 도시하는 예시적인 도면이다.5 is an exemplary diagram illustrating a convolution operation applied to a current frame 510 for motion prediction.

제 1 필터 커널(415)은, 현재 샘플에 대응하는 중앙 샘플이 1의 값을 가지고, 나머지 샘플들이 0의 값을 갖는다.In the first filter kernel 415 , the central sample corresponding to the current sample has a value of 1, and the remaining samples have a value of 0.

현재 프레임(510)은 a1, b1, c1, d1, e1, f1, g1, h1, i1의 샘플들을 가질 수 있다. 도 5는 현재 프레임(510)의 크기가 3x3인 것으로 도시하고 있으나, 이는 설명의 편의를 위한 것일 뿐, 현재 프레임(510)의 크기는 다양할 수 있다.The current frame 510 may have samples of a1, b1, c1, d1, e1, f1, g1, h1, and i1. 5 shows that the size of the current frame 510 is 3x3, this is only for convenience of description, and the size of the current frame 510 may vary.

현재 프레임(510)과 동일한 크기의 제 1 특징 맵(417)의 생성을 위해 현재 프레임(510)이 좌측 방향, 상부 방향, 우측 방향 및 하부 방향으로 패딩될 수 있다. 패딩을 통해 현재 프레임(510)의 좌측 방향, 상부 방향, 우측 방향 및 하부 방향으로 미리 결정된 샘플 값을 갖는 샘플들(p0 내지 p15)이 현재 프레임(510)에 부가될 수 있다.In order to generate the first feature map 417 having the same size as the current frame 510 , the current frame 510 may be padded in the left direction, the upper direction, the right direction, and the lower direction. Samples p0 to p15 having predetermined sample values in a left direction, an upper direction, a right direction, and a lower direction of the current frame 510 may be added to the current frame 510 through padding.

현재 프레임(510)의 샘플들에 대해 순차적으로 컨볼루션 연산을 수행하기 위해 컨볼루션 연산의 스트라이드는 1로 설정될 수 있다.In order to sequentially perform a convolution operation on samples of the current frame 510 , the stride of the convolution operation may be set to 1.

먼저, 현재 프레임(510)의 p0 샘플, p1 샘플, p2 샘플, p5 샘플, a1 샘플, b1 샘플, p7 샘플, d1 샘플 및 e1 샘플과 제 1 필터 커널(415)의 가중 값들의 컨볼루션 연산을 통해 제 1 특징 맵(417)의 첫 번째 샘플(즉, 좌상측 샘플)의 샘플 값이 도출된다. 제 1 필터 커널(415)의 중앙 샘플이 1의 값을 가지고, 나머지 샘플들이 0의 값을 가지므로, 제 1 특징 맵(417)의 좌상측 샘플의 샘플 값은 a1으로 도출된다.First, a convolution operation of the weight values of the p0 sample, p1 sample, p2 sample, p5 sample, a1 sample, b1 sample, p7 sample, d1 sample, and e1 sample of the current frame 510 and the first filter kernel 415 is performed. A sample value of the first sample (ie, the upper left sample) of the first feature map 417 is derived through this. Since the center sample of the first filter kernel 415 has a value of 1 and the remaining samples have a value of 0, the sample value of the upper-left sample of the first feature map 417 is derived as a1.

다음으로, 현재 프레임(510)의 p1 샘플, p2 샘플, p3 샘플, a1 샘플, b1 샘플, c1 샘플, d1 샘플, e1 샘플 및 f1 샘플과 제 1 필터 커널(415)의 가중 값들의 컨볼루션 연산을 통해 제 1 특징 맵(417)의 두 번째 샘플(즉, 좌상측 샘플의 우측에 위치하는 샘플)의 샘플 값이 도출된다. 컨볼루션 연산을 통해 제 1 특징 맵(417)의 두 번째 샘플의 샘플 값은 b1으로 도출된다.Next, a convolution operation of the weight values of the p1 sample, p2 sample, p3 sample, a1 sample, b1 sample, c1 sample, d1 sample, e1 sample, and f1 sample of the current frame 510 and the first filter kernel 415 is performed. A sample value of the second sample of the first feature map 417 (ie, a sample located to the right of the upper-left sample) is derived through . A sample value of the second sample of the first feature map 417 is derived as b1 through a convolution operation.

제 1 필터 커널(415)이 현재 프레임(510)의 마지막 샘플, 즉 i1 샘플에 도달할 때까지 현재 프레임(510)의 샘플들과 제 1 필터 커널(415)에 기반한 컨볼루션 연산이 수행된다. 제 1 필터 커널(415)이 i1 샘플에 도달하면, 현재 프레임(510)의 e1 샘플, f1 샘플, p8 샘플, h1 샘플, i1 샘플, p10 샘플, p13 샘플, p14 샘플 및 p15 샘플들과 제 1 필터 커널(415)의 가중 값들의 컨볼루션 연산을 통해 제 1 특징 맵(417)의 마지막 샘플의 샘플 값이 i1으로 도출된다.A convolution operation based on the samples of the current frame 510 and the first filter kernel 415 is performed until the first filter kernel 415 reaches the last sample of the current frame 510 , that is, the i1 sample. When the first filter kernel 415 arrives at the i1 sample, the e1 sample, the f1 sample, the p8 sample, the h1 sample, the i1 sample, the p10 sample, the p13 sample, the p14 sample and the p15 samples and the first sample of the current frame 510 . A sample value of the last sample of the first feature map 417 is derived as i1 through a convolution operation of the weight values of the filter kernel 415 .

도 5를 참조하면, 제 1 필터 커널(415)의 중앙 샘플이 1의 값을 가지는 경우, 현재 프레임(510)의 샘플 값들과 제 1 특징 맵(417)의 샘플 값들이 동일하게 되는 것을 알 수 있다. 즉, 현재 샘플에 대응하는 샘플이 1의 값을 갖는 제 1 필터 커널(415)은, 현재 프레임(510)의 샘플 값들을 추출하는데 이용된다.Referring to FIG. 5 , when the central sample of the first filter kernel 415 has a value of 1, it can be seen that the sample values of the current frame 510 and the sample values of the first feature map 417 are the same. have. That is, the first filter kernel 415 in which a sample corresponding to the current sample has a value of 1 is used to extract sample values of the current frame 510 .

도 6은 움직임 예측을 위해 이전 프레임에 적용되는 컨볼루션 연산을 도시하는 예시적인 도면이다.6 is an exemplary diagram illustrating a convolution operation applied to a previous frame for motion prediction.

제 2 필터 커널(425)들은, 1의 값을 가지는 하나의 샘플과 0의 값을 가지는 다른 샘플들을 포함할 수 있다. 전술한 바와 같이, 1의 가중 값을 가지는 샘플의 위치는 제 2 필터 커널(425)마다 다를 수 있다. 제 2 필터 커널(425)들은, -1의 값을 가지는 하나의 샘플과 0의 값을 가지는 다른 샘플들을 포함할 수도 있다.The second filter kernels 425 may include one sample having a value of 1 and other samples having a value of 0. As described above, a position of a sample having a weight of 1 may be different for each second filter kernel 425 . The second filter kernels 425 may include one sample having a value of -1 and other samples having a value of 0.

이전 프레임(530)은 a2, b2, c2, d2, e2, f2, g2, h2, i2의 샘플들을 포함할 수 있다. 도 6은 이전 프레임(530)의 크기가 3x3인 것으로 도시하고 있으나, 이는 설명의 편의를 위한 것일 뿐, 이전 프레임(530)의 크기는 다양할 수 있다.The previous frame 530 may include samples of a2, b2, c2, d2, e2, f2, g2, h2, and i2. 6 illustrates that the size of the previous frame 530 is 3x3, this is only for convenience of description, and the size of the previous frame 530 may vary.

이전 프레임(530)과 동일한 크기의 제 2 특징 맵(427)들의 생성을 위해 이전 프레임(530)이 좌측 방향, 상부 방향, 우측 방향 및 하부 방향으로 패딩될 수 있다. 패딩을 통해 이전 프레임(530)의 좌측 방향, 상부 방향, 우측 방향 및 하부 방향으로 미리 결정된 샘플 값을 갖는 샘플들이 이전 프레임(530)에 부가될 수 있다.In order to generate the second feature maps 427 having the same size as the previous frame 530 , the previous frame 530 may be padded in the left direction, the upper direction, the right direction, and the lower direction. Samples having predetermined sample values in a left direction, an upper direction, a right direction, and a lower direction of the previous frame 530 may be added to the previous frame 530 through padding.

이전 프레임(530)과 제 2 필터 커널(425)들 각각에 기반한 컨볼루션 연산을 통해 제 2 필터 커널(425)들에 대응하는 제 2 특징 맵(427)들이 획득될 수 있다. Second feature maps 427 corresponding to the second filter kernels 425 may be obtained through a convolution operation based on the previous frame 530 and each of the second filter kernels 425 .

이하에서는, 제 2 필터 커널(425)들의 구분을 위해 1의 값을 가지는 샘플의 위치에 따라 제 2 필터 커널(425)들 각각을 제 2 필터 커널(A)(425-1), 제 2 필터 커널(B)(425-2) 내지 제 2 필터 커널(I)(425-9)로 참조하고, 제 2 특징 맵(427)들의 구분을 위해 제 2 특징 맵(427)들 각각을 제 2 특징 맵(A)(427-1), 제 2 특징 맵(B)(427-2) 내지 제 2 특징 맵(I)(427-9)로 참조한다.Hereinafter, each of the second filter kernels 425 is referred to as a second filter kernel (A) 425-1 and a second filter according to a location of a sample having a value of 1 to distinguish the second filter kernels 425. Referring to as a kernel (B) 425-2 to a second filter kernel (I) 425-9, each of the second feature maps 427 is used to distinguish between the second feature maps 427 as a second feature. Reference is made to the map (A) 427-1, the second feature map (B) 427-2 to the second feature map (I) 427-9.

이전 프레임(530)의 샘플들에 대해 순차적으로 컨볼루션 연산을 수행하기 위해 컨볼루션 연산의 스트라이드는 1로 설정될 수 있다.In order to sequentially perform a convolution operation on samples of the previous frame 530 , the stride of the convolution operation may be set to 1.

먼저, 좌상측 샘플 값이 1인 제 2 필터 커널(A)(425-1)과 이전 프레임(530)에 기반한 컨볼루션 연산을 통해 제 2 특징 맵(A)(427-1)가 획득될 수 있다. 전술한 바와 같이, 제 2 필터 커널(A)(425-1)는 1의 스트라이드에 따라 이동하면서 이전 프레임(530)과 컨볼루션 연산될 수 있다. 제 2 필터 커널(A)(425-1)는 이전 프레임(530)의 샘플들(a2 샘플, b2 샘플, c2 샘플, d2 샘플, e2 샘플, f2 샘플, g2 샘플, h2 샘플, i2 샘플)의 좌측 상부에 위치한 샘플 값을 추출하기 위한 것이다. 이에 따라, 제 2 특징 맵(A)(427-1)은 이전 프레임(530)의 샘플들(a2 샘플, b2 샘플, c2 샘플, d2 샘플, e2 샘플, f2 샘플, g2 샘플, h2 샘플, i2 샘플)의 좌측 상부에 위치하는 샘플들의 샘플 값에 1을 곱한 값을 가지게 된다. 예를 들어, 현재 샘플이 a1이라면, 이전 프레임(530)의 콜로케이티드 샘플은 a2가 되고, 제 2 특징 맵(A)(427-1)의 콜로케이티드 샘플의 샘플 값은 a2 샘플의 좌측 상부에 위치하는 p0의 샘플 값으로 도출된다.First, a second feature map (A) 427-1 may be obtained through a convolution operation based on the second filter kernel (A) 425-1 having the upper-left sample value of 1 and the previous frame 530. have. As described above, the second filter kernel (A) 425 - 1 may be convolved with the previous frame 530 while moving along the stride of 1. The second filter kernel (A) 425 - 1 of the samples (a2 sample, b2 sample, c2 sample, d2 sample, e2 sample, f2 sample, g2 sample, h2 sample, i2 sample) of the previous frame 530 ) This is to extract the sample value located in the upper left corner. Accordingly, the second feature map (A) 427 - 1 is the samples (a2 sample, b2 sample, c2 sample, d2 sample, e2 sample, f2 sample, g2 sample, h2 sample, i2 sample) of the previous frame 530 . sample) has a value obtained by multiplying the sample value of the samples located in the upper left corner by 1. For example, if the current sample is a1, the collocated sample of the previous frame 530 becomes a2, and the sample value of the collocated sample of the second feature map (A) 427-1 is to the left of the a2 sample. It is derived as a sample value of p0 located at the top.

다음으로, 좌상측 샘플의 우측에 위치하는 샘플의 값이 1인 제 2 필터 커널(B)(425-2)과 이전 프레임(530) 사이의 컨볼루션 연산을 통해 제 2 특징 맵(B)(427-2)가 획득될 수 있다. 제 2 필터 커널(B)(425-2)는 이전 프레임(530)의 샘플들(a2 샘플, b2 샘플, c2 샘플, d2 샘플, e2 샘플, f2 샘플, g2 샘플, h2 샘플, i2 샘플)의 상부에 위치한 샘플 값을 추출하기 위한 것이다. 이에 따라, 제 2 특징 맵(B)(427-2)은 이전 프레임(530)의 샘플들(a2 샘플, b2 샘플, c2 샘플, d2 샘플, e2 샘플, f2 샘플, g2 샘플, h2 샘플, i2 샘플)의 상부에 위치하는 샘플들의 샘플 값에 1을 곱한 값을 가지게 된다. 예를 들어, 현재 샘플이 a1이라면, 이전 프레임(530)의 콜로케이티드 샘플은 a2가 되고, 제 2 특징 맵(B)(427-2)의 콜로케이티드 샘플의 샘플 값은 a2 샘플의 상부에 위치하는 p1의 샘플 값으로 도출된다.Next, a second feature map (B) ( 427-2) can be obtained. The second filter kernel (B) 425 - 2 of the samples of the previous frame 530 (a2 sample, b2 sample, c2 sample, d2 sample, e2 sample, f2 sample, g2 sample, h2 sample, i2 sample) This is to extract the sample value located on the top. Accordingly, the second feature map (B) 427 - 2 is the samples (a2 sample, b2 sample, c2 sample, d2 sample, e2 sample, f2 sample, g2 sample, h2 sample, i2 sample) of the previous frame 530 . It has a value obtained by multiplying the sample value of the samples located above the sample) by 1. For example, if the current sample is a1, the collocated sample of the previous frame 530 becomes a2, and the sample value of the collocated sample of the second feature map (B) 427-2 is the upper part of the a2 sample. It is derived as a sample value of p1 located in .

이와 같이, 제 2 필터 커널(A)(425-1) 내지 제 2 필터 커널(I)(425-9) 각각과 이전 프레임(530)의 컨볼루션 연산을 통해 제 2 특징 맵(A)(427-1) 내지 제 2 특징 맵(I)(427-9)가 획득될 수 있다.In this way, the second feature map (A) 427 through the convolution operation of each of the second filter kernel (A) 425-1 to the second filter kernel (I) 425-9 and the previous frame 530 . -1) to the second feature map (I) 427-9 may be obtained.

도 7은 현재 프레임 내 샘플들에 대응하는 예측 샘플들(430)을 도시하는 예시적인 도면이다.7 is an exemplary diagram illustrating prediction samples 430 corresponding to samples in a current frame.

영상 처리 장치(100)는 제 1 특징 맵(417) 내 각 샘플이, 제 2 특징 맵(427)들 내 샘플들 중 어느 샘플과 가장 유사한지를 확인한다. 이 때, 제 1 특징 맵(417)의 샘플들과 제 2 특징 맵(427)들의 샘플들 중 동일 위치에 있는 샘플들이 비교 대상이 된다. 구체적으로, 영상 처리 장치(100)는 제 1 특징 맵(417) 내 특정 위치의 샘플의 샘플 값과 제 2 특징 맵(427)들 내 특정 위치의 샘플들의 샘플 값들 사이의 차이의 절대 값을 산출하고, 산출된 절대 값이 가장 작은 샘플 값을 확인할 수 있다. 영상 처리 장치(100)는 차이의 절대 값이 가장 작은 샘플 값에 대응하는 이전 프레임(530) 내 샘플을 예측 샘플로 결정할 수 있다. The image processing apparatus 100 identifies which sample in the first feature map 417 is most similar to which sample among the samples in the second feature maps 427 . In this case, samples at the same location among the samples of the first feature map 417 and the samples of the second feature maps 427 are compared. Specifically, the image processing apparatus 100 calculates an absolute value of a difference between a sample value of a sample at a specific location in the first feature map 417 and sample values of samples at a specific location in the second feature maps 427 . and a sample value having the smallest calculated absolute value can be identified. The image processing apparatus 100 may determine a sample in the previous frame 530 corresponding to a sample value having the smallest absolute value of the difference as the prediction sample.

전술한 바와 같이, 제 1 필터 커널(415)의 어느 한 샘플이 가지는 제 1 값의 부호와 제 2 필터 커널(425)의 어느 한 샘플이 가지는 제 2 값의 부호가 서로 동일하다면, 제 1 특징 맵(417) 내 샘플의 샘플 값과 제 2 특징 맵(427)들 내 샘플들의 샘플 값들 사이의 차이는 차 연산을 통해 산출될 수 있고, 반대로, 제 1 값의 부호와 제 2 값의 부호가 서로 다르다면, 제 1 특징 맵(417) 내 샘플의 샘플 값과 제 2 특징 맵(427)들 내 샘플들의 샘플 값들 사이의 차이는 합 연산을 통해 산출될 수 있다.As described above, if the sign of the first value of any one sample of the first filter kernel 415 and the sign of the second value of any one sample of the second filter kernel 425 are the same, the first characteristic The difference between the sample value of the sample in the map 417 and the sample values of the samples in the second feature maps 427 may be calculated through a difference operation, and conversely, the sign of the first value and the sign of the second value are If different, a difference between the sample values of the samples in the first feature map 417 and the sample values of the samples in the second feature maps 427 may be calculated through a sum operation.

도 5에 도시된 제 1 특징 맵(417)의 좌상단 샘플을 기준으로 설명하면, 영상 처리 장치(100)는 제 1 특징 맵(417)의 좌상단 샘플 a1의 샘플 값과 제 2 특징 맵(427)들의 좌상단 샘플의 샘플 값들 사이의 차이의 절대 값을 산출할 수 있다. 예를 들어, 제 1 특징 맵(417)의 좌상단 샘플 a1의 샘플 값과 제 2 특징 맵(A)(427-1)의 좌상단 샘플 p0의 샘플 값 사이에서 |a1-p0|이 산출되고, 제 1 특징 맵(417)의 좌상단 샘플 a1의 샘플 값과 제 2 특징 맵(B)(427-2)의 좌상단 샘플 p1의 샘플 값 사이에서 |a1-p1|이 산출될 수 있다. 만약, 제 2 특징 맵(I)(427-9)의 좌상단 샘플 e2의 샘플 값과 제 1 특징 맵(417)의 좌상단 샘플 a1의 샘플 값 사이의 차이의 절대 값이 가장 작다면, 제 2 특징 맵(I)(427-9)의 좌상단 샘플 e2 샘플에 대응하는 이전 프레임(530) 내 e2 샘플이 현재 프레임(510)의 좌상단 샘플 a1의 예측 샘플로 결정될 수 있다.Referring to the upper-left sample of the first feature map 417 illustrated in FIG. 5 , the image processing apparatus 100 generates a sample value of the upper-left sample a1 of the first feature map 417 and the second feature map 427 . It is possible to calculate the absolute value of the difference between the sample values of the upper-left sample of . For example, |a1-p0| is calculated between the sample value of the upper-left sample a1 of the first feature map 417 and the sample value of the upper-left sample p0 of the second feature map (A) 427-1, |a1-p1| may be calculated between the sample value of the upper-left sample a1 of the first feature map 417 and the sample value of the upper-left sample p1 of the second feature map (B) 427-2. If the absolute value of the difference between the sample value of the upper-left sample e2 of the second feature map (I) 427-9 and the sample value of the upper-left sample a1 of the first feature map 417 is the smallest, the second feature The sample e2 in the previous frame 530 corresponding to the sample e2 of the upper left of the map (I) 427 - 9 may be determined as the prediction sample of the upper left sample a1 of the current frame 510 .

이와 같이, 제 1 특징 맵(417)의 샘플 값들과 제 2 특징 맵(427)들의 샘플 값들을 비교함으로써, 현재 프레임(510)의 각 샘플에 대응하는 예측 샘플들(430)이 식별될 수 있다.As such, by comparing sample values of the first feature map 417 with sample values of the second feature maps 427 , prediction samples 430 corresponding to each sample of the current frame 510 may be identified. .

도 7에서는, 현재 프레임(510)의 샘플들에 대응하는 예측 샘플들이 b2, e2, f2, e2, f2, i2, h2, e2, i2인 것으로 결정된 것을 알 수 있다.7 , it can be seen that prediction samples corresponding to samples of the current frame 510 are determined to be b2, e2, f2, e2, f2, i2, h2, e2, and i2.

도 4 내지 도 7에서 예측 샘플의 결정 과정을 중심으로 하여 움직임 예측 과정에 대해 설명하였으나, 움직임 예측 과정은 움직임 벡터를 찾는 과정으로 이해될 수 있다. 전술한 제 2 필터 커널(425)들 각각은 이전 프레임(530)의 콜로케이티드 샘플 및 그 주변 샘플들과 현재 샘플 사이의 위치 관계, 즉 움직임 벡터 후보로 해석될 수 있다. 다시 말하면, 움직임 예측 과정은 여러 움직임 벡터 후보들(여러 개의 제 2 필터 커널들) 중 현재 샘플과 가장 유사한 샘플을 가리키는 움직임 벡터 후보(어느 하나의 제 2 필터 커널)를 현재 샘플의 움직임 벡터로 결정하는 과정일 수 있다.Although the motion prediction process has been described with reference to the process of determining the prediction sample in FIGS. 4 to 7 , the motion prediction process may be understood as a process of finding a motion vector. Each of the aforementioned second filter kernels 425 may be interpreted as a positional relationship between the collocated sample of the previous frame 530 and its neighboring samples and the current sample, that is, a motion vector candidate. In other words, the motion prediction process determines a motion vector candidate (one second filter kernel) indicating a sample most similar to the current sample among several motion vector candidates (a plurality of second filter kernels) as the motion vector of the current sample. It can be a process

도 8은 도 2에 도시된 움직임 보상 과정(220)을 설명하기 위한 도면이다.FIG. 8 is a diagram for explaining the motion compensation process 220 shown in FIG. 2 .

움직임 보상 과정은, 현재 프레임(X_t)의 샘플들과 동일한 위치에 있는 이전 프레임(X_t-1)의 샘플들의 샘플 값들을 예측 샘플들의 샘플 값들에 따라 변경하는 과정이다. 움직임 보상 과정을 통해 예측 프레임(X_{t_pred})이 획득될 수 있다.The motion compensation process is a process of changing sample values of samples of a previous frame (X _t-1 ) at the same position as samples of a current frame (X _t ) according to sample values of prediction samples. A prediction frame (X _{t_pred} ) may be obtained through a motion compensation process.

영상 처리 장치(100)는 움직임 예측 과정과 마찬가지로 컨볼루션 연산을 통해 움직임 보상 과정을 수행할 수 있다.The image processing apparatus 100 may perform a motion compensation process through a convolution operation similar to the motion prediction process.

영상 처리 장치(100)는 미리 결정된 복수의 제 3 필터 커널(815) 중 이전 프레임(X_t-1)의 각 샘플의 움직임 보상에 이용될 제 3 필터 커널을 선택하고, 선택된 제 3 필터 커널에 기반한 컨볼루션 연산을 이전 프레임(X_t-1)의 각 샘플에 적용할 수 있다. 이 때, 이전 프레임(X_t-1)의 샘플 별로 그에 대응하는 제 3 필터 커널이 선택될 수 있다. The image processing apparatus 100 selects a third filter kernel to be used for motion compensation of each sample of the previous frame (X _t-1 ) from among a plurality of predetermined third filter kernels 815, and applies the selected third filter kernel to the selected third filter kernel. A convolution operation based on this can be applied to each sample of the previous frame (X _t-1 ). In this case, a third filter kernel corresponding to each sample of the previous frame (X _t-1 ) may be selected.

복수의 제 3 필터 커널들(815)은 미리 결정된 제 3 값을 갖는 샘플과 0을 갖는 샘플들을 포함할 수 있고, 제 3 값을 갖는 샘플의 위치는 제 3 필터 커널(815)마다 다를 수 있다. 제 3 값은 예를 들어, 1일 수 있다. 구현예에 따라, 움직임 예측 과정에서 이용된 복수의 제 2 필터 커널(425)들이 움직임 보상 과정에서도 사용될 수 있다.The plurality of third filter kernels 815 may include a sample having a third predetermined value and samples having a zero value, and the location of the sample having the third value may be different for each third filter kernel 815 . . The third value may be, for example, one. According to an embodiment, the plurality of second filter kernels 425 used in the motion prediction process may also be used in the motion compensation process.

도 9는 움직임 예측 결과를 이용하여 이전 프레임을 움직임 보상하는 과정을 도시하는 예시적인 도면이다.9 is an exemplary diagram illustrating a process of motion compensation for a previous frame using a motion prediction result.

제 3 필터 커널(815)들 각각의 구별을 위해 제 3 값을 갖는 샘플의 위치에 따라 제 3 필터 커널(A)(815-1) 내지 제 3 필터 커널(I)(815-9)로 참조한다.Reference is made to the third filter kernel (A) 815-1 to the third filter kernel (I) 815-9 according to the location of the sample having the third value to distinguish each of the third filter kernels 815 . do.

영상 처리 장치(100)는 현재 프레임(510)의 샘플들과 동일한 위치에 있는 이전 프레임(530)의 샘플들 각각에 대하여, 예측 샘플에 대응하는 위치에서 제 3 값을 갖는 제 3 필터 커널(815)을 선택할 수 있다. The image processing apparatus 100 generates a third filter kernel 815 having a third value at a position corresponding to the prediction sample with respect to each of the samples of the previous frame 530 located at the same position as the samples of the current frame 510 . ) can be selected.

먼저, 이전 프레임(530)의 좌상단에 위치한 a2 샘플을 기준으로 설명한다. 현재 프레임(510)의 좌상단에 위치한 a1 샘플의 예측 샘플이 b2 샘플로 결정되었다면, 영상 처리 장치(100)는 중앙 샘플의 우측에 위치하면서 1의 값을 가지는 샘플과 0의 값을 가지는 나머지 샘플들을 포함하는 제 3 필터 커널(F)(815-6)을 a2 샘플을 위해 선택할 수 있다. 이 경우, 영상 처리 장치(100)는 이전 프레임(530)의 p0 샘플, p1 샘플, p2 샘플, p5 샘플, a2 샘플, b2 샘플, p7 샘플, d2 샘플 및 e2 샘플과 제 3 필터 커널(F)(815-6)의 0, 0, 0, 0, 0, 1, 0, 0 및 0에 기반한 곱연산 및 합연산을 통해 예측 프레임(900)의 좌상단 샘플 b2를 도출할 수 있다. 즉, 이전 프레임(530)의 a2 샘플과 그 주변 샘플들에 대한 컨볼루션 연산을 통해 a2 샘플이 예측 프레임(900)에서 b2 샘플로 교체된 것을 알 수 있다.First, the a2 sample located at the upper left of the previous frame 530 will be described as a reference. If the prediction sample of the a1 sample located at the upper left of the current frame 510 is determined to be the b2 sample, the image processing apparatus 100 performs the sample having a value of 1 and the remaining samples having a value of 0 while located to the right of the central sample. The included third filter kernel (F) 815 - 6 may be selected for the a2 sample. In this case, the image processing apparatus 100 includes the p0 sample, the p1 sample, the p2 sample, the p5 sample, the a2 sample, the b2 sample, the p7 sample, the d2 sample, and the e2 sample of the previous frame 530 and the third filter kernel (F). The upper-left sample b2 of the prediction frame 900 may be derived through multiplication and sum operations based on 0, 0, 0, 0, 0, 1, 0, 0, and 0 in (815-6). That is, it can be seen that the a2 sample is replaced with the b2 sample in the prediction frame 900 through a convolution operation on the a2 sample of the previous frame 530 and its neighboring samples.

다음으로, 현재 프레임(510)의 중앙 샘플의 상부에 위치하는 b1 샘플의 예측 샘플이 e2 샘플로 결정되었다면, 중앙 샘플의 하부에 위치하면서 1의 값을 가지는 샘플과 0의 값을 가지는 나머지 샘플들을 포함하는 제 3 필터 커널(H)(815-8)이 b2 샘플을 위해 선택될 수 있다. 영상 처리 장치(100)는 이전 프레임(530)의 p1 샘플, p2 샘플, p3 샘플, a2 샘플, b2 샘플, c2 샘플, d2 샘플, e2 샘플 및 f2 샘플과 제 3 필터 커널(H)(815-8)의 0, 0, 0, 0, 0, 0, 0, 1 및 0에 기반한 곱연산 및 합연산을 통해 예측 프레임(900)의 중앙 샘플의 상부에 위치하는 샘플 e2를 도출할 수 있다. 즉, 이전 프레임(530)의 b2 샘플과 그 주변 샘플들에 대한 컨볼루션 연산을 통해 이전 프레임(530)의 b2 샘플이 예측 프레임(900)에서 e2 샘플로 교체된 것을 알 수 있다.Next, if the prediction sample of the b1 sample located above the center sample of the current frame 510 is determined to be the e2 sample, the sample having a value of 1 and the remaining samples having a value of 0 while located below the center sample are combined. The containing third filter kernel (H) 815 - 8 may be selected for the b2 sample. The image processing apparatus 100 includes the p1 sample, the p2 sample, the p3 sample, the a2 sample, the b2 sample, the c2 sample, the d2 sample, the e2 sample, the f2 sample, and the third filter kernel (H) 815 of the previous frame 530 . 8), the sample e2 positioned above the center sample of the prediction frame 900 may be derived through the multiplication and sum operations based on 0, 0, 0, 0, 0, 0, 0, 1, and 0. That is, it can be seen that the b2 sample of the previous frame 530 is replaced with the e2 sample in the prediction frame 900 through a convolution operation on the b2 sample of the previous frame 530 and its neighboring samples.

이전 프레임(530)의 첫 번째 샘플부터 마지막 샘플까지 각 샘플에 대응하는 제 3 필터 커널(815)을 기반으로 컨볼루션 연산을 수행함에 따라 현재 프레임(510)의 예측 버전인 예측 프레임(900)이 생성될 수 있다.As a convolution operation is performed based on the third filter kernel 815 corresponding to each sample from the first sample to the last sample of the previous frame 530, the prediction frame 900, which is the prediction version of the current frame 510, is can be created

도 10은 움직임 보상의 결과로 획득된 예측 프레임(900)에 가중치(950)를 적용하는 과정을 도시하는 예시적인 도면이다.10 is an exemplary diagram illustrating a process of applying a weight 950 to a prediction frame 900 obtained as a result of motion compensation.

영상 처리 장치(100)는 현재 프레임(510) 내 현재 샘플과 이전 프레임(530) 내 예측 샘플(또는 예측 프레임(900) 내 콜로케이티드 샘플) 사이의 차이 값에 기반하여 가중치(950)를 산출할 수 있다. 영상 처리 장치(100)는 현재 프레임(510)의 샘플별로 가중치(950)를 산출할 수 있다.The image processing apparatus 100 calculates a weight 950 based on a difference value between a current sample in the current frame 510 and a predicted sample in the previous frame 530 (or a collocated sample in the prediction frame 900 ). can do. The image processing apparatus 100 may calculate a weight 950 for each sample of the current frame 510 .

전술한 바와 같이, 가중치(950)는 예측 프레임(900)의 샘플들이 현재 프레임(510)의 처리에 어느 정도로 도움이 되는지를 나타낸다.As described above, the weight 950 indicates to what extent the samples of the prediction frame 900 help the processing of the current frame 510 .

가중치(950)는 하기 수학식 1에 기반하여 도출될 수 있다.The weight 950 may be derived based on Equation 1 below.

[수학식 1][Equation 1]

수학식 1에서,

는 미리 결정된 상수로서 예를 들어 16일 수 있다. 수학식 1을 참조하면, 현재 샘플의 샘플 값과 예측 샘플의 샘플 값이 동일하다면, 가중치(950)는 1로 산출되고, 현재 샘플의 샘플 값과 예측 샘플의 샘플 값의 차이 값이 커질수록 가중치(950)는 작게 산출되는 것을 알 수 있다.In Equation 1,

may be, for example, 16 as a predetermined constant. Referring to Equation 1, if the sample value of the current sample and the sample value of the prediction sample are the same, the weight 950 is calculated as 1, and as the difference between the sample value of the current sample and the sample value of the prediction sample increases, the weight It can be seen that 950 is calculated to be small.

영상 처리 장치(100)는 예측 프레임(900)의 각 샘플에 대해 그에 대응하는 가중치(950)를 곱함으로써 가중치 적용된 예측 프레임(1000)을 획득할 수 있다.The image processing apparatus 100 may obtain the weighted prediction frame 1000 by multiplying each sample of the prediction frame 900 by a weight 950 corresponding thereto.

전술한 바와 같이, 영상 처리 장치(100)는 예측 출력 프레임 및 예측 특징 맵의 각 샘플에 대해 그에 대응하는 가중치(950)를 적용하여 가중치 적용된 예측 출력 프레임 및 가중치 적용된 예측 특징 맵을 획득할 수도 있다.As described above, the image processing apparatus 100 may obtain a weighted prediction output frame and a weighted prediction feature map by applying a weight 950 corresponding thereto to each sample of the prediction output frame and the prediction feature map. .

전술한 것과 같이, 일 실시예에서 움직임 예측 과정과 움직임 보상 과정은 컨볼루션 연산을 기반으로 수행될 수 있다. 도 4에 도시된 것과 같이, 움직임 예측 과정이 한번(현재 프레임에 대한 컨볼루션 연산(410)이 생략되는 경우) 또는 두번의 컨볼루션 연산을 통해 수행될 수 있고, 도 8에 도시된 것과 같이, 움직임 보상 과정이 한번의 컨볼루션 연산을 통해 수행될 수 있으므로, 연산의 복잡성이 상당히 감소될 수 있다.As described above, in an embodiment, the motion prediction process and the motion compensation process may be performed based on a convolution operation. As shown in Fig. 4, the motion prediction process can be performed once (when the convolution operation 410 for the current frame is omitted) or through two convolution operations, and as shown in Fig. 8, Since the motion compensation process can be performed through one convolution operation, the complexity of the operation can be significantly reduced.

한편, 전술한 움직임 예측 과정은 다운샘플링된 현재 프레임 및 다운샘플링된 이전 프레임에 대해 적용될 수도 있다. 이는, 움직임 예측 과정에 따른 부하 및 복잡도를 감소시키기 위함이다. 여기서, 다운샘플링이란, 프레임 내 샘플 개수를 감소시키는 처리를 의미한다. 프레임의 다운샘플링은 다양한 방법으로 수행될 수 있다. 일예로, 현재 프레임과 이전 프레임을 풀링(pooling) 처리하여 현재 프레임과 이전 프레임 내 샘플 개수를 감소시킬 수 있다. 풀링 처리는 맥스 풀링(max pooling) 처리 또는 평균 풀링(average pooling) 처리를 포함할 수 있다. 풀링 처리는 인공신경망 분야에서 이용되는 풀링 레이어로부터 자명하게 이해될 수 있으므로 그 상세한 설명은 생략한다. 구현예에 따라, 현재 프레임 및 이전 프레임의 다운샘플링은 공지된 다양한 다운샘플링 알고리즘을 통해 수행될 수도 있다.Meanwhile, the above-described motion prediction process may be applied to the downsampled current frame and the downsampled previous frame. This is to reduce the load and complexity of the motion prediction process. Here, the downsampling means a process of reducing the number of samples in a frame. Downsampling of the frame may be performed in various ways. For example, the current frame and the previous frame may be pooled to reduce the number of samples in the current frame and the previous frame. The pooling process may include a max pooling process or an average pooling process. Since the pooling process can be clearly understood from the pooling layer used in the field of artificial neural networks, a detailed description thereof will be omitted. Depending on the implementation, the downsampling of the current frame and the previous frame may be performed through various known downsampling algorithms.

다운샘플링된 현재 프레임과 다운샘플링된 이전 프레임을 대상으로 하여 움직임 예측 과정을 수행하게 되면, 다운샘플링된 현재 프레임에 포함된 샘플 개수만큼의 움직임 벡터들이 도출된다. 움직임 보상 과정에서 필요한 움직임 벡터의 개수는 다운샘플링된 프레임에 기반한 움직임 예측 과정을 통해 획득되는 움직임 벡터의 개수보다 많으므로, 움직임 예측 과정에서 획득되는 움직임 벡터의 개수를 증가시켜야 한다.When a motion prediction process is performed with respect to the downsampled current frame and the downsampled previous frame, motion vectors equal to the number of samples included in the downsampled current frame are derived. Since the number of motion vectors required in the motion compensation process is greater than the number of motion vectors acquired through the downsampled frame-based motion prediction process, the number of motion vectors acquired in the motion prediction process should be increased.

움직임 예측 과정에서 획득되는 움직임 벡터의 개수를 증가시키는 방법에 대해 도 11을 참조하여 설명한다.A method of increasing the number of motion vectors obtained in the motion prediction process will be described with reference to FIG. 11 .

도 11은 다운샘플링된 프레임을 대상으로 하여 획득된 움직임 벡터의 개수를 증가시키는 방법을 설명하기 위한 도면이다.11 is a diagram for explaining a method of increasing the number of motion vectors obtained by targeting a downsampled frame.

도 11을 참조하면, 다운샘플링된 프레임(1110)의 크기는 2x2이고, 다운샘플링되기 전의 프레임(1130)의 크기는 4x4이다. 다운샘플링된 프레임(1110)의 크기는 다운샘플링 비율에 따라 다양하게 변경될 수 있다.Referring to FIG. 11 , the size of the downsampled frame 1110 is 2x2, and the size of the downsampled frame 1130 is 4x4. The size of the downsampled frame 1110 may be variously changed according to the downsampling ratio.

전술한 움직임 예측 과정을 다운샘플링된 프레임(1110)에 대해 적용하면, 프레임(1110)에 포함된 4개의 샘플들에 대응하는 4개의 움직임 벡터(즉, 필터 커널)가 도출된다. 다운샘플링되기 전의 프레임(1130)의 크기는 4x4이므로, 움직임 보상 과정에서 16개의 움직임 벡터가 요구된다.When the above-described motion prediction process is applied to the downsampled frame 1110 , four motion vectors (ie, filter kernels) corresponding to the four samples included in the frame 1110 are derived. Since the size of the frame 1130 before downsampling is 4x4, 16 motion vectors are required in the motion compensation process.

일예로, 영상 처리 장치(100)는 다운샘플링된 프레임(1110) 내 샘플들의 개수에 따라 다운샘플링되기 전의 프레임(1130)의 샘플들을 그룹핑할 수 있다. 그리고, 영상 처리 장치(100)는 움직임 예측 과정에서 도출된 움직임 벡터들 각각을 다운샘플링 전의 프레임(1130)의 샘플 그룹들에 할당할 수 있다. 이때, 다운샘플링 전의 프레임(1130)의 샘플 그룹들과 다운샘플링된 프레임(1110) 내 샘플들의 위치가 고려될 수 있다. As an example, the image processing apparatus 100 may group samples of the downsampled frame 1130 according to the number of samples in the downsampled frame 1110 . In addition, the image processing apparatus 100 may allocate each of the motion vectors derived in the motion prediction process to sample groups of the frame 1130 before downsampling. In this case, sample groups of the frame 1130 before downsampling and positions of samples in the downsampled frame 1110 may be considered.

상세히 설명하면, 다운샘플링된 프레임(1110) 내 샘플들(1112, 1114, 1116, 1118) 중 좌측 상부 샘플(1112)에 대해 도출된 움직임 벡터(mv1)는 다운샘플링되기 전 프레임(1130)의 샘플 그룹들(1132, 1134, 1136, 1138) 중 좌측 상부에 위치하는 샘플 그룹(1132)에 할당될 수 있다. 이에 따라 프레임(1130)에서 좌측 상부에 위치하는 샘플 그룹(1132)에 포함된 샘플들에 대해서는 움직임 벡터(mv1) 기반으로 움직임 보상이 수행될 수 있다. 여기서, 움직임 보상은 다운샘플링 되기 전의 이전 프레임에 대해 수행된다는 것에 주의한다. In detail, the motion vector mv1 derived for the upper left sample 1112 among the samples 1112 , 1114 , 1116 , and 1118 in the downsampled frame 1110 is a sample of the frame 1130 before being downsampled. Among the groups 1132 , 1134 , 1136 , and 1138 , the sample group 1132 may be allocated to the upper left. Accordingly, motion compensation may be performed on the samples included in the sample group 1132 positioned at the upper left of the frame 1130 based on the motion vector mv1. Here, it is noted that motion compensation is performed on the previous frame before downsampling.

다운샘플링된 프레임(1110) 내 샘플들(1112, 1114, 1116, 1118) 중 우측 상부 샘플(1114)에 대해 도출된 움직임 벡터(mv2)는 다운샘플링되기 전 프레임(1130)의 샘플 그룹들(1132, 1134, 1136, 1138) 중 우측 상부에 위치하는 샘플 그룹(1134)에 할당될 수 있다. 이에 따라 프레임(1130)에서 우측 상부에 위치하는 샘플 그룹(1134)에 포함된 샘플들에 대해서는 움직임 벡터(mv2) 기반으로 움직임 보상이 수행될 수 있다. The motion vector mv2 derived for the upper right sample 1114 among the samples 1112 , 1114 , 1116 , and 1118 in the downsampled frame 1110 is the sample groups 1132 of the frame 1130 before being downsampled. , 1134 , 1136 , and 1138 , may be allocated to the sample group 1134 positioned at the upper right. Accordingly, motion compensation may be performed on samples included in the sample group 1134 positioned at the upper right of the frame 1130 based on the motion vector mv2.

샘플 그룹에 포함된 샘플의 개수가 많은 경우, 샘플 그룹에 포함된 모든 샘플들에 대해 동일한 움직임 벡터를 적용하는 것은 움직임 보상의 정확도를 감소시킬 수 있다. When the number of samples included in the sample group is large, applying the same motion vector to all samples included in the sample group may reduce motion compensation accuracy.

이에 다른 예로서, 영상 처리 장치(100)는 샘플 그룹에 포함된 샘플들 중 인접 샘플 그룹과의 경계에 인접한 샘플들에 대해서는 해당 샘플 그룹에 할당된 움직임 벡터와 인접한 샘플 그룹에 할당된 움직임 벡터를 조합한 움직임 벡터를 적용할 수도 있다. As another example, for samples included in the sample group that are adjacent to the boundary with the adjacent sample group, the image processing apparatus 100 may generate a motion vector allocated to the corresponding sample group and a motion vector allocated to the adjacent sample group. A combined motion vector may be applied.

또 다른 예로, 영상 처리 장치(100)는 다운샘플링된 프레임(1110)에 대해 획득한 움직임 벡터들을 인터폴레이션하여 다운샘플링되기 전의 프레임(1130)의 움직임 보상을 위한 움직임 벡터들을 획득할 수 있다. 인터폴레이션의 예로서, 바이리니어(bilinear) 인터폴레이션, 바이큐빅(bicubic) 인터폴레이션 또는 니어리스트 네이버(neareast neighbor) 인터폴레이션이 적용될 수 있다.As another example, the image processing apparatus 100 may obtain motion vectors for motion compensation of the downsampled frame 1130 by interpolating motion vectors obtained with respect to the downsampled frame 1110 . As an example of interpolation, bilinear interpolation, bicubic interpolation, or near-neighbor interpolation may be applied.

다운샘플링된 프레임(1110)을 대상으로 하여 움직임 예측을 수행하면, 가중치 도출 과정에서 도출되는 가중치의 개수 역시 게이팅 과정에서 필요로 하는 가중치의 개수보다 적어진다. 따라서, 영상 처리 장치(100)는 가중치 도출 과정을 통해 획득된 가중치의 개수를 증가시킨다. 여기서, 게이팅 과정은 다운샘플링 되기 전의 이전 프레임으로부터 움직임 보상 과정을 통해 생성된 예측 프레임에 대해 적용된다는 것에 주의한다.When motion prediction is performed on the downsampled frame 1110 , the number of weights derived in the weight derivation process is also smaller than the number of weights required in the gating process. Accordingly, the image processing apparatus 100 increases the number of weights obtained through the weight derivation process. Here, it is noted that the gating process is applied to the prediction frame generated through the motion compensation process from the previous frame before downsampling.

일예로, 영상 처리 장치(100)는 다운샘플링된 프레임(1110) 내 샘플들의 개수에 따라 다운샘플링되기 전의 프레임(1130)의 샘플들을 그룹핑할 수 있다. 그리고, 영상 처리 장치(100)는 가중치 도출 과정에서 도출된 가중치들 각각을 다운샘플링 전의 프레임(1130)의 샘플 그룹들에 할당할 수 있다. 이때, 다운샘플링 전의 프레임(1130)의 샘플 그룹들과 다운샘플링된 프레임(1110) 내 샘플들의 위치가 고려될 수 있다. As an example, the image processing apparatus 100 may group samples of the downsampled frame 1130 according to the number of samples in the downsampled frame 1110 . In addition, the image processing apparatus 100 may allocate each of the weights derived in the weight derivation process to sample groups of the frame 1130 before downsampling. In this case, sample groups of the frame 1130 before downsampling and positions of samples in the downsampled frame 1110 may be considered.

상세히 설명하면, 다운샘플링된 프레임(1110) 내 샘플들(1112, 1114, 1116, 1118) 중 좌측 상부 샘플(1112)에 대해 도출된 제 1 가중치는 다운샘플링되기 전 프레임(1130)의 샘플 그룹들(1132, 1134, 1136, 1138) 중 좌측 상부에 위치하는 샘플 그룹(1132)에 할당될 수 있다. 이에 따라 프레임(1130)에서 좌측 상부에 위치하는 샘플 그룹(1132)에 포함된 샘플들에 대해서는 제 1 가중치 기반으로 게이팅 과정이 수행될 수 있다. 또한, 다운샘플링된 프레임(1110) 내 샘플들(1112, 1114, 1116, 1118) 중 우측 상부 샘플(1114)에 대해 도출된 제 2 가중치는 다운샘플링되기 전 프레임(1130)의 샘플 그룹들(1132, 1134, 1136, 1138) 중 우측 상부에 위치하는 샘플 그룹(1134)에 할당될 수 있다. 이에 따라 프레임(1130)에서 우측 상부에 위치하는 샘플 그룹(1134)에 포함된 샘플들에 대해서는 제 2 가중치 기반으로 게이팅 과정이 수행될 수 있다.In detail, the first weight derived for the upper left sample 1112 among the samples 1112 , 1114 , 1116 , and 1118 in the downsampled frame 1110 is the sample groups of the frame 1130 before being downsampled. Among (1132, 1134, 1136, 1138), the sample group 1132 located at the upper left may be allocated. Accordingly, a gating process may be performed on the samples included in the sample group 1132 positioned at the upper left of the frame 1130 based on the first weight. In addition, the second weight derived for the upper right sample 1114 among the samples 1112 , 1114 , 1116 , and 1118 in the downsampled frame 1110 is the sample groups 1132 of the frame 1130 before being downsampled. , 1134 , 1136 , and 1138 , may be allocated to the sample group 1134 positioned at the upper right. Accordingly, a gating process may be performed on the samples included in the sample group 1134 positioned at the upper right of the frame 1130 based on the second weight.

다른 예로서, 영상 처리 장치(100)는 샘플 그룹에 포함된 샘플들 중 인접 샘플 그룹과의 경계에 인접한 샘플들에 대해서는 해당 샘플 그룹에 할당된 가중치와 인접한 샘플 그룹에 할당된 가중치를 조합한 가중치를 적용할 수도 있다. As another example, the image processing apparatus 100 may set a weight obtained by combining a weight assigned to a corresponding sample group and a weight assigned to an adjacent sample group with respect to samples included in the sample group that are adjacent to the boundary with the adjacent sample group. may be applied.

또 다른 예로, 영상 처리 장치(100)는 다운샘플링된 프레임(1110)에 대해 획득된 가중치들을 인터폴레이션하여 다운샘플링되기 전의 프레임(1130)의 게이팅 과정을 위한 가중치들을 획득할 수 있다. 인터폴레이션의 예로서, 바이리니어(bilinear) 인터폴레이션, 바이큐빅(bicubic) 인터폴레이션 또는 니어리스트 네이버(neareast neighbor) 인터폴레이션이 적용될 수 있다.As another example, the image processing apparatus 100 may obtain weights for the gating process of the downsampled frame 1130 by interpolating the weights obtained with respect to the downsampled frame 1110 . As an example of interpolation, bilinear interpolation, bicubic interpolation, or near-neighbor interpolation may be applied.

앞서 움직임 예측 과정 및 움직임 보상 과정들이 컨볼루션 연산 기반으로 수행되는 것으로 설명하였지만, 이는 실시예일 뿐이다. 움직임 예측 과정과 움직임 보상 과정은 공지의 비디오 코덱의 인터 예측(inter prediction)에서 이용되는 알고리즘으로 수행될 수 있다.Although it has been described above that the motion prediction process and the motion compensation process are performed based on a convolution operation, this is only an example. The motion prediction process and the motion compensation process may be performed by an algorithm used in inter prediction of a known video codec.

일 예로, 움직임 예측 과정은 블록 매칭 알고리즘(block matching algorithm) 또는 옵티컬 플로우(optical flow) 알고리즘 기반으로 수행될 수 있다. 블록 매칭 알고리즘 및 옵티컬 플로우는 현재 프레임 내 샘플 또는 블록과 가장 유사한 샘플 또는 블록을 이전 프레임에서 써치하는 알고리즘이다. 블록 매칭 알고리즘 및 옵티컬 플로우을 통해 현재 프레임 내 샘플 또는 블록과 이전 프레임 내 유사 샘플 또는 유사 블록 사이의 움직임 벡터가 획득되고, 획득된 움직임 벡터에 기반하여 이전 프레임을 움직임 보상함으로써 예측 프레임이 획득될 수 있다. 블록 매칭 알고리즘(block matching algorithm) 및 옵티컬 플로우(optical flow) 알고리즘은 당업자에게 자명한 기술인 바, 본 명세서에서는 상세한 설명을 생략한다. For example, the motion prediction process may be performed based on a block matching algorithm or an optical flow algorithm. The block matching algorithm and optical flow are algorithms that search a previous frame for a sample or block most similar to a sample or block in the current frame. A motion vector between a sample or block in the current frame and a similar sample or similar block in the previous frame is obtained through a block matching algorithm and optical flow, and a prediction frame can be obtained by motion compensation for the previous frame based on the obtained motion vector. . A block matching algorithm and an optical flow algorithm are techniques known to those skilled in the art, and detailed descriptions thereof will be omitted herein.

이하에서는, 도 12 및 도 13을 참조하여 프레임들의 처리에 이용되는 신경망에 대해 상세히 설명한다.Hereinafter, a neural network used for processing frames will be described in detail with reference to FIGS. 12 and 13 .

도 2를 참조하여 전술한 바와 같이, 현재 프레임(X_t), 가중치 적용된 예측 프레임(X'_{t_pred}), 가중치 적용된 예측 출력 프레임(Y'_{t_pred}) 및 가중치 적용된 예측 특징 맵(S'_{t_pred})이 신경망(250)에 의해 처리됨으로써 현재 프레임(X_t)에 대응하는 현재 출력 프레임(Y_t)이 획득될 수 있다.As described above with reference to FIG. 2 , the current frame (X _t ), the weighted prediction frame (X' _{t_pred} ), the weighted prediction output frame (Y' _{t_pred} ), and the weighted prediction feature map (S' _{t_pred} ) are the neural networks By processing by 250 , the current output frame Y _t corresponding to the current frame X _t may be obtained.

신경망(250)은 적어도 하나의 서브 신경망을 포함할 수 있는데, 각 서브 신경망은 융합 레이어 및 복수의 컨볼루션 레이어로 구성될 수 있다.The neural network 250 may include at least one sub-neural network, and each sub-neural network may include a fusion layer and a plurality of convolutional layers.

적어도 하나의 서브 신경망 중 제 1 서브 신경망(1200)의 구조는 도 12에 도시되어 있다.The structure of the first sub-neural network 1200 among at least one sub-neural network is illustrated in FIG. 12 .

도 12를 참조하면, 제 1 서브 신경망(1200)은 제 1 컨볼루션 레이어(1214)와 제 2 컨볼루션 레이어(1216)를 포함하는 융합 레이어(1210), 및 복수의 제 3 컨볼루션 레이어(1230)로 구성될 수 있다. 컨볼루션 레이어에서는 훈련을 통해 결정된 필터 커널 기반으로 입력된 데이터에 대해 컨볼루션 연산이 수행된다.Referring to FIG. 12 , the first sub-neural network 1200 includes a fusion layer 1210 including a first convolutional layer 1214 and a second convolutional layer 1216 , and a plurality of third convolutional layers 1230 . ) can be composed of In the convolution layer, a convolution operation is performed on the input data based on the filter kernel determined through training.

융합 레이어(1210)는 현재 프레임(X_t)과, 게이팅 과정을 통해 출력된 데이터들, 즉, 가중치 적용된 예측 프레임(X'_{t_pred}), 가중치 적용된 예측 출력 프레임(Y'_{t_pred}) 및 가중치 적용된 예측 특징 맵(S'_{t_pred})을 융합한다.The fusion layer 1210 includes a current frame (X _t ), data output through the gating process, that is, a weighted prediction frame (X' _{t_pred} ), a weighted prediction output frame (Y' _{t_pred} ), and a weighted prediction feature. Fuse the map (S' _{t_pred} ).

먼저, 현재 프레임(X_t), 가중치 적용된 예측 프레임(X'_{t_pred}) 및 가중치 적용된 예측 출력 프레임(Y'_{t_pred})은 연접(concatenation)(1212)된 후 제 1 컨볼루션 레이어(1214)로 입력된다. 연접이란, 현재 프레임(X_t), 가중치 적용된 예측 프레임(X'_{t_pred}) 및 가중치 적용된 예측 출력 프레임(Y'_{t_pred})을 채널 방향으로 결합하는 처리를 의미할 수 있다.First, the current frame (X _t ), the weighted prediction frame (X' _{t_pred} ), and the weighted prediction output frame (Y' _{t_pred} ) are concatenated 1212 and then input to the first convolutional layer 1214 . . Concatenation may mean a process of combining the current frame (X _t ), the weighted prediction frame (X' _{t_pred} ), and the weighted prediction output frame (Y' _{t_pred} ) in the channel direction.

연접(1212) 결과로 획득된 데이터는 제 1 컨볼루션 레이어(1214)에서 컨볼루션 처리된다. 제 1 컨볼루션 레이어(1214)에 표시된 3x3x1은 3 x 3의 크기의 1개의 필터 커널을 이용하여 입력된 데이터에 대해 컨볼루션 처리를 하는 것을 예시한다. 컨볼루션 처리 결과 1개의 필터 커널에 의해 1개의 특징 맵이 생성된다.Data obtained as a result of the concatenation 1212 are convolutionally processed in the first convolutional layer 1214 . 3x3x1 displayed in the first convolution layer 1214 exemplifies convolution processing on input data using one filter kernel having a size of 3x3. As a result of the convolution process, one feature map is generated by one filter kernel.

현재 프레임(X_t), 가중치 적용된 예측 프레임(X'_{t_pred}) 및 가중치 적용된 예측 출력 프레임(Y'_{t_pred})이 연접된 결과가 제 1 컨볼루션 레이어(1214)로 입력되는 것과 별개로 가중치 적용된 예측 특징 맵(S'_{t_pred})은 제 2 컨볼루션 레이어(1216)로 입력된다. 가중치 적용된 예측 특징 맵(S'_{t_pred})은 제 2 컨볼루션 레이어(1216)에서 컨볼루션 처리된다. 제 2 컨볼루션 레이어(1216)에 표시된 3x3x1은 3 x 3의 크기의 1개의 필터 커널을 이용하여 입력된 데이터에 대해 컨볼루션 처리를 하는 것을 예시한다. 컨볼루션 처리 결과 1개의 필터 커널에 의해 1개의 특징 맵이 생성된다.The result of concatenating the current frame (X _t ), the weighted prediction frame (X' _{t_pred} ), and the weighted prediction output frame (Y' _{t_pred} ) is input to the first convolutional layer 1214 and weighted prediction features separately The map S' _{t_pred} is input to the second convolutional layer 1216 . The weighted prediction feature map (S' _{t_pred} ) is convolutionally processed in the second convolutional layer 1216 . 3x3x1 displayed in the second convolution layer 1216 exemplifies convolution processing on input data using one filter kernel having a size of 3x3. As a result of the convolution process, one feature map is generated by one filter kernel.

제 1 컨볼루션 레이어(1214)에서 출력된 데이터와 제 2 컨볼루션 레이어(1216)에서 출력된 데이터는 연접(1218)된 후 복수의 제 3 컨볼루션 레이어(1230)들에 의해 순차적으로 처리된다.Data output from the first convolution layer 1214 and data output from the second convolution layer 1216 are concatenated 1218 and then sequentially processed by a plurality of third convolution layers 1230 .

융합 레이어(1210)에서, 현재 프레임(X_t), 가중치 적용된 예측 프레임(X'_{t_pred}) 및 가중치 적용된 예측 출력 프레임(Y'_{t_pred})과 달리 가중치 적용된 예측 특징 맵(S'_{t_pred})만이 구별되어 제 2 컨볼루션 레이어(1216)로 입력되는 이유는, 가중치 적용된 예측 특징 맵(S'_{t_pred})의 도메인과 현재 프레임(X_t), 가중치 적용된 예측 프레임(X'_{t_pred}) 및 가중치 적용된 예측 출력 프레임(Y'_{t_pred})의 도메인이 서로 다르기 때문이다. 가중치 적용된 예측 특징 맵(S'_{t_pred})은 프레임의 처리 과정에서 획득되는 특징 도메인의 데이터이나, 현재 프레임(X_t), 가중치 적용된 예측 프레임(X'_{t_pred}) 및 가중치 적용된 예측 출력 프레임(Y'_{t_pred})은 처리 대상에 해당하는 영상 데이터 또는 처리 결과로서 획득되는 영상 데이터이므로, 이들을 구별하여 컨볼루션 처리한 후 연접하는 것이다. 즉, 제 1 컨볼루션 레이어(1214)와 제 2 컨볼루션 레이어(1216)는 각각 현재 프레임(X_t), 가중치 적용된 예측 프레임(X'_{t_pred}), 가중치 적용된 예측 출력 프레임(Y'_{t_pred}) 및 가중치 적용된 예측 특징 맵(S'_{t_pred})의 도메인을 일치시키는 기능을 할 수 있다.In the fusion layer 1210, unlike the current frame (X _t ), the weighted prediction frame (X' _{t_pred} ), and the weighted prediction output frame (Y' _{t_pred} ), only the weighted prediction feature map (S' _{t_pred} ) is distinguished and the first 2 The reason for input to the convolution layer 1216 is the domain of the weighted prediction feature map (S' _{t_pred} ) and the current frame (X _t ), the weighted prediction frame (X' _{t_pred} ), and the weighted prediction output frame (Y). This is because the domains of ' _{t_pred} ) are different. The weighted prediction feature map (S' _{t_pred} ) is data of the feature domain obtained in the process of processing the frame, but the current frame (X _t ), the weighted prediction frame (X' _{t_pred} ), and the weighted prediction output frame (Y' _{t_pred} ) ) is image data corresponding to the processing target or image data obtained as a result of processing, so they are concatenated after distinguishing them. That is, the first convolutional layer 1214 and the second convolutional layer 1216 are a current frame (X _t ), a weighted prediction frame (X' _{t_pred} ), a weighted prediction output frame (Y' _{t_pred} ), and a weight, respectively. It may serve to match the domain of the applied prediction feature map (S' _{t_pred} ).

제 1 컨볼루션 레이어(1214)에서 출력된 데이터와 제 2 컨볼루션 레이어(1216)에서 출력된 데이터는 연접(1218)된 후 복수의 제 3 컨볼루션 레이어(1230)들에 의해 순차적으로 처리됨으로써 중간 출력 프레임(Y_{t_int})이 획득될 수 있다. 도 12에 도시된 것과 같이, 복수의 제 3 컨볼루션 레이어(1230)의 마지막 레이어(1234)에서 중간 출력 프레임(Y_{t_int})이 출력되고, 마지막 레이어(1234)의 이전 레이어(1232)에서 중간 특징 맵(S_{t_int})이 출력된다. 도 12는 이전 레이어(1232) 다음에 마지막 레이어(1234)가 위치하는 것으로 도시하고 있으나, 이전 레이어(1232)와 마지막 레이어(1234) 사이에 하나 이상의 컨볼루션 레이어가 위치할 수도 있다.The data output from the first convolutional layer 1214 and the data output from the second convolution layer 1216 are concatenated 1218 and sequentially processed by a plurality of third convolutional layers 1230 to form an intermediate An output frame Y _{t_int} may be obtained. As shown in FIG. 12 , an intermediate output frame Y _{t_int} is output from the last layer 1234 of the plurality of third convolutional layers 1230 , and an intermediate feature from the previous layer 1232 of the last layer 1234 . A map (S _{t_int} ) is output. Although FIG. 12 illustrates that the last layer 1234 is positioned after the previous layer 1232 , one or more convolutional layers may be positioned between the previous layer 1232 and the last layer 1234 .

제 3 컨볼루션 레이어(1230)들에 표시된 3x3x1은 3 x 3의 크기의 1개의 필터 커널을 이용하여 입력된 데이터에 대해 컨볼루션 처리를 하는 것을 예시한다. 컨볼루션 처리 결과 1개의 필터 커널에 의해 1개의 특징 맵 또는 1개의 출력 프레임이 생성될 수 있다.3x3x1 displayed in the third convolutional layers 1230 exemplifies that convolution processing is performed on input data using one filter kernel having a size of 3x3. As a result of the convolution process, one feature map or one output frame may be generated by one filter kernel.

복수의 제 3 컨볼루션 레이어(1230)들로부터 출력되는 중간 특징 맵(S_{t_int})과 중간 출력 프레임(Y_{t_int})은 다음 서브 신경망으로 입력된다.The intermediate feature map S _{t_int} and the intermediate output frame Y _{t_int} output from the plurality of third convolutional layers 1230 are input to the next sub-neural network.

신경망이 하나의 서브 신경망만을 포함하고 있는 경우, 복수의 제 3 컨볼루션 레이어(1230)의 마지막 레이어(1234)에서 현재 출력 프레임(Y_t)이 출력되고, 마지막 레이어(1234)의 이전 레이어(1232)에서 현재 특징 맵(S_t)이 출력된다. When the neural network includes only one sub-neural network, the current output frame Y _t is output from the last layer 1234 of the plurality of third convolutional layers 1230 , and the previous layer 1232 of the last layer 1234 is output. ), the current feature map (S _t ) is output.

현재 출력 프레임(Y_t)과 현재 특징 맵(S_t)은 다음 프레임의 처리 과정에 이용될 수 있다.The current output frame (Y _t ) and the current feature map (S _t ) may be used in the processing of the next frame.

도 12에 도시된 제 1 서브 신경망(1200)의 구조는 일 예시이며, 제 1 서브 신경망(1200)에 포함된 컨볼루션 레이어의 개수, 필터 커널의 크기 및 필터 커널의 개수는 구현 방식에 따라 다양하게 변경될 수 있다.The structure of the first sub-neural network 1200 shown in FIG. 12 is an example, and the number of convolution layers, the size of filter kernels, and the number of filter kernels included in the first sub-neural network 1200 vary depending on the implementation method. can be changed to

도 13은 신경망에 포함된 마지막 서브 신경망(1300)의 구조를 도시하는 도면이다.13 is a diagram illustrating the structure of the last sub-neural network 1300 included in the neural network.

제 1 서브 신경망(1200)과 마찬가지로 마지막 서브 신경망(1300)도 제 1 컨볼루션 레이어(1314)와 제 2 컨볼루션 레이어(1316)를 포함하는 융합 레이어(1310), 및 복수의 제 3 컨볼루션 레이어(1330)로 구성될 수 있다. 컨볼루션 레이어에서는 훈련을 통해 결정된 필터 커널 기반으로 입력된 데이터에 대해 컨볼루션 연산이 수행된다.Like the first sub-neural network 1200 , the last sub-neural network 1300 also includes a fusion layer 1310 including a first convolutional layer 1314 and a second convolutional layer 1316 , and a plurality of third convolutional layers. (1330). In the convolution layer, a convolution operation is performed on the input data based on the filter kernel determined through training.

융합 레이어(1310)는 현재 프레임(X_t), 가중치 적용된 예측 프레임(X'_{t_pred}), 이전 서브 신경망에서 출력된 중간 출력 프레임(Y_{t_int}) 및 이전 서브 신경망에서 출력된 중간 특징 맵(S_{t_int})을 융합한다.The fusion layer 1310 includes the current frame (X _t ), the weighted prediction frame (X' _{t_pred} ), the intermediate output frame (Y _{t_int} ) output from the previous sub-neural network, and the intermediate feature map (S _{t_int} ) output from the previous sub-neural network. to fuse

먼저, 현재 프레임(X_t), 가중치 적용된 예측 프레임(X'_{t_pred}) 및 중간 출력 프레임(Y_{t_int})은 연접(1312)된 후 제 1 컨볼루션 레이어(1314)로 입력된다.First, the current frame (X _t ), the weighted prediction frame (X' _{t_pred} ), and the intermediate output frame (Y _{t_int} ) are concatenated 1312 and then input to the first convolutional layer 1314 .

연접(1312) 결과로 획득된 데이터는 제 1 컨볼루션 레이어(1314)에서 컨볼루션 처리된다. 제 1 컨볼루션 레이어(1314)에 표시된 3x3x1은 3 x 3의 크기의 1개의 필터 커널을 이용하여 입력된 데이터에 대해 컨볼루션 처리를 하는 것을 예시한다. 컨볼루션 처리 결과 1개의 필터 커널에 의해 1개의 특징 맵이 생성된다.Data obtained as a result of the concatenation 1312 are convolutionally processed in the first convolution layer 1314 . 3x3x1 displayed in the first convolution layer 1314 exemplifies convolution processing on input data using one filter kernel having a size of 3x3. As a result of the convolution process, one feature map is generated by one filter kernel.

현재 프레임(X_t), 가중치 적용된 예측 프레임(X'_{t_pred}) 및 중간 출력 프레임(Y_{t_int})이 연접된 결과가 제 1 컨볼루션 레이어(1314)로 입력되는 것과 별개로 중간 특징 맵(S_{t_int})은 제 2 컨볼루션 레이어(1316)로 입력된다. 전술한 바와 같이, 중간 특징 맵(S_{t_int})은 제 2 컨볼루션 레이어(1316)에서 컨볼루션 처리된다. 제 2 컨볼루션 레이어(1316)에 표시된 3x3x1은 3 x 3의 크기의 1개의 필터 커널을 이용하여 입력된 데이터에 대해 컨볼루션 처리를 하는 것을 예시한다. 컨볼루션 처리 결과 1개의 필터 커널에 의해 1개의 특징 맵이 생성된다.The result of concatenating the current frame (X _t ), the weighted prediction frame (X' _{t_pred} ), and the intermediate output frame (Y _{t_int} ) is input to the first convolutional layer 1314 and the intermediate feature map (S _{t_int} ) separately is input to the second convolutional layer 1316 . As described above, the intermediate feature map (S _{t_int} ) is convolutionally processed in the second convolutional layer 1316 . 3x3x1 displayed in the second convolution layer 1316 exemplifies convolution processing on input data using one filter kernel having a size of 3x3. As a result of the convolution process, one feature map is generated by one filter kernel.

전술한 바와 같이, 융합 레이어(1310) 내 제 1 컨볼루션 레이어(1314)와 제 2 컨볼루션 레이어(1316)는 각각 현재 프레임(X_t), 가중치 적용된 예측 프레임(X'_{t_pred}), 중간 출력 프레임(Y_{t_int}) 및 중간 특징 맵(S_{t_int})의 도메인을 일치시키는 기능을 할 수 있다.As described above, the first convolutional layer 1314 and the second convolutional layer 1316 in the fusion layer 1310 are the current frame (X _t ), the weighted prediction frame (X' _{t_pred} ), and the intermediate output frame, respectively. (Y _{t_int} ) and the intermediate feature map (S _{t_int} ) may serve to match the domains.

제 1 컨볼루션 레이어(1314)에서 출력된 데이터와 제 2 컨볼루션 레이어(1316)에서 출력된 데이터는 연접(1318)된 후 복수의 제 3 컨볼루션 레이어(1330)들에 의해 순차적으로 처리된다.Data output from the first convolution layer 1314 and data output from the second convolution layer 1316 are concatenated 1318 and then sequentially processed by a plurality of third convolution layers 1330 .

제 1 컨볼루션 레이어(1314)에서 출력된 데이터와 제 2 컨볼루션 레이어(1316)에서 출력된 데이터는 연접된 후 복수의 제 3 컨볼루션 레이어(1330)들에 의해 순차적으로 처리됨으로써 현재 출력 프레임(Y_t)이 획득될 수 있다. The data output from the first convolution layer 1314 and the data output from the second convolution layer 1316 are concatenated and then sequentially processed by a plurality of third convolution layers 1330 to provide a current output frame ( Y _t ) can be obtained.

도 13에 도시된 것과 같이, 복수의 제 3 컨볼루션 레이어(1330)의 마지막 레이어(1334)에서 현재 출력 프레임(Y_t)이 출력되고, 마지막 레이어(1334)의 이전 레이어(1332)에서 현재 특징 맵(S_t)이 출력된다. 도 13은 이전 레이어(1332) 다음에 마지막 레이어(1334)가 위치하는 것으로 도시하고 있으나, 이전 레이어(1332)와 마지막 레이어(1334) 사이에 하나 이상의 컨볼루션 레이어가 위치할 수도 있다.As shown in FIG. 13 , the current output frame Y _t is output from the last layer 1334 of the plurality of third convolutional layers 1330 , and the current feature from the previous layer 1332 of the last layer 1334 . A map S _t is output. Although FIG. 13 illustrates that the last layer 1334 is positioned after the previous layer 1332 , one or more convolutional layers may be positioned between the previous layer 1332 and the last layer 1334 .

도 13에 도시된 서브 신경망(1300)이 마지막 서브 신경망이 아니라면, 복수의 제 3 컨볼루션 레이어(1330)의 마지막 레이어(1334)에서 중간 출력 프레임(Y_{t_int})이 출력되고, 마지막 레이어(1334)의 이전 레이어(1332)에서 중간 특징 맵(S_{t_int})이 출력될 수 있다. 출력된 중간 출력 프레임(Y_{t_int})과 중간 특징 맵(S_{t_int})은 다음 서브 신경망으로 입력될 수 있다.If the sub neural network 1300 shown in FIG. 13 is not the last sub neural network, an intermediate output frame Y _{t_int} is output from the last layer 1334 of the plurality of third convolutional layers 1330, and the last layer 1334 An intermediate feature map S _{t_int} may be output from the previous layer 1332 of . The output intermediate output frame (Y _{t_int} ) and intermediate feature map (S _{t_int} ) may be input to the next sub-neural network.

제 3 컨볼루션 레이어(1330)들에 표시된 3x3x1은 3 x 3의 크기의 1개의 필터 커널을 이용하여 입력된 데이터에 대해 컨볼루션 처리를 하는 것을 예시한다. 컨볼루션 처리 결과 1개의 필터 커널에 의해 1개의 특징 맵 또는 1개의 출력 프레임이 생성될 수 있다.3x3x1 displayed in the third convolutional layers 1330 exemplifies convolution processing on input data using one filter kernel having a size of 3x3. As a result of the convolution process, one feature map or one output frame may be generated by one filter kernel.

도 13에 도시된 서브 신경망(1300)의 구조는 일 예시이며, 서브 신경망(1300)에 포함된 컨볼루션 레이어의 개수, 필터 커널의 크기 및 필터 커널의 개수는 구현 방식에 따라 다양하게 변경될 수 있다.The structure of the sub neural network 1300 shown in FIG. 13 is an example, and the number of convolution layers, the size of filter kernels, and the number of filter kernels included in the sub neural network 1300 may be variously changed according to an implementation method. have.

도 14는 일 실시예에 따른 영상 처리 방법의 응용 예를 도시하는 도면이다.14 is a diagram illustrating an application example of an image processing method according to an embodiment.

도 14에 도시된 응용 예는, 입력된 프레임들에 대한 영상 처리를 통해 입력된 프레임들의 해상도보다 큰 해상도를 갖는 출력 프레임들을 획득하는 과정을 나타낸다.The application example shown in FIG. 14 shows a process of obtaining output frames having a resolution greater than the resolution of the input frames through image processing of the input frames.

도 14의 VNN(video neural network)(1400)는 도 2에 도시된 신경망(250)에 대응한다. 프레임들이 VNN(1400)으로 입력되기 전에 전술한 움직임 예측 과정, 움직임 보상 과정, 가중치 도출 과정 및 게이팅 과정이 수행된 것으로 가정한다.The video neural network (VNN) 1400 of FIG. 14 corresponds to the neural network 250 shown in FIG. 2 . It is assumed that the above-described motion prediction process, motion compensation process, weight derivation process, and gating process are performed before frames are input to the VNN 1400 .

제 1 프레임(1412)이 VNN(1400)에 의해 처리됨에 따라 제 1 프레임(1412)의 해상도보다 큰 제 1 출력 프레임(1432)이 획득된다. 제 1 프레임(1412)과 제 1 출력 프레임(1432)은 제 2 프레임(1414)과 함께 VNN(1400)으로 입력되고, VNN(1400)에 의한 처리 결과로 제 2 프레임(1414)의 해상도보다 큰 해상도를 갖는 제 2 출력 프레임(1434)이 획득된다. 제 2 프레임(1414)과 제 2 출력 프레임(1434)은 제 3 프레임(1416)과 함께 VNN(1400)으로 입력되고, VNN(1400)에 의한 처리 결과로 제 3 프레임(1416)의 해상도보다 큰 해상도를 갖는 제 3 출력 프레임(1436)이 획득된다.As the first frame 1412 is processed by the VNN 1400 , a first output frame 1432 greater than the resolution of the first frame 1412 is obtained. The first frame 1412 and the first output frame 1432 are input to the VNN 1400 together with the second frame 1414 , and as a result of processing by the VNN 1400 , the resolution of the second frame 1414 is greater than that of the second frame 1414 . A second output frame 1434 with resolution is obtained. The second frame 1414 and the second output frame 1434 are input to the VNN 1400 together with the third frame 1416 , and as a result of processing by the VNN 1400 , the resolution of the third frame 1416 is greater than that of the third frame 1416 . A third output frame 1436 with resolution is obtained.

도 14에 도시된 응용 예는, 서버 등으로부터 수신한 프레임들의 해상도를 증가시키고자 할 때 유용할 수 있다. 서버는 작은 해상도의 프레임들을 부호화함으로써 작은 크기의 비트레이트를 영상 처리 장치(100)로 전송할 수 있고, 영상 처리 장치(100)는 복호화를 통해 획득한 작은 해상도의 프레임들을 처리하여 보다 큰 해상도의 출력 프레임들을 획득할 수 있다.The application example shown in FIG. 14 may be useful when it is desired to increase the resolution of frames received from a server or the like. The server may transmit a small bitrate to the image processing apparatus 100 by encoding the frames of the small resolution, and the image processing apparatus 100 processes the frames of the small resolution obtained through decoding to output a larger resolution. frames can be obtained.

도 15는 다른 실시예에 따른 영상 처리 방법의 응용 예를 도시하는 도면이다.15 is a diagram illustrating an application example of an image processing method according to another embodiment.

도 15에 도시된 응용 예는, 입력된 프레임들에 대한 영상 처리를 통해 입력된 프레임들의 특성이 결합된 하나의 출력 프레임을 획득하는 과정을 나타낸다.The application example shown in FIG. 15 shows a process of obtaining one output frame in which the characteristics of the input frames are combined through image processing on the input frames.

전술한 바와 같이, 프레임들이 VNN(1500)으로 입력되기 전에 전술한 움직임 예측 과정, 움직임 보상 과정, 가중치 도출 과정 및 게이팅 과정이 수행된 것으로 가정한다.As described above, it is assumed that the aforementioned motion prediction process, motion compensation process, weight derivation process, and gating process are performed before frames are input to the VNN 1500 .

제 1 프레임(1512)은 VNN(1500)으로 입력되고, VNN(1500)에 의한 제 1 프레임(1512)의 처리 결과는 제 1 프레임(1512) 및 제 2 프레임(1514)과 함께 VNN(1500)으로 입력된다. 그리고, VNN(1500)에 의한 처리 결과는 제 2 프레임(1514) 및 제 3 프레임(1516)과 함께 VNN(1500)으로 다시 입력된다. VNN(1500)에 의한 처리 결과, 제 1 프레임(1512), 제 2 프레임(1514) 및 제 3 프레임(1516)의 특성이 모두 반영된 출력 프레임(1530)이 획득될 수 있다.The first frame 1512 is input to the VNN 1500 , and the processing result of the first frame 1512 by the VNN 1500 is the VNN 1500 together with the first frame 1512 and the second frame 1514 . is entered as Then, the processing result by the VNN 1500 is again input to the VNN 1500 together with the second frame 1514 and the third frame 1516 . As a result of processing by the VNN 1500 , an output frame 1530 in which all characteristics of the first frame 1512 , the second frame 1514 , and the third frame 1516 are reflected may be obtained.

도 15에 도시된 응용 예는, 프레임들의 다이내믹 레인지를 향상시키는데 유용할 수 있다. 예를 들어, 어느 하나의 프레임이 긴 노출 시간으로 촬영되었고, 다른 하나의 프레임이 짧은 노출 시간으로 촬영되었다면, 두 프레임의 특성을 모두 포함함으로써 높은 다이내믹 레인지를 갖는 출력 프레임이 획득될 수 있다.The application example shown in FIG. 15 may be useful for improving the dynamic range of frames. For example, if one frame is photographed with a long exposure time and the other frame is photographed with a short exposure time, an output frame having a high dynamic range may be obtained by including the characteristics of both frames.

도 16은 또 다른 실시예에 따른 영상 처리 방법의 응용 예를 도시하는 도면이다.16 is a diagram illustrating an application example of an image processing method according to another embodiment.

도 16에 도시된 응용 예는, 영상 처리 장치(100)가 서버 또는 영상 프로바이더(provider)로서 동작할 때를 고려한 것이다. 일반적으로, 서버는 영상을 부호화하여 단말 장치로 전송하고, 단말 장치는 서버로부터 수신된 비트스트림을 복호화하여 영상을 복원한다. 영상 처리 장치(100)는 부호화/복호화 과정에서 발생하는 손실을 보상하기 위해 프레임들을 처리할 때, 인코더(120)에 의해 부호화된 후 디코더(140)에 의해 복호화된 프레임도 함께 이용할 수 있다.An application example illustrated in FIG. 16 is considered when the image processing apparatus 100 operates as a server or an image provider. In general, a server encodes an image and transmits it to a terminal device, and the terminal device decodes a bitstream received from the server to restore the image. The image processing apparatus 100 may also use a frame encoded by the encoder 120 and then decoded by the decoder 140 when processing frames to compensate for a loss occurring in the encoding/decoding process.

구체적으로, 영상 처리 장치(100)는 제 1 프레임(1612)을 VNN(1600) 기반으로 처리하여 제 1 출력 프레임(A)(미도시)를 획득한다. 제 1 출력 프레임(A)의 부호화를 통해 제 1 비트스트림이 생성되고, 제 1 비트스트림에 대한 복호화를 통해 제 1 출력 프레임(A)이 복원된다. 영상 처리 장치(100)는 제 1 출력 프레임(A)를 VNN(1600)으로 처리하여 제 1 출력 프레임(B)(미도시)를 획득한다. Specifically, the image processing apparatus 100 processes the first frame 1612 based on the VNN 1600 to obtain a first output frame A (not shown). A first bitstream is generated through encoding of the first output frame A, and the first output frame A is restored through decoding of the first bitstream. The image processing apparatus 100 processes the first output frame A with the VNN 1600 to obtain a first output frame B (not shown).

제 1 프레임(1612), 제 1 출력 프레임(A) 및 제 1 출력 프레임(B)는 제 2 프레임(1614)과 함께 VNN(1600)으로 입력된다. 전술한 실시예들에서는 하나의 출력 프레임이 다음 프레임과 함께 VNN(1600)으로 입력되었지만, 도 16에 도시된 응용 예에서는 두 개의 출력 프레임이 다음 프레임과 함께 VNN(1600)으로 입력된다. VNN(1600)으로 입력되기 전에 두 개의 출력 프레임 모두에 대해 움직임 보상 과정 및 게이팅 과정이 적용될 수 있다.The first frame 1612 , the first output frame A and the first output frame B are input to the VNN 1600 together with the second frame 1614 . In the above-described embodiments, one output frame is input to the VNN 1600 together with the next frame, but in the application example shown in FIG. 16 , two output frames are input to the VNN 1600 together with the next frame. A motion compensation process and a gating process may be applied to both output frames before being input to the VNN 1600 .

제 2 프레임(1614), 제 1 프레임(1612), 제 1 출력 프레임(A) 및 제 1 출력 프레임(B)이 VNN(1600)에 의해 처리되어 제 2 출력 프레임(A)가 획득된다. 제 2 출력 프레임(A)의 부호화를 통해 제 2 비트스트림이 생성되고, 제 2 비트스트림에 대한 복호화를 통해 제 2 출력 프레임(A)이 복원된다. 영상 처리 장치(100)는 제 2 출력 프레임(A)와 제 1 출력 프레임(B)를 VNN(1600)으로 처리하여 제 2 출력 프레임(B)(미도시)를 획득한다. 도시되어 있지는 않지만, 영상 처리 장치(100)는 제 2 출력 프레임(A) 및 제 1 출력 프레임(B)와 함께 복호화를 통해 복원된 제 1 출력 프레임(A)도 VNN(1600)으로 처리하여 제 2 출력 프레임(B)(미도시)를 획득할 수 있다.The second frame 1614 , the first frame 1612 , the first output frame A and the first output frame B are processed by the VNN 1600 to obtain a second output frame A. A second bitstream is generated through encoding of the second output frame A, and the second output frame A is restored through decoding of the second bitstream. The image processing apparatus 100 processes the second output frame A and the first output frame B with the VNN 1600 to obtain a second output frame B (not shown). Although not shown, the image processing apparatus 100 processes the first output frame A restored through decoding together with the second output frame A and the first output frame B with the VNN 1600 to obtain the second output frame (A). 2 output frames B (not shown) may be obtained.

제 2 프레임(1614), 제 2 출력 프레임(A) 및 제 2 출력 프레임(B)는 제 3 프레임(1616)과 함께 VNN(1600)으로 입력된다. 제 3 프레임(1616), 제 2 프레임(1614), 제 2 출력 프레임(A) 및 제 2 출력 프레임(B)이 VNN(1600)에 의해 처리되어 제 3 출력 프레임(A)가 획득될 수 있다.The second frame 1614 , the second output frame A and the second output frame B are input to the VNN 1600 together with the third frame 1616 . A third frame 1616 , a second frame 1614 , a second output frame A and a second output frame B may be processed by the VNN 1600 to obtain a third output frame A .

도 17은 일 실시예에 따른 멀티 프레임의 처리 방법의 순서도이다.17 is a flowchart of a multi-frame processing method according to an embodiment.

S1710 단계에서, 영상 처리 장치(100)는 현재 프레임의 현재 샘플에 대응하는 예측 샘플을 이전 프레임에서 식별한다. 예측 샘플의 식별을 위해 현재 프레임 및 이전 프레임에 대한 움직임 예측이 수행될 수 있다. 전술한 바와 같이, 예측 샘플의 식별을 위해 현재 프레임 및 이전 프레임에 대해 컨볼루션 연산이 수행될 수 있다.In operation S1710, the image processing apparatus 100 identifies a prediction sample corresponding to the current sample of the current frame from the previous frame. In order to identify a prediction sample, motion prediction may be performed on the current frame and the previous frame. As described above, a convolution operation may be performed on the current frame and the previous frame to identify the prediction sample.

S1720 단계에서, 영상 처리 장치(100)는 이전 프레임의 콜로케이티드 샘플의 샘플 값을 예측 샘플의 샘플 값에 따라 변경하여 현재 프레임의 예측 프레임을 생성한다. 움직임 벡터들에 해당하는 필터 커널들에 기반한 이전 프레임에 대한 컨볼루션 연산을 통해 예측 프레임이 생성될 수 있다.In operation S1720, the image processing apparatus 100 generates a prediction frame of the current frame by changing the sample value of the collocated sample of the previous frame according to the sample value of the prediction sample. A prediction frame may be generated through a convolution operation on a previous frame based on filter kernels corresponding to motion vectors.

S1730 단계에서, 영상 처리 장치(100)는 현재 샘플의 샘플 값과 예측 샘플의 샘플 값을 비교하여 가중치를 도출한다. 영상 처리 장치(100)는 현재 샘플의 샘플 값과 예측 샘플의 샘플 값의 차이 값이 클수록 가중치를 작게 결정하고, 현재 샘플의 샘플 값과 예측 샘플의 샘플 값의 차이 값이 작을수록 가중치를 크게 결정할 수 있다.In operation S1730, the image processing apparatus 100 derives a weight by comparing the sample value of the current sample with the sample value of the prediction sample. The image processing apparatus 100 determines a weight to be small as the difference value between the sample value of the current sample and the sample value of the prediction sample is large, and determines the weight to be large as the difference value between the sample value of the current sample and the sample value of the prediction sample is small. can

S1740 단계에서, 영상 처리 장치(100)는 예측 프레임의 콜로케이티드 샘플에 가중치를 적용한다. 영상 처리 장치(100)는 예측 프레임의 콜로케이티드 샘플에 가중치를 곱할 수 있다.In operation S1740, the image processing apparatus 100 applies a weight to the collocated samples of the prediction frame. The image processing apparatus 100 may multiply the collocated sample of the prediction frame by a weight.

S1750 단계에서, 영상 처리 장치(100)는 컨볼루션 레이어를 포함하는 신경망을 통해 가중치 적용된 예측 프레임과 현재 프레임을 처리하여 현재 출력 프레임을 획득한다.In operation S1750, the image processing apparatus 100 processes the weighted prediction frame and the current frame through a neural network including a convolutional layer to obtain a current output frame.

한편, 상술한 본 개시의 실시예들은 컴퓨터에서 실행될 수 있는 프로그램으로 작성가능하고, 작성된 프로그램은 기기로 읽을 수 있는 저장매체에 저장될 수 있다.Meanwhile, the above-described embodiments of the present disclosure can be written as a program that can be executed on a computer, and the written program can be stored in a device-readable storage medium.

기기로 읽을 수 있는 저장매체는, 비일시적(non-transitory) 0저장매체의 형태로 제공될 수 있다. 여기서, ‘비일시적 저장매체'는 실재(tangible)하는 장치이고, 신호(signal)(예: 전자기파)를 포함하지 않는다는 것을 의미할 뿐이며, 이 용어는 데이터가 저장매체에 반영구적으로 저장되는 경우와 임시적으로 저장되는 경우를 구분하지 않는다. 예로, '비일시적 저장매체'는 데이터가 임시적으로 저장되는 버퍼를 포함할 수 있다.The device-readable storage medium may be provided in the form of a non-transitory 0 storage medium. Here, 'non-transitory storage medium' is a tangible device and only means that it does not contain a signal (eg, electromagnetic wave). It does not distinguish the case where it is stored as For example, the 'non-transitory storage medium' may include a buffer in which data is temporarily stored.

일 실시예에 따르면, 본 문서에 개시된 다양한 실시예들에 따른 방법은 컴퓨터 프로그램 제품(computer program product)에 포함되어 제공될 수 있다. 컴퓨터 프로그램 제품은 상품으로서 판매자 및 구매자 간에 거래될 수 있다. 컴퓨터 프로그램 제품은 기기로 읽을 수 있는 저장 매체(예: compact disc read only memory (CD-ROM))의 형태로 배포되거나, 또는 어플리케이션 스토어(예: 플레이 스토어TM)를 통해 또는 두개의 사용자 장치들(예: 스마트폰들) 간에 직접, 온라인으로 배포(예: 다운로드 또는 업로드)될 수 있다. 온라인 배포의 경우에, 컴퓨터 프로그램 제품(예: 다운로더블 앱(downloadable app))의 적어도 일부는 제조사의 서버, 어플리케이션 스토어의 서버, 또는 중계 서버의 메모리와 같은 기기로 읽을 수 있는 저장 매체에 적어도 일시 저장되거나, 임시적으로 생성될 수 있다. According to an embodiment, the method according to various embodiments disclosed in this document may be included and provided in a computer program product. Computer program products may be traded between sellers and buyers as commodities. The computer program product is distributed in the form of a device-readable storage medium (eg compact disc read only memory (CD-ROM)), or through an application store (eg Play Store™) or on two user devices (eg, It can be distributed (eg downloaded or uploaded) directly or online between smartphones (eg: smartphones). In the case of online distribution, at least a portion of the computer program product (eg, a downloadable app) is stored at least on a machine-readable storage medium, such as a memory of a manufacturer's server, a server of an application store, or a relay server. It may be temporarily stored or temporarily created.

이상, 본 개시의 기술적 사상을 바람직한 실시예를 들어 상세하게 설명하였으나, 본 개시의 기술적 사상은 상기 실시예들에 한정되지 않고, 본 개시의 기술적 사상의 범위 내에서 당 분야에서 통상의 지식을 가진 자에 의하여 여러 가지 변형 및 변경이 가능하다.In the above, the technical idea of the present disclosure has been described in detail with reference to preferred embodiments, but the technical idea of the present disclosure is not limited to the above embodiments, and those of ordinary skill in the art within the scope of the technical spirit of the present disclosure Various modifications and changes are possible by the person.

Claims

a memory storing one or more instructions; and
a processor executing the one or more instructions stored in the memory;
The processor is
Identifies a prediction sample corresponding to the current sample of the current frame from the previous frame,
A prediction frame of the current frame is generated by changing a sample value of a collocated sample of the previous frame according to a sample value of the prediction sample,
A weight is derived by comparing the sample value of the current sample with the sample value of the prediction sample,
applying the weight to the collocated samples of the prediction frame;
An image processing apparatus for obtaining a current reconstructed frame by processing the weighted prediction frame and the current frame through a neural network including a convolutional layer.

The method of claim 1,
The processor is
and a sample having a sample value most similar to the sample value of the current sample among the collocated sample of the previous frame and neighboring samples of the collocated sample as the prediction sample.

3. The method of claim 2,
The processor is
Convolving the current sample and neighboring samples of the current sample with a predetermined first filter kernel to obtain a sample value corresponding to the first filter kernel,
Convolving the collocated sample of the previous frame and neighboring samples of the collocated sample with a plurality of predetermined second filter kernels to obtain sample values corresponding to the plurality of second filter kernels;
identify a sample value most similar to a sample value corresponding to the first filter kernel from among sample values corresponding to the plurality of second filter kernels;
and determining, as the prediction sample, a sample corresponding to the identified sample value among the collocated sample of the previous frame and neighboring samples of the collocated sample.

4. The method of claim 3,
The first filter kernel is
A sample corresponding to the current sample has a preset first value, and other samples have a value of 0.

5. The method of claim 4,
The plurality of second filter kernels,
Any one sample has a preset second value, other samples have a value of 0,
The position of the one sample having the second value is different for each of the plurality of second filter kernels.

6. The method of claim 5,
and signs of the first preset value and the preset second value are opposite to each other.

4. The method of claim 3,
The processor is
Changing the sample value of the collocated sample by convolution processing the collocated sample of the previous frame and neighboring samples of the collocated sample with a third predetermined filter kernel,
In the third filter kernel, a sample corresponding to the prediction sample has a preset third value, and other samples have a value of 0.

According to claim 1,
The weight is in inverse proportion to a difference between a sample value of the current sample and a sample value of the prediction sample.

According to claim 1,
The processor is
Obtaining a previous output frame and a previous feature map output as a result of processing the previous frame by the neural network,
generating a prediction output frame and a prediction feature map by changing the sample values of collocated samples of the previous output frame and the previous feature map according to the positional relationship between the current sample and the prediction sample in the previous frame;
applying the weight to the collocated samples of the prediction output frame and the prediction feature map;
and inputting the weighted prediction output frame, the weighted prediction feature map, the weighted prediction frame, and the current frame to the neural network.

10. The method of claim 9,
The previous output frame is
A first previous output frame output from the neural network and a second previous output frame obtained as a result of processing the first previous output frame restored through encoding and decoding of the first previous output frame by the neural network , image processing device.

10. The method of claim 9,
The neural network includes a plurality of sub-neural networks including a first convolutional layer, a second convolutional layer, and a plurality of third convolutional layers,
The first convolution layer of the first sub-neural network performs convolution processing on a result of concatenating the weighted prediction output frame, the weighted prediction frame, and the current frame,
The second convolutional layer of the first sub-neural network convolves the weighted prediction feature map,
The plurality of third convolutional layers of the first sub-neural network sequentially concatenates the result of concatenating the feature map output from the first convolution layer and the feature map output from the second convolution layer, image processing device.

12. The method of claim 11,
A first convolutional layer of a sub-neural network other than the first sub-neural network is a concatenated result of the weighted prediction frame, the current frame, and intermediate reconstructed frames output from the previous sub-neural network. Root processing,
The second convolutional layer of the sub-neural network other than the first sub-neural network convolves an intermediate feature map output from the previous sub-neural network,
The plurality of third convolutional layers of the sub-neural network other than the first sub-neural network sequentially concatenates the result of concatenating the feature map output from the first convolution layer and the feature map output from the second convolution layer. A solution processing, image processing device.

According to claim 1,
The processor is
An image processing apparatus for transmitting a bitstream generated through encoding of the current output frame to a terminal device.

According to claim 1,
The processor is
An image processing apparatus for reproducing the current output frame through a display.

In the multi-frame processing method by an image processing apparatus,
identifying a prediction sample corresponding to a current sample of a current frame from a previous frame;
generating a prediction frame of the current frame by changing a sample value of a collocated sample of the previous frame according to a sample value of the prediction sample;
deriving a weight by comparing the sample value of the current sample with the sample value of the prediction sample;
applying the weight to collocated samples of the prediction frame; and
A method of processing multi-frames, comprising: processing the weighted prediction frame and the current frame through a neural network including a convolutional layer to obtain a current reconstructed frame.

A computer-readable recording medium recording a program for executing the method of claim 15 in combination with hardware.