KR20230034820A

KR20230034820A - Method and system for processing image based on weighted multi-kernel

Info

Publication number: KR20230034820A
Application number: KR1020210149019A
Authority: KR
Inventors: 김대식; 조우영; 손상혁
Original assignee: 삼성전자주식회사; 한국과학기술원
Priority date: 2021-09-03
Filing date: 2021-11-02
Publication date: 2023-03-10

Abstract

The present invention relates to a method for processing a plurality of images, which comprises the steps of: obtaining input data including a plurality of images; providing the input data to a first machine learning model; providing an output of the first machine learning model for second and third machine learning models; generating a first feature map corresponding to a plurality of kernels on the basis of an output of the second machine learning model; generating a second feature map corresponding to a plurality of weights on the basis of an output of the third machine learning model; generating the kernel predicted based on a weighted sum of the kernels; and generating output data on the basis of the input data and the predicted kernel. Therefore, a higher-quality image can be generated.

Description

Image processing method and system based on weighted multi-kernel

본 개시의 기술적 사상은 이미지 처리에 관한 것으로서, 자세하게는 가중화된 다중 커널에 기초한 이미지 처리 방법 및 시스템에 관한 것이다.The technical idea of the present disclosure relates to image processing, and more particularly, to an image processing method and system based on a weighted multi-kernel.

이미지 처리에 기계 학습 모델이 활용되고 있다. 예를 들면, 기계 학습 모델은 이미지에서 노이즈를 제거하거나 저해상도 이미지를 고해상도 이미지로 변환하는데 활용될 수 있다. 이에 따라, 스마트폰의 카메라 모듈과 같은 제한된 성능 또는 낮은 조도와 같은 제한된 환경 등에 기인하여 낮은 품질을 가지는 이미지가 기계 학습 모델을 활용하여 높은 품질의 이미지로 변환될 수 있다.Machine learning models are being used for image processing. For example, machine learning models can be used to remove noise from images or convert low-resolution images to high-resolution images. Accordingly, an image having low quality due to limited performance such as a camera module of a smartphone or a limited environment such as low illumination may be converted into a high quality image by using a machine learning model.

본 개시의 기술적 사상은, 가중화된 다중 커널에 기초하여 낮은 품질의 이미지로부터 높은 품질의 이미지를 생성하는 이미지 처리 방법 및 시스템을 제공한다.The technical idea of the present disclosure provides an image processing method and system for generating a high quality image from a low quality image based on a weighted multi-kernel.

본 개시의 기술적 사상의 일측면에 따라, 복수의 이미지들을 처리하기 위한 방법은, 복수의 이미지들을 포함하는 입력 데이터를 획득하는 단계, 입력 데이터를 제1 기계 학습 모델에 제공하는 단계, 제1 기계 학습 모델의 출력을 제2 기계 학습 모델 및 제3 기계 학습 모델에 제공하는 단계, 제2 기계 학습 모델의 출력에 기초하여, 복수의 커널들에 대응하는 제1 피처 맵을 생성하는 단계, 제3 기계 학습 모델의 출력에 기초하여, 복수의 가중치들에 대응하는 제2 피처 맵을 생성하는 단계, 복수의 커널들의 가중합에 기초하여 예측된 커널을 생성하는 단계, 및 입력 데이터 및 예측된 커널에 기초하여, 출력 데이터를 생성하는 단계를 포함할 수 있다.According to one aspect of the technical idea of the present disclosure, a method for processing a plurality of images includes obtaining input data including a plurality of images, providing the input data to a first machine learning model, and a first machine learning model. Providing an output of the learning model to a second machine learning model and a third machine learning model; generating a first feature map corresponding to a plurality of kernels based on the output of the second machine learning model; Based on the output of the machine learning model, generating a second feature map corresponding to a plurality of weights, generating a predicted kernel based on a weighted sum of a plurality of kernels, and based on the input data and the predicted kernel Based on the method, generating output data may be included.

본 개시의 기술적 사상의 일측면에 따른 시스템은, 적어도 하나의 프로세서, 및 적어도 하나의 프로세서에 의해서 실행시, 적어도 하나의 프로세서로 하여금 이미지 처리를 수행하도록하는 명령어들을 저장하는 비일시적 저장 매체를 포함할 수 있고, 이미지 처리는, 복수의 이미지들을 포함하는 입력 데이터를 획득하는 단계, 입력 데이터를 제1 기계 학습 모델에 제공하는 단계, 제1 기계 학습 모델의 출력을 제2 기계 학습 모델 및 제3 기계 학습 모델에 제공하는 단계, 제2 기계 학습 모델의 출력에 기초하여, 복수의 커널들에 대응하는 제1 피처 맵을 생성하는 단계, 제3 기계 학습 모델의 출력에 기초하여, 복수의 가중치들에 대응하는 제2 피처 맵을 생성하는 단계, 복수의 커널들의 가중합에 기초하여 예측된 커널을 생성하는 단계, 및 입력 데이터 및 예측된 커널에 기초하여, 출력 데이터를 생성하는 단계를 포함할 수 있다.A system according to one aspect of the technical idea of the present disclosure includes at least one processor, and a non-transitory storage medium storing instructions that, when executed by the at least one processor, cause the at least one processor to perform image processing. The image processing may include obtaining input data including a plurality of images, providing the input data to a first machine learning model, and outputting the output of the first machine learning model to a second machine learning model and a third machine learning model. Providing to the machine learning model, generating a first feature map corresponding to the plurality of kernels based on the output of the second machine learning model, and generating a plurality of weights based on the output of the third machine learning model. Generating a second feature map corresponding to , generating a predicted kernel based on a weighted sum of a plurality of kernels, and generating output data based on the input data and the predicted kernel. there is.

본 개시의 기술적 사상의 일측면에 따른 비일시적 컴퓨터 판독가능 저장 매체는, 명령어들을 포함할 수 있고, 명령어들은, 적어도 하나의 프로세서에 의해서 실행시 적어도 하나의 프로세서로 하여금 이미지 처리를 수행하도록 할 수 있고, 이미지 처리는, 복수의 이미지들을 포함하는 입력 데이터를 획득하는 단계, 입력 데이터를 제1 기계 학습 모델에 제공하는 단계, 제1 기계 학습 모델의 출력을 제2 기계 학습 모델 및 제3 기계 학습 모델에 제공하는 단계, 제2 기계 학습 모델의 출력에 기초하여, 복수의 커널들에 대응하는 제1 피처 맵을 생성하는 단계, 제3 기계 학습 모델의 출력에 기초하여, 복수의 가중치들에 대응하는 제2 피처 맵을 생성하는 단계, 복수의 커널들의 가중합에 기초하여 예측된 커널을 생성하는 단계, 및 입력 데이터 및 예측된 커널에 기초하여, 출력 데이터를 생성하는 단계를 포함할 수 있다.A non-transitory computer-readable storage medium according to one aspect of the technical idea of the present disclosure may include instructions, and the instructions, when executed by at least one processor, may cause the at least one processor to perform image processing. The image processing includes obtaining input data including a plurality of images, providing the input data to a first machine learning model, and converting the output of the first machine learning model to a second machine learning model and a third machine learning model. Providing a first feature map to a model, generating a first feature map corresponding to a plurality of kernels based on an output of a second machine learning model, and corresponding to a plurality of weights based on an output of a third machine learning model. generating a second feature map, generating a predicted kernel based on a weighted sum of a plurality of kernels, and generating output data based on the input data and the predicted kernel.

본 개시의 예시적 실시예에 따른 방법 및 시스템에 의하면, 다중 커널을 사용함으로써 적용가능한 다양한 범위들이 고려될 수 있는 동시에, 커널들 사이 중요도를 반영함으로써 보다 높은 품질의 이미지가 생성될 수 있다.According to the method and system according to an exemplary embodiment of the present disclosure, various applicable ranges can be considered by using multiple kernels, and a higher quality image can be generated by reflecting the importance between kernels.

또한, 본 개시의 예시적 실시예에 따른 방법 및 시스템에 의하면, 다중 이미지로부터 정확하게 정렬된 이미지가 생성될 수 있고, 정확하게 정렬된 이미지로부터 높은 품질의 초해상도(super resolution) 이미지가 생성될 수 있다.In addition, according to the method and system according to the exemplary embodiments of the present disclosure, an accurately aligned image can be generated from multiple images, and a high quality super resolution image can be generated from the accurately aligned image. .

또한, 본 개시의 예시적 실시예에 따른 방법 및 시스템에 의하면, 제한된 성능 및 환경에서 생성된 이미지로부터 높은 품질의 이미지가 생성될 수 있고, 이에 따라 높은 품질의 이미지를 활용하는 어플리케이션들의 유용성이 증대될 수 있다.In addition, according to the method and system according to exemplary embodiments of the present disclosure, a high-quality image can be generated from an image generated in an environment with limited performance, and accordingly, the usefulness of applications utilizing high-quality images increases. It can be.

본 개시의 예시적 실시예들에서 얻을 수 있는 효과는 이상에서 언급한 효과들로 제한되지 아니하며, 언급되지 아니한 다른 효과들은 이하의 설명으로부터 본 개시의 예시적 실시예들이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 도출되고 이해될 수 있다. 즉, 본 개시의 예시적 실시예들을 실시함에 따른 의도하지 아니한 효과들 역시 본 개시의 예시적 실시예들로부터 당해 기술분야의 통상의 지식을 가진 자에 의해 도출될 수 있다.Effects obtainable in the exemplary embodiments of the present disclosure are not limited to the effects mentioned above, and other effects not mentioned are common knowledge in the art to which exemplary embodiments of the present disclosure belong from the following description. can be clearly derived and understood by those who have That is, unintended effects according to the implementation of the exemplary embodiments of the present disclosure may also be derived by those skilled in the art from the exemplary embodiments of the present disclosure.

도 1은 본 개시의 예시적 실시예에 따른 이미지 처리를 나타내는 블록도이다.
도 2는 본 개시의 예시적 실시예에 따른 이미지 처리를 나타내는 도면이다.
도 3은 본 개시의 예시적 실시예에 따른 피처 맵의 예시들을 나타내는 도면이다.
도 4는 본 개시의 예시적 실시예에 따른 이미지 처리를 나타내는 도면이다.
도 5는 본 개시의 예시적 실시예에 따른 이미지 처리를 나타내는 도면이다.
도 6은 본 개시의 예시적 실시예에 따른 이미지 처리를 나타내는 도면이다.
도 7은 본 개시의 예시적 실시예에 따른 이미지 처리 방법을 나타내는 순서도이다.
도 8은 본 개시의 예시적 실시예에 다른 이미지 처리 방법을 나타내는 순서도이다.
도 9는 본 개시의 예시적 실시예에 따른 이미지 처리 방법을 나타내는 순서도이다.
도 10은 본 개시의 예시적 실시예에 따른 컴퓨터 시스템(100)을 나타내는 블록도이다.
도 11은 본 개시의 예시적 실시예에 따른 장치(110)를 나타내는 블록도이다.1 is a block diagram illustrating image processing according to an exemplary embodiment of the present disclosure.
2 is a diagram illustrating image processing according to an exemplary embodiment of the present disclosure.
3 is a diagram illustrating examples of feature maps according to an exemplary embodiment of the present disclosure.
4 is a diagram illustrating image processing according to an exemplary embodiment of the present disclosure.
5 is a diagram illustrating image processing according to an exemplary embodiment of the present disclosure.
6 is a diagram illustrating image processing according to an exemplary embodiment of the present disclosure.
7 is a flowchart illustrating an image processing method according to an exemplary embodiment of the present disclosure.
8 is a flowchart illustrating an image processing method according to an exemplary embodiment of the present disclosure.
Fig. 9 is a flowchart illustrating an image processing method according to an exemplary embodiment of the present disclosure.
10 is a block diagram illustrating a computer system 100 according to an exemplary embodiment of the present disclosure.
11 is a block diagram illustrating a device 110 according to an exemplary embodiment of the present disclosure.

도 1은 본 개시의 예시적 실시예에 따른 이미지 처리(10)를 나타내는 블록도이다. 이미지 처리(10)는 이미지를 포함하는 입력 데이터(DIN)를 처리함으로써 이미지를 포함하는 출력 데이터(DOUT)를 생성할 수 있다. 도 1에 도시된 바와 같이, 이미지 처리는 제1 모델(11), 제2 모델(12), 제3 모델(13), 제1 후처리(14), 제2 후처리(15), 커널 생성(16) 및 재건(17)을 포함할 수 있다.1 is a block diagram illustrating image processing 10 according to an exemplary embodiment of the present disclosure. The image processing 10 may generate output data DOUT including an image by processing input data DIN including the image. As shown in FIG. 1, image processing includes a first model 11, a second model 12, a third model 13, a first post-processing 14, a second post-processing 15, and kernel generation. (16) and reconstruction (17).

일부 실시예들에서, 도 1의 이미지 처리(10)는, 도 10 및 도 11을 참조하여 후술되는 바와 같이 컴퓨팅 시스템에 의해서 수행될 수 있다. 예를 들면, 도 1에 도시된 블록들 각각은 컴퓨팅 시스템에 포함된 하드웨어, 소프트웨어 또는 하드웨어와 소프트웨어의 조합에 대응할 수 있다. 하드웨어는, CPU(central processing unit), DSP(digital signal processor), GPU(graphics processing unit)와 같은 프로그램가능(programmable) 컴포넌트, FPGA(field programmable gate array)와 같은 재구성가능(reconfigurable) 컴포넌트 및 IP(intellectual property) 블록과 같은 고정된 기능을 제공하는 컴포넌트 중 적어도 하나를 포함할 수 있다. 소프트웨어는, 프로그램가능 컴포넌트에 의해서 실행가능한 일련의 명령어들 및 컴파일러 등에 의해서 일련의 명령어들로 변환가능한 코드 중 적어도 하나를 포함할 수 있고, 비일시적(non-transitory) 저장 매체에 저장될 수 있다.In some embodiments, image processing 10 of FIG. 1 may be performed by a computing system as described below with reference to FIGS. 10 and 11 . For example, each of the blocks shown in FIG. 1 may correspond to hardware, software, or a combination of hardware and software included in a computing system. Hardware includes programmable components such as central processing units (CPUs), digital signal processors (DSPs), graphics processing units (GPUs), reconfigurable components such as field programmable gate arrays (FPGAs), and IP ( intellectual property) may include at least one of components providing fixed functions such as blocks. The software may include at least one of a sequence of instructions executable by a programmable component and a code convertible into a sequence of instructions by a compiler or the like, and may be stored in a non-transitory storage medium.

도 1을 참조하면, 제1 모델(11)은 기계 학습 모델로서 입력 데이터(DIN)를 수신할 수 있고, 제1 출력(OUT1)을 생성할 수 있다. 일부 실시예들에서, 입력 데이터(DIN)는 복수의 이미지들을 포함할 수 있고, 제1 모델(11)은 입력 데이터(DIN)에 포함된 복수의 이미지들로부터 피처들을 추출하도록 트레이닝될 수 있다. 이에 따라, 제1 모델(11)이 생성하는 제1 출력(OUT1)은 입력 데이터(DIN)에 포함된 복수의 이미지들에 대한 피처 맵(feature map)에 대응할 수 있다. 제1 모델(11)의 예시가 도 2를 참조하여 후술될 것이다.Referring to FIG. 1 , the first model 11 is a machine learning model and may receive input data DIN and generate a first output OUT1. In some embodiments, the input data DIN may include a plurality of images, and the first model 11 may be trained to extract features from the plurality of images included in the input data DIN. Accordingly, the first output OUT1 generated by the first model 11 may correspond to a feature map of a plurality of images included in the input data DIN. An example of the first model 11 will be described later with reference to FIG. 2 .

본 명세서에서, 기계 학습 모델은 트레이닝 가능한 임의의 구조를 가질 수 있다. 예를 들면, 기계 학습 모델은, 인공 신경망(artificial neural network), 결정 트리(decision tree), 서포트 벡터 머신(support vector machine), 베이즈 네트워크(Bayesian network) 및/또는 유전 알고리즘(genetic algorithm) 등을 포함할 수 있다. 이하에서, 기계 학습 모델은 인공 신경망을 주로 참조하여 설명될 것이나, 본 개시의 예시적 실시예들이 이에 제한되지 아니하는 점이 유의된다. 인공 신경망은, 비제한적인 예시로서, CNN(Convolution Neural Network), R-CNN(Region with Convolution Neural Network), RPN(Region Proposal Network), RNN(Recurrent Neural Network), S-DNN(Stacking-based Deep Neural Network), S-SDNN(State-Space Dynamic Neural Network), Deconvolution Network, DBN(Deep Belief Network), RBM(Restricted Boltzmann Machine), Fully Convolutional Network, LSTM(Long Short-Term Memory) Network, Classification Network 등을 포함할 수 있다. 본 명세서에서, 기계 학습 모델은 단순하게 모델로 지칭될 수 있다.In this specification, a machine learning model may have any structure capable of being trained. For example, a machine learning model may be an artificial neural network, a decision tree, a support vector machine, a Bayesian network, and/or a genetic algorithm, and the like. can include In the following, a machine learning model will be described mainly with reference to an artificial neural network, but it is noted that exemplary embodiments of the present disclosure are not limited thereto. Artificial neural networks, as non-limiting examples, include a convolution neural network (CNN), a region with convolution neural network (R-CNN), a region proposal network (RPN), a recurrent neural network (RNN), and a stacking-based deep neural network (S-DNN). Neural Network), S-SDNN (State-Space Dynamic Neural Network), Deconvolution Network, DBN (Deep Belief Network), RBM (Restricted Boltzmann Machine), Fully Convolutional Network, LSTM (Long Short-Term Memory) Network, Classification Network, etc. can include In this specification, a machine learning model may simply be referred to as a model.

제2 모델(12)은, 제1 모델(11)로부터 제1 출력(OUT1)을 수신할 수 있고, 제2 출력(OUT2)을 생성할 수 있다. 제2 모델(12)은, 후술되는 재건(17)에서 입력 데이터(DIN)와 연산될 커널(K)을 예측하기 위해 트레이닝될 수 있다. 예를 들면, 제2 모델(12)은, 도 4를 참조하여 후술되는 바와 같이, 상이한 크기의 복수의 커널들을 출력하도록 트레이닝 될 수 있고, 이에 따라 제2 출력(OUT2)은 복수의 커널들에 대응할 수 있다.The second model 12 may receive the first output OUT1 from the first model 11 and generate a second output OUT2. The second model 12 may be trained to predict the input data DIN and the kernel K to be operated in the reconstruction 17 described later. For example, the second model 12 may be trained to output a plurality of kernels of different sizes, as will be described later with reference to FIG. can respond

제3 모델(13)은, 제1 모델(11)로부터 제1 출력(OUT1)을 수신할 수 있고, 제3 출력(OUT3)을 생성할 수 있다. 제3 모델(13)은, 후술되는 재건(17)에서 입력 데이터(DIN)와 연산될 커널(K)을 예측하기 위해 트레이닝될 수 있다. 예를 들면, 제3 모델(13)은, 도 4를 참조하여 후술되는 바와 같이, 상이한 크기의 복수의 커널들 각각에 대응하는 복수의 가중치들을 출력하도록 트레이닝될 수 있고, 이에 따라 제3 출력(OUT3)은 복수의 가중치들에 대응할 수 있다.The third model 13 may receive the first output OUT1 from the first model 11 and generate a third output OUT3. The third model 13 may be trained to predict the kernel K to be operated with the input data DIN in the reconstruction 17 described later. For example, as will be described later with reference to FIG. 4 , the third model 13 may be trained to output a plurality of weights corresponding to each of a plurality of kernels having different sizes, and thus a third output ( OUT3) may correspond to a plurality of weights.

도 1에 도시된 바와 상이하게, 제2 모델(12)이 출력하는 제2 출력(OUT2)에만 기초하여 커널이 예측되는 경우, 다중 커널이 이미지의 위치와 무관하게 동일하게 적용될 수 있다. 예를 들면, 이미지는 높은 주파수를 가지는 영역 및 낮은 주파수를 가지는 영역을 포함할 수 있고, 해당 영역들에 다중 커널이 동일하게 적용되는 것은 최종 이미지의 품질을 제한할 수 있다. 다른 한편으로, 도 1의 이미지 처리(10)는 커널 예측을 위하여 제2 모델(12)을 포함하는 신호 처리 경로(본 명세서에서, 제1 브랜치로 지칭될 수 있다)뿐만 아니라, 다중 커널의 중요도를 반영하기 위하여 제3 모델(13)을 포함하는 신호 처리 경로(본 명세서에서, 제2 브랜치로 지칭될 수 있다)를 포함할 수 있고, 이에 따라 입력 데이터(DIN)로부터 보다 높은 품질의 이미지를 포함하는 출력 데이터(DOUT)를 생성하는 것을 가능하게 할 수 있다. Unlike that shown in FIG. 1 , when a kernel is predicted based only on the second output OUT2 output from the second model 12 , multiple kernels may be equally applied regardless of the position of the image. For example, an image may include a region having a high frequency and an region having a low frequency, and the same application of multiple kernels to the corresponding regions may limit the quality of the final image. On the other hand, the image processing 10 of FIG. 1 includes a signal processing path (which may be referred to herein as a first branch) including a second model 12 for kernel prediction, as well as the importance of multiple kernels. It may include a signal processing path (in this specification, which may be referred to as a second branch) including a third model 13 to reflect the image of higher quality from the input data DIN. It may be possible to generate output data (DOUT) including.

제1 후처리(14)는 제2 출력(OUT2)으로부터 제1 피처 맵(FM1)을 생성할 수 있다. 예를 들면, 제1 후처리(14)는 후술되는 커널 생성(16)에서 복수의 커널들이 추출될 수 있도록, 제2 출력(OUT2)을 재조직(reshape)함으로써 제1 피처 맵(FM1)을 생성할 수 있다. 제1 후처리(14)에 의해서 생성되는 제1 피처 맵(FM1)의 예시가 도 3을 참조하여 후술될 것이다.The first post-processing 14 may generate a first feature map FM1 from the second output OUT2. For example, the first post-processing 14 generates a first feature map FM1 by reshaping the second output OUT2 so that a plurality of kernels can be extracted in a kernel generation 16 described later. can do. An example of the first feature map FM1 generated by the first post-processing 14 will be described later with reference to FIG. 3 .

제2 후처리(15)는 제3 출력(OUT3)으로부터 제2 피처 맵(FM2)을 생성할 수 있다. 예를 들면, 제2 후처리(14)는 후술되는 커널 생성(16)에서 복수의 가중치들이 추출될 수 있도록, 제3 출력(OUT3)을 재조직함으로써 제2 피처 맵(FM2)을 생성할 수 있다. 제2 후처리(15)에 의해서 생성되는 제2 피처 맵(FM2)의 예시가 도 3을 참조하여 후술될 것이다.The second post-processing 15 may generate a second feature map FM2 from the third output OUT3. For example, the second post-processing 14 may generate the second feature map FM2 by reorganizing the third output OUT3 so that a plurality of weights may be extracted in a kernel generation 16 described later. . An example of the second feature map FM2 generated by the second post-processing 15 will be described later with reference to FIG. 3 .

커널 생성(16)은 제1 피처 맵(FM1) 및 제2 피처 맵(FM2)로부터 커널(K)을 생성할 수 있다. 전술된 바와 같이, 제1 피처 맵(FM1)은 복수의 커널들에 대응할 수 있고, 제2 피처 맵(FM2)은 복수의 가중치들에 대응할 수 있다. 커널 생성(16)은 복수의 가중치들에 기초하여 복수의 커널들의 각각의 중요도를 식별할 수 있고, 식별된 중요도에 기초하여 복수의 커널들로부터 커널(K)을 생성할 수 있다. 커널 생성(16)의 예시가 도 4를 참조하여 후술될 것이다.Kernel generation 16 may generate a kernel K from the first feature map FM1 and the second feature map FM2. As described above, the first feature map FM1 may correspond to a plurality of kernels, and the second feature map FM2 may correspond to a plurality of weights. Kernel generation 16 may identify an importance of each of the plurality of kernels based on a plurality of weights, and may generate a kernel K from the plurality of kernels based on the identified importance. An example of kernel generation 16 will be described below with reference to FIG. 4 .

재건(17)은 입력 데이터(DIN) 및 커널(K)로부터 출력 데이터(DOUT)를 생성할 수 있다. 전술된 바와 같이, 입력 데이터(DIN)는 복수의 이미지들을 포함할 수 있고, 복수의 이미지들 각각은 커널(K)과 연산될 수 있다. 출력 데이터(DOUT)는 복수의 이미지들 각각이 커널(K)이 연산됨으로써 생성된 복수의 이미지들을 포함할 수 있다. 출력 데이터(DOUT)에 포함된 이미지는 입력 데이터(DIN)에 포함된 이미지보다 높은 품질을 가질 수 있다. 예를 들면, 출력 데이터(DOUT)에 포함된 이미지는, 입력 데이터(DIN)에 포함된 이미지에서 노이즈가 제거된 이미지에 대응할 수도 있고, 입력 데이터(DIN)에 포함된 이미지가 정렬된 이미지에 대응할 수도 있다.Reconstruction 17 may generate output data DOUT from input data DIN and kernel K. As described above, the input data DIN may include a plurality of images, and each of the plurality of images may be operated with a kernel K. The output data DOUT may include a plurality of images generated by calculating the kernel K for each of the plurality of images. An image included in the output data DOUT may have a higher quality than an image included in the input data DIN. For example, an image included in the output data DOUT may correspond to an image from which noise is removed from an image included in the input data DIN, or an image included in the input data DIN may correspond to an aligned image. may be

도 2는 본 개시의 예시적 실시예에 따른 이미지 처리를 나타내는 도면이다. 구체적으로, 도 2는 도 1의 제1 모델(11), 제2 모델(12) 및 제3 모델(13)의 예시들을 나타낸다. 2 is a diagram illustrating image processing according to an exemplary embodiment of the present disclosure. Specifically, FIG. 2 shows examples of the first model 11 , the second model 12 and the third model 13 of FIG. 1 .

도 2를 참조하면, 복수의 이미지들을 포함하는 입력 데이터(DIN)가 제1 모델(21)에 제공될 수 있다. 예를 들면, 도 2에 도시된 바와 같이, 입력 데이터(DIN)는 W*H의 해상도(resolution)를 가지는 T*C개의 이미지들을 포함할 수 있다. T는 각각이 하나의 장면에 대응하는 이미지 프레임들의 수일 수 있고, C는 이미지 프레임의 채널들의 수일 수 있다. 예를 들면, 이미지 프레임이 RGB 포맷을 가지는 경우 C는 3일 수 있는 한편, 이미지 프레임이 베이어(Bayer) 포맷을 가지는 경우 C는 4일 수 있다. 입력 데이터(DIN)에서 복수의 이미지들은, 도 2에 도시된 바와 같이 복수의 이미지들의 픽셀들이 중첩되도록 정렬될 수 있다.Referring to FIG. 2 , input data DIN including a plurality of images may be provided to the first model 21 . For example, as shown in FIG. 2 , the input data DIN may include T*C images having a resolution of W*H. T may be the number of image frames each corresponding to one scene, and C may be the number of channels of the image frame. For example, C can be 3 if the image frame has RGB format, while C can be 4 if the image frame has Bayer format. In the input data DIN, a plurality of images may be aligned so that pixels of the plurality of images overlap, as shown in FIG. 2 .

일부 실시예들에서, 입력 데이터(DIN)에 포함된 복수의 이미지들은 동일한 대상을 반복하여 촬영함으로써 생성된 이미지 프레임들에 대응할 수 있다. 예를 들면, 스마트 폰의 카메라 모듈과 같이 제한된 성능에도 불구하고 높은 품질의 이미지를 생성하기 위하여, 동일한 대상이 반복하여 촬영될 수 있고, 생성된 이미지 프레임들에 기초하여 높은 품질의 이미지 프레임이 생성될 수 있다. 일부 실시예들에서, 입력 데이터(DIN)에 포함된 이미지들은 큰 크기의 소스 이미지로부터 분할된 이미지들일 수 이다. 즉, 이미지 처리의 복잡도(complexity)를 감소시키기 위하여, 큰 크기의 소스 이미지가 복수의 이미지들로 분할될 수 있고, 분할된 이미지가 제1 모델(21)에 제공될 수 있다.In some embodiments, a plurality of images included in the input data DIN may correspond to image frames generated by repeatedly photographing the same object. For example, in order to generate a high quality image despite limited performance like a camera module of a smart phone, the same subject may be photographed repeatedly, and a high quality image frame is generated based on the generated image frames. It can be. In some embodiments, images included in the input data DIN may be divided images from a large-sized source image. That is, in order to reduce the complexity of image processing, a large-sized source image may be divided into a plurality of images, and the divided images may be provided to the first model 21 .

일부 실시예들에서, 제1 모델(21)은 U-Net에 기초할 수 있다. U-Net은 end-to-end 방식의 완전 합성곱(fully convolution) 네트워크일 수 있다. 도 2에 도시된 바와 같이, U-Net은 합성곱 레이어들, 바이리니어 업샘플링(bilinear upsampling) 레이어들, 평균 레이어들 및 어텐션 레이어들을 포함할 수 있다. 트레이닝된 제1 모델(21)은 입력 데이터(DIN)에 포함된 복수의 이미지들로의 피처들을 포함하는 제1 출력(OUT1)을 생성할 수 있고, 도 2에 도시된 바와 같이, 제1 출력(OUT1)은 제2 모델(22) 및 제3 모델(23)에 공통으로 제공될 수 있다.In some embodiments, the first model 21 may be based on U-Net. U-Net may be an end-to-end fully convolution network. As shown in FIG. 2, U-Net may include convolution layers, bilinear upsampling layers, average layers, and attention layers. The trained first model 21 may generate a first output OUT1 including features of a plurality of images included in the input data DIN, and as shown in FIG. 2 , the first output (OUT1) may be commonly provided to the second model 22 and the third model 23.

제2 모델(22)은 제1 출력(OUT1)으로부터 제2 출력(OUT2)을 생성할 수 있다. 도 2에 도시된 바와 같이, 제2 모델(22)은 복수의 합성곱 레이어들을 포함할 할 수 있다. 또한, 제3 모델(23)은 제1 출력(OUT1)으로부터 제3 출력(OUT3)을 생성할 수 있다. 도 2에 도시된 바와 같이, 제3 모델(23)은 복수의 합성곱 레이어들을 포함할 수 있다. 일부 실시예들에서, 제1 출력(OUT1)으로부터 복수의 커널들에 대응하는 제2 출력(OUT2)을 생성하기 위한 제2 모델(22)은, 제1 출력(OUT1)으로부터 복수의 가중치들에 대응하는 제3 출력(OUT3)을 생성하기 위한 제3 모델(23)보다 많은 수의 합성곱 레이어들을 포함할 수 있다. 도 1을 참조하여 전술된 바와 같이, 제2 출력(OUT2) 및 제3 출력(OUT3) 각각 후처리될 수 있고, 후처리에서 각각 재조직될 수 있다.The second model 22 may generate a second output OUT2 from the first output OUT1. As shown in FIG. 2 , the second model 22 may include a plurality of convolutional layers. Also, the third model 23 may generate a third output OUT3 from the first output OUT1. As shown in FIG. 2 , the third model 23 may include a plurality of convolutional layers. In some embodiments, the second model 22 for generating the second output OUT2 corresponding to the plurality of kernels from the first output OUT1 depends on the plurality of weights from the first output OUT1. A greater number of convolutional layers than the third model 23 for generating the corresponding third output OUT3 may be included. As described above with reference to FIG. 1 , each of the second output OUT2 and the third output OUT3 may be post-processed and reorganized in the post-processing.

도 3은 본 개시의 예시적 실시예에 따른 피처 맵의 예시들을 나타내는 도면이다. 구체적으로, 도 3은 도 1의 제1 피처 맵(FM1) 및 제2 피처 맵(FM2)의 예시들을 나타낸다. 도 1을 참조하여 전술된 바와 같이, 제1 피처 맵(FM1)은 제1 브랜치에서 제2 모델(12)의 제2 출력(OUT2)을 후처리함으로써 생성될 수 있고, 제2 피처 맵(FM2)은 제2 브랜치에서 제3 모델(13)의 제3 출력(OTU3)을 후처리함으로써 생성될 수 있다. 이하에서, 도 3은 도 1을 참조하여 설명될 것이다.3 is a diagram illustrating examples of feature maps according to an exemplary embodiment of the present disclosure. Specifically, FIG. 3 shows examples of the first feature map FM1 and the second feature map FM2 of FIG. 1 . As described above with reference to FIG. 1 , the first feature map FM1 may be generated by post-processing the second output OUT2 of the second model 12 in the first branch, and the second feature map FM2 ) may be generated by post-processing the third output OTU3 of the third model 13 in the second branch. In the following, FIG. 3 will be described with reference to FIG. 1 .

도 3을 참을 참조하면, 제2 모델(12)의 제2 출력(OUT2)은, 도 1의 제1 후처리(14)에서 복수의 커널들이 추출되도록 재조직될 수 있다. 예를 들면, 도 3에 도시된 바와 같이, 제1 피처 맵(FM1)은 입력 데이터(DIN)에 포함된 이미지와 같이 W*H의 해상도를 가지는 np*3TC개의 슬라이스들을 포함할 수 있다. 여기서 n은 영(zero)보다 큰 정수일 수 있고, 3은 최종 이미지가 3개의 채널들(즉, RGB)을 가지는 것을 의미할 수 있고, T는 이미지 프레임들의 수일 수 있고, C는 채널들의 수일 수 있다. p는 복수의 커널들의 크기들의 합일 수 있다. 예를 들면, 도 4를 참조하여 후술되는 바와 같이, 총 4개의 커널들이 1, 3, 5 및 7의 크기를 각각 가지는 경우, p는 16일 수 있다(p = 1+3+5+7). 이에 따라, 상이한 크기의 커널들이 채널 방향으로, 즉 도 3에서 가로축 방향으로 추출될 수 있다.Referring to FIG. 3 , the second output OUT2 of the second model 12 may be reorganized so that a plurality of kernels are extracted in the first post-processing 14 of FIG. 1 . For example, as shown in FIG. 3 , the first feature map FM1 may include np*3TC slices having a resolution of W*H like the image included in the input data DIN. Here, n may be an integer greater than zero, 3 may mean that the final image has three channels (ie, RGB), T may be the number of image frames, and C may be the number of channels. there is. p may be the sum of sizes of a plurality of kernels. For example, as will be described later with reference to FIG. 4, when a total of four kernels have sizes of 1, 3, 5, and 7, respectively, p may be 16 (p = 1 + 3 + 5 + 7) . Accordingly, kernels of different sizes can be extracted in the channel direction, that is, in the horizontal axis direction in FIG. 3 .

도 3을 참조하면, 제3 모델(13)의 제3 출력(OUT3)은, 도 1의 제2 후처리(15)에서 복수의 가중치들이 추출되도록 재조직될 수 있다. 예를 들면, 도 3에 도시된 바와 같이, 제2 피처 맵(FM2)은 입력 데이터(DIN)에 포함된 이미지와 같이 W*H의 해상도를 가지는 3TM개의 슬라이스들을 포함할 수 있다. 여기서 3은 최종 이미지가 3개의 채널들(즉, RGB)을 가지는 것을 의미할 수 있고, T는 이미지 프레임들의 수일 수 있고, M은 복수의 커널들의 수일 수 있다. 예를 들면, 도 4를 참조하여 후술되는 바와 같이, 총 4개의 커널들이 사용되는 경우, M은 4일 수 있다. 이에 따라, 복수의 커널들 각각에 대응하는 가중치들이 채널 방향으로, 즉 도 3에서 가로축 방향으로 추출될 수 있다.Referring to FIG. 3 , the third output OUT3 of the third model 13 may be reorganized so that a plurality of weights are extracted in the second post-processing 15 of FIG. 1 . For example, as shown in FIG. 3 , the second feature map FM2 may include 3TM slices having a resolution of W*H like the image included in the input data DIN. Here, 3 may mean that the final image has three channels (ie, RGB), T may be the number of image frames, and M may be the number of a plurality of kernels. For example, as described later with reference to FIG. 4 , M may be 4 when a total of 4 kernels are used. Accordingly, weights corresponding to each of the plurality of kernels may be extracted in a channel direction, that is, in a horizontal axis direction in FIG. 3 .

도 4는 본 개시의 예시적 실시예에 따른 이미지 처리를 나타내는 도면이다. 구체적으로, 도 4는 도 1의 커널 생성(16)의 예시로서 하나의 픽셀에 대응하는 커널이 생성되는 예시를 나타낸다. 도 1을 참조하여 전술된 바와 같이, 커널 생성(16)에서 제1 피처 맵(FM1) 및 제2 피처 맵(FM2)으로부터 커널(K)이 생성될 수 있다.4 is a diagram illustrating image processing according to an exemplary embodiment of the present disclosure. Specifically, FIG. 4 shows an example of generating a kernel corresponding to one pixel as an example of the kernel generation 16 of FIG. 1 . As described above with reference to FIG. 1 , a kernel K may be generated from the first feature map FM1 and the second feature map FM2 in kernel generation 16 .

도 3을 참조하여 전술된 바와 같이, 제1 피처 맵(FM1)은 W*H의 해상도를 가지는 np*3TC개의 슬라이스들을 포함할 수 있다. 도 4에서, 1, 3, 5 및 7의 크기를 각각 가지는 총 4개의 커널들, 즉 제1 내지 제4 커널(K₁ 내지 K₄)이 생성될 수 있고, 이에 따라 p는 16일 수 있다. 또한, 도 4의 예시에서 제1 피처 맵(FM1) 및 제2 피처 맵(FM2)은 베이어 포맷을 가지는 입력 데이터(DIN)로부터 생성될 수 있고(즉, C = 4), 이에 따라 도 4에 도시된 바와 같이, 제1 내지 제4 커널(K₁ 내지 K₄)은 채널 방향으로 4개의 텐서들을 포함할 수 있다.As described above with reference to FIG. 3 , the first feature map FM1 may include np*3TC slices having a resolution of W*H. In FIG. 4, a total of four kernels having sizes of 1, 3, 5, and 7, that is, first to fourth kernels (K ₁ to K ₄ ) may be generated, and thus p may be 16. . In addition, in the example of FIG. 4 , the first feature map FM1 and the second feature map FM2 may be generated from input data DIN having a Bayer format (that is, C = 4), and accordingly, in FIG. 4 As shown, the first to fourth kernels K ₁ to K ₄ may include four tensors in a channel direction.

커널 생성(16)에서, 제1 피처 맵(FM1)은 채널 방향으로 T개의 피처 맵들(FM1₁ 내지 FM1_T)로 분할될 수 있다. 도 4에 도시된 바와 같이, 분할된 하나의 피처 맵(FM1₁)에서 하나의 픽셀에 대응하는 부분으로부터 제1 내지 제4 커널(K₁ 내지 K₄)이 추출될 수 있다. 제1 내지 제4 커널(K₁ 내지 K₄)은 하나의 커널 그룹에 포함될 수 있고, 최종 이미지의 3개 채널들(즉, RGB)에 대응하는 3개의 커널 그룹들이 생성될 수 있다. 입력 이미지

로부터 생성된 커널

은 아래 [수학식 1]과 같이 표현될 수 있다.In kernel generation 16 , the first feature map FM1 may be divided into T feature maps FM1 ₁ to FM1 _T in the channel direction. As shown in FIG. 4 , first to fourth kernels K ₁ to K ₄ may be extracted from a portion corresponding to one pixel in one divided feature map FM1 ₁ . The first to fourth kernels K ₁ to K ₄ may be included in one kernel group, and three kernel groups corresponding to three channels (ie, RGB) of the final image may be generated. input image

Kernel generated from

Can be expressed as in [Equation 1] below.

[수학식 1]에서, S는 커널 그룹이고, i는 이미지 인덱스이고, B_k는 복수의 커널들을 예측하기 위한 모델, 즉 도 1의 제1 모델(11) 및 제2 모델(12)을 포함하는 모델이고, x 및 y는 픽셀 좌표를 나타낼 수 있다.In [Equation 1], S is a kernel group, i is an image index, and B _k is a model for predicting a plurality of kernels, that is, includes the first model 11 and the second model 12 of FIG. 1 is a model, and x and y may represent pixel coordinates.

도 3을 참조하여 전술된 바와 같이, 제2 피처 맵(FM2)은 W*H의 해상도를 가지는 3TM개의 슬라이스들을 포함할 수 있다. 도 4에 도시된 바와 같이, 4개의 커널들이 생성될 수 있으므로, M은 4일 수 있다. 제1 피처 맵(FM1)과 유사하게, 커널 생성(16)에서, 제2 피처 맵(FM2)은 채널 방향으로 T개의 피처 맵들(FM2₁ 내지 FM2_T)로 분할될 수 있다. 도 4에 도시된 바와 같이, 분할된 하나의 피처 맵(FM2₁)에서 하나의 픽셀에 대응하는 부분으로부터 제1 내지 제4 가중치(w₁ 내지 w₄)가 추출될 수 있다. 제1 내지 제4 가중치(w₁ 내지 w₄)는 하나의 가중치 그룹에 포함될 수 있고, 최종 이미지의 3개 채널들(즉, RGB)에 대응하는 3개의 가중치 그룹들이 생성될 수 있다. 입력 이미지

로부터 생성된 가중치

은 아래 [수학식 2]와 같이 표현될 수 있다.As described above with reference to FIG. 3 , the second feature map FM2 may include 3TM slices having a resolution of W*H. As shown in FIG. 4, since 4 kernels can be generated, M can be 4. Similar to the first feature map FM1 , in the kernel generation 16 , the second feature map FM2 may be divided into T feature maps FM2 ₁ to FM2 _T in the channel direction. As shown in FIG. 4 , first to fourth weights w ₁ to w ₄ may be extracted from a portion corresponding to one pixel in one divided feature map FM2 ₁ . The first to fourth weights (w ₁ to w ₄ ) may be included in one weight group, and three weight groups corresponding to three channels (ie, RGB) of the final image may be generated. input image

Weight generated from

Can be expressed as in [Equation 2] below.

[수학식 2]에서, S는 커널 그룹이고, i는 이미지 인덱스이고, B_w는 복수의 가중치들을 예측하기 위한 모델, 즉 도 1의 제1 모델(11) 및 제3 모델(13)을 포함하는 모델이고, x 및 y는 픽셀 좌표를 나타낼 수 있다.In [Equation 2], S is a kernel group, i is an image index, and B _w includes a model for predicting a plurality of weights, that is, the first model 11 and the third model 13 of FIG. 1 is a model, and x and y may represent pixel coordinates.

커널 생성(16)에서, 복수의 가중치들에 기초하여 복수의 커널들의 가중합(weighted sum)이 계산될 수 있다. 예를 들면, 도 4에 도시된 바와 같이, 제1 커널(K₁) 및 제1 가중치(w₁)가 승산될 수 있고, 제2 커널(K₂) 및 제2 가중치(w₂)가 승산될 수 있고, 제3 커널(K₃) 및 제3 가중치(w₃)가 승산될 수 있으며, 제4 커널(K₄) 및 제4 가중치(w₄)가 승산될 수 있다.In kernel generation 16, a weighted sum of a plurality of kernels may be computed based on the plurality of weights. For example, as shown in FIG. 4 , the first kernel (K ₁ ) and the first weight (w ₁ ) may be multiplied, and the second kernel (K ₂ ) and the second weight (w ₂ ) may be multiplied. may be multiplied by the third kernel (K ₃ ) and the third weight (w ₃ ), and may be multiplied by the fourth kernel (K ₄ ) and the fourth weight (w ₄ ).

일부 실시예들에서, 제2 피처 맵(FM2)으로부터 추출된 가중치들에 소프트맥스(softmax) 함수가 적용될 수 있다. 예를 들면, 피처 맵(FM2₁)으로부터 추출된 제1 내지 제4 가중치(w₁ 내지 w₄)에 소프트맥스 함수가 적용될 수 있고, 소프트맥스 함수가 적용된 가중치들이 커널들과 승산될 수 있다. 소프트맥스를 통해서 중요한 가중치가 더욱 강조될 수 있다. 소프트맥스가 적용된 가중치

는 아래 [수학식 3]과 같이 표현될 수 있다.In some embodiments, a softmax function may be applied to the weights extracted from the second feature map FM2. For example, a softmax function may be applied to the first to fourth weights w ₁ to w ₄ extracted from the feature map FM2 ₁ , and the weights to which the softmax function is applied may be multiplied by kernels. Important weights can be further emphasized through softmax. Weights with softmax applied

Can be expressed as in [Equation 3] below.

[수학식 3]에서 j는 가중치(또는 복수의 커널들 중 하나의 커널)의 인덱스일 수 있다.In [Equation 3], j may be an index of a weight (or one of a plurality of kernels).

커널 및 가중치의 곱들은 합산될 수 있고, 이에 따라 최종 커널은 커널들의 가중합에 대응할 수 있다. 상이한 크기의 곱들을 합산하기 위하여, 작은 크기의 곱에 제로 패딩(zero padding)이 적용될 수 있다. 예를 들면, 도 4에서 점선으로 도시된 바와 같이, 가장 큰 크기의 곱(w₄K₄)에 대응하도록, 나머지 곱들(w₁K₁, w₂K₂, w₃K₃)에 영이 추가될 수 있다. 가중합에 기초하여 생성되는 최종 커널

은 아래 [수학식 4]와 같이 나타낼 수 있다.The products of the kernel and the weight may be summed, so that the final kernel may correspond to the weighted sum of the kernels. In order to sum products of different magnitudes, zero padding may be applied to the products of smaller magnitude. For example, as shown by a dotted line in FIG. 4 , zero is added to the remaining products (w ₁ K ₁ , w ₂ K ₂ , w ₃ K ₃ ) to correspond to the product (w ₄ K ₄ ) of the largest magnitude. It can be. The final kernel generated based on the weighted sum

Can be expressed as in [Equation 4] below.

도 5는 본 개시의 예시적 실시예에 따른 이미지 처리를 나타내는 도면이다. 구체적으로, 도 5는 도 1의 재건(17)의 예시를 나타낸다. 도 1을 참조하여 전술된 바와 같이, 커널 생성(16)을 통해 생성된 커널(K) 및 입력 데이터(DIN)를 연산함으로써 출력 데이터(DOUT)가 생성될 수 있다.5 is a diagram illustrating image processing according to an exemplary embodiment of the present disclosure. Specifically, FIG. 5 shows an example of the reconstruction 17 of FIG. 1 . As described above with reference to FIG. 1 , the output data DOUT may be generated by calculating the kernel K generated through the kernel generation 16 and the input data DIN.

도 5를 참조하면, 입력 데이터(DIN) 및 커널(K)의 합성곱이 수행될 수 있고, 합성곱의 결과로서 출력 데이터(DOUT)가 생성될 수 있다. 도 5에 도시된 바와 같이, 입력 데이터(DIN)는 복수의 이미지들을 포함할 수 있고, 출력 데이터(DOUT) 역시 복수의 이미지들을 포함할 수 있다. 도면들을 참조하여 전술된 바와 같이, 커널(K)은 상이한 크기의 복수의 커널들 및 복수의 커널들에 대응하는 복수의 가중치들에 기초하여 생성될 수 있다. 이에 따라, 커널(K)은 입력 데이터(DIN)에 포함된 이미지들에 보다 적합할 수 있고, 보다 높은 품질의 이미지들을 포함하는 출력 데이터(DOUT)를 생성할 수 있다. 일부 실시예들에서, 출력 데이터(DOUT)에 포함된 이미지들은 입력 데이터(DIN)에 포함된 이미지들보다 감소된 노이즈를 가질 수 있다. 또한, 일부 실시예들에서, 입력 데이터(DIN)에 포함된 이미지들이 동일한 대상을 촬영함으로써 생성된 경우, 출력 데이터(DOUT)에 포함된 이미지들은 더욱 상호 정렬될 수 있다. 일부 실시예들에서, 도 6을 참조하여 후술되는 바와 같이, 높은 품질을 가지는 이미지들을 포함하는 출력 데이터(DOUT)는 고해상도의 이미지를 생성하는데 사용될 수 있다. 본 명세서에서, 커널(K)은 예측된 커널로서 지칭될 수 있다.Referring to FIG. 5 , convolution of input data DIN and kernel K may be performed, and output data DOUT may be generated as a result of the convolution. As shown in FIG. 5 , input data DIN may include a plurality of images, and output data DOUT may also include a plurality of images. As described above with reference to the drawings, the kernel K may be generated based on a plurality of kernels of different sizes and a plurality of weights corresponding to the plurality of kernels. Accordingly, the kernel K may be more suitable for images included in the input data DIN, and may generate output data DOUT including higher quality images. In some embodiments, images included in the output data DOUT may have reduced noise compared to images included in the input data DIN. Also, in some embodiments, when the images included in the input data DIN are generated by photographing the same subject, the images included in the output data DOUT may be further aligned with each other. In some embodiments, as described below with reference to FIG. 6 , output data DOUT including images having high quality may be used to generate a high-resolution image. In this specification, kernel K may be referred to as a predicted kernel.

도 6은 본 개시의 예시적 실시예에 따른 이미지 처리를 나타내는 도면이다. 구체적으로, 도 6은 도 1의 출력 데이터(DOUT)에 기초하여 고해상도 이미지(IMG)를 생성하는 동작을 나타낸다.6 is a diagram illustrating image processing according to an exemplary embodiment of the present disclosure. Specifically, FIG. 6 illustrates an operation of generating a high resolution image IMG based on the output data DOUT of FIG. 1 .

UHD(ultra-high definition)과 같이 고해상도 디스플레이의 보급이 증가함에 따라, FHD(full-high definition)과 같은 저해상도(low resolution; LR) 이미지를 고해상도(high resolution; HR)이미지로 변환하는 초해상화(super resolution; SR) 이미징이 사용될 수 있다. 딥 러닝(deep learning)과 같은 기계 학습 모델에 기초한 방법이 초해상화 이미징에 사용될 수 있고, 도 1의 출력 데이터(DOUT)는 초해상화 이미징의 입력으로 사용될 수 있다.As the prevalence of high-resolution displays such as UHD (ultra-high definition) increases, super-resolution conversion of low resolution (LR) images such as FHD (full-high definition) to high resolution (HR) images (super resolution; SR) imaging can be used. A method based on a machine learning model such as deep learning may be used for super-resolution imaging, and the output data DOUT of FIG. 1 may be used as an input for super-resolution imaging.

초해상화에 딥 러닝 모델을 사용하는 경우, 깊은 네트워크에 기인한 높은 복잡도는 많은 자원들을 요구할 수 있고, 네트워크의 깊이와 네트워크의 성능이 반드시 비례하지 아니할 수 있다. 이러한 문제를 해소하기 위하여, 잔차 학습(residual learning)이 사용될 수 있다. 잔차 학습은 저해상도 이미지를 고해상도 이미지에 가산하고, 2개 이미지들 사이 차이값을 학습하는 것을 지칭할 수 있다. 깊은 네트워크를 보다 안정적으로 학습시키기 위하여 네트워크는 복수의 잔차 블록들로 분할될 수 있고, 복수의 잔차 블록들 각각을 스킵 커넥션(skip connection)을 통해 연결함으로써 필터 파라미터들이 보다 용이하게 최적화될 수 있다. 일부 실시예들에서, 도 6의 제4 모델(60)은 복수의 잔차 블록들을 포함할 수 있다.When a deep learning model is used for super-resolution, high complexity due to a deep network may require a lot of resources, and the depth of the network and the performance of the network may not necessarily be proportional. To solve this problem, residual learning can be used. Residual learning may refer to adding a low-resolution image to a high-resolution image and learning a difference value between the two images. In order to train a deep network more stably, the network may be divided into a plurality of residual blocks, and filter parameters may be more easily optimized by connecting each of the plurality of residual blocks through a skip connection. In some embodiments, the fourth model 60 of FIG. 6 may include a plurality of residual blocks.

도 6을 참조하면, 제4 모델(60)은 출력 데이터(DOUT)를 수신할 수 있고, 제4 출력(OUT4)을 생성할 수 있고, 이에 따라 고해상도 이미지(IMG)의 품질이 더욱 향상될 수 있다. 도면들을 참조하여 전술된 바와 같이, 출력 데이터(DOUT)는 입력 데이터(DIN)에 포함된 이미지들보다 높은 품질의 이미지들을 포함할 수 있고, 이에 따라 제4 모델(60)에 의해서 보다 양호한 제4 출력(OUT4)이 생성될 수 있다. 도 6에 도시된 바와 같이, 입력 데이터(DIN)는 합성곱 레이어 및 업샘플링 레이어에 의해서 처리될 수 있고, 처리 결과는 제4 출력(OUT4)과 합산될 수 있다. 합산 결과는 합성곱 레이어에 의해서 처리될 수 있고, 이에 따라 고해상도 이미지(IMG)가 생성될 수 있다. 본 명세서에서, 고해상도 이미지(IMG)는 초해상도(super resolution) 이미지로서 지칭될 수 있다.Referring to FIG. 6 , the fourth model 60 may receive output data DOUT and generate a fourth output OUT4, and accordingly, the quality of the high-resolution image IMG may be further improved. there is. As described above with reference to the drawings, the output data DOUT may include images of a higher quality than images included in the input data DIN, and accordingly, the fourth model 60 has a better fourth quality. An output OUT4 can be produced. As shown in FIG. 6 , the input data DIN may be processed by the convolution layer and the upsampling layer, and the processing result may be summed with the fourth output OUT4. The summation result may be processed by a convolution layer, and thus a high-resolution image (IMG) may be generated. In this specification, a high resolution image (IMG) may be referred to as a super resolution image.

도 7은 본 개시의 예시적 실시예에 따른 이미지 처리 방법을 나타내는 순서도이다. 도 7에 도시된 바와 같이, 이미지 처리 방법은 복수의 단계들(S10 내지 S70)을 포함할 수 있다. 이하에서, 도 7은 도 1을 참조하여 설명될 것이다.7 is a flowchart illustrating an image processing method according to an exemplary embodiment of the present disclosure. As shown in FIG. 7 , the image processing method may include a plurality of steps S10 to S70. In the following, FIG. 7 will be described with reference to FIG. 1 .

도 7을 참조하면, 단계 S10에서 입력 데이터(DIN)가 획득될 수 있다. 예를 들면, 동일한 대상을 반복하여 촬영함으로써 복수의 이미지들이 생성될 수 있고, 복수의 이미지들을 포함하는 입력 데이터(DIN)가 생성될 수 있다. 입력 데이터(DIN)에서 복수의 이미지들은, 복수의 이미지들의 픽셀들이 중첩되도록 정렬될 수 있다.Referring to FIG. 7 , input data DIN may be acquired in step S10. For example, a plurality of images may be generated by repeatedly photographing the same object, and input data DIN including the plurality of images may be generated. A plurality of images in the input data DIN may be aligned so that pixels of the plurality of images overlap.

단계 S20에서, 입력 데이터(DIN)가 제1 모델(11)에 제공될 수 있다. 예를 들면, 제1 모델(11)은 U-Net에 기초할 수 있고, 입력 데이터(DIN)를 처리함으로써 입력 데이터(DIN)에 포함된 복수의 이미지들로의 피처들을 포함하는 제1 출력(OUT1)을 생성할 수 있다.In step S20 , input data DIN may be provided to the first model 11 . For example, the first model 11 may be based on U-Net, and a first output including features into a plurality of images included in the input data DIN by processing the input data DIN ( OUT1) can be created.

단계 S30에서, 제1 모델(11)의 출력이 제2 모델(12) 및 제3 모델(13)에 제공될 수 있다. 예를 들면, 제2 모델(12)은 복수의 커널들을 추출하기 위하여 트레이닝될 수 있고, 제3 모델(13)은 복수의 커널들에 각각 대응하는 복수의 가중치들을 추출하기 위하여 트레이닝될 수 있다. 제2 모델(12) 및 제3 모델(13)은 제1 모델(11)의 출력, 즉 제1 출력(OUT1)을 공통으로 수신할 수 있고, 제2 출력(OUT2) 및 제3 출력(OUT3)을 각각 생성할 수 있다.In step S30 , the output of the first model 11 may be provided to the second model 12 and the third model 13 . For example, the second model 12 may be trained to extract a plurality of kernels, and the third model 13 may be trained to extract a plurality of weights respectively corresponding to the plurality of kernels. The second model 12 and the third model 13 may commonly receive the output of the first model 11, that is, the first output OUT1, and may receive the second output OUT2 and the third output OUT3. ) can be created respectively.

단계 S40에서, 제1 피처 맵(FM1)이 생성될 수 있다. 예를 들면, 단계 S30에서 제2 모델(12)에 의해서 생성된 제2 출력(OUT2)은 복수의 커널들의 추출이 가능하도록 재조직될 수 있고, 이에 따라 제1 피처 맵(FM1)이 생성될 수 있다. 일부 실시예들에서, 제1 피처 맵(FM1)은, 입력 데이터(DIN)에 포함된 이미지 프레임들의 수, 이미지의 채널들의 수 및 복수의 커널들의 크기들에 기초하여 재조직될 수 있다. 예를 들면, 도 3을 참조하여 전술된 바와 같이, 제1 피처 맵(FM1)은 입력 데이터(DIN)에 포함된 이미지와 같이 W*H의 해상도를 가지는 np*3TC개의 슬라이스들을 포함할 수 있다.In step S40, a first feature map FM1 may be generated. For example, the second output OUT2 generated by the second model 12 in step S30 may be reorganized to enable extraction of a plurality of kernels, and accordingly, the first feature map FM1 may be generated. there is. In some embodiments, the first feature map FM1 may be reorganized based on the number of image frames included in the input data DIN, the number of channels of the image, and the sizes of the plurality of kernels. For example, as described above with reference to FIG. 3 , the first feature map FM1 may include np*3TC slices having a resolution of W*H like the image included in the input data DIN. .

단계 S50에서, 제2 피처 맵(FM2)이 생성될 수 있다. 예를 들면, 단계 S30에서 제3 모델(13)에 의해서 생성된 제3 출력(OUT3)은 복수의 가중치들이 추출이 가능하도록 재조직될 수 있고, 이에 따라 제2 피처 맵(FM2)이 생성될 수 있다. 일부 실시예들에서, 제2 피처 맵(FM2)은, 입력 데이터(DIN)에 포함된 이미지 프레임들의 수 및 커널들의 수에 기초하여 재조직될 수 있다. 예를 들면, 제2 피처 맵(FM2)은 입력 데이터(DIN)에 포함된 이미지와 같이 W*H의 해상도를 가지는 3TM개의 슬라이스들을 포함할 수 있다.In step S50, a second feature map FM2 may be generated. For example, the third output OUT3 generated by the third model 13 in step S30 may be reorganized so that a plurality of weights can be extracted, and thus the second feature map FM2 can be generated. there is. In some embodiments, the second feature map FM2 may be reorganized based on the number of image frames and kernels included in the input data DIN. For example, the second feature map FM2 may include 3TM slices having a resolution of W*H like the image included in the input data DIN.

단계 S60에서, 예측된 커널이 생성될 수 있다. 예를 들면, 단계 S40에서 생성된 제1 피처 맵(FM1) 및 단계 S50에서 생성된 제2 피처 맵(FM2)에 기초하여 커널이 생성될 수 있다. 도면들을 참조하여 전술된 바와 같이, 예측된 커널은 중요도에 기초하여 복수의 커널들의 가중합에 기초하여 생성될 수 있고, 이에 따라 입력 데이터(DIN)에 보다 적합할 수 있다. 단계 S60의 예시가 도 8을 참조하여 후술될 것이다.In step S60, a predicted kernel may be generated. For example, a kernel may be generated based on the first feature map FM1 generated in step S40 and the second feature map FM2 generated in step S50. As described above with reference to the drawings, the predicted kernel may be generated based on a weighted sum of a plurality of kernels based on the importance, and thus may be more suitable for the input data DIN. An example of step S60 will be described later with reference to FIG. 8 .

단계 S70에서, 출력 데이터(DOUT)가 생성될 수 있다. 단계 S60에서 생성된 예측된 커널이 입력 데이터(DIN)에 연산될 수 있고, 이에 따라 출력 데이터(DIN)가 생성될 수 있다. 예를 들면, 도 5를 참조하여 전술된 바와 같이, 복수의 이미지들을 포함하는 입력 데이터(DIN) 및 커널 사이 합성곱이 수행될 수 있고, 이에 따라 복수의 이미지들을 포함하는 출력 데이터(DOUT)가 생성될 수 있다.In step S70, output data DOUT may be generated. The predicted kernel generated in step S60 may be operated on the input data DIN, and output data DIN may be generated accordingly. For example, as described above with reference to FIG. 5 , convolution between input data DIN including a plurality of images and a kernel may be performed, and thus output data DOUT including a plurality of images is generated. It can be.

도 8은 본 개시의 예시적 실시예에 다른 이미지 처리 방법을 나타내는 순서도이다. 구체적으로, 도 8의 순서도는 도 7의 단계 S60의 예시를 나타낸다. 도 7을 참조하여 전술된 바와 같이, 도 8의 단계 S60'에서 예측된 커널이 생성될 수 있다. 도 8에 도시된 바와 같이, 단계 S60'은 복수의 단계들(S62, S64, S66)을 포함할 수 있다.8 is a flowchart illustrating an image processing method according to an exemplary embodiment of the present disclosure. Specifically, the flowchart of FIG. 8 shows an example of step S60 of FIG. 7 . As described above with reference to FIG. 7 , a predicted kernel may be generated in step S60′ of FIG. 8 . As shown in FIG. 8 , step S60' may include a plurality of steps S62, S64, and S66.

도 8을 참조하면, 단계 S62에서 제1 피처 맵(FM1)으로부터 복수의 커널들이 추출될 수 있다. 예를 들면, 제1 피처 맵(FM1)으로부터 상이한 크기의 복수의 커널들이 추출될 수 있다. 도 4를 참조하여 전술된 바와 같이, 제1 피처 맵(FM1)은 복수의 커널들의 크기들의 합(즉, p)에 기초하여 재조직될 수 있고, 이에 따라 복수의 커널들은 제1 피처 맵(FM1)으로부터 용이하게 추출될 수 있다.Referring to FIG. 8 , a plurality of kernels may be extracted from the first feature map FM1 in step S62. For example, a plurality of kernels having different sizes may be extracted from the first feature map FM1. As described above with reference to FIG. 4 , the first feature map FM1 may be reorganized based on the sum of the sizes of a plurality of kernels (ie, p), and thus the plurality of kernels may be reorganized based on the first feature map FM1. ) can be easily extracted from

단계 S64에서, 제2 피처 맵(FM2)으로부터 복수의 가중치들이 추출될 수 있다. 예를 들면, 단계 S62에서 추출된 복수의 커널들에 각각 대응하는 복수의 가중치들이 제2 피처 맵(FM2)으로부터 추출될 수 있다. 도 4를 참조하여 전술된 바와 같이, 제2 피처 맵(FM2)은 커널들의 수(즉, M)에 기초하여 재조직될 수 있고, 이에 따라 복수의 가중치들은 제2 피처 맵(FM2)으로부터 용이하게 추출될 수 있다. 일부실시예들에서, 커널의 중요도를 강조하기 위하여 제2 피처 맵(FM2)으로부터 추출된 가중치들에 소프트맥스 함수가 적용될 수 있고, 소프트맥스 함수가 적용된 가중치들이 후술되는 단계 S66에서 사용될 수 있다.In step S64, a plurality of weights may be extracted from the second feature map FM2. For example, a plurality of weights respectively corresponding to the plurality of kernels extracted in step S62 may be extracted from the second feature map FM2. As described above with reference to FIG. 4 , the second feature map FM2 may be reorganized based on the number of kernels (ie, M), and accordingly, a plurality of weights may be easily obtained from the second feature map FM2. can be extracted. In some embodiments, a softmax function may be applied to weights extracted from the second feature map FM2 in order to emphasize the importance of the kernel, and the weights to which the softmax function is applied may be used in step S66 to be described later.

단계 S66에서, 복수의 커널들의 가중합이 계산될 수 있다. 단계 S64에서 추출된 가중치들에 기초하여 단계 S62에서 추출된 복수의 커널들의 가중합이 계산될 수 있다. 예를 들면, 가중치 및 커널의 곱이 생성될 수 있고, 복수의 곱들이 합산될 수 있다. 커널들의 상이한 크기에 기인하여, 도 4를 참조하여 전술된 바와 같이, 작은 크기의 커널에 대응하는 곱에 제로 패딩이 적용될 수 있다.In step S66, a weighted sum of a plurality of kernels may be calculated. A weighted sum of the plurality of kernels extracted in step S62 may be calculated based on the weights extracted in step S64. For example, a product of a weight and a kernel may be generated, and a plurality of products may be summed. Due to the different sizes of the kernels, as described above with reference to FIG. 4, zero padding may be applied to the product corresponding to the small size kernel.

도 9는 본 개시의 예시적 실시예에 따른 이미지 처리 방법을 나타내는 순서도이다. 구체적으로, 도 9의 순서도는 도 7의 단계 S70에 후속할 수 있는 단계 S80 및 단계 S90을 나타낸다. 일부 실시예들에서, 도 6을 참조하여 전술된 바와 같이, 다중 커널의 가중합에 기초하여 예측된 커널을 사용하여 생성된 출력 데이터(DOUT)는 초해상도 이미징에 사용될 수 있다. 이하에서, 도 9는 도 6을 참조하여 설명될 것이다.9 is a flowchart illustrating an image processing method according to an exemplary embodiment of the present disclosure. Specifically, the flowchart of FIG. 9 shows steps S80 and S90 which may follow step S70 of FIG. 7 . In some embodiments, as described above with reference to FIG. 6 , output data DOUT generated using a kernel predicted based on a weighted sum of multiple kernels may be used for super-resolution imaging. In the following, FIG. 9 will be described with reference to FIG. 6 .

도 9를 참조하면, 단계 S80에서 출력 데이터(DOUT)를 제4 모델(60)에 제공될 수 있다. 도 6을 참조하여 전술된 바와 같이, 제4 모델(60)은 잔차 학습에 기초하여 트레이닝될 수 있고, 복수의 잔차 블록들을 포함할 수 있다. 트레이닝된 제4 모델(60)은 출력 데이터(DOUT)로부터 제4 출력(OUT4)을 생성할 수 있다.Referring to FIG. 9 , output data DOUT may be provided to the fourth model 60 in step S80. As described above with reference to FIG. 6 , the fourth model 60 may be trained based on residual learning and may include a plurality of residual blocks. The trained fourth model 60 may generate a fourth output OUT4 from the output data DOUT.

단계 S90에서, 고해상도 이미지(IMG)가 생성될 수 있다. 예를 들면, 입력 데이터(DIN)는 합성곱 레이어 및 업샘플링 레이어에 의해서 처리될 수 있고, 처리 결과는 단계 S80에서 생성된 제4 출력(OUT4)과 합산될 수 있다. 합산 결과는 다시 합성곱 레이어에 의해서 처리될 수 있고, 이에 따라 고해상도 이미지(IMG)가 생성될 수 있다. 출력 데이터(DOUT)의 개선된 품질에 기인하여, 고해상도 이미지(IMG)의 품질 역시 개선된 품질을 가질 수 있다.In step S90, a high-resolution image IMG may be generated. For example, the input data DIN may be processed by the convolution layer and the upsampling layer, and the processing result may be summed with the fourth output OUT4 generated in step S80. The summation result may be processed again by the convolution layer, and thus a high-resolution image IMG may be generated. Due to the improved quality of the output data DOUT, the quality of the high-resolution image IMG may also have improved quality.

일부 실시예들에서, 도 9의 제4 모델(60) 및 도 1의 제1 내지 제3 모델(11 내지 13)을 트레이닝하는데 사용되는 손실 함수는 고해상도 이미지(IMG)에 기초하여 정의될 수 있다. 예를 들면, L₁ 손실이 사용될 수 있고, L₁ 손실은 아래 [수학식 5]과 같이 정의될 수 있다.In some embodiments, the loss function used to train the fourth model 60 of FIG. 9 and the first to third models 11 to 13 of FIG. 1 may be defined based on the high-resolution image IMG. . For example, L ₁ loss may be used, and L ₁ loss may be defined as in [Equation 5] below.

[수학식 5]에서, H_SR은 전술된 모든 모델들에 대응할 수 있다.In [Equation 5], H _SR can correspond to all the models described above.

또한, SSIM(structural similarity index measure) 손실이 사용될 수 있고, SSIM 손실은 영역 P에서 픽셀 p에 대한 평균 및 표준 편차의 의존성에 기초하여 아래 [수학식 6]과 같이 정의될 수 있다.In addition, structural similarity index measure (SSIM) loss may be used, and the SSIM loss may be defined as Equation 6 below based on the dependence of the mean and standard deviation for pixel p in area P.

모델들은 [수학식 5]의 손실 및 [수학식 6]의 손실을 모두 고려하여 트레이닝될 수 있고, 이에 따라 아래 [수학식 7]의 손실 함수가 정의될 수 있다.Models may be trained considering both the loss of [Equation 5] and the loss of [Equation 6], and accordingly, the loss function of [Equation 7] below may be defined.

[수학식 7]에서, λ_SR 및 λ_SSIM은 일관성(consistency) 및 시각적 품질(visual quality) 사이 균형된 학습 방식에 기초하여 결정되는 계수들일 수 있다.In [Equation 7], λ _SR and λ _SSIM may be coefficients determined based on a balanced learning method between consistency and visual quality.

도 10은 본 개시의 예시적 실시예에 따른 컴퓨터 시스템(100)을 나타내는 블록도이다. 일부 실시예들에서, 도 10의 컴퓨터 시스템(100)은 도면들을 참조하여 전술된 이미지 처리 또는 모델들의 트레이닝을 수행할 수 있고, 이미지 처리 시스템 또는 트레이닝 시스템 등으로 지칭될 수 있다.10 is a block diagram illustrating a computer system 100 according to an exemplary embodiment of the present disclosure. In some embodiments, computer system 100 of FIG. 10 may perform image processing or training of models described above with reference to the figures, and may be referred to as an image processing system or training system, or the like.

컴퓨터 시스템(100)은 범용 또는 특수 목적 컴퓨팅 시스템을 포함하는 임의의 시스템을 지칭할 수 있다. 예를 들면, 컴퓨터 시스템(100)은 퍼스널 컴퓨터, 서버 컴퓨터, 랩탑 컴퓨터, 가전 제품 등을 포함할 수 있다. 도 10에 도시된 바와 같이, 컴퓨터 시스템(100)은, 적어도 하나의 프로세서(101), 메모리(102), 스토리지 시스템(103), 네트워크 어댑터(104), 입출력 인터페이스(105) 및 디스플레이(106)를 포함할 수 있다.Computer system 100 may refer to any system including a general purpose or special purpose computing system. For example, computer system 100 may include personal computers, server computers, laptop computers, consumer electronics, and the like. As shown in FIG. 10 , the computer system 100 includes at least one processor 101, memory 102, storage system 103, network adapter 104, input/output interface 105, and display 106. can include

적어도 하나의 프로세서(101)는, 컴퓨터 시스템 실행가능 명령어를 포함하는 프로그램 모듈을 실행할 수 있다. 프로그램 모듈은 특정 작업을 수행하거나 특정 추상 데이터 유형을 구현하는, 루틴들, 프로그램들, 객체들, 컴포넌트들, 로직, 데이터 구조 등을 포함할 수 있다. 메모리(102)는 RAM(random access memory)와 같은 휘발성 메모리 형태의 컴퓨터 시스템 판독 가능 매체를 포함할 수 있다. 적어도 하나의 프로세서(101)는 메모리(102)에 액세스할 수 있고, 메모리(102)에 로딩된 명령어들을 실행할 수 있다. 저장 시스템(103)은 비휘발적으로 정보를 저장할 수 있고, 일부 실시예들에서 도면들을 참조하여 전술된 이미지 처리 및/또는 모델들의 트레이닝을 수행하도록 구성된 프로그램 모듈을 포함하는 적어도 하나의 프로그램 제품을 포함할 수 있다. 프로그램은 비제한적인 예시로서, 운영 체제(operating system), 적어도 하나의 어플리케이션, 기타 프로그램 모듈들 및 프로그램 데이터를 포함할 수 있다.At least one processor 101 may execute a program module including computer system executable instructions. A program module may include routines, programs, objects, components, logic, data structures, etc. that perform particular tasks or implement particular abstract data types. Memory 102 may include computer system readable media in the form of volatile memory, such as random access memory (RAM). At least one processor 101 can access memory 102 and execute instructions loaded into memory 102 . Storage system 103 may store information non-volatilely and, in some embodiments, at least one program product comprising a program module configured to perform image processing and/or training of models described above with reference to the figures. can include A program may include, by way of non-limiting example, an operating system, at least one application, other program modules, and program data.

네트워크 어댑터(104)는, LAN(local area network), WAN(wide area network) 및/또는 공용 네트워크(예컨대, 인터넷) 등에 대한 접속을 제공할 수 있다. 입출력 인터페이스(105)는, 키보드, 포인팅 장치, 오디오 시스템 등과 같은 주변장치와의 통신 채널을 제공할 수 있다. 디스플레이(106)는 사용자가 확인할 수 있도록 다양한 정보를 출력할 수 있다.The network adapter 104 may provide connectivity to a local area network (LAN), a wide area network (WAN), and/or a public network (eg, the Internet), and the like. The input/output interface 105 may provide a communication channel with peripheral devices such as a keyboard, pointing device, audio system, and the like. The display 106 may output various information so that the user can check.

일부 실시예들에서, 도면들을 참조하여 전술된 이미지 처리 및/또는 모델들의 트레이닝은 컴퓨터 프로그램 제품으로 구현될 수 있다. 컴퓨터 프로그램 제품은 적어도 하나의 프로세서(101)가 이미지 처리 및/또는 모델들의 트레이닝을 수행하게하기 위한 컴퓨터 판독가능 프로그램 명령어들을 포함하는 비일시적 컴퓨터 판독가능 매체(또는 저장 매체)를 포함할 수 있다. 컴퓨터 판독가능 명령어는 비제한적인 예시로서, 어셈블러 명령어, ISA(instruction set architecture) 명령어, 기계 명령어, 기계 종속 명령어, 마이크로 코드, 펌웨어 명령어, 상태 설정 데이터, 또는 적어도 하나의 프로그래밍 언어로 작성된 소스 코드 또는 객체 코드일 수 있다. In some embodiments, the image processing and/or training of models described above with reference to the figures may be implemented as a computer program product. The computer program product may include a non-transitory computer readable medium (or storage medium) containing computer readable program instructions for causing at least one processor 101 to perform image processing and/or training of models. Computer readable instructions include, but are not limited to, assembler instructions, instruction set architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state setting data, or source code written in at least one programming language; It can be object code.

컴퓨터 판독가능 매체는, 적어도 하나의 프로세서(101) 또는 임의의 명령어 실행가능 장치에 의해서 실행되는 명령어들을 비일시적으로 보유하고 저장할 수 있는 임의의 유형의 매체일 수 있다. 컴퓨터 판독가능 매체는, 전자 저장 장치, 자기 저장 장치, 광학 저장 장치, 전자기 저장 장치, 반도체 저장 장치, 또는 이들의 임의의 조합일 수 있으나, 이에 제한되지 아니한다. 예를 들면, 컴퓨터 판독가능 매체는, 휴대용 컴퓨터 디스켓, 하드 디스크, RAM(random access memory), ROM(read-only memory), EEPROM(electrically erasable read only memory), 플래시 메모리, SRAM(static random access memory), CD, DVD, 메모리 스틱, 플로피 디스크, 펀치 카드와 같이 기계적으로 인코딩된 장치 또는 이들의 임의의 조합일 수 있다.The computer readable medium may be any tangible medium capable of non-temporarily holding and storing instructions executed by at least one processor 101 or any instruction executable device. A computer readable medium may be, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any combination thereof. For example, computer readable media include portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), electrically erasable read only memory (EEPROM), flash memory, and static random access memory (SRAM). ), a mechanically encoded device such as a CD, DVD, memory stick, floppy disk, punch card, or any combination thereof.

도 11은 본 개시의 예시적 실시예에 따른 장치(110)를 나타내는 블록도이다. 일부 실시예들에서, 본 개시의 예시적 실시예에 따른 이미지 처리는 장치(110)에서 실행될 수 있다. 이에 따라, 장치(110)는 입력 이미지로부터 높은 품질의 출력 이미지를 생성할 수 있고, 높은 품질의 출력 이미지를 활용하는 다양한 어플리케이션들을 실행할 수 있다.11 is a block diagram illustrating a device 110 according to an exemplary embodiment of the present disclosure. In some embodiments, image processing according to an exemplary embodiment of the present disclosure may be performed on device 110 . Accordingly, the device 110 can generate a high-quality output image from an input image, and can execute various applications utilizing the high-quality output image.

도 11을 참조하면, 장치(110)는 적어도 하나의 프로세서(111), 메모리(113), AI(Artificial Intelligence) 가속기(115) 및 하드웨어 가속기(117)를 포함할 수 있고, 적어도 하나의 프로세서(111), 메모리(113), AI(Artificial Intelligence) 가속기(115) 및 하드웨어 가속기(117)는 버스(119)를 통해서 상호 통신할 수 있다. 일부 실시예들에서, 적어도 하나의 프로세서(111), 메모리(113), AI(Artificial Intelligence) 가속기(115) 및 하드웨어 가속기(117)는 하나의 반도체 칩에 포함될 수도 있다. 또한, 일부 실시예들에서, 적어도 하나의 프로세서(111), 메모리(113), AI(Artificial Intelligence) 가속기(115) 및 하드웨어 가속기(117) 중 적어도 2개는 기판(board)에 장착된 2이상의 반도체 칩들에 각각 포함될 수도 있다.Referring to FIG. 11 , the device 110 may include at least one processor 111, memory 113, AI (Artificial Intelligence) accelerator 115 and hardware accelerator 117, and at least one processor ( 111), memory 113, AI (Artificial Intelligence) accelerator 115, and hardware accelerator 117 may mutually communicate through a bus 119. In some embodiments, at least one processor 111 , memory 113 , artificial intelligence (AI) accelerator 115 , and hardware accelerator 117 may be included in one semiconductor chip. In addition, in some embodiments, at least two of the at least one processor 111, memory 113, AI (Artificial Intelligence) accelerator 115, and hardware accelerator 117 are two or more mounted on a board. It may also be included in each of the semiconductor chips.

적어도 하나의 프로세서(111)는 명령어들을 실행할 수 있다. 예를 들면, 적어도 하나의 프로세서(111)는 메모리(113)에 저장된 명령어들을 실행함으로써 운영 체제(operating system)를 실행할 수도 있고, 운영 체제 상에서 실행되는 어플리케이션들을 실행할 수도 있다. 일부 실시예들에서, 적어도 하나의 프로세서(111)는 명령어들을 실행함으로써, AI 가속기(115) 및/또는 하드웨어 가속기(117)에 작업을 지시할 수 있고, AI 가속기(115) 및/또는 하드웨어 가속기(117)로부터 작업의 수행 결과를 획득할 수도 있다. 일부 실시예들에서, 적어도 하나의 프로세서(111)는 특정한 용도를 위하여 커스텀화된 ASIP(Application Specific Instruction set Processor)일 수 있고, 전용의 명령어 세트(instruction set)를 지원할 수도 있다.At least one processor 111 may execute instructions. For example, the at least one processor 111 may execute an operating system by executing instructions stored in the memory 113 or may execute applications executed on the operating system. In some embodiments, the at least one processor 111 may instruct the AI accelerator 115 and/or the hardware accelerator 117 to execute instructions, and the AI accelerator 115 and/or the hardware accelerator 117 may It is also possible to obtain the performance result of the work from (117). In some embodiments, at least one processor 111 may be an application specific instruction set processor (ASIP) customized for a specific purpose and may support a dedicated instruction set.

메모리(113)는 데이터를 저장하는 임의의 구조를 가질 수 있다. 예를 들면, 메모리(113)는, DRAM(Dynamic Random Access Memory), SRAM(Static Random Access Memory) 등과 같은 휘발성 메모리 장치를 포함할 수도 있고, 플래시 메모리, RRAM(Resistive Random Access Memory) 등과 같은 비휘발성 메모리 장치를 포함할 수도 있다. 적어도 하나의 프로세서(111), AI(Artificial Intelligence) 가속기(115) 및 하드웨어 가속기(117)는 버스(119)를 통해서 메모리(113)에 데이터(예컨대, 도 2의 IN, IMG_I, IMG_O, OUT)를 저장하거나 메모리(113)로부터 데이터(예컨대, 도 2의 IN, IMG_I, IMG_O, OUT)를 독출할 수 있다.The memory 113 may have any structure for storing data. For example, the memory 113 may include a volatile memory device such as dynamic random access memory (DRAM) and static random access memory (SRAM), or a non-volatile memory device such as flash memory and resistive random access memory (RRAM). It may also include a memory device. At least one processor 111, artificial intelligence (AI) accelerator 115, and hardware accelerator 117 transmit data (e.g., IN, IMG_I, IMG_O, OUT in FIG. 2) to memory 113 through bus 119. may be stored or data (eg, IN, IMG_I, IMG_O, OUT of FIG. 2) may be read from the memory 113.

AI 가속기(115)는 AI 어플리케이션들을 위해 설계된 하드웨어를 지칭할 수 있다. 일부 실시예들에서, AI 가속기(115)는 뉴로모픽(neuromorphic) 구조를 구현하기 위한 NPU(Neural Processing Unit)를 포함할 수 있고, 적어도 하나의 프로세서(111) 및/또는 하드웨어 가속기(117)로부터 제공된 입력 데이터를 처리함으로써 출력 데이터를 생성할 수 있고, 적어도 하나의 프로세서(111) 및/또는 하드웨어 가속기(117)에 출력 데이터를 제공할 수 있다. 일부 실시예들에서, AI 가속기(115)는 프로그램가능할 수 있고, 적어도 하나의 프로세서(111) 및/또는 하드웨어 가속기(117)에 의해서 프로그램될 수 있다.AI accelerator 115 may refer to hardware designed for AI applications. In some embodiments, AI accelerator 115 may include a Neural Processing Unit (NPU) for implementing a neuromorphic structure, and may include at least one processor 111 and/or hardware accelerator 117 Output data may be generated by processing input data provided from, and output data may be provided to at least one processor 111 and/or hardware accelerator 117 . In some embodiments, AI accelerator 115 may be programmable and may be programmed by at least one processor 111 and/or hardware accelerator 117 .

하드웨어 가속기(117)는 특정 작업을 고속으로 수행하기 위하여 설계된 하드웨어를 지칭할 수 있다. 예를 들면, 하드웨어 가속기(117)는 복조, 변조, 부호화, 복호화 등과 같은 데이터 변환을 고속으로 수행하도록 설계될 수 있다. 하드웨어 가속기(117)는 프로그램가능할 수 있고, 적어도 하나의 프로세서(111) 및/또는 하드웨어 가속기(117)에 의해서 프로그램될 수 있다.The hardware accelerator 117 may refer to hardware designed to perform a specific task at high speed. For example, the hardware accelerator 117 may be designed to perform data conversion such as demodulation, modulation, encoding, and decoding at high speed. The hardware accelerator 117 may be programmable and may be programmed by at least one processor 111 and/or the hardware accelerator 117 .

일부 실시예들에서, AI 가속기(115)는 도면들을 참조하여 전술된 모델들을 실행할 수 있다. AI 가속기(115)는 이미지, 피처 맵 등을 처리함으로써 유용한 정보를 포함하는 출력을 생성할 수 있다. 또한, 일부 실시예들에서 AI 가속기(115)가 실행하는 모델들 중 적어도 일부는, 적어도 하나의 프로세서(111) 및/또는 하드웨어 가속기(117)에 의해서 실행될 수도 있다.In some embodiments, AI accelerator 115 may execute the models described above with reference to the figures. AI accelerator 115 may generate output containing useful information by processing images, feature maps, and the like. Also, in some embodiments, at least some of the models executed by the AI accelerator 115 may be executed by at least one processor 111 and/or hardware accelerator 117.

이상에서와 같이 도면과 명세서에서 예시적인 실시예들이 개시되었다. 본 명세서에서 특정한 용어를 사용하여 실시예들을 설명되었으나, 이는 단지 본 개시의 기술적 사상을 설명하기 위한 목적에서 사용된 것이지 의미 한정이나 특허청구범위에 기재된 본 개시의 범위를 제한하기 위하여 사용된 것은 아니다. 그러므로 본 기술분야의 통상의 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 타 실시예가 가능하다는 점을 이해할 것이다.As above, exemplary embodiments have been disclosed in the drawings and specifications. Although the embodiments have been described using specific terms in this specification, they are only used for the purpose of explaining the technical idea of the present disclosure, and are not used to limit the scope of the present disclosure described in the claims. . Therefore, those of ordinary skill in the art will understand that various modifications and equivalent other embodiments are possible therefrom.

Claims

As a method for processing a plurality of images,
obtaining input data including the plurality of images;
providing the input data to a first machine learning model;
providing the output of the first machine learning model to a second machine learning model and a third machine learning model;
generating a first feature map corresponding to a plurality of kernels based on the output of the second machine learning model;
generating a second feature map corresponding to a plurality of weights based on the output of the third machine learning model;
generating a predicted kernel based on a weighted sum of the plurality of kernels; and
generating output data based on the input data and the predicted kernel.

The method of claim 1,
Generating the predicted kernel,
extracting the plurality of kernels of different sizes from the first feature map;
extracting the plurality of weights corresponding to the plurality of kernels from the second feature map; and
and calculating a weighted sum of the plurality of kernels based on the plurality of weights.

The method of claim 2,
The step of calculating the weighted sum is zero in the product of a weight and a second kernel different from the first kernel among the plurality of kernels so as to correspond to the first kernel having the largest size among the plurality of kernels. A method characterized by adding a.

The method of claim 2,
The generating of the first feature map may include reorganizing an output of the second machine learning model based on the number of the plurality of images, the number of channels of each of the plurality of images, and sizes of the plurality of kernels. A method comprising the steps.

The method of claim 2,
wherein generating the second feature map comprises reorganizing an output of the first machine learning model based on the number of the plurality of images and the number of kernels.

The method of claim 5,
The extracting of the plurality of weights comprises applying a softmax function to the extracted weights.

The method of claim 1,
The output data includes a plurality of output images,
The generating of the output data comprises generating each of the plurality of output images by performing a convolution operation with each of the plurality of images and the predicted kernel.

The method of claim 1,
providing the output data to a fourth machine learning model comprising a plurality of residual blocks; and
Based on the output of the fourth machine learning model, generating a super-resolution image corresponding to the input data.

at least one processor; and
A non-transitory storage medium storing instructions that, when executed by the at least one processor, cause the at least one processor to perform image processing;
The image processing,
obtaining input data including a plurality of images;
providing the input data to a first machine learning model;
providing the output of the first machine learning model to a second machine learning model and a third machine learning model;
generating a first feature map corresponding to a plurality of kernels based on the output of the second machine learning model;
generating a second feature map corresponding to a plurality of weights based on the output of the third machine learning model;
generating a predicted kernel based on a weighted sum of the plurality of kernels; and
and generating output data based on the input data and the predicted kernel.

A non-transitory computer-readable storage medium containing instructions,
the instructions, when executed by at least one processor, cause the at least one processor to perform image processing;
The image processing,
obtaining input data including a plurality of images;
providing the input data to a first machine learning model;
providing the output of the first machine learning model to a second machine learning model and a third machine learning model;
generating a first feature map corresponding to a plurality of kernels based on the output of the second machine learning model;
generating a second feature map corresponding to a plurality of weights based on the output of the third machine learning model;
generating a predicted kernel based on a weighted sum of the plurality of kernels; and
and generating output data based on the input data and the predicted kernel.