KR102197089B1

KR102197089B1 - Multi-task learning architecture for removing non-uniform blur of single image based on deep learning

Info

Publication number: KR102197089B1
Application number: KR1020200121985A
Authority: KR
Inventors: 정형주; 손광훈; 장현성; 하남구; 권구용; 이민석
Original assignee: 엘아이지넥스원 주식회사; 연세대학교 산학협력단
Priority date: 2020-09-22
Filing date: 2020-09-22
Publication date: 2020-12-30

Abstract

The present embodiments provide a blur improvement method and apparatus capable of removing non-uniform blur from a single image through a multi-learning network model, wherein a blur improvement network and a motion prediction network designed with an encoder-decoder structure share an encoder and simultaneously go through learning.

Description

Deep learning-based multi-learning structure for removing single image non-uniform blur {MULTI-TASK LEARNING ARCHITECTURE FOR REMOVING NON-UNIFORM BLUR OF SINGLE IMAGE BASED ON DEEP LEARNING}

본 발명이 속하는 기술 분야는 딥러닝 기반의 다중 학습 구조를 이용한 단일 영상 비균일 블러 개선 방법 및 장치에 관한 것이다.The technical field to which the present invention pertains is to a method and apparatus for improving single image non-uniform blur using a deep learning-based multiple learning structure.

이 부분에 기술된 내용은 단순히 본 실시예에 대한 배경 정보를 제공할 뿐 종래기술을 구성하는 것은 아니다.The content described in this section merely provides background information on the present embodiment and does not constitute the prior art.

카메라의 센서는 장면에 대한 정보를 얻기 위해 빛을 축적하며, 이때 충분한 노출 시간이 필요하다. 그 결과, 노출 시간 중의 카메라/물체의 움직임은 사진에서 모션 블러를 발생시킨다. 카메라/물체의 움직임에 의한 모션 블러는 사진의 가시성을 떨어뜨리며, 이는 다양한 물체 인식 분야의 성능에 영향을 준다.The camera's sensor accumulates light to obtain information about the scene, and sufficient exposure time is required. As a result, movement of the camera/object during the exposure time causes motion blur in the picture. Motion blur caused by camera/object movement decreases the visibility of photos, which affects the performance of various object recognition fields.

모션 블러에 의해 열화된 영상을 복원하는 것은 가시적으로 향상된 영상을 얻을 수 있으며, 다양한 객체 탐지 분야에서 매우 필수적인 전처리 과정이다.Restoring an image deteriorated by motion blur can obtain a visually improved image, and is a very essential pre-processing process in various object detection fields.

한국등록특허공보 제10-0860967호 (2008.09.24.)Korean Registered Patent Publication No. 10-0860967 (2008.09.24.) 한국등록특허공보 제10-1152525호 (2012.05.25.)Korean Registered Patent Publication No. 10-1152525 (2012.05.25.) 한국등록특허공보 제10-1350460호 (2014.01.03.)Korean Registered Patent Publication No. 10-1350460 (2014.01.03.) 한국등록특허공보 제10-1844332호 (2018.03.27.)Korean Registered Patent Publication No. 10-1844332 (2018.03.27.)

본 발명의 실시예들은 인코더-디코더 구조로 설계된 블러 개선 네트워크 및 모션 예측 네트워크가 인코더를 공유하고 동시 학습을 거친 다중 학습 네트워크 모델을 통해 단일 영상에서 비균일 블러를 제거하는 데 주된 목적이 있다.Embodiments of the present invention are mainly aimed at removing non-uniform blur from a single image through a multi-learning network model that shares an encoder and undergoes simultaneous learning in a blur improvement network and a motion prediction network designed with an encoder-decoder structure.

본 발명의 명시되지 않은 또 다른 목적들은 하기의 상세한 설명 및 그 효과로부터 용이하게 추론할 수 있는 범위 내에서 추가적으로 고려될 수 있다.Still other objects, not specified, of the present invention may be additionally considered within the range that can be easily deduced from the following detailed description and effects thereof.

본 실시예의 일 측면에 의하면 컴퓨팅 디바이스에 의한 블러 개선 방법에 있어서, 단일 블러 이미지를 입력받는 단계, 및 상기 입력된 단일 블러 이미지로부터 다중 학습 네트워크 모델을 통해 비균일 블러를 제거하는 단계를 포함하며, 상기 다중 학습 네트워크 모델의 블러 개선 네트워크 및 모션 예측 네트워크는 인코더를 상호 공유하는 것을 특징으로 하는 블러 개선 방법을 제공한다.According to an aspect of the present embodiment, a method of improving blur by a computing device includes receiving a single blur image, and removing non-uniform blur from the input single blur image through a multi-learning network model, The blur improvement network and the motion prediction network of the multi-learning network model provide a blur improvement method, characterized in that they share encoders with each other.

상기 인코더는 컨볼루션 레이어(Convolutional Layer)가 적용된 모션 기반의 특징 추출 모듈일 수 있다.The encoder may be a motion-based feature extraction module to which a convolutional layer is applied.

상기 인코더는 컨볼루션 레이어(Convolutional Layer)를 통해 표현자를 추출하고, 스킵 구조를 갖는 레지듀얼 블록을 통해 상기 표현자의 채널 개수를 증가시킬 수 있다.The encoder may extract a presenter through a convolutional layer and increase the number of channels of the presenter through a residual block having a skip structure.

상기 블러 개선 네트워크는 상기 인코더와 제1 디코더를 포함하며, 상기 제1 디코더는 상기 인코더에서 추출한 특징으로부터 이미지를 복원할 수 있다.The blur improvement network includes the encoder and a first decoder, and the first decoder may restore an image from features extracted from the encoder.

상기 제1 디코더는 디컨볼루션 레이어(Deconvolutional Layer)를 통해 해상도를 증가시키고, 상기 디컨볼루션 레이어의 후단에 제1 컨볼루션 레이어를 추가하고, 상기 제1 컨볼루션 레이어에서 표현자의 채널 개수를 상기 인코더와 대칭이 되도록 감소시키고, 상기 제1 컨볼루션 레이어는 블러 개선된 이미지를 추정하며, 대칭이 되는 상기 인코더와 상기 제1 디코더 간에 스킵 연결을 통해 상기 표현자의 공간적 구조를 유지할 수 있다.The first decoder increases the resolution through a deconvolutional layer, adds a first convolution layer to a rear end of the deconvolution layer, and calculates the number of presenter channels in the first convolution layer. It is reduced to be symmetric with the encoder, the first convolution layer estimates a blur-improved image, and the spatial structure of the presenter can be maintained through skip connection between the symmetrical encoder and the first decoder.

상기 모션 예측 네트워크는 상기 인코더와 제2 디코더를 포함하며, 상기 제2 디코더는 상기 인코더에서 추출한 특징으로부터 옵티컬 플로우를 추정할 수 있다.The motion prediction network includes the encoder and a second decoder, and the second decoder may estimate an optical flow from features extracted from the encoder.

상기 제2 디코더는 디컨볼루션 레이어를 통해 상기 옵티컬 플로우를 추정하고, 상기 디컨볼루션 레이어의 후단에 제2 컨볼루션 레이어를 추가하고, 상기 제2 컨볼루션 레이어에서 표현자의 채널 개수를 상기 인코더와 대칭이 되도록 감소시키고, 상기 제2 컨볼루션 레이어는 전방 옵티컬 플로우와 후방 옵티컬 플로우를 추정하며, 대칭이 되는 상기 인코더와 상기 제2 디코더 간에 스킵 연결을 통해 상기 표현자의 공간적 구조를 유지할 수 있다.The second decoder estimates the optical flow through a deconvolution layer, adds a second convolution layer to a rear end of the deconvolution layer, and determines the number of presenter channels in the second convolution layer with the encoder. It is reduced to be symmetrical, and the second convolution layer estimates a front optical flow and a rear optical flow, and maintains the spatial structure of the presenter through skip connection between the symmetrical encoder and the second decoder.

상기 모션 예측 네트워크는 학습 과정에서 옵티컬 플로우에 대한 슈도 정답을 적용할 수 있다.The motion prediction network may apply a pseudo correct answer to an optical flow in a learning process.

상기 옵티컬 플로우에 대한 슈도 정답은 공간 피라미드 배합과 심층 학습을 결합한 옵티컬 플로우 추정 모델에 의해 산출될 수 있다.The pseudo correct answer to the optical flow may be calculated by an optical flow estimation model that combines spatial pyramid formulation and deep learning.

상기 옵티컬 플로우 추정 모델은 비교할 이미지 쌍을 스택한 후 컨볼루션 레이어를 거쳐 추정하거나 비교할 이미지 쌍을 각각의 컨볼루션 레이어를 거친 후 서로의 입력을 이용하는 상관 레이어(Correlation Layer)를 통해 특징을 비교할 수 있다.The optical flow estimation model may be estimated through a convolution layer after stacking image pairs to be compared, or characteristics may be compared through a correlation layer using inputs from each other after passing through each convolution layer for an image pair to be compared. .

상기 블러 개선 네트워크 및 상기 모션 예측 네트워크는 동시에 학습될 수 있다.The blur improvement network and the motion prediction network may be learned simultaneously.

이상에서 설명한 바와 같이 본 발명의 실시예들에 의하면, 인코더-디코더 구조로 설계된 블러 개선 네트워크 및 모션 예측 네트워크가 인코더를 공유하고 동시 학습을 거친 다중 학습 네트워크 모델을 통해 단일 영상에서 비균일 블러를 제거할 수 있는 효과가 있다.As described above, according to the embodiments of the present invention, a blur improvement network and a motion prediction network designed with an encoder-decoder structure share an encoder and remove non-uniform blur from a single image through a multi-learning network model that has undergone simultaneous learning. There is an effect that can be done.

여기에서 명시적으로 언급되지 않은 효과라 하더라도, 본 발명의 기술적 특징에 의해 기대되는 이하의 명세서에서 기재된 효과 및 그 잠정적인 효과는 본 발명의 명세서에 기재된 것과 같이 취급된다.Even if it is an effect not explicitly mentioned herein, the effect described in the following specification expected by the technical features of the present invention and the provisional effect thereof are treated as described in the specification of the present invention.

도 1 및 도 2는 기존의 블러 개선 모델을 예시한 도면이다.
도 3은 본 발명의 일 실시예에 따른 블러 개선 장치를 예시한 블록도이다.
도 4 및 도 5는 본 발명의 일 실시예에 따른 블러 개선 장치의 다중 학습 네트워크 모델을 예시한 도면이다.
도 6은 본 발명의 일 실시예에 따른 블러 개선 장치의 다중 학습 네트워크 모델이 처리하는 입력 이미지, 모션 정보, 및 복원 이미지를 예시한 도면이다.
도 7은 본 발명의 다른 실시예에 따른 블러 개선 방법을 예시한 흐름도이다.1 and 2 are diagrams illustrating an existing blur improvement model.
3 is a block diagram illustrating an apparatus for improving blur according to an embodiment of the present invention.
4 and 5 are diagrams illustrating a multi-learning network model of an apparatus for improving blur according to an embodiment of the present invention.
6 is a diagram illustrating an input image, motion information, and a reconstructed image processed by a multi-learning network model of a blur improvement apparatus according to an embodiment of the present invention.
7 is a flowchart illustrating a blur improvement method according to another embodiment of the present invention.

이하, 본 발명을 설명함에 있어서 관련된 공지기능에 대하여 이 분야의 기술자에게 자명한 사항으로서 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략하고, 본 발명의 일부 실시예들을 예시적인 도면을 통해 상세하게 설명한다. Hereinafter, in describing the present invention, when it is determined that the subject matter of the present invention may be unnecessarily obscured as matters apparent to those skilled in the art with respect to known functions related to the present invention, a detailed description thereof will be omitted, and some embodiments of the present invention will be It will be described in detail through exemplary drawings.

본 발명은 표적 탐지, 군 기반 시설 감시 체계 등에 적용할 수 있다.The present invention can be applied to target detection, military infrastructure monitoring system, and the like.

도 1는 기존의 블러 커널 기반의 블러 개선 모델이고, 도 2는 기존의 블러 커널을 사용하지 않는 블러 개선 모델을 예시한 도면이다.1 is a blur improvement model based on an existing blur kernel, and FIG. 2 is a diagram illustrating a blur improvement model that does not use the existing blur kernel.

도 1에 도시된 기존의 방식은 블러 커널을 추정한 후 논-블라인드(Non-Blind) 디블러를 통해 블러 제거된 영상을 획득한다. 영상의 모든 픽셀에 대한 블러 커널을 찾는 것은 잘 정의되지 않은 문제(ill-posed problem)에 해당한다. ill-posed 문제를 해결하기 위하여 블러 커널에 다양한 가정을 설정하고, 블러 커널 추정 과정에서 카메라와 객체의 움직임 정보를 한정하기 위해 다양한 물리적 모델을 사용한다. 수작업을 사용하므로 정확한 커널 추정이 곤란한 문제가 있다.In the conventional method shown in FIG. 1, after estimating a blur kernel, a blur-removed image is obtained through a non-blind deblurrer. Finding the blur kernel for every pixel in the image is an ill-posed problem. In order to solve the ill-posed problem, various assumptions are set in the blur kernel, and various physical models are used to limit motion information of cameras and objects in the blur kernel estimation process. There is a problem that it is difficult to accurately estimate the kernel because it is manually used.

도 2에 도시된 기존의 방식은 커널 추정없이 바로 블러 영상으로부터 개선 영상의 고차원의 비선형 매핑 관계를 딥러닝 기술을 통해 학습한다. 학습 네트워크는 움직임 정보에 대해서 블러 영상과 정답 영상 간의 고차원의 비선형 매핑 관계를 통해 학습한다. 이렇게 학습된 네트워크의 표현자(feature)는 의미론적으로 움직임 정보를 포함할 수 있게 하는 직접적인 학습 과정이 없다.The conventional method shown in FIG. 2 learns a high-dimensional nonlinear mapping relationship of an enhanced image from a blurred image directly without kernel estimation through a deep learning technique. The learning network learns motion information through a high-dimensional nonlinear mapping relationship between the blurred image and the correct answer image. There is no direct learning process that enables the learned network features to contain motion information semantically.

본 실시예에 따른 블러 개선 장치는 입력 정답 간의 매핑을 통한 움직임 정보를 학습할 뿐만 아니라, 움직임 추정 네트워크를 추가로 학습하는 다중 학습 구조를 통해 분명하게(explicit) 움직임 정보를 학습한다. 이는 동일한 네트워크를 사용하였을 때 기존의 딥러닝 방식보다 실행시간(runtime)의 손실없이 더욱 향상된 블러 제거 성능을 확보할 수 있다. 기존의 커널 기반의 방식과 달리 테스트 과정에서 정확한 블러 정보(즉 움직임 정보)를 추정할 필요가 없으며, 복잡한 반복 연산을 필요로 하지 않는다.The blur improvement apparatus according to the present embodiment not only learns motion information through mapping between input correct answers, but also explicitly learns motion information through a multi-learning structure for additionally learning a motion estimation network. When using the same network, it is possible to secure more improved blur removal performance without loss of runtime than the existing deep learning method. Unlike the existing kernel-based method, there is no need to estimate accurate blur information (that is, motion information) during the test process, and complex repetitive operations are not required.

도 3은 본 발명의 일 실시예에 따른 블러 개선 장치를 예시한 블록도이다.3 is a block diagram illustrating an apparatus for improving blur according to an embodiment of the present invention.

블러 개선 장치(110)는 적어도 하나의 프로세서(120), 컴퓨터 판독 가능한 저장매체(130) 및 통신 버스(170)를 포함한다. The blur improvement device 110 includes at least one processor 120, a computer-readable storage medium 130, and a communication bus 170.

프로세서(120)는 블러 개선 장치(110)로 동작하도록 제어할 수 있다. 예컨대, 프로세서(120)는 컴퓨터 판독 가능한 저장 매체(130)에 저장된 하나 이상의 프로그램들을 실행할 수 있다. 하나 이상의 프로그램들은 하나 이상의 컴퓨터 실행 가능 명령어를 포함할 수 있으며, 컴퓨터 실행 가능 명령어는 프로세서(120)에 의해 실행되는 경우 블러 개선 장치(110)로 하여금 예시적인 실시예에 따른 동작들을 수행하도록 구성될 수 있다.The processor 120 may be controlled to operate as the blur improvement device 110. For example, the processor 120 may execute one or more programs stored in the computer-readable storage medium 130. One or more programs may include one or more computer-executable instructions, and when the computer-executable instructions are executed by the processor 120, the blur improvement apparatus 110 is configured to perform operations according to an exemplary embodiment. I can.

컴퓨터 판독 가능한 저장 매체(130)는 컴퓨터 실행 가능 명령어 내지 프로그램 코드, 프로그램 데이터 및/또는 다른 적합한 형태의 정보를 저장하도록 구성된다. 컴퓨터 판독 가능한 저장 매체(130)에 저장된 프로그램(140)은 프로세서(120)에 의해 실행 가능한 명령어의 집합을 포함한다. 일 실시예에서, 컴퓨터 판독한 가능 저장 매체(130)는 메모리(랜덤 액세스 메모리와 같은 휘발성 메모리, 비휘발성 메모리, 또는 이들의 적절한 조합), 하나 이상의 자기 디스크 저장 디바이스들, 광학 디스크 저장 디바이스들, 플래시 메모리 디바이스들, 그 밖에 블러 개선 장치(110)에 의해 액세스되고 원하는 정보를 저장할 수 있는 다른 형태의 저장 매체, 또는 이들의 적합한 조합일 수 있다.Computer-readable storage medium 130 is configured to store computer-executable instructions or program code, program data, and/or other suitable form of information. The program 140 stored in the computer-readable storage medium 130 includes a set of instructions executable by the processor 120. In one embodiment, the computer-readable storage medium 130 includes memory (volatile memory such as random access memory, nonvolatile memory, or a suitable combination thereof), one or more magnetic disk storage devices, optical disk storage devices, Flash memory devices, other types of storage media that can be accessed by the blur enhancement device 110 and store desired information, or a suitable combination thereof.

통신 버스(170)는 프로세서(120), 컴퓨터 판독 가능한 저장 매체(140)를 포함하여 블러 개선 장치(110)의 다른 다양한 컴포넌트들을 상호 연결한다.The communication bus 170 interconnects the various other components of the blur enhancement device 110 including the processor 120 and a computer-readable storage medium 140.

블러 개선 장치(110)는 또한 하나 이상의 입출력 장치를 위한 인터페이스를 제공하는 하나 이상의 입출력 인터페이스(150) 및 하나 이상의 통신 인터페이스(160)를 포함할 수 있다. 입출력 인터페이스(150) 및 통신 인터페이스(160)는 통신 버스(170)에 연결된다. 입출력 장치(미도시)는 입출력 인터페이스(150)를 통해 블러 개선 장치(110)의 다른 컴포넌트들에 연결될 수 있다.The blur improvement device 110 may also include one or more input/output interfaces 150 and one or more communication interfaces 160 that provide interfaces for one or more input/output devices. The input/output interface 150 and the communication interface 160 are connected to the communication bus 170. The input/output device (not shown) may be connected to other components of the blur improvement device 110 through the input/output interface 150.

도 4 및 도 5는 본 발명의 일 실시예에 따른 블러 개선 장치의 다중 학습 네트워크 모델을 예시한 도면이다.4 and 5 are diagrams illustrating a multi-learning network model of an apparatus for improving blur according to an embodiment of the present invention.

다중 학습 네트워크 모델은 복수의 레이어가 연결된 학습 네트워크 구조이며, 인코더-디코더 구조를 갖는다. 학습 네트워크 모델은 다수의 레이어가 네트워크로 연결되며 히든 레이어를 포함한다. 레이어는 파라미터를 포함할 수 있고, 레이어의 파라미터는 학습가능한 필터 집합을 포함한다. 파라미터는 노드 간의 가중치 및/또는 바이어스를 포함한다.The multi-learning network model is a learning network structure in which a plurality of layers are connected, and has an encoder-decoder structure. In the learning network model, a number of layers are connected through a network and include hidden layers. The layer may include parameters, and the parameters of the layer include a set of learnable filters. The parameters include weights and/or biases between nodes.

다중 학습 네트워크 모델은 블러 개선 네트워크 및 모션 예측 네트워크를 포함한다. 두 네트워크 모두 넓은 범위의 영역을 참조하기 위해 스트라이드 콘볼루션을 통한 인코더-디코더 구조로 설계되며 서로 일부 네트워크 파라미터를 공유한다. The multi-learning network model includes a blur improvement network and a motion prediction network. Both networks are designed with an encoder-decoder structure through stride convolution to refer to a wide area, and share some network parameters with each other.

블러 개선 네트워크는 먼저 단일 입력 영상으로부터 세 개의 콘볼루션 레이어를 통해 64 채널의 표현자를 추출한다. 그 후, 스트라이드 콘볼루션을 포함하는 두 개의 레지듀얼 블록을 통해 공간적 크기를 반으로 줄이면서 동시에 표현자 채널의 수를 2배씩 증가시킨다. The blur enhancement network first extracts a 64 channel presenter from a single input image through three convolution layers. Thereafter, the spatial size is halved through two residual blocks including stride convolutions, and the number of presenter channels is doubled.

레지듀얼 블록은 네트워크 학습 시 그래디언트가 사라지는 문제를 해결할 수 있어 더욱 좋은 표현자를 학습할 수 있는 장점이 있다. 레지듀얼 블록은 세 개의 콘볼루션 레이어로 구성되며, 첫 번째 레이어는 스트라이드(stride)가 2이다. 또한, 레지듀얼 블록의 결과는 마지막 콘볼루션 레이어의 결과와 첫 번째 레이어의 결과를 합하여 구한다. 1/4 크기로 줄어든 표현자 (채널은 256)는 그 후 세 개의 레지듀얼 블록을 통과하며, 이때 모든 콘볼루션 레이어의 스트라이드는 1이다. The residual block has the advantage of learning better presenters because it can solve the problem that the gradient disappears during network learning. The residual block consists of three convolutional layers, and the first layer has a stride of 2. Also, the result of the residual block is obtained by summing the result of the last convolutional layer and the result of the first layer. The presenter reduced to 1/4 size (the channel is 256) then passes through three residual blocks, where the stride of all convolutional layers is 1.

마지막으로 입력 영상과 동일한 해상도를 갖는 개선 영상을 추정하기 위해 두 개의 디콘볼루션 레이어를 사용하여 해상도를 2배씩 증가시킨다. 각 디콘볼루션 레이어 이후 한 개의 콘볼루션 레이어를 추가로 설계하였으며, 표현자의 채널 수는 인코더와 대칭이 되도록 반으로 감소시켰다. 마지막 콘볼루션 레이어는 세 채널의 개선 영상을 바로 추정한다. 한편 대칭이 되는 인코더와 디코더 표현자 간 스킵 연결을 통해 앞 단의 표현자의 공간적 구조를 유지시킨다. Finally, in order to estimate an improved image having the same resolution as the input image, two deconvolution layers are used to increase the resolution by two times. After each deconvolution layer, one convolution layer was additionally designed, and the number of channels of the presenter was reduced in half to be symmetric with the encoder. The last convolutional layer directly estimates the three-channel enhancement image. Meanwhile, the spatial structure of the presenter at the front end is maintained through skip connection between the symmetrical encoder and the decoder presenter.

네트워크의 비선형성을 높이기 위한 비선형 함수로 ReLU(Rectified linear unit) 레이어를 사용한다. 또한, 콘볼루션 표현자의 내부 코베리언스 변화(Internal Covariate Shift) 문제를 해결하기 위한 Batch normalization 레이어를 사용할 수 있다.ReLU (Rectified Linear Unit) layer is used as a nonlinear function to increase the nonlinearity of the network. In addition, a batch normalization layer can be used to solve the problem of the internal covariate shift of the convolution presenter.

다중 학습 구조의 다른 네트워크인 모션 예측 네트워크(옵티컬 플로우 추정 네트워크)는 블러 개선 네트워크의 인코더 구조를 공유한다. 옵티컬 플로우 추정을 위한 추가적인 네트워크는 디코더 부분의 디콘볼루션 레이어와 콘볼루션 레이어이다. 이는 블러 개선 네트워크의 디코더 부분과 유사하게 설계할 수 있다. Another network of multiple learning structures, a motion prediction network (optical flow estimation network), shares the encoder structure of the blur improvement network. Additional networks for optical flow estimation are the deconvolution layer and the convolution layer of the decoder part. This can be designed similar to the decoder part of the blur enhancement network.

모션 예측 네트워크의 마지막 콘볼루션은 네 채널의 옵티컬 플로우를 바로 추정한다. 각각의 두 채널짜리 플로우는 각각 전방 옵티컬 플로우와 후방 옵티컬 플로우를 나타낸다. The final convolution of the motion prediction network immediately estimates the optical flow of the four channels. Each two-channel flow represents a front optical flow and a rear optical flow, respectively.

다중 학습 네트워크 모델은 학습 과정에서 제1 피드백 경로를 통해 블러 개선 네트워크의 제1 디코더의 출력을 모션 기반의 특징 추출 모듈인 인코더의 입력으로 변환하여 피드백할 수 있다. 변환 모듈은 출력 및 입력의 포맷에 맞게 데이터 값을 변환하며 제1 피드백 경로 상에서 제1 변환 모듈을 통해 데이터 값을 변환한다.The multi-learning network model may convert an output of the first decoder of the blur improvement network into an input of an encoder, which is a motion-based feature extraction module, through a first feedback path during a learning process, and provide feedback. The conversion module converts the data value according to the format of the output and the input, and converts the data value through the first conversion module on the first feedback path.

다중 학습 네트워크 모델은 학습 과정에서 제2 피드백 경로를 통해 모션 예측 네트워크의 제2 디코더의 출력을 모션 기반의 특징 추출 모듈인 인코더의 입력으로 변환하여 피드백할 수 있다. 변환 모듈은 출력 및 입력의 포맷에 맞게 데이터 값을 변환하며 제2 피드백 경로 상에서 제2 변환 모듈을 통해 데이터 값을 변환한다.The multi-learning network model may convert an output of the second decoder of the motion prediction network into an input of an encoder, which is a motion-based feature extraction module, through a second feedback path in the learning process, and feed back. The conversion module converts the data value according to the format of the output and the input, and converts the data value through the second conversion module on the second feedback path.

다중 학습 네트워크 모델은 학습 과정에서 제3 피드백 경로를 통해 블러 개선 네트워크의 제1 디코더의 출력을 모션 예측 네트워크의 제2 디코더의 입력으로 변환하여 피드백할 수 있다. 변환 모듈은 출력 및 입력의 포맷에 맞게 데이터 값을 변환하며 제3 피드백 경로 상에서 제3 변환 모듈을 통해 데이터 값을 변환한다.The multi-learning network model may convert an output of the first decoder of the blur improvement network into an input of the second decoder of the motion prediction network through a third feedback path in the learning process and feed back. The conversion module converts the data value according to the format of the output and the input, and converts the data value through the third conversion module on the third feedback path.

다중 학습 네트워크 모델은 학습 과정에서 제4 피드백 경로를 통해 모션 예측 네트워크의 제2 디코더의 출력을 블러 개선 네트워크의 제1 디코더의 입력으로 변환하여 피드백할 수 있다. 변환 모듈은 출력 및 입력의 포맷에 맞게 데이터 값을 변환하며 제4 피드백 경로 상에서 제4 변환 모듈을 통해 데이터 값을 변환한다.The multi-learning network model may convert the output of the second decoder of the motion prediction network into the input of the first decoder of the blur improvement network through the fourth feedback path in the learning process and provide feedback. The conversion module converts the data value according to the format of the output and the input, and converts the data value through the fourth conversion module on the fourth feedback path.

다중 학습 네트워크 모델은 손실 함수를 최소화하는 파라미터를 도출하도록 학습된다.The multi-learning network model is trained to derive parameters that minimize the loss function.

전체 손실 함수는 블러 개선 네트워크 및 모션 예측 네트워크를 학습하기 위한 L_deblur와 L_flow의 비용 함수의 합으로 설계된다. L_deblur는 최신의 CNN 기반 초해상도 영상 복원 기술을 따라 수학식 1과 같이 Perceptual Loss와 Adversarial Loss의 합으로 설계할 수 있다. λ는 0.01로 설정될 수 있으며, 요구되는 설계 사항에 따라 다른 수치 적용이 가능하다.The total loss function is designed as the sum of the cost functions of L _deblur and L _flow for learning the blur improvement network and motion prediction network. L _deblur can be designed as the sum of Perceptual Loss and Adversarial Loss as shown in Equation 1 according to the latest CNN-based super-resolution image restoration technology. λ can be set to 0.01, and other numerical values can be applied according to the required design requirements.

L_VGG는 ImageNet 데이터베이스를 이용한 영상 분류 분야에 이미 학습된 VGG 네트워크의 표현자를 통해 계산된다. L_VGG는 개선 영상

와 정답

를 수학식 2와 같이 설계된다.L _VGG is calculated through the presenter of the VGG network already learned in the field of image classification using the ImageNet database. L _VGG is an improved video

And correct answer

Is designed as in Equation 2.

Φ는 VGG 네트워크의 표현자 추정 네트워크를 나타낸다. 이러한 의미론적 특성을 갖는 표현자를 이용한 비용 함수는 가시적으로 향상된 개선 영상을 획득할 수 있다. L_adv는 개선 영상과 정답 영상 간의 구분이 안 되도록 하는 비용 함수이다. 이러한 적대적(Adversarial) 학습을 위해서는 입력 영상이 실제 뚜렷한 영상인지 개선된 영상인지를 분류하기 위한 판별자(Discriminator) 네트워크가 필요하다. GAN(Generative Adversarial Network) 모델을 고려하여 표 1과 같이 판별자 네트워크를 설계할 수 있다.Φ represents the presenter estimation network of the VGG network. The cost function using the presenter having such semantic characteristics can obtain a visually improved improved image. L _adv is a cost function that makes no distinction between the improved image and the correct answer image. For such adversarial learning, a discriminator network is required to classify whether an input image is actually a clear image or an improved image. In consideration of the Generative Adversarial Network (GAN) model, the discriminator network can be designed as shown in Table 1.

여기서, Conv, LReLU, BN, 및 FC는 각각 콘볼루션, Leaky ReLU, 배치 정규화(Batch Normalization), 풀리 커넥티드 레이어(Fully Connected Layer)를 의미한다. 표 1의 판별자는 블러 개선 네트워크의 결과 영상 또는 정답 영상을 입력으로 하며, 수학식 3의 비용 함수로 학습된다.Here, Conv, LReLU, BN, and FC mean convolution, leaky ReLU, batch normalization, and fully connected layer, respectively. The discriminator in Table 1 takes as an input the result image or the correct answer image of the blur improvement network, and is learned by the cost function of Equation 3.

D와 G는 각각 판별자 및 블러 개선 네트워크를 나타낸다. L_deblur에서 L_adv가 개선 영상과 정답 영상의 구분 안 되게 학습하기 위해서는 수학식 4의 비용 함수를 최소화해야 한다.D and G represent discriminator and blur improvement networks, respectively. In order to learn that L _adv is _{indistinguishable} from the improved image and the correct answer image in L _deblur , the cost function of Equation 4 must be minimized.

단일 블러 영상으로부터 모션 예측 네트워크(옵티컬 플로우 추정 네트워크)를 학습하기 위해서는 먼저 플로우에 대한 정답을 알아야 한다. 그러나 일반적으로 이러한 정답은 존재하지 않는다. In order to learn a motion prediction network (optical flow estimation network) from a single blur image, we first need to know the correct answer for the flow. However, in general, there is no such answer.

본 실시예에 따른 블러 개선 장치는 S[0]와 S[M-1] 두 영상을 이용하여 옵티컬 플로우 추정 모델을 이용하여 플로우에 대한 슈도 정답(

및

)을 도출한다. 옵티컬 플로우 추정 모델은 비교할 이미지 쌍을 스택한 후 컨볼루션 레이어를 거쳐 추정하거나 비교할 이미지 쌍을 각각의 컨볼루션 레이어를 거친 후 서로의 입력을 이용하는 상관 레이어(Correlation Layer)를 통해 특징을 비교한다.The blur improvement apparatus according to the present embodiment uses an optical flow estimation model using two images S[0] and S[M-1], and a pseudo correct answer for the flow (

And

). The optical flow estimation model stacks image pairs to be compared and then estimates them through a convolution layer, or compares features through a correlation layer using inputs from each other after estimating the image pairs to be compared through each convolution layer.

다중 학습 네트워크 구조를 학습하기 위한 입력 영상 및 블러 개선과 플로우 추정에 대한 정답 영상의 예시가 도 6에 예시되어 있다.An example of an input image for learning a multi-learning network structure and an image of a correct answer for blur improvement and flow estimation is illustrated in FIG. 6.

L_flow는 수학식 5와 같이 표현된다.L _flow is expressed as in Equation 5.

F_forward와 F_backward는 모션 예측 네트워크(옵티컬 플로우 추정 네트워크)로 추정된 전방 옵티컬 플로우와 후방 옵티컬 플로우를 나타낸다.F _forward and F _backward represent a forward optical flow and a rear optical flow estimated by a motion prediction network (optical flow estimation network).

도 7은 본 발명의 다른 실시예에 따른 블러 개선 방법을 예시한 흐름도이다. 블러 개선 방법은 블러 개선 장치 또는 컴퓨팅 디바이스에 의하여 수행될 수 있다.7 is a flowchart illustrating a blur improvement method according to another embodiment of the present invention. The blur improvement method may be performed by a blur improvement apparatus or a computing device.

단계 S210에서 프로세서는 단일 블러 이미지를 입력받는다.In step S210, the processor receives a single blur image.

단계 S220에서 프로세서는 입력된 단일 블러 이미지로부터 다중 학습 네트워크 모델을 통해 비균일 블러를 제거한다.In step S220, the processor removes the non-uniform blur from the input single blur image through the multiple learning network model.

다중 학습 네트워크 모델의 블러 개선 네트워크 및 모션 예측 네트워크는 인코더를 상호 공유한다. 블러 개선 네트워크 및 모션 예측 네트워크는 동시에 학습된다.The blur improvement network and the motion prediction network of the multi-learning network model share encoders with each other. The blur improvement network and motion prediction network are learned simultaneously.

인코더는 컨볼루션 레이어(Convolutional Layer)가 적용된 모션 기반의 특징 추출 모듈이다. 인코더는 컨볼루션 레이어를 통해 표현자를 추출하고, 스킵 구조를 갖는 레지듀얼 블록을 통해 상기 표현자의 채널 개수를 증가시킨다.The encoder is a motion-based feature extraction module to which a convolutional layer is applied. The encoder extracts a presenter through a convolutional layer, and increases the number of channels of the presenter through a residual block having a skip structure.

블러 개선 네트워크는 인코더와 제1 디코더를 포함하며, 제1 디코더는 인코더에서 추출한 특징으로부터 이미지를 복원한다. 제1 디코더는 디컨볼루션 레이어(Deconvolutional Layer)를 통해 해상도를 증가시키고, 디컨볼루션 레이어의 후단에 제1 컨볼루션 레이어를 추가하고, 제1 컨볼루션 레이어에서 표현자의 채널 개수를 인코더와 대칭이 되도록 감소시키고, 제1 컨볼루션 레이어는 블러 개선된 이미지를 추정한다. 대칭이 되는 인코더와 제1 디코더 간에 스킵 연결을 통해 표현자의 공간적 구조를 유지한다.The blur enhancement network includes an encoder and a first decoder, and the first decoder recovers an image from features extracted from the encoder. The first decoder increases the resolution through a deconvolutional layer, adds a first convolution layer to the rear end of the deconvolution layer, and determines the number of channels of the presenter in the first convolution layer symmetrically with the encoder. As much as possible, and the first convolution layer estimates the blur-enhanced image. The spatial structure of the presenter is maintained through skip connection between the symmetrical encoder and the first decoder.

모션 예측 네트워크는 인코더와 제2 디코더를 포함하며, 제2 디코더는 인코더에서 추출한 특징으로부터 옵티컬 플로우를 추정한다. 제2 디코더는 디컨볼루션 레이어를 통해 상기 옵티컬 플로우를 추정하고, 디컨볼루션 레이어의 후단에 제2 컨볼루션 레이어를 추가하고, 제2 컨볼루션 레이어에서 표현자의 채널 개수를 인코더와 대칭이 되도록 감소시키고, 제2 컨볼루션 레이어는 전방 옵티컬 플로우와 후방 옵티컬 플로우를 추정한다. 대칭이 되는 인코더와 제2 디코더 간에 스킵 연결을 통해 표현자의 공간적 구조를 유지한다.The motion prediction network includes an encoder and a second decoder, and the second decoder estimates an optical flow from features extracted from the encoder. The second decoder estimates the optical flow through the deconvolution layer, adds a second convolution layer to the rear end of the deconvolution layer, and reduces the number of presenter channels in the second convolution layer to be symmetric with the encoder. And, the second convolution layer estimates the front optical flow and the rear optical flow. The spatial structure of the presenter is maintained through skip connection between the symmetrical encoder and the second decoder.

모션 예측 네트워크는 학습 과정에서 옵티컬 플로우에 대한 슈도 정답을 적용한다. 옵티컬 플로우에 대한 슈도 정답은 공간 피라미드 배합과 심층 학습을 결합한 옵티컬 플로우 추정 모델에 의해 산출된다. 옵티컬 플로우 추정 모델은 비교할 이미지 쌍을 스택한 후 컨볼루션 레이어를 거쳐 추정하거나 비교할 이미지 쌍을 각각의 컨볼루션 레이어를 거친 후 서로의 입력을 이용하는 상관 레이어(Correlation Layer)를 통해 특징을 비교한다.The motion prediction network applies a pseudo correct answer to the optical flow in the learning process. The pseudo-correct answer for the optical flow is calculated by an optical flow estimation model that combines spatial pyramid formulation and deep learning. The optical flow estimation model stacks image pairs to be compared and then estimates them through a convolution layer, or compares features through a correlation layer using inputs from each other after estimating the image pairs to be compared through each convolution layer.

블러 개선 장치는 하드웨어, 펌웨어, 소프트웨어 또는 이들의 조합에 의해 로직회로 내에서 구현될 수 있고, 범용 또는 특정 목적 컴퓨터를 이용하여 구현될 수도 있다. 장치는 고정배선형(Hardwired) 기기, 필드 프로그램 가능한 게이트 어레이(Field Programmable Gate Array, FPGA), 주문형 반도체(Application Specific Integrated Circuit, ASIC) 등을 이용하여 구현될 수 있다. 또한, 장치는 하나 이상의 프로세서 및 컨트롤러를 포함한 시스템온칩(System on Chip, SoC)으로 구현될 수 있다.The blur improvement apparatus may be implemented in a logic circuit by hardware, firmware, software, or a combination thereof, or may be implemented using a general purpose or specific purpose computer. The device may be implemented using a hardwired device, a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), or the like. In addition, the device may be implemented as a System on Chip (SoC) including one or more processors and controllers.

블러 개선 장치는 하드웨어적 요소가 마련된 컴퓨팅 디바이스 또는 서버에 소프트웨어, 하드웨어, 또는 이들의 조합하는 형태로 탑재될 수 있다. 컴퓨팅 디바이스 또는 서버는 각종 기기 또는 유무선 통신망과 통신을 수행하기 위한 통신 모뎀 등의 통신장치, 프로그램을 실행하기 위한 데이터를 저장하는 메모리, 프로그램을 실행하여 연산 및 명령하기 위한 마이크로프로세서 등을 전부 또는 일부 포함한 다양한 장치를 의미할 수 있다.The blur improvement apparatus may be mounted in a form of software, hardware, or a combination thereof on a computing device or server provided with hardware elements. Computing devices or servers include all or part of a communication device such as a communication modem for performing communication with various devices or wired/wireless communication networks, a memory storing data for executing a program, and a microprocessor for calculating and commanding a program. It can mean various devices including.

도 7에서는 각각의 과정을 순차적으로 실행하는 것으로 기재하고 있으나 이는 예시적으로 설명한 것에 불과하고, 이 분야의 기술자라면 본 발명의 실시예의 본질적인 특성에서 벗어나지 않는 범위에서 도 7에 기재된 순서를 변경하여 실행하거나 또는 하나 이상의 과정을 병렬적으로 실행하거나 다른 과정을 추가하는 것으로 다양하게 수정 및 변형하여 적용 가능할 것이다.In FIG. 7, it is described that each process is sequentially executed, but this is only illustrative, and those skilled in the art may change the order shown in FIG. 7 without departing from the essential characteristics of the embodiment of the present invention. Or, by executing one or more processes in parallel, or adding other processes, various modifications and variations may be applied.

본 실시예들에 따른 동작은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능한 매체에 기록될 수 있다. 컴퓨터 판독 가능한 매체는 실행을 위해 프로세서에 명령어를 제공하는 데 참여한 임의의 매체를 나타낸다. 컴퓨터 판독 가능한 매체는 프로그램 명령, 데이터 파일, 데이터 구조 또는 이들의 조합을 포함할 수 있다. 예를 들면, 자기 매체, 광기록 매체, 메모리 등이 있을 수 있다. 컴퓨터 프로그램은 네트워크로 연결된 컴퓨터 시스템 상에 분산되어 분산 방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수도 있다. 본 실시예를 구현하기 위한 기능적인(Functional) 프로그램, 코드, 및 코드 세그먼트들은 본 실시예가 속하는 기술분야의 프로그래머들에 의해 용이하게 추론될 수 있을 것이다.The operations according to the embodiments may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. Computer-readable medium refers to any medium that has participated in providing instructions to a processor for execution. The computer-readable medium may include program instructions, data files, data structures, or a combination thereof. For example, there may be a magnetic medium, an optical recording medium, a memory, and the like. Computer programs may be distributed over networked computer systems to store and execute computer-readable codes in a distributed manner. Functional programs, codes, and code segments for implementing the present embodiment may be easily inferred by programmers in the technical field to which the present embodiment belongs.

본 실시예들은 본 실시예의 기술 사상을 설명하기 위한 것이고, 이러한 실시예에 의하여 본 실시예의 기술 사상의 범위가 한정되는 것은 아니다. 본 실시예의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 실시예의 권리범위에 포함되는 것으로 해석되어야 할 것이다.The present embodiments are for explaining the technical idea of the present embodiment, and the scope of the technical idea of the present embodiment is not limited by these embodiments. The scope of protection of this embodiment should be interpreted by the following claims, and all technical ideas within the scope equivalent thereto should be construed as being included in the scope of the present embodiment.

Claims

In the blur improvement method by the blur improvement device,
Receiving a single blur image; And
And removing non-uniform blur from the input single blur image through a multi-learning network model,
The blur improvement network and the motion prediction network of the multi-learning network model share encoders with each other,
The blur improvement method, characterized in that the blur improvement network and the motion prediction network are learned at the same time.

The method of claim 1,
Wherein the encoder is a motion-based feature extraction module to which a convolutional layer is applied.

The method of claim 2,
Wherein the encoder extracts a presenter through the convolutional layer and increases the number of channels of the presenter through a residual block having a skip structure.

The method of claim 1,
Wherein the blur improvement network includes the encoder and a first decoder, and the first decoder restores an image from features extracted from the encoder.

The method of claim 4,
The first decoder increases the resolution through a deconvolutional layer, adds a first convolution layer to a rear end of the deconvolution layer, and calculates the number of presenter channels in the first convolution layer. Reduce to be symmetric with the encoder, and the first convolution layer estimates a blur-improved image,
And maintaining the spatial structure of the presenter through skip connection between the symmetrical encoder and the first decoder.

The method of claim 1,
Wherein the motion prediction network includes the encoder and a second decoder, and the second decoder estimates an optical flow from features extracted from the encoder.

The method of claim 6,
The second decoder estimates the optical flow through a deconvolution layer, adds a second convolution layer to a rear end of the deconvolution layer, and determines the number of presenter channels in the second convolution layer with the encoder. Decrease to be symmetric, and the second convolution layer estimates a front optical flow and a rear optical flow,
And maintaining the spatial structure of the presenter through skip connection between the symmetrical encoder and the second decoder.

The method of claim 6,
The motion prediction network applies a pseudo correct answer to an optical flow in a learning process.

The method of claim 8,
The method for improving blur, characterized in that the pseudo correct answer to the optical flow is calculated by an optical flow estimation model that combines spatial pyramid formulation and deep learning.

The method of claim 9,
In the optical flow estimation model, the image pairs to be compared are stacked and then estimated through a convolution layer, or the image pairs to be compared are evaluated through respective convolution layers, and features are compared through a correlation layer using inputs from each other. Blur improvement method characterized by.

delete