KR102418000B1

KR102418000B1 - Method for performing stereo matching by using color image and monochrome image and device using the same

Info

Publication number: KR102418000B1
Application number: KR1020220032396A
Authority: KR
Inventors: 전해곤; 임성훈
Original assignee: 광주과학기술원
Priority date: 2022-03-15
Filing date: 2022-03-15
Publication date: 2022-07-07

Abstract

According to the present invention, a method for performing stereo matching by using a color image and a monochrome image includes the steps of: (a) allowing a learning device to generate a denoised monochrome image for learning and a denoised image for learning when at least one color/monochrome stereo pair for learning is acquired; (b) allowing the learning device to generate a final color reconstruction image for learning; and (c) allowing the learning device to learn at least one parameter included in a disparity network, an occlusion network, a denoising network, and a colorization network using summation losses. Accordingly, it is possible to obtain a color reconstruction image with high sharpness.

Description

Method for performing stereo matching using a color image and a monochrome image, and an apparatus using the same

본 발명은 컬러 이미지와 모노크롬(monochrome) 이미지를 이용하여 스테레오 매칭(stereo matching)을 수행하는 방법 및 이를 이용한 장치에 관한 것으로, 보다 상세하게는, 디스패리티 네트워크(disparity network)와 오클루션 네트워크(occlusion network) 각각으로부터 컬러 이미지와 모노크롬 이미지 간의 매칭 코스트에 따른 디스패리티 맵(disparity map) 및 컬러 이미지와 모노크롬 이미지 간의 디퍼런스(difference)에 따른 오클루션 맵(occlusion map)을 생성하고, 디스패리티 맵과 오클루션 맵 및 디노이징 네트워크(denoising network)로부터 노이즈를 제거하여 생성된 디노이즈드 컬러 이미지와 디노이즈드 모노크롬 이미지를 이용하여 컬러를 복원한 선명한 이미지를 획득하는 방법 및 이를 이용한 장치에 관한 것이다. The present invention relates to a method for performing stereo matching using a color image and a monochrome image and an apparatus using the same, and more particularly, to a disparity network and an occlusion network. network) from each of the disparity map according to the matching cost between the color image and the monochrome image and the occlusion map according to the difference between the color image and the monochrome image, and the disparity map and The present invention relates to a method of obtaining a clear image in which color is restored using a denoised color image and a denoised monochrome image generated by removing noise from an occlusion map and a denoising network, and an apparatus using the same.

모바일 디바이스들이 운용할 수 있는 컴퓨팅 파워가 증가함에 따라 화웨이, 엘지, 삼성, 애플 등의 스마트폰 및 다양한 촬영기기 등을 개발하는 다양한 기업들은 다수의 카메라에서 동시에 촬영한 다수의 이미지를 조정 및 병합하여 촬영된 이미지의 퀄리티를 향상시키는 멀티 카메라 셋업을 활발히 연구하고 있다. 흥미롭게도, 상용화된 멀티 카메라 셋업 중에는 서로 다른 시야각(Field of View; FoV)이나 서로 다른 스펙트럼 속성을 동시에 이용하는 비대칭 카메라 구성들도 있다.As the computing power that can be operated by mobile devices increases, various companies developing smartphones and various shooting devices such as Huawei, LG, Samsung, and Apple are adjusting and merging multiple images taken simultaneously by multiple cameras. We are actively researching multi-camera setups that improve the quality of captured images. Interestingly, some commercial multi-camera setups include asymmetric camera configurations that use different Field of View (FoV) or different spectral properties simultaneously.

비대칭 카메라 구성들 중, 화웨이의 P9, P10, P20 시리즈에서와 같이, 컬러 카메라와 모노크롬 카메라를 이용하여 RGB-Monochrome 셋업을 구성한 스테레오 카메라가 실용화되고 있다.Among the asymmetric camera configurations, as in Huawei's P9, P10, and P20 series, a stereo camera with an RGB-Monochrome setup using a color camera and a monochrome camera is being put to practical use.

이러한 RGB-Monochrome 셋업을 구성한 스테레오 카메라에서, 컬러 카메라는 이미지 센서에 Bayer 컬러 필터가 있어 들어오는 빛을 파장에 따라 3가지의 기본 색상(빨간색, 녹색 또는 파란색) 중 하나로 분리하여 촬영된 이미지에 대한 컬러 정보를 획득하는 데에는 효과적이지만, 필터가 들어오는 빛을 많이 차단하기 때문에 저조도(low-light)의 조건에서는 이미지 노이즈를 되려 증폭시킬 수 있다. 반면, 모노크롬 카메라는 각 픽셀에 들어오는 빛을 모두 수신하므로 컬러 카메라에 비하여 더 나은 광 효율과 더 선명한 이미지를 제공할 수 있다.In a stereo camera configured with this RGB-Monochrome setup, the color camera has a Bayer color filter on the image sensor that separates the incoming light into one of three primary colors (red, green, or blue) depending on its wavelength, which is the color of the captured image. It is effective in acquiring information, but since the filter blocks a lot of incoming light, it can amplify image noise in low-light conditions. On the other hand, a monochrome camera can provide better light efficiency and a clearer image than a color camera because it receives all the light that enters each pixel.

도 1은 종래의 RGB-Monochrome 셋업을 구성한 스테레오 카메라의 특성을 개략적으로 도시한 것으로, 도 1의 (a)는 컬러 카메라와 모노크롬 카메라의 분광 시감도(spectral sensitivity)를 도시한 것이고, 도 1의 (b)는 컬러 카메라와 모노크롬 카메라 각각에 의해 촬영된 이미지를 도시한 것이다.FIG. 1 schematically shows the characteristics of a stereo camera configured with a conventional RGB-Monochrome setup, and FIG. 1 (a) shows the spectral sensitivity of a color camera and a monochrome camera, and FIG. 1 ( b) shows images taken by a color camera and a monochrome camera, respectively.

도 1의 (a)에서 알 수 있는 바와 같이, 동일한 스펙트럼 영역, 즉, 동일한 파장에서 모노크롬 카메라가 컬러 카메라보다 더 나은 분광 시감도(spectral sensitivity)를 제공한다. 이는 도 1의 (b)에서의 모노크롬 카메라와 컬러 카메라에서 각각 촬영된 이미지들을 비교하여 보아도, 컬러 카메라와 모노크롬 카메라 간의 광효율 차이로 인해 모노크롬 이미지 대비 컬러 이미지의 노이즈가 훨씬 더 많다는 것을 알 수 있다. 그러므로, RGB-Monochrome 셋업은 컬러 이미지와 모노크롬 이미지의 정보를 모두 이용했을 때 보다 나은 퀄리티의 이미지를 제공할 수 있다.As can be seen from FIG. 1A , in the same spectral region, that is, at the same wavelength, a monochrome camera provides better spectral sensitivity than a color camera. Even when comparing the images taken by the monochrome camera and the color camera in FIG. 1B , it can be seen that the noise of the color image compared to the monochrome image is much higher due to the difference in light efficiency between the color camera and the monochrome camera. Therefore, RGB-Monochrome setup can provide better quality images when using both color image and monochrome image information.

그러나, 이런 컬러 이미지와 모노크롬 이미지와 같이 비대칭한 이미지들을 스테레오 페어로 사용할 경우, 비대칭 이미지 페어 간의 비선형 분광 시감도(non-linear spectral sensitivity) 차이로 인해 이미지 간의 정확한 매칭이 어려운 문제점이 있다. 즉, 컬러 카메라 및 모노크롬 카메라 간의 분광 시감도 차이와 공간적으로 변하는 조도로 인해 컬러 카메라에서 기록된 이미지 휘도는 모노크롬 카메라에서 기록된 이미지 휘도와 다르므로 스테레오 매칭의 성능을 저하시키게 된다. However, when asymmetric images such as a color image and a monochrome image are used as a stereo pair, there is a problem in that it is difficult to accurately match the images due to a difference in non-linear spectral sensitivity between the asymmetric image pairs. That is, the image luminance recorded by the color camera is different from the image luminance recorded by the monochrome camera due to the difference in spectral luminance between the color camera and the monochrome camera and the spatially varying illuminance, thereby degrading the performance of stereo matching.

또한, 컬러 이미지와 모노크롬 이미지와 같이 비대칭 이미지들간 스테레오 매칭을 수행하는 과정에서 이미지 간 오클루션(occlusion) 영역이 발생하거나 부정확한 매칭이 수행되면 컬러 블리딩(color-bleeding) 오류, 즉 색 번짐이 발행할 수 있다.In addition, in the process of performing stereo matching between asymmetric images such as color images and monochrome images, if an occlusion area occurs between images or if incorrect matching is performed, a color-bleeding error, that is, color bleeding is issued. can do.

뿐만 아니라, 비대칭 이미지 간 정확도를 높이기 위해 반복적으로 스테레오 매칭을 수행할 경우 출력을 생성하는 데에 시간이 걸릴 수 있다는 문제가 있다. In addition, there is a problem that it may take time to generate an output if stereo matching is repeatedly performed to increase accuracy between asymmetric images.

따라서, 상기 문제점들을 해결하기 위한 개선 방안이 요구되는 실정이다.Accordingly, there is a need for an improvement method to solve the above problems.

본 발명은 상술한 문제점을 모두 해결하는 것을 그 목적으로 한다.An object of the present invention is to solve all of the above problems.

또한, 본 발명은 컬러 이미지와 모노크롬 이미지를 스테레오 페어로 이용하여 컬러 이미지의 컬러 정보와 모노크롬 이미지의 휘도 정보를 결합하여 스테레오 매칭을 수행하되 디스패리티 맵과 오클루션 맵을 추가로 생성하여 높은 선명도의 컬러 복원 이미지를 획득하는 것을 목적으로 한다.In addition, the present invention performs stereo matching by combining color information of a color image and luminance information of a monochrome image by using a color image and a monochrome image as a stereo pair, but additionally generates a disparity map and an occlusion map to achieve high clarity. It aims to obtain a color restoration image.

또한, 본 발명은 엔드-투-엔드(end-to-end) CNN(convolutional neural network)구조를 이용하여 비교적 짧은 프로세싱 시간으로도 높은 정확도의 디스패리티 맵과 선명한 컬러 복원 이미지를 생성하는 것을 또 다른 목적으로 한다.In addition, the present invention uses an end-to-end convolutional neural network (CNN) structure to generate a high-accuracy disparity map and a clear color reconstructed image even with a relatively short processing time. The purpose.

또한, 본 발명은 디스패리티 맵을 참조하여 노이즈를 제거한 컬러 이미지의 컬러 정보를 노이즈를 제거한 모노크롬 이미지에 추가하여 노이즈가 없는 선명한 컬러 이미지를 복구하는 것을 또 다른 목적으로 한다.Another object of the present invention is to restore a clear color image without noise by adding color information of a color image from which noise has been removed with reference to a disparity map to a monochrome image from which noise has been removed.

또한, 본 발명은 오클루션 맵을 이용하여 부정확한 스테레오 매칭으로 인해 발생할 수 있는 색 번짐 현상을 방지하는 것을 또 다른 목적으로 한다.Another object of the present invention is to prevent color bleeding that may occur due to inaccurate stereo matching using an occlusion map.

상기한 바와 같은 본 발명의 목적을 달성하고, 후술하는 본 발명의 특징적인 효과를 실현하기 위한, 본 발명의 특징적인 구성은 하기와 같다.In order to achieve the object of the present invention as described above and to realize the characteristic effects of the present invention to be described later, the characteristic configuration of the present invention is as follows.

본 발명의 일 태양에 따르면, 컬러 이미지와 모노크롬(monochrome) 이미지를 이용하여 스테레오 매칭(stereo matching)을 수행하는 방법에 있어서, (a) 스테레오 카메라를 구성하는 컬러 카메라에 대응되는 학습용 컬러 이미지와 상기 스테레오 카메라를 구성하는 모노크롬 카메라에 대응되는 학습용 모노크롬 이미지를 포함하는 적어도 하나의 학습용 컬러 및 모노크롬 스테레오 페어가 획득되면, 학습 장치가, (i) 상기 학습용 컬러 및 모노크롬 스테레오 페어를 디스패리티 네트워크(disparity network)에 입력하여, 상기 디스패리티 네트워크로 하여금, 상기 학습용 컬러 및 모노크롬 스테레오 페어의 상기 학습용 컬러 이미지와 상기 학습용 모노크롬 이미지 사이의 매칭 코스트에 따라 상기 학습용 모노크롬 이미지에 대응하는 학습용 디스패리티 맵을 생성하도록 하고, (ii) 상기 학습용 컬러 및 모노크롬 스테레오 페어를 오클루션 네트워크(occlusion network)에 입력하여, 상기 오클루션 네트워크로 하여금 상기 학습용 컬러 이미지와 상기 학습용 모노크롬 이미지 사이의 디퍼런스(difference)에 따른 학습용 오클루션 맵(occlusion map)을 생성하도록 하며, (iii) 상기 학습용 컬러 및 모노크롬 스테레오 페어를 디노이징 네트워크(denoising network)에 입력하여, 상기 디노이징 네트워크로 하여금 상기 학습용 컬러 이미지 및 상기 학습용 모노크롬 이미지 각각에서의 노이즈를 제거한 학습용 디노이즈드(denoised) 모노크롬 이미지 및 학습용 디노이즈드 컬러 이미지를 생성하도록 하는 단계; (b) 상기 학습 장치가, (i) (i-1) 상기 학습용 디스패리티 맵을 참조하여 상기 학습용 디노이즈드 컬러 이미지를 와핑(warping)하여 학습용 와프드(warped) 디노이즈드 컬러 이미지를 생성하고, (i-2) 상기 학습용 와프드 디노이즈드 컬러 이미지와 상기 학습용 디노이즈드 모노크롬 이미지를 컨캐터네이션(concatenation)하여 학습용 초기 컬러 복원 이미지를 생성하며, (ii) 상기 학습용 초기 컬러 복원 이미지, 상기 학습용 오클루션 맵 및 상기 학습용 모노크롬 이미지를 컬러화 네트워크(colorization network)에 입력하여, 상기 컬러화 네트워크로 하여금, 상기 오클루션 맵을 참조하여 상기 학습용 초기 컬러 복원 이미지의 컬러 정보에 따라 상기 학습용 모노크롬 이미지의 컬러를 복원하여 학습용 최종 컬러 복원 이미지를 생성하도록 하는 단계; 및 (c) 상기 학습 장치가, 상기 학습용 디스패리티 맵과 디스패리티 GT(ground truth) 맵을 참조하여 생성한 제1 로스, 상기 학습용 오클루션 맵과 오클루션 GT 맵을 참조하여 생성한 제2 로스, 상기 학습용 디노이즈드 모노크롬 이미지 및 상기 학습용 디노이즈드 컬러 이미지와 이에 대응하는 디노이즈드 GT 이미지를 참조하여 생성한 제3 로스 및 상기 학습용 최종 컬러 복원 이미지의 학습용 복원 크로미넌스(chrominance) 값과 이에 대응하는 GT 크로미넌스 값을 참조하여 생성한 제4 로스에 각각 가중치를 부여하여 생성한 합산 로스를 이용하여 상기 디스패리티 네트워크, 상기 오클루션 네트워크, 상기 디노이징 네트워크 및 상기 컬러화 네트워크에 포함된 적어도 하나의 파라미터를 학습시키는 단계;를 포함하는 방법이 개시된다. According to an aspect of the present invention, in a method for performing stereo matching using a color image and a monochrome image, (a) a color image for learning corresponding to a color camera constituting a stereo camera and the When at least one learning color and monochrome stereo pair including a learning monochrome image corresponding to the monochrome camera constituting the stereo camera is obtained, the learning apparatus (i) converts the learning color and monochrome stereo pair into a disparity network (disparity network) ) to cause the disparity network to generate a training disparity map corresponding to the training monochrome image according to a matching cost between the training color image and the training monochrome image of the training color and monochrome stereo pair, and , (ii) input the color and monochrome stereo pair for training into an occlusion network to cause the occlusion network to perform occlusion for training according to the difference between the training color image and the training monochrome image generate an occlusion map, and (iii) input the training color and monochrome stereo pairs into a denoising network, causing the denoising network to generate an occlusion map in each of the training color image and the training monochrome image. generating a denoised monochrome image for learning and a denoised color image for learning from which noise has been removed; (b) the learning device warps the denoised color image for learning with reference to (i) (i-1) the disparity map for learning to generate a warped denoised color image for learning and (i-2) concatenating the warped denoised color image for training and the denoised monochrome image for training to generate an initial color restored image for learning, (ii) the initial color restored image for training , input the learning occlusion map and the learning monochrome image to a colorization network, causing the colorization network to refer to the occlusion map and according to the color information of the initial color restoration image for learning, the learning monochrome image restoring the color of to generate a final color restored image for learning; and (c) a first loss generated by the learning device by referring to the learning disparity map and a disparity GT (ground truth) map, and a second loss generated by referring to the learning occlusion map and the occlusion GT map. , a third loss generated by referring to the denoised monochrome image for learning, the denoised color image for learning, and a corresponding denoised GT image, and a restored chrominance value for learning of the final color restored image for learning The disparity network, the occlusion network, the denoising network, and the colorization network are included in the disparity network, the occlusion network, and the colorization network using the summation loss generated by giving weight to the fourth loss generated by referring to the GT chrominance value and the corresponding GT chrominance value. A method comprising a; learning the at least one parameter is disclosed.

일례로서, 상기 (a) 단계에서, 상기 학습 장치는, 상기 학습용 모노크롬 이미지와 상기 학습용 컬러 이미지를 디스패리티 네트워크에 입력하여, 상기 디스패리티 네트워크로 하여금, (i) 상기 디스패리티 네트워크의 인코더(encoder)에 포함된 제1_1 서브 인코더를 통해 상기 학습용 모노크롬 이미지를 인코딩하여 상기 학습용 모노크롬 이미지에 대한 학습용 제1_1 피쳐 맵(feature map)을 생성하고, 상기 디스패리티 네트워크의 상기 인코더에 포함된 제1_2 서브 인코더를 통해 상기 학습용 컬러 이미지를 인코딩하여 상기 학습용 컬러 이미지에 대한 학습용 제1_2 피쳐 맵을 생성하도록 하며, (ii) 상기 디스패리티 네트워크의 상기 인코더에 포함된 코릴레이션 레이어(correlation layer)를 통해 상기 학습용 제1_1 피쳐 맵과 상기 학습용 제1_2 피쳐 맵에 코릴레이션 연산을 수행하여 상기 매칭 코스트를 연산함으로써 상기 학습용 컬러 및 모노크롬 스테레오 페어에 대응되는 학습용 코릴레이션 피쳐 맵을 생성하고, (iii) 상기 디스패리티 네트워크의 상기 인코더에 포함된 제2 서브 인코더를 통해 상기 학습용 코릴레이션 피쳐 맵을 인코딩하여 학습용 제2 피쳐 맵을 생성하도록 하며, (iv) 상기 디스패리티 네트워크의 디코더(decoder)를 통해 상기 학습용 제2 피쳐 맵을 디코딩하여 상기 학습용 디스패리티 맵을 생성하도록 하는 것을 특징으로 하는 방법이 개시된다. As an example, in step (a), the learning apparatus inputs the learning monochrome image and the learning color image to a disparity network, causing the disparity network to perform (i) an encoder of the disparity network ) encodes the monochromatic image for training through the 1_1 sub-encoder included in ) to generate a 1_1 feature map for training with respect to the monochromatic image for training, and 1_2 sub-encoder included in the encoder of the disparity network to generate a first_2 feature map for training for the color image for training by encoding the color image for training through By performing a correlation operation on the 1_1 feature map and the first_2 feature map for learning and calculating the matching cost, a correlation feature map for learning corresponding to the color and monochrome stereo pair for learning is generated, and (iii) of the disparity network. A second feature map for learning is generated by encoding the correlation feature map for learning through a second sub-encoder included in the encoder, and (iv) the second feature map for learning through a decoder of the disparity network. A method characterized in that by decoding to generate the disparity map for learning is disclosed.

일례로서, 상기 학습 장치는, 상기 디스패리티 네트워크로 하여금, 상기 코릴레이션 연산으로서 상기 학습용 제1_1 피쳐 맵 상의 적어도 하나의 제1 영역에 포함된 적어도 하나의 제1 패치에 대응하는 학습용 제1 피쳐 벡터와 상기 제1 영역에 대응되는 상기 학습용 제1_2 피쳐 맵 상의 적어도 하나의 제2 영역에 포함된 적어도 하나의 제2 패치에 대응하는 학습용 제2 피쳐 벡터를 내적 연산함으로써 상기 학습용 코릴레이션 피쳐 맵을 생성하도록 하는 것을 특징으로 하는 방법이 개시된다. As an example, the learning apparatus causes the disparity network to perform, as the correlation operation, a first feature vector for learning corresponding to at least one first patch included in at least one first area on the first_1 feature map for learning. and a second feature vector for learning corresponding to at least one second patch included in at least one second region on the first_2 feature map for learning corresponding to the first region to generate the correlation feature map for learning Disclosed is a method characterized in that to do so.

일례로서, 상기 학습 장치는, 상기 디스패리티 네트워크로 하여금, (i) 상기 제1_1 서브 인코더, 상기 제1_2 서브 인코더 및 상기 제2 서브 인코더에 포함된 학습용 제1 인터미디어트 피쳐 맵(intermediate feature map) - 상기 학습용 제1 인터미디어트 피쳐 맵은 상기 학습용 제1_1 피쳐 맵, 상기 학습용 제1_2 피쳐 맵 및 상기 학습용 제2 피쳐 맵을 포함함 - 을 생성하는 적어도 하나의 인코딩 레이어 중 적어도 일부인 소정의 인코딩 레이어에서 생성되는 소정의 학습용 제1 인터미디어트 피쳐 맵을 상기 디코더에 포함되어 상기 학습용 제1 인터미디어트 피쳐 맵에 대응하는 학습용 제2 인터미디어트 피쳐 맵을 생성하는 적어도 하나의 디코딩 레이어 중 상기 소정의 인코딩 레이어에 대응되는 소정의 디코딩 레이어에서 생성되는 소정의 학습용 제2 인터미디어트 피쳐 맵과 컨캐터네이션(concatenation)하여 적어도 하나의 학습용 컨캐터네이션 피쳐 맵을 생성하고, 상기 학습용 컨캐터네이션 피쳐 맵 각각을 상기 학습용 컨캐터네이션 피쳐 맵을 생성하는데 사용된 상기 소정의 학습용 제2 인터미디어트 피쳐 맵에 대응되는 각각의 상기 소정의 디코딩 레이어의 후속 레이어에 입력하도록 하는 것을 특징으로 하는 방법이 개시된다. As an example, the learning apparatus causes the disparity network to perform (i) a first intermediate feature map for learning included in the first_1 sub-encoder, the first_2 sub-encoder, and the second sub-encoder. ), wherein the first intermediate feature map for learning includes the first_1 feature map for learning, the first_2 feature map for learning and the second feature map for learning. Among the at least one decoding layer, a predetermined first intermediate feature map for learning generated in a layer is included in the decoder to generate a second intermediate feature map for learning corresponding to the first intermediate feature map for learning. At least one concatenation feature map for learning is generated by concatenation with a second intermediate feature map for learning generated in a predetermined decoding layer corresponding to a predetermined encoding layer, and the concatenation for learning A method characterized in that inputting each feature map into a subsequent layer of each of the predetermined decoding layers corresponding to the predetermined second intermediate feature map for learning used to generate the concatenation feature map for learning is initiated

일례로서, 상기 학습 장치는, (i) 상기 학습용 디스패리티 맵을 참조하여 상기 학습용 컬러 이미지를 와핑하여 생성한 학습용 와프드(warped) 컬러 이미지 상의 각 픽셀의 픽셀값과 상기 학습용 모노크롬 이미지 상의 각 픽셀의 픽셀값 간의 차이를 계산한 상기 학습용 디퍼런스 맵을 생성하고, (ii) 상기 학습용 디스패리티 맵을 생성하는데 사용한 상기 학습용 제2 피쳐 맵과 상기 학습용 디퍼런스 맵을 상기 오클루션 네트워크에 입력하여, 상기 오클루션 네트워크로 하여금 상기 학습용 디퍼런스 맵에서 추출된 피쳐와 상기 학습용 제2 피쳐 맵의 피쳐를 참조하여 상기 학습용 와프드 컬러 이미지와 상기 학습용 모노크롬 이미지 간의 오클루션을 각 픽셀에 대한 바이너리 값으로 표시한 상기 학습용 오클루션 맵을 생성하도록 하는 것을 특징으로 하는 방법이 개시된다.As an example, the learning apparatus includes: (i) a pixel value of each pixel on a warped color image for training generated by warping the color image for training with reference to the disparity map for training, and each pixel on the monochrome image for training generating the difference map for learning by calculating the difference between the pixel values of Let the occlusion network display the occlusion between the warped color image for training and the monochrome image for training by referring to the features extracted from the learning difference map and the features of the second learning feature map as binary values for each pixel. Disclosed is a method characterized in that it generates the occlusion map for learning.

일례로서, 상기 학습 장치는, 상기 오클루션 네트워크로 하여금, (i) 상기 오클루션 네트워크의 인코더를 통해 상기 학습용 디퍼런스 맵을 인코딩하여 학습용 제3 피쳐 맵을 생성하도록 하고, (ii) 상기 오클루션 네트워크의 디코더를 통해 상기 학습용 제3 피쳐 맵과 상기 학습용 제2 피쳐 맵을 컨캐터네이션(concatenation)한 다음 디코딩하여 상기 학습용 오클루션 맵을 생성하도록 하는 것을 특징으로 하는 방법이 개시된다.As an example, the learning device causes the occlusion network to (i) encode the difference map for learning through an encoder of the occlusion network to generate a third feature map for learning, (ii) the occlusion Disclosed is a method characterized in that the third feature map for learning and the second feature map for learning are concatenated through a decoder of a network and then decoded to generate the occlusion map for learning.

일례로서, 상기 (a) 단계에서, 상기 학습 장치는, 상기 디노이징 네트워크로 하여금, 상기 학습용 모노크롬 이미지와 상기 학습용 컬러 이미지 각각에 적어도 하나의 컨벌루션(convolution) 연산, 적어도 하나의 배치 정규화(batch normalization) 연산 및 적어도 하나의 ReLU 연산을 수행하여 상기 학습용 모노크롬 이미지와 상기 학습용 컬러 이미지 각각으로부터 상기 학습용 모노크롬 이미지와 상기 학습용 컬러 이미지 각각에 대응하는 학습용 레이턴트 노이즈리스 이미지(latent noiseless image)를 제거한 출력을 예측한 학습용 잔차 이미지(residual image)를 획득하고, 상기 학습용 모노크롬 이미지와 상기 학습용 컬러 이미지 각각에 대응하는 상기 학습용 잔차 이미지를 참조하여 상기 학습용 모노크롬 이미지와 상기 학습용 컬러 이미지로부터 상기 노이즈를 제거하여 상기 학습용 디노이즈드 모노크롬 이미지와 상기 학습용 디노이즈드 컬러 이미지를 생성하도록 하는 것을 특징으로 하는 방법이 개시된다.As an example, in step (a), the learning device causes the denoising network to perform at least one convolution operation on each of the learning monochrome image and the learning color image, and at least one batch normalization ) operation and at least one ReLU operation to remove the latent noiseless image for learning corresponding to each of the learning monochrome image and the learning color image from each of the learning monochrome image and the learning color image. Acquire a predicted residual image for training, and remove the noise from the training monochrome image and the training color image by referring to the training residual image corresponding to each of the training monochrome image and the training color image. Disclosed is a method comprising generating a denoised monochrome image and a denoised color image for training.

일례로서, 상기 (b) 단계에서, 상기 학습 장치는, 상기 컬러화 네트워크로 하여금, (i) 상기 학습용 오클루션 맵의 오클루션 영역 정보를 참조하여 상기 학습용 초기 컬러 복원 이미지로부터 추출된 적어도 하나의 픽셀 피쳐값에 대한 컬러 블리딩(color bleeding) 오류를 교정하여 교정된 픽셀 피쳐값을 생성하고, (ii) 레퍼런스 이미지(reference image)로 사용되는 상기 학습용 모노크롬 이미지 상에서 적어도 하나의 엣지로 구분된 각 영역의 컬러를 복원하기 위한 상기 각 영역의 컬러 씨드(seed)로써 상기 교정된 픽셀 피쳐값 중 적어도 일부를 이용하여 상기 학습용 모노크롬 이미지 상의 상기 각 픽셀에 대한 최종 컬러 정보를 복원하여 상기 학습용 최종 컬러 복원 이미지를 생성하도록 하는 것을 특징으로 하는 방법이 개시된다. As an example, in step (b), the learning device causes the colorization network to (i) at least one pixel extracted from the initial color restoration image for training with reference to the occlusion region information of the occlusion map for training. Correcting a color bleeding error for a feature value to generate a corrected pixel feature value, and (ii) each region separated by at least one edge on the learning monochrome image used as a reference image. By using at least a part of the corrected pixel feature values as a color seed of each region for color restoration, the final color information for each pixel on the learning monochrome image is restored to obtain the final color restoration image for training. Disclosed is a method characterized in that it is caused to generate.

일례로서, 상기 (b) 단계에서, 상기 학습 장치는, 상기 컬러화 네트워크로 하여금, (i) 상기 컬러화 네트워크의 인코더(encoder)를 이용하여 상기 학습용 초기 컬러 복원 이미지, 상기 학습용 오클루션 맵 및 상기 학습용 모노크롬 이미지를 컨캐터네이션(concatenation)한 다음 인코딩하여 학습용 제4 피쳐 맵을 생성하도록 하며, (ii) 상기 컬러화 네트워크의 디코더(decoder)를 이용하여 상기 학습용 제4 피쳐 맵을 디코딩하여 상기 학습용 최종 컬러 복원 이미지를 생성하도록 하는 것을 특징으로 하는 방법이 개시된다. As an example, in step (b), the learning device causes the colorization network to (i) use an encoder of the colorization network to obtain the initial color restoration image for learning, the occlusion map for learning, and the learning use The monochrome image is concatenated and then encoded to generate a fourth feature map for training, and (ii) the fourth feature map for training is decoded using a decoder of the colorization network to decode the final color for training. Disclosed is a method comprising generating a reconstructed image.

일례로서, 상기 (b) 단계에서, 상기 학습 장치는, 상기 학습용 와프드 디노이즈드 모노크롬 이미지와 상기 학습용 디노이즈드 컬러 이미지를 채널 방향으로 컨캐터네이션하여, 상기 학습용 와프드 디노이즈드 컬러 이미지의 컬러 정보가 크로미넌스(chrominance) 채널인 U 채널과 V 채널을 구성하고 상기 학습용 디노이즈드 모노크롬 이미지의 휘도 정보가 루미넌스(luminance) 채널인 Y 채널을 구성하는 상기 학습용 초기 컬러 복원 이미지를 생성하는 것을 특징으로 하는 방법이 개시된다. As an example, in step (b), the learning device concatenates the warped denoised monochrome image for training and the denoised color image for training in a channel direction, and the warped denoised color image for training The color information of the chrominance channel constitutes the U channel and the V channel, and the luminance information of the denoised monochrome image for training constitutes the Y channel, which is the luminance channel. Disclosed is a method characterized in that

본 발명의 다른 태양에 따르면, 컬러 이미지와 모노크롬(monochrome) 이미지를 이용하여 스테레오 매칭(stereo matching)을 수행하는 방법에 있어서, (a) 스테레오 카메라를 구성하는 컬러 카메라에 대응되는 학습용 컬러 이미지와 상기 스테레오 카메라를 구성하는 모노크롬 카메라에 대응되는 학습용 모노크롬 이미지를 포함하는 적어도 하나의 학습용 컬러 및 모노크롬 스테레오 페어가 획득되면, 학습 장치가, (1) (i) 상기 학습용 컬러 및 모노크롬 스테레오 페어를 디스패리티 네트워크(disparity network)에 입력하여, 상기 디스패리티 네트워크로 하여금, 상기 학습용 컬러 및 모노크롬 스테레오 페어의 상기 학습용 컬러 이미지와 상기 학습용 모노크롬 이미지 사이의 매칭 코스트에 따라 상기 학습용 모노크롬 이미지에 대응하는 학습용 디스패리티 맵을 생성하도록 하고, (ii) 상기 학습용 컬러 및 모노크롬 스테레오 페어를 오클루션 네트워크(occlusion network)에 입력하여, 상기 오클루션 네트워크로 하여금 상기 학습용 컬러 이미지와 상기 학습용 모노크롬 이미지 사이의 디퍼런스(difference)에 따른 학습용 오클루션 맵(occlusion map)을 생성하도록 하며, (iii) 상기 학습용 컬러 및 모노크롬 스테레오 페어를 디노이징 네트워크(denoising network)에 입력하여, 상기 디노이징 네트워크로 하여금 상기 학습용 컬러 이미지 및 상기 학습용 모노크롬 이미지 각각에서의 노이즈를 제거한 학습용 디노이즈드(denoised) 모노크롬 이미지 및 학습용 디노이즈드 컬러 이미지를 생성하도록 하는 프로세스, (2) (i) (i-1) 상기 학습용 디스패리티 맵을 참조하여 상기 학습용 디노이즈드 컬러 이미지를 와핑(warping)하여 학습용 와프드(warped) 디노이즈드 컬러 이미지를 생성하고, (i-2) 상기 학습용 와프드 디노이즈드 컬러 이미지와 상기 학습용 디노이즈드 모노크롬 이미지를 컨캐터네이션(concatenation)하여 학습용 초기 컬러 복원 이미지를 생성하며, (ii) 상기 학습용 초기 컬러 복원 이미지, 상기 학습용 오클루션 맵 및 상기 학습용 모노크롬 이미지를 컬러화 네트워크(colorization network)에 입력하여, 상기 컬러화 네트워크로 하여금, 상기 오클루션 맵을 참조하여 상기 학습용 초기 컬러 복원 이미지의 컬러 정보에 따라 상기 학습용 모노크롬 이미지의 컬러를 복원하여 학습용 최종 컬러 복원 이미지를 생성하도록 하는 프로세스, 및 (3) 상기 학습용 디스패리티 맵과 디스패리티 GT(ground truth) 맵을 참조하여 생성한 제1 로스, 상기 학습용 오클루션 맵과 오클루션 GT 맵을 참조하여 생성한 제2 로스, 상기 학습용 디노이즈드 모노크롬 이미지 및 상기 학습용 디노이즈드 컬러 이미지와 이에 대응하는 디노이즈드 GT 이미지를 참조하여 생성한 제3 로스 및 상기 학습용 최종 컬러 복원 이미지의 학습용 복원 크로미넌스(chrominance) 값과 이에 대응하는 GT 크로미넌스 값을 참조하여 생성한 제4 로스에 각각 가중치를 부여하여 생성한 합산 로스를 이용하여 상기 디스패리티 네트워크, 상기 오클루션 네트워크, 상기 디노이징 네트워크 및 상기 컬러화 네트워크에 포함된 적어도 하나의 파라미터를 학습시키는 프로세스를 수행한 상태에서, 테스트 장치가, 상기 스테레오 카메라를 구성하는 상기 컬러 카메라에 대응되는 테스트용 컬러 이미지와 상기 스테레오 카메라를 구성하는 상기 모노크롬 카메라에 대응되는 테스트용 모노크롬 이미지를 포함하는 적어도 하나의 테스트용 컬러 및 모노크롬 스테레오 페어를 획득하는 단계; (b) 상기 테스트 장치가, (i) 상기 테스트용 컬러 및 모노크롬 스테레오 페어를 상기 디스패리티 네트워크에 입력하여, 상기 디스패리티 네트워크로 하여금, 상기 테스트용 컬러 및 모노크롬 스테레오 페어의 상기 테스트용 컬러 이미지와 상기 테스트용 모노크롬 이미지 사이의 매칭 코스트에 따라 상기 테스트용 모노크롬 이미지에 대응하는 테스트용 디스패리티 맵을 생성하도록 하고, (ii) 상기 테스트용 컬러 및 모노크롬 스테레오 페어를 상기 오클루션 네트워크에 입력하여, 상기 오클루션 네트워크로 하여금 상기 테스트용 컬러 이미지와 상기 테스트용 모노크롬 이미지 사이의 디퍼런스에 따른 테스트용 오클루션 맵을 생성하도록 하며, (iii) 상기 테스트용 컬러 및 모노크롬 스테레오 페어를 상기 디노이징 네트워크에 입력하여, 상기 디노이징 네트워크로 하여금 상기 테스트용 컬러 이미지 및 상기 테스트용 모노크롬 이미지 각각에서의 노이즈를 제거한 테스트용 디노이즈드 모노크롬 이미지 및 테스트용 디노이즈드 컬러 이미지를 생성하도록 하는 단계; 및 (c) 상기 테스트 장치가, (i) (i-1) 상기 테스트용 디스패리티 맵을 참조하여 상기 테스트용 디노이즈드 컬러 이미지를 와핑하여 테스트용 와프드 디노이즈드 컬러 이미지를 생성하고, (i-2) 상기 테스트용 와프드 디노이즈드 컬러 이미지와 상기 테스트용 디노이즈드 모노크롬 이미지를 컨캐터네이션하여 테스트용 초기 컬러 복원 이미지를 생성하며, (ii) 상기 테스트용 초기 컬러 복원 이미지, 상기 테스트용 오클루션 맵 및 상기 테스트용 모노크롬 이미지를 상기 컬러화 네트워크에 입력하여, 상기 컬러화 네트워크로 하여금, 상기 오클루션 맵을 참조하여 상기 테스트용 초기 컬러 복원 이미지의 컬러 정보에 따라 상기 테스트용 모노크롬 이미지의 컬러를 복원하여 테스트용 최종 컬러 복원 이미지를 생성하도록 하는 단계;를 포함하는 방법이 개시된다.According to another aspect of the present invention, in a method for performing stereo matching using a color image and a monochrome image, (a) a color image for learning corresponding to a color camera constituting a stereo camera and the When at least one learning color and monochrome stereo pair including a learning monochrome image corresponding to the monochrome camera constituting the stereo camera is obtained, the learning device (1) (i) converts the learning color and monochrome stereo pair into a disparity network (disparity network) to cause the disparity network to generate a learning disparity map corresponding to the learning monochrome image according to a matching cost between the learning color image and the learning monochrome image of the learning color and monochrome stereo pair (ii) input the color and monochrome stereo pair for training into an occlusion network, so that the occlusion network causes the training color image and the training monochrome image according to the difference generate an occlusion map for training, and (iii) input the training color and monochrome stereo pair into a denoising network, causing the denoising network to cause the training color image and the training monochrome image. A process of generating a denoised monochrome image for training and a denoised color image for training from which noise has been removed from each, (2) (i) (i-1) Referring to the disparity map for training, the Warped the noised color image to generate a warped denoised color image for training, (i-2) convert the warped denoised color image for training and the denoised monochrome image for training by concatenation generating an initial color restored image for training, (ii) inputting the initial color restored image for training, the training occlusion map, and the training monochrome image to a colorization network, causing the colorization network to perform the occlusion map A process of restoring the color of the monochrome image for training according to the color information of the initial color restored image for training with reference to generate a final color restored image for training, and (3) the training disparity map and disparity GT (ground truth) ) the first loss generated by referring to the map, the second loss generated by referring to the occlusion map for learning and the occlusion GT map, the denoised monochrome image for learning, the denoised color image for learning, and the corresponding data loss Weights are respectively applied to the third loss generated by referring to the noised GT image, the restored chrominance value for training of the final color restored image for training, and the fourth loss generated by referring to the corresponding GT chrominance value. In a state in which the process of learning at least one parameter included in the disparity network, the occlusion network, the denoising network, and the colorization network is performed using the summation loss generated by giving the test apparatus, the stereo obtaining at least one test color and monochrome stereo pair including a test color image corresponding to the color camera constituting a camera and a test monochrome image corresponding to the monochrome camera constituting the stereo camera; (b) the test device (i) inputs the test color and monochrome stereo pair to the disparity network, so that the disparity network causes the test color image and the test color image of the test color and monochrome stereo pair; generate a test disparity map corresponding to the test monochrome image according to a matching cost between the test monochrome images, (ii) input the test color and monochrome stereo pair into the occlusion network, and cause the occlusion network to generate a test occlusion map according to a difference between the test color image and the test monochrome image, and (iii) input the test color and monochrome stereo pair into the denoising network. causing the denoising network to generate a denoised monochrome image for testing and a denoised color image for testing from which noises are removed from each of the color image for testing and the monochrome image for testing; and (c) the test device generates a warped denoised color image for testing by (i) (i-1) warping the denoised color image for testing with reference to the disparity map for testing, (i-2) concatenating the warped denoised color image for the test and the denoised monochrome image for the test to generate an initial color restored image for testing, (ii) the initial color restored image for the test; The test occlusion map and the test monochrome image are input to the colorization network, so that the colorization network refers to the occlusion map and according to the color information of the initial color restoration image for test, the test monochrome image A method comprising the steps of restoring the color of to generate a final color restored image for testing is disclosed.

일례로서, 상기 (b) 단계에서, 상기 테스트 장치는, 상기 테스트용 모노크롬 이미지와 상기 테스트용 컬러 이미지를 디스패리티 네트워크에 입력하여, 상기 디스패리티 네트워크로 하여금, (i) 상기 디스패리티 네트워크의 인코더(encoder)에 포함된 제1_1 서브 인코더를 통해 상기 테스트용 모노크롬 이미지를 인코딩하여 상기 테스트용 모노크롬 이미지에 대한 테스트용 제1_1 피쳐 맵(feature map)을 생성하고, 상기 디스패리티 네트워크의 상기 인코더에 포함된 제1_2 서브 인코더를 통해 상기 테스트용 컬러 이미지를 인코딩하여 상기 테스트용 컬러 이미지에 대한 테스트용 제1_2 피쳐 맵을 생성하도록 하며, (ii) 상기 디스패리티 네트워크의 상기 인코더에 포함된 코릴레이션 레이어(correlation layer)를 통해 상기 테스트용 제1_1 피쳐 맵과 상기 테스트용 제1_2 피쳐 맵에 코릴레이션 연산을 수행하여 상기 매칭 코스트를 연산함으로써 상기 테스트용 컬러 및 모노크롬 스테레오 페어에 대응되는 테스트용 코릴레이션 피쳐 맵을 생성하고, (iii) 상기 디스패리티 네트워크의 상기 인코더에 포함된 제2 서브 인코더를 통해 상기 테스트용 코릴레이션 피쳐 맵을 인코딩하여 테스트용 제2 피쳐 맵을 생성하도록 하며, (iv) 상기 디스패리티 네트워크의 디코더(decoder)를 통해 상기 테스트용 제2 피쳐 맵을 디코딩하여 상기 테스트용 디스패리티 맵을 생성하도록 하는 것을 특징으로 하는 방법이 개시된다. As an example, in step (b), the test apparatus inputs the monochrome image for testing and the color image for testing to a disparity network, and causes the disparity network to (i) an encoder of the disparity network Encodes the test monochrome image through the 1_1 sub-encoder included in (encoder) to generate a test 1_1 feature map for the test monochrome image, and includes it in the encoder of the disparity network encoding the test color image through the first_2 sub-encoder to generate a test first_2 feature map for the test color image, (ii) a correlation layer included in the encoder of the disparity network ( A correlation layer for testing corresponding to the color and monochrome stereo pair for testing by performing a correlation operation on the first_1 feature map for testing and the first_2 feature map for testing through a correlation layer) and calculating the matching cost (iii) encoding the correlation feature map for testing through a second sub-encoder included in the encoder of the disparity network to generate a second feature map for testing, (iv) the disparity network A method is disclosed, comprising decoding the second feature map for testing through a decoder of a network to generate the disparity map for testing.

일례로서, 상기 테스트 장치는, 상기 디스패리티 네트워크로 하여금, 상기 코릴레이션 연산으로서 상기 테스트용 제1_1 피쳐 맵 상의 적어도 하나의 제1 영역에 포함된 적어도 하나의 제1 패치에 대응하는 테스트용 제1 피쳐 벡터와 상기 제1 영역에 대응되는 상기 테스트용 제1_2 피쳐 맵 상의 적어도 하나의 제2 영역에 포함된 적어도 하나의 제2 패치에 대응하는 테스트용 제2 피쳐 벡터를 내적 연산함으로써 상기 테스트용 코릴레이션 피쳐 맵을 생성하도록 하는 것을 특징으로 하는 방법이 개시된다. As an example, the test apparatus causes the disparity network to perform a test first patch corresponding to at least one first patch included in at least one first area on the test first_1 feature map as the correlation operation. The test nose by performing a dot product operation on a feature vector and a second feature vector for testing corresponding to at least one second patch included in at least one second region on the first_2 feature map for testing corresponding to the first region Disclosed is a method comprising generating a relation feature map.

일례로서, 상기 테스트 장치는, (i) 상기 테스트용 디스패리티 맵을 참조하여 상기 테스트용 컬러 이미지를 와핑하여 생성한 테스트용 와프드(warped) 컬러 이미지 상의 각 픽셀의 픽셀값과 상기 테스트용 모노크롬 이미지 상의 각 픽셀의 픽셀값 간의 차이를 계산한 상기 테스트용 디퍼런스 맵을 생성하고, (ii) 상기 테스트용 디스패리티 맵을 생성하는데 사용한 상기 테스트용 제2 피쳐 맵과 상기 테스트용 디퍼런스 맵을 상기 오클루션 네트워크에 입력하여, 상기 오클루션 네트워크로 하여금 상기 테스트용 디퍼런스 맵에서 추출된 피쳐와 상기 테스트용 제2 피쳐 맵의 피쳐를 참조하여 상기 테스트용 와프드 컬러 이미지와 상기 테스트용 모노크롬 이미지 간의 오클루션을 각 픽셀에 대한 바이너리 값으로 표시한 상기 테스트용 오클루션 맵을 생성하도록 하는 것을 특징으로 하는 방법이 개시된다.As an example, the test device includes: (i) a pixel value of each pixel on a warped color image for testing generated by warping the color image for testing with reference to the disparity map for testing and the monochrome for testing generating the difference map for the test by calculating the difference between the pixel values of each pixel on the image; Inputs to the occlusion network, causing the occlusion network to refer to the features extracted from the difference map for testing and features of the second feature map for testing, and the warped color image for testing and the monochrome image for testing Disclosed is a method characterized by generating the occlusion map for the test in which the occlusion of the liver is expressed as a binary value for each pixel.

본 발명의 또 다른 태양에 따르면, 컬러 이미지와 모노크롬(monochrome) 이미지를 이용하여 스테레오 매칭(stereo matching)을 수행하는 학습 장치에 있어서, 컬러 이미지와 모노크롬 이미지를 이용하여 스테레오 매칭을 수행하기 위한 인스트럭션들이 저장된 메모리; 및 상기 메모리에 저장된 상기 인스트럭션들에 따라 컬러 이미지와 모노크롬 이미지를 이용하여 스테레오 매칭을 수행하기 위한 동작을 수행하는 프로세서;를 포함하되, 상기 프로세서는, (I) 스테레오 카메라를 구성하는 컬러 카메라에 대응되는 학습용 컬러 이미지와 상기 스테레오 카메라를 구성하는 모노크롬 카메라에 대응되는 학습용 모노크롬 이미지를 포함하는 적어도 하나의 학습용 컬러 및 모노크롬 스테레오 페어가 획득되면, (i) 상기 학습용 컬러 및 모노크롬 스테레오 페어를 디스패리티 네트워크(disparity network)에 입력하여, 상기 디스패리티 네트워크로 하여금, 상기 학습용 컬러 및 모노크롬 스테레오 페어의 상기 학습용 컬러 이미지와 상기 학습용 모노크롬 이미지 사이의 매칭 코스트에 따라 상기 학습용 모노크롬 이미지에 대응하는 학습용 디스패리티 맵을 생성하도록 하고, (ii) 상기 학습용 컬러 및 모노크롬 스테레오 페어를 오클루션 네트워크(occlusion network)에 입력하여, 상기 오클루션 네트워크로 하여금 상기 학습용 컬러 이미지와 상기 학습용 모노크롬 이미지 사이의 디퍼런스(difference)에 따른 학습용 오클루션 맵(occlusion map)을 생성하도록 하며, (iii) 상기 학습용 컬러 및 모노크롬 스테레오 페어를 디노이징 네트워크(denoising network)에 입력하여, 상기 디노이징 네트워크로 하여금 상기 학습용 컬러 이미지 및 상기 학습용 모노크롬 이미지 각각에서의 노이즈를 제거한 학습용 디노이즈드(denoised) 모노크롬 이미지 및 학습용 디노이즈드 컬러 이미지를 생성하도록 하는 프로세스, (II) (i) (i-1) 상기 학습용 디스패리티 맵을 참조하여 상기 학습용 디노이즈드 컬러 이미지를 와핑(warping)하여 학습용 와프드(warped) 디노이즈드 컬러 이미지를 생성하고, (i-2) 상기 학습용 와프드 디노이즈드 컬러 이미지와 상기 학습용 디노이즈드 모노크롬 이미지를 컨캐터네이션(concatenation)하여 학습용 초기 컬러 복원 이미지를 생성하며, (ii) 상기 학습용 초기 컬러 복원 이미지, 상기 학습용 오클루션 맵 및 상기 학습용 모노크롬 이미지를 컬러화 네트워크(colorization network)에 입력하여, 상기 컬러화 네트워크로 하여금, 상기 오클루션 맵을 참조하여 상기 학습용 초기 컬러 복원 이미지의 컬러 정보에 따라 상기 학습용 모노크롬 이미지의 컬러를 복원하여 학습용 최종 컬러 복원 이미지를 생성하도록 하는 프로세스, 및 (III) 상기 학습용 디스패리티 맵과 디스패리티 GT(ground truth) 맵을 참조하여 생성한 제1 로스, 상기 학습용 오클루션 맵과 오클루션 GT 맵을 참조하여 생성한 제2 로스, 상기 학습용 디노이즈드 모노크롬 이미지 및 상기 학습용 디노이즈드 컬러 이미지와 이에 대응하는 디노이즈드 GT 이미지를 참조하여 생성한 제3 로스 및 상기 학습용 최종 컬러 복원 이미지의 학습용 복원 크로미넌스(chrominance) 값과 이에 대응하는 GT 크로미넌스 값을 참조하여 생성한 제4 로스에 각각 가중치를 부여하여 생성한 합산 로스를 이용하여 상기 디스패리티 네트워크, 상기 오클루션 네트워크, 상기 디노이징 네트워크 및 상기 컬러화 네트워크에 포함된 적어도 하나의 파라미터를 학습시키는 프로세스를 수행하는 학습 장치가 개시된다. According to another aspect of the present invention, in a learning apparatus for performing stereo matching using a color image and a monochrome image, instructions for performing stereo matching using a color image and a monochrome image are provided. stored memory; and a processor that performs an operation for performing stereo matching using a color image and a monochrome image according to the instructions stored in the memory, wherein the processor corresponds to (I) a color camera constituting the stereo camera When at least one learning color and monochrome stereo pair including a learning color image to be used and a learning monochrome image corresponding to the monochrome camera constituting the stereo camera is obtained, (i) the learning color and monochrome stereo pair is combined with a disparity network ( disparity network), causing the disparity network to generate a training disparity map corresponding to the training monochrome image according to a matching cost between the training color image and the training monochrome image of the training color and monochrome stereo pair (ii) input the color and monochrome stereo pair for learning into an occlusion network, so that the occlusion network causes the learning according to the difference between the color image for learning and the monochrome image for learning generate an occlusion map, and (iii) input the training color and monochrome stereo pair into a denoising network to cause the denoising network to each of the training color image and the training monochrome image. A process of generating a denoised monochrome image for training and a denoised color image for training from which the noise in warping the denoised color image to generate a warped denoised color image for training, (i-2) concatenating the warped denoised color image for training and the denoised monochrome image for training (concatenation) for learning generate an initial color restored image, (ii) input the initial color restored image for training, the training occlusion map, and the training monochrome image to a colorization network, causing the colorization network to generate the occlusion map A process of restoring the color of the monochrome image for training according to the color information of the initial color restored image for training with reference to generate a final color restored image for training, and (III) the training disparity map and disparity GT (ground truth) The first loss generated by referring to the map, the second loss generated by referring to the occlusion map for learning and the occlusion GT map, the denoised monochrome image for learning, the denoised color image for learning, and the corresponding denoise Weights are given to the third loss generated by referring to the de GT image, the restoration chrominance value for training of the final color restoration image for training, and the fourth loss generated by referring to the GT chrominance value corresponding thereto, respectively. Disclosed is a learning apparatus that performs a process of learning at least one parameter included in the disparity network, the occlusion network, the denoising network, and the colorization network using the generated summation loss.

일례로서, 상기 (I) 프로세스에서, 상기 프로세서는, 상기 학습용 모노크롬 이미지와 상기 학습용 컬러 이미지를 디스패리티 네트워크에 입력하여, 상기 디스패리티 네트워크로 하여금, (i) 상기 디스패리티 네트워크의 인코더(encoder)에 포함된 제1_1 서브 인코더를 통해 상기 학습용 모노크롬 이미지를 인코딩하여 상기 학습용 모노크롬 이미지에 대한 학습용 제1_1 피쳐 맵(feature map)을 생성하고, 상기 디스패리티 네트워크의 상기 인코더에 포함된 제1_2 서브 인코더를 통해 상기 학습용 컬러 이미지를 인코딩하여 상기 학습용 컬러 이미지에 대한 학습용 제1_2 피쳐 맵을 생성하도록 하며, (ii) 상기 디스패리티 네트워크의 상기 인코더에 포함된 코릴레이션 레이어(correlation layer)를 통해 상기 학습용 제1_1 피쳐 맵과 상기 학습용 제1_2 피쳐 맵에 코릴레이션 연산을 수행하여 상기 매칭 코스트를 연산함으로써 상기 학습용 컬러 및 모노크롬 스테레오 페어에 대응되는 학습용 코릴레이션 피쳐 맵을 생성하고, (iii) 상기 디스패리티 네트워크의 상기 인코더에 포함된 제2 서브 인코더를 통해 상기 학습용 코릴레이션 피쳐 맵을 인코딩하여 학습용 제2 피쳐 맵을 생성하도록 하며, (iv) 상기 디스패리티 네트워크의 디코더(decoder)를 통해 상기 학습용 제2 피쳐 맵을 디코딩하여 상기 학습용 디스패리티 맵을 생성하도록 하는 것을 특징으로 하는 학습 장치가 개시된다. As an example, in the process (I), the processor inputs the monochrome image for training and the color image for training to a disparity network, and causes the disparity network to: (i) an encoder of the disparity network Encoding the monochromatic image for learning through the 1_1 sub-encoder included in generates a 1_1 feature map for learning for the monochromatic image for training, and using the 1_2 sub-encoder included in the encoder of the disparity network to generate a first_2 feature map for training with respect to the color image for training by encoding the color image for training through (ii) the first_1 for training through a correlation layer included in the encoder of the disparity network By performing a correlation operation on the feature map and the first_2 feature map for learning to calculate the matching cost, a correlation feature map for learning corresponding to the color and monochrome stereo pair for learning is generated, and (iii) the disparity network A second feature map for learning is generated by encoding the correlation feature map for learning through a second sub-encoder included in the encoder, and (iv) the second feature map for learning through a decoder of the disparity network. A learning apparatus is disclosed, characterized in that it generates the disparity map for learning by decoding.

일례로서, 상기 프로세서는, 상기 디스패리티 네트워크로 하여금, 상기 코릴레이션 연산으로서 상기 학습용 제1_1 피쳐 맵 상의 적어도 하나의 제1 영역에 포함된 적어도 하나의 제1 패치에 대응하는 학습용 제1 피쳐 벡터와 상기 제1 영역에 대응되는 상기 학습용 제1_2 피쳐 맵 상의 적어도 하나의 제2 영역에 포함된 적어도 하나의 제2 패치에 대응하는 학습용 제2 피쳐 벡터를 내적 연산함으로써 상기 학습용 코릴레이션 피쳐 맵을 생성하도록 하는 것을 특징으로 하는 학습 장치가 개시된다. As an example, the processor may cause the disparity network to perform, as the correlation operation, a first feature vector for learning corresponding to at least one first patch included in at least one first area on the first_1 feature map for learning; To generate the correlation feature map for learning by performing a dot product operation on a second feature vector for learning corresponding to at least one second patch included in at least one second region on the first_2 feature map for learning corresponding to the first region A learning device is disclosed, characterized in that

일례로서, 상기 프로세서는, 상기 디스패리티 네트워크로 하여금, (i) 상기 제1_1 서브 인코더, 상기 제1_2 서브 인코더 및 상기 제2 서브 인코더에 포함된 학습용 제1 인터미디어트 피쳐 맵(intermediate feature map) - 상기 학습용 제1 인터미디어트 피쳐 맵은 상기 학습용 제1_1 피쳐 맵, 상기 학습용 제1_2 피쳐 맵 및 상기 학습용 제2 피쳐 맵을 포함함 - 을 생성하는 적어도 하나의 인코딩 레이어 중 적어도 일부인 소정의 인코딩 레이어에서 생성되는 소정의 학습용 제1 인터미디어트 피쳐 맵을 상기 디코더에 포함되어 상기 학습용 제1 인터미디어트 피쳐 맵에 대응하는 학습용 제2 인터미디어트 피쳐 맵을 생성하는 적어도 하나의 디코딩 레이어 중 상기 소정의 인코딩 레이어에 대응되는 소정의 디코딩 레이어에서 생성되는 소정의 학습용 제2 인터미디어트 피쳐 맵과 컨캐터네이션(concatenation)하여 적어도 하나의 학습용 컨캐터네이션 피쳐 맵을 생성하고, 상기 학습용 컨캐터네이션 피쳐 맵 각각을 상기 학습용 컨캐터네이션 피쳐 맵을 생성하는데 사용된 상기 소정의 학습용 제2 인터미디어트 피쳐 맵에 대응되는 각각의 상기 소정의 디코딩 레이어의 후속 레이어에 입력하도록 하는 것을 특징으로 하는 학습 장치가 개시된다. As an example, the processor may cause the disparity network to: (i) a first intermediate feature map for learning included in the first_1 sub-encoder, the first_2 sub-encoder, and the second sub-encoder; A predetermined encoding layer that is at least a part of at least one encoding layer for generating - the first intermediate feature map for learning includes the first_1 feature map for learning, the first_2 feature map for learning, and the second feature map for learning The predetermined first intermediate feature map for learning generated in at least one decoding layer is included in the decoder to generate a second intermediate feature map for learning corresponding to the first intermediate feature map for learning Concatenate with a predetermined second intermediate feature map for learning generated in a predetermined decoding layer corresponding to the encoding layer of A learning apparatus characterized in that each map is input to a subsequent layer of each predetermined decoding layer corresponding to the predetermined second intermediate feature map for learning used to generate the concatenation feature map for learning is initiated.

일례로서, 상기 프로세서는, (i) 상기 학습용 디스패리티 맵을 참조하여 상기 학습용 컬러 이미지를 와핑하여 생성한 학습용 와프드(warped) 컬러 이미지 상의 각 픽셀의 픽셀값과 상기 학습용 모노크롬 이미지 상의 각 픽셀의 픽셀값 간의 차이를 계산한 상기 학습용 디퍼런스 맵을 생성하고, (ii) 상기 학습용 디스패리티 맵을 생성하는데 사용한 상기 학습용 제2 피쳐 맵과 상기 학습용 디퍼런스 맵을 상기 오클루션 네트워크에 입력하여, 상기 오클루션 네트워크로 하여금 상기 학습용 디퍼런스 맵에서 추출된 피쳐와 상기 학습용 제2 피쳐 맵의 피쳐를 참조하여 상기 학습용 와프드 컬러 이미지와 상기 학습용 모노크롬 이미지 간의 오클루션을 각 픽셀에 대한 바이너리 값으로 표시한 상기 학습용 오클루션 맵을 생성하도록 하는 것을 특징으로 하는 학습 장치가 개시된다.As an example, the processor may include: (i) a pixel value of each pixel on a warped color image for training generated by warping the color image for training with reference to the disparity map for training and a pixel value of each pixel on the monochrome image for training generating the difference map for learning by calculating the difference between pixel values, (ii) inputting the second feature map for learning and the difference map for learning used to generate the learning disparity map to the occlusion network, The occlusion network between the warped color image for training and the monochrome image for training is displayed as a binary value for each pixel by having the occlusion network refer to the features extracted from the learning difference map and the features of the second learning feature map. Disclosed is a learning apparatus characterized in that it generates the occlusion map for learning.

일례로서, 상기 프로세서는, 상기 오클루션 네트워크로 하여금, (i) 상기 오클루션 네트워크의 인코더를 통해 상기 학습용 디퍼런스 맵을 인코딩하여 학습용 제3 피쳐 맵을 생성하도록 하고, (ii) 상기 오클루션 네트워크의 디코더를 통해 상기 학습용 제3 피쳐 맵과 상기 학습용 제2 피쳐 맵을 컨캐터네이션(concatenation)한 다음 디코딩하여 상기 학습용 오클루션 맵을 생성하도록 하는 것을 특징으로 하는 학습 장치가 개시된다.As an example, the processor is configured to cause the occlusion network to: (i) encode a difference map for training through an encoder of the occlusion network to generate a third feature map for training, (ii) the occlusion network Disclosed is a learning apparatus characterized in that the third feature map for learning and the second feature map for learning are concatenated and then decoded to generate the occlusion map for learning through a decoder of .

일례로서, 상기 (I) 프로세스에서, 상기 프로세서는, 상기 디노이징 네트워크로 하여금, 상기 학습용 모노크롬 이미지와 상기 학습용 컬러 이미지 각각에 적어도 하나의 컨벌루션(convolution) 연산, 적어도 하나의 배치 정규화(batch normalization) 연산 및 적어도 하나의 ReLU 연산을 수행하여 상기 학습용 모노크롬 이미지와 상기 학습용 컬러 이미지 각각으로부터 상기 학습용 모노크롬 이미지와 상기 학습용 컬러 이미지 각각에 대응하는 학습용 레이턴트 노이즈리스 이미지(latent noiseless image)를 제거한 출력을 예측한 학습용 잔차 이미지(residual image)를 획득하고, 상기 학습용 모노크롬 이미지와 상기 학습용 컬러 이미지 각각에 대응하는 상기 학습용 잔차 이미지를 참조하여 상기 학습용 모노크롬 이미지와 상기 학습용 컬러 이미지로부터 상기 노이즈를 제거하여 상기 학습용 디노이즈드 모노크롬 이미지와 상기 학습용 디노이즈드 컬러 이미지를 생성하도록 하는 것을 특징으로 하는 학습 장치가 개시된다.As an example, in the process (I), the processor causes the denoising network to perform at least one convolution operation on each of the monochrome image for training and the color image for training, and at least one batch normalization. Calculation and at least one ReLU operation are performed to predict the output obtained by removing the latent noiseless image for learning corresponding to each of the learning monochrome image and the learning color image from each of the learning monochrome image and the learning color image. A residual image for training is obtained, and the noise is removed from the training monochrome image and the training color image by referring to the training residual image corresponding to each of the training monochrome image and the training color image. Disclosed is a learning apparatus characterized by generating a noised monochrome image and a denoised color image for learning.

일례로서, 상기 (II) 프로세스에서, 상기 프로세서는, 상기 컬러화 네트워크로 하여금, (i) 상기 학습용 오클루션 맵의 오클루션 영역 정보를 참조하여 상기 학습용 초기 컬러 복원 이미지로부터 추출된 적어도 하나의 픽셀 피쳐값에 대한 컬러 블리딩(color bleeding) 오류를 교정하여 교정된 픽셀 피쳐값을 생성하고, (ii) 레퍼런스 이미지(reference image)로 사용되는 상기 학습용 모노크롬 이미지 상에서 적어도 하나의 엣지로 구분된 각 영역의 컬러를 복원하기 위한 상기 각 영역의 컬러 씨드(seed)로써 상기 교정된 픽셀 피쳐값 중 적어도 일부를 이용하여 상기 학습용 모노크롬 이미지 상의 상기 각 픽셀에 대한 최종 컬러 정보를 복원하여 상기 학습용 최종 컬러 복원 이미지를 생성하도록 하는 것을 특징으로 하는 학습 장치가 개시된다. As an example, in the process (II), the processor causes the colorization network to: (i) at least one pixel feature extracted from the initial color restoration image for training with reference to the occlusion region information of the occlusion map for training. Correcting a color bleeding error on a value to produce a corrected pixel feature value, and (ii) the color of each region separated by at least one edge on the training monochrome image used as a reference image. By using at least a part of the corrected pixel feature values as a color seed of each region to restore Disclosed is a learning apparatus characterized in that to do so.

일례로서, 상기 (II) 프로세스에서, 상기 프로세서는, 상기 컬러화 네트워크로 하여금, (i) 상기 컬러화 네트워크의 인코더(encoder)를 이용하여 상기 학습용 초기 컬러 복원 이미지, 상기 학습용 오클루션 맵 및 상기 학습용 모노크롬 이미지를 컨캐터네이션(concatenation)한 다음 인코딩하여 학습용 제4 피쳐 맵을 생성하도록 하며, (ii) 상기 컬러화 네트워크의 디코더(decoder)를 이용하여 상기 학습용 제4 피쳐 맵을 디코딩하여 상기 학습용 최종 컬러 복원 이미지를 생성하도록 하는 것을 특징으로 하는 학습 장치가 개시된다. As an example, in the process (II), the processor causes the colorization network to: (i) use an encoder of the colorization network to perform the training initial color restoration image, the training occlusion map, and the training monochrome The image is concatenated and then encoded to generate a fourth feature map for training, and (ii) the fourth feature map for training is decoded using a decoder of the colorization network to restore the final color for training. Disclosed is a learning apparatus characterized in that it generates an image.

일례로서, 상기 (II) 프로세스에서, 상기 프로세서는, 상기 학습용 와프드 디노이즈드 모노크롬 이미지와 상기 학습용 디노이즈드 컬러 이미지를 채널 방향으로 컨캐터네이션하여, 상기 학습용 와프드 디노이즈드 컬러 이미지의 컬러 정보가 크로미넌스(chrominance) 채널인 U 채널과 V 채널을 구성하고 상기 학습용 디노이즈드 모노크롬 이미지의 휘도 정보가 루미넌스(luminance) 채널인 Y 채널을 구성하는 상기 학습용 초기 컬러 복원 이미지를 생성하는 것을 특징으로 하는 학습 장치가 개시된다. As an example, in the process (II), the processor concatenates the warped denoised monochrome image for training and the denoised color image for training in a channel direction, and the The color information constitutes the U channel and the V channel, which are the chrominance channels, and the luminance information of the denoised monochrome image for training constitutes the Y channel, which is the luminance channel To generate the initial color restoration image for training A learning apparatus characterized in that is disclosed.

본 발명의 또 다른 태양에 따르면, 컬러 이미지와 모노크롬(monochrome) 이미지를 이용하여 스테레오 매칭(stereo matching)을 수행하는 테스트 장치에 있어서, 컬러 이미지와 모노크롬 이미지를 이용하여 스테레오 매칭을 수행하기 위한 인스트럭션들이 저장된 메모리; 및 상기 메모리에 저장된 상기 인스트럭션들에 따라 컬러 이미지와 모노크롬 이미지를 이용하여 스테레오 매칭을 수행하기 위한 동작을 수행하는 프로세서;를 포함하되, 상기 프로세서는, (I) 스테레오 카메라를 구성하는 컬러 카메라에 대응되는 학습용 컬러 이미지와 상기 스테레오 카메라를 구성하는 모노크롬 카메라에 대응되는 학습용 모노크롬 이미지를 포함하는 적어도 하나의 학습용 컬러 및 모노크롬 스테레오 페어가 획득되면, 학습 장치가, (1) (i) 상기 학습용 컬러 및 모노크롬 스테레오 페어를 디스패리티 네트워크(disparity network)에 입력하여, 상기 디스패리티 네트워크로 하여금, 상기 학습용 컬러 및 모노크롬 스테레오 페어의 상기 학습용 컬러 이미지와 상기 학습용 모노크롬 이미지 사이의 매칭 코스트에 따라 상기 학습용 모노크롬 이미지에 대응하는 학습용 디스패리티 맵을 생성하도록 하고, (ii) 상기 학습용 컬러 및 모노크롬 스테레오 페어를 오클루션 네트워크(occlusion network)에 입력하여, 상기 오클루션 네트워크로 하여금 상기 학습용 컬러 이미지와 상기 학습용 모노크롬 이미지 사이의 디퍼런스(difference)에 따른 학습용 오클루션 맵(occlusion map)을 생성하도록 하며, (iii) 상기 학습용 컬러 및 모노크롬 스테레오 페어를 디노이징 네트워크(denoising network)에 입력하여, 상기 디노이징 네트워크로 하여금 상기 학습용 컬러 이미지 및 상기 학습용 모노크롬 이미지 각각에서의 노이즈를 제거한 학습용 디노이즈드(denoised) 모노크롬 이미지 및 학습용 디노이즈드 컬러 이미지를 생성하도록 하는 프로세스, (2) (i) (i-1) 상기 학습용 디스패리티 맵을 참조하여 상기 학습용 디노이즈드 컬러 이미지를 와핑(warping)하여 학습용 와프드(warped) 디노이즈드 컬러 이미지를 생성하고, (i-2) 상기 학습용 와프드 디노이즈드 컬러 이미지와 상기 학습용 디노이즈드 모노크롬 이미지를 컨캐터네이션(concatenation)하여 학습용 초기 컬러 복원 이미지를 생성하며, (ii) 상기 학습용 초기 컬러 복원 이미지, 상기 학습용 오클루션 맵 및 상기 학습용 모노크롬 이미지를 컬러화 네트워크(colorization network)에 입력하여, 상기 컬러화 네트워크로 하여금, 상기 오클루션 맵을 참조하여 상기 학습용 초기 컬러 복원 이미지의 컬러 정보에 따라 상기 학습용 모노크롬 이미지의 컬러를 복원하여 학습용 최종 컬러 복원 이미지를 생성하도록 하는 프로세스, 및 (3) 상기 학습용 디스패리티 맵과 디스패리티 GT(ground truth) 맵을 참조하여 생성한 제1 로스, 상기 학습용 오클루션 맵과 오클루션 GT 맵을 참조하여 생성한 제2 로스, 상기 학습용 디노이즈드 모노크롬 이미지 및 상기 학습용 디노이즈드 컬러 이미지와 이에 대응하는 디노이즈드 GT 이미지를 참조하여 생성한 제3 로스 및 상기 학습용 최종 컬러 복원 이미지의 학습용 복원 크로미넌스(chrominance) 값과 이에 대응하는 GT 크로미넌스 값을 참조하여 생성한 제4 로스에 각각 가중치를 부여하여 생성한 합산 로스를 이용하여 상기 디스패리티 네트워크, 상기 오클루션 네트워크, 상기 디노이징 네트워크 및 상기 컬러화 네트워크에 포함된 적어도 하나의 파라미터를 학습시키는 프로세스를 수행한 상태에서, 상기 스테레오 카메라를 구성하는 상기 컬러 카메라에 대응되는 테스트용 컬러 이미지와 상기 스테레오 카메라를 구성하는 상기 모노크롬 카메라에 대응되는 테스트용 모노크롬 이미지를 포함하는 적어도 하나의 테스트용 컬러 및 모노크롬 스테레오 페어를 획득하는 프로세스, (II) (i) 상기 테스트용 컬러 및 모노크롬 스테레오 페어를 상기 디스패리티 네트워크에 입력하여, 상기 디스패리티 네트워크로 하여금, 상기 테스트용 컬러 및 모노크롬 스테레오 페어의 상기 테스트용 컬러 이미지와 상기 테스트용 모노크롬 이미지 사이의 매칭 코스트에 따라 상기 테스트용 모노크롬 이미지에 대응하는 테스트용 디스패리티 맵을 생성하도록 하고, (ii) 상기 테스트용 컬러 및 모노크롬 스테레오 페어를 상기 오클루션 네트워크에 입력하여, 상기 오클루션 네트워크로 하여금 상기 테스트용 컬러 이미지와 상기 테스트용 모노크롬 이미지 사이의 디퍼런스에 따른 테스트용 오클루션 맵을 생성하도록 하며, (iii) 상기 테스트용 컬러 및 모노크롬 스테레오 페어를 상기 디노이징 네트워크에 입력하여, 상기 디노이징 네트워크로 하여금 상기 테스트용 컬러 이미지 및 상기 테스트용 모노크롬 이미지 각각에서의 노이즈를 제거한 테스트용 디노이즈드 모노크롬 이미지 및 테스트용 디노이즈드 컬러 이미지를 생성하도록 하는 프로세스, 및 (III) (i) (i-1) 상기 테스트용 디스패리티 맵을 참조하여 상기 테스트용 디노이즈드 컬러 이미지를 와핑하여 테스트용 와프드 디노이즈드 컬러 이미지를 생성하고, (i-2) 상기 테스트용 와프드 디노이즈드 컬러 이미지와 상기 테스트용 디노이즈드 모노크롬 이미지를 컨캐터네이션하여 테스트용 초기 컬러 복원 이미지를 생성하며, (ii) 상기 테스트용 초기 컬러 복원 이미지, 상기 테스트용 오클루션 맵 및 상기 테스트용 모노크롬 이미지를 상기 컬러화 네트워크에 입력하여, 상기 컬러화 네트워크로 하여금, 상기 오클루션 맵을 참조하여 상기 테스트용 초기 컬러 복원 이미지의 컬러 정보에 따라 상기 테스트용 모노크롬 이미지의 컬러를 복원하여 테스트용 최종 컬러 복원 이미지를 생성하도록 하는 프로세스를 수행하는 테스트 장치가 개시된다.According to another aspect of the present invention, in a test apparatus for performing stereo matching using a color image and a monochrome image, instructions for performing stereo matching using a color image and a monochrome image are provided. stored memory; and a processor that performs an operation for performing stereo matching using a color image and a monochrome image according to the instructions stored in the memory, wherein the processor corresponds to (I) a color camera constituting the stereo camera When at least one learning color and monochrome stereo pair including a learning color image to be used and a learning monochrome image corresponding to the monochrome camera constituting the stereo camera is obtained, the learning apparatus is configured to: (1) (i) the learning color and monochrome A stereo pair is input to a disparity network, so that the disparity network corresponds to the learning monochrome image according to a matching cost between the learning color image and the learning monochrome image of the learning color and monochrome stereo pair and (ii) input the training color and monochrome stereo pair to an occlusion network, causing the occlusion network to dip between the training color image and the training monochrome image. generate an occlusion map for learning according to difference, and (iii) input the learning color and monochrome stereo pair into a denoising network, causing the denoising network to cause the learning color A process of generating a denoised monochrome image for training and a denoised color image for training by removing noise from each of the image and the monochrome image for training, (2) (i) (i-1) the disparity map for training warped the denoised color image for learning with reference to generate a warped denoised color image for learning, (i-2) the warped denoised color image for learning and the denoise for learning Concatena a monochrome image tion) to generate an initial color restored image for training, (ii) input the initial color restored image for learning, the occlusion map for learning, and the monochrome image for learning into a colorization network, causing the colorization network to A process of restoring the color of the learning monochrome image according to the color information of the initial color restored image for training with reference to an occlusion map to generate a final color restored image for training, and (3) the training disparity map and disparity GT (ground truth) The first loss generated by referring to the map, the second loss generated by referring to the occlusion map for training and the occlusion GT map, the denoised monochrome image for training and the denoised color image for training, and the The third loss generated by referring to the corresponding denoised GT image, the restored chrominance value for training of the final color restored image for training, and the fourth loss generated by referring to the corresponding GT chrominance value In a state in which a process of learning at least one parameter included in the disparity network, the occlusion network, the denoising network, and the colorization network is performed using the summation loss generated by assigning weights to each, the stereo camera a process of obtaining at least one test color and monochrome stereo pair comprising a test color image corresponding to the color camera constituting ; to generate a test disparity map corresponding to the test monochrome image according to the matching cost, (ii) input the test color and monochrome stereo pair to the occlusion network to cause the occlusion network to generate a test occlusion map according to a difference between the test color image and the test monochrome image (iii) input the test color and monochrome stereo pair to the denoising network to cause the denoising network to remove noise from each of the test color image and the test monochrome image. a process to generate a de-monochrome image and a denoised color image for testing, and (III) (i) (i-1) warping the denoised color image for testing with reference to the disparity map for testing and testing generating a warped denoised color image for the test, (i-2) concatenating the warped denoised color image for the test and the denoised monochrome image for the test to generate an initial color restored image for the test, , (ii) inputting the initial color restoration image for testing, the occlusion map for testing, and the monochrome image for testing into the colorization network, causing the colorization network to refer to the occlusion map to determine the initial color for testing Disclosed is a test apparatus that performs a process of restoring the color of the monochrome image for testing according to color information of the restored image to generate a final color restored image for testing.

일례로서, 상기 (II) 프로세스에서, 상기 프로세서는, 상기 테스트용 모노크롬 이미지와 상기 테스트용 컬러 이미지를 디스패리티 네트워크에 입력하여, 상기 디스패리티 네트워크로 하여금, (i) 상기 디스패리티 네트워크의 인코더(encoder)에 포함된 제1_1 서브 인코더를 통해 상기 테스트용 모노크롬 이미지를 인코딩하여 상기 테스트용 모노크롬 이미지에 대한 테스트용 제1_1 피쳐 맵(feature map)을 생성하고, 상기 디스패리티 네트워크의 상기 인코더에 포함된 제1_2 서브 인코더를 통해 상기 테스트용 컬러 이미지를 인코딩하여 상기 테스트용 컬러 이미지에 대한 테스트용 제1_2 피쳐 맵을 생성하도록 하며, (ii) 상기 디스패리티 네트워크의 상기 인코더에 포함된 코릴레이션 레이어(correlation layer)를 통해 상기 테스트용 제1_1 피쳐 맵과 상기 테스트용 제1_2 피쳐 맵에 코릴레이션 연산을 수행하여 상기 매칭 코스트를 연산함으로써 상기 테스트용 컬러 및 모노크롬 스테레오 페어에 대응되는 테스트용 코릴레이션 피쳐 맵을 생성하고, (iii) 상기 디스패리티 네트워크의 상기 인코더에 포함된 제2 서브 인코더를 통해 상기 테스트용 코릴레이션 피쳐 맵을 인코딩하여 테스트용 제2 피쳐 맵을 생성하도록 하며, (iv) 상기 디스패리티 네트워크의 디코더(decoder)를 통해 상기 테스트용 제2 피쳐 맵을 디코딩하여 상기 테스트용 디스패리티 맵을 생성하도록 하는 것을 특징으로 하는 테스트 장치가 개시된다. As an example, in the process (II), the processor inputs the monochrome image for testing and the color image for testing to a disparity network to cause the disparity network to: (i) an encoder of the disparity network ( Encodes the test monochrome image through the first_1 sub-encoder included in the encoder to generate a test first_1 feature map for the test monochrome image, and includes the encoder in the disparity network. Encoding the color image for testing through the first_2 sub-encoder to generate a first_2 feature map for testing with respect to the color image for testing, (ii) a correlation layer included in the encoder of the disparity network layer) to calculate the matching cost by performing a correlation operation on the 1_1 feature map for testing and the 1_2 feature map for testing through the test correlation feature map corresponding to the color and monochrome stereo pair for testing. (iii) encoding the correlation feature map for testing through a second sub-encoder included in the encoder of the disparity network to generate a second feature map for testing, (iv) the disparity network A test apparatus is disclosed, characterized in that the disparity map for the test is generated by decoding the second feature map for the test through a decoder of .

일례로서, 상기 프로세서는, 상기 디스패리티 네트워크로 하여금, 상기 코릴레이션 연산으로서 상기 테스트용 제1_1 피쳐 맵 상의 적어도 하나의 제1 영역에 포함된 적어도 하나의 제1 패치에 대응하는 테스트용 제1 피쳐 벡터와 상기 제1 영역에 대응되는 상기 테스트용 제1_2 피쳐 맵 상의 적어도 하나의 제2 영역에 포함된 적어도 하나의 제2 패치에 대응하는 테스트용 제2 피쳐 벡터를 내적 연산함으로써 상기 테스트용 코릴레이션 피쳐 맵을 생성하도록 하는 것을 특징으로 하는 테스트 장치가 개시된다. As an example, the processor is configured to cause the disparity network, as the correlation operation, to be a test first feature corresponding to at least one first patch included in at least one first area on the test first_1 feature map. Correlation for testing by performing a dot product operation on a vector and a second feature vector for testing corresponding to at least one second patch included in at least one second region on the first_2 feature map for testing corresponding to the first region Disclosed is a test apparatus characterized in that it is configured to generate a feature map.

일례로서, 상기 프로세서는, (i) 상기 테스트용 디스패리티 맵을 참조하여 상기 테스트용 컬러 이미지를 와핑하여 생성한 테스트용 와프드(warped) 컬러 이미지 상의 각 픽셀의 픽셀값과 상기 테스트용 모노크롬 이미지 상의 각 픽셀의 픽셀값 간의 차이를 계산한 상기 테스트용 디퍼런스 맵을 생성하고, (ii) 상기 테스트용 디스패리티 맵을 생성하는데 사용한 상기 테스트용 제2 피쳐 맵과 상기 테스트용 디퍼런스 맵을 상기 오클루션 네트워크에 입력하여, 상기 오클루션 네트워크로 하여금 상기 테스트용 디퍼런스 맵에서 추출된 피쳐와 상기 테스트용 제2 피쳐 맵의 피쳐를 참조하여 상기 테스트용 와프드 컬러 이미지와 상기 테스트용 모노크롬 이미지 간의 오클루션을 각 픽셀에 대한 바이너리 값으로 표시한 상기 테스트용 오클루션 맵을 생성하도록 하는 것을 특징으로 하는 테스트 장치가 개시된다.As an example, the processor is configured to: (i) warp the test color image with reference to the test disparity map, and the pixel value of each pixel on the test warped color image and the test monochrome image generating the difference map for the test by calculating the difference between the pixel values of each pixel on the image; By inputting to the occlusion network, the occlusion network causes the test between the warped color image for testing and the monochrome image for testing by referring to the features extracted from the difference map for testing and the features of the second feature map for testing. Disclosed is a test apparatus configured to generate the occlusion map for testing in which occlusion is expressed as a binary value for each pixel.

이 외에도, 본 발명의 방법을 실행하기 위한 컴퓨터 프로그램을 기록하기 위한 컴퓨터 판독 가능한 기록 매체가 더 제공된다.In addition to this, a computer-readable recording medium for recording a computer program for executing the method of the present invention is further provided.

본 발명은 각각 인코더(encoder)와 디코더(decoder)로 구성된 디스패리티 네트워크(disparity network), 오클루션 네트워크(occlusion network) 및 컬러화 네트워크(colorization network)와 더불어 디노이징 네트워크(denoising network)가 포함된 엔드-투-엔드(end-to-end) CNN(convolutional neural network)구조를 이용하여 비교적 짧은 프로세싱 시간으로도 높은 정확도의 디스패리티 맵과 선명한 컬러 복원 이미지를 함께 생성하는 효과가 있다. The present invention is an end including a disparity network, an occlusion network, and a colorization network composed of an encoder and a decoder, respectively, as well as a denoising network -Using an end-to-end convolutional neural network (CNN) structure, there is an effect of generating a high-accuracy disparity map and a clear color reconstructed image together with a relatively short processing time.

또한, 본 발명은 디스패리티 맵을 참조하여 노이즈를 제거한 컬러 이미지의 컬러 정보를 노이즈를 제거한 모노크롬 이미지에 추가하여 노이즈가 없는 선명한 컬러 이미지를 복구하는 효과가 있다. In addition, the present invention has an effect of restoring a clear color image without noise by adding color information of a color image from which noise has been removed with reference to a disparity map to a monochrome image from which noise has been removed.

또한, 본 발명은 오클루션 맵을 이용하여 부정확한 스테레오 매칭으로 인해 발생할 수 있는 색 번짐 현상을 방지하는 효과가 있다.In addition, the present invention has an effect of preventing color bleeding that may occur due to inaccurate stereo matching by using the occlusion map.

또한, 본 발명은 오클루션 맵을 생성할 때 디스패리티 맵에 대한 정보를 추가로 이용하여 오클루션 맵의 정확도를 향상시키는 효과가 있다. In addition, the present invention has an effect of improving the accuracy of the occlusion map by additionally using information on the disparity map when generating the occlusion map.

본 발명의 실시예의 설명에 이용되기 위하여 첨부된 아래 도면들은 본 발명의 실시예들 중 단지 일부일 뿐이며, 본 발명이 속한 기술분야에서 통상의 지식을 가진 자(이하 "통상의 기술자")에게 있어서는 발명적 작업이 이루어짐 없이 이 도면들에 기초하여 다른 도면들이 얻어질 수 있다.
도 1은 컬러 이미지와 모노크롬 이미지의 광 효율 차이를 개략적으로 도시한 것이고,
도 2는 본 발명의 일 실시예에 따라 컬러 이미지와 모노크롬 이미지를 이용하여 스테레오 매칭을 수행하는 컬러 및 모노크롬 스테레오 네트워크(Color and Monochrome Stereo Network; CMSNet)를 학습시키기 위한 학습 장치를 개략적으로 도시한 것이며,
도 3은 본 발명의 일 실시예에 따라 컬러 이미지와 모노크롬 이미지를 이용하여 스테레오 매칭을 수행하는 컬러 및 모노크롬 스테레오 네트워크 네트워크의 구조를 개략적으로 도시한 것이고,
도 4는 본 발명의 일 실시예에 따라 네트워크의 학습에 사용할 학습용 컬러 이미지와 학습용 모노크롬 이미지를 생성하는 방법을 개략적으로 도시한 것이며,
도 5는 본 발명의 일 실시예에 따라 컬러 이미지와 모노크롬 이미지로 구성된 스테레오 페어로부터 디스패리티 맵(disparity map)을 생성하는 디스패리티 네트워크(disparity network)의 구조를 개략적으로 도시한 것이며,
도 6은 본 발명의 일 실시예에 따라 스테레오 페어로부터 오클루션 맵(occlusion map)을 생성하는 오클루션 네트워크(occlusion network)의 구조를 개략적으로 도시한 것이고,
도 7은 본 발명의 일 실시예에 따라 오클루션 맵(occlusion map)의 사용에 따른 컬러 블리딩 오류 보정 효과를 개략적으로 도시한 것이며,
도 8은 본 발명의 일 실시예에 따라 모노크롬 이미지와 컬러 이미지로부터 노이즈를 제거하는 디노이징 네트워크(denoising network)의 구조를 개략적으로 도시한 것이고,
도 9는 본 발명의 일 실시예에 따라 노이즈를 제거하지 않은 스테레오 페어와 노이즈를 제거한 스테레오 페어로부터 생성된 디스패리티 맵을 비교한 것을 개략적으로 도시한 것이며,
도 10은 본 발명의 일 실시예에 따라 스테레오 페어, 디스패리티 맵, 오클루션 맵을 이용하여 컬러 복원 이미지를 생성하는 컬러화 네트워크(colorization network)의 구조를 개략적으로 도시한 것이고,
도 11은 본 발명의 일 실시예에 따라 컬러 이미지와 모노크롬 이미지를 이용하여 스테레오 매칭을 수행하는 컬러 및 모노크롬 스테레오 네트워크를 테스트하기 위한 테스트 장치를 개략적으로 도시한 것이며,
도 12는 본 발명의 일 실시예에 따라 생성된 디스래피티 맵을 다른 각종 네트워크로부터 생성된 디스패리티 맵과 비교하여 개략적으로 도시한 것이고,
도 13은 본 발명의 일 실시예에 따라 생성된 컬러 복원 이미지를 다른 각종 네트워크로부터 생성된 컬러 복원 이미지와 비교하여 개략적으로 도시한 것이며,
도 14는 본 발명의 일 실시예에 따라 입력으로 사용한 컬러 이미지와 출력으로 생성된 컬러 복원 이미지를 비교하여 개략적으로 도시한 것이고,
도 15는 본 발명의 일 실시예에 따라 컬러 이미지와 근적외선(Near-Infrared; NIR) 이미지로 구성된 스테레오 페어로부터 생성된 디스패리티 맵을 개략적으로 도시하고 있다.The accompanying drawings for use in the description of the embodiments of the present invention are only a part of the embodiments of the present invention, and for those of ordinary skill in the art to which the present invention pertains (hereinafter, "those of ordinary skill in the art"), the invention Other drawings may be obtained based on these drawings without any work being done.
1 schematically shows the difference in light efficiency between a color image and a monochrome image,
2 schematically shows a learning apparatus for learning a Color and Monochrome Stereo Network (CMSNet) that performs stereo matching using a color image and a monochrome image according to an embodiment of the present invention. ,
3 schematically shows the structure of a color and monochrome stereo network network that performs stereo matching using a color image and a monochrome image according to an embodiment of the present invention;
4 schematically shows a method of generating a color image for learning and a monochrome image for learning to be used for learning of a network according to an embodiment of the present invention;
5 schematically shows the structure of a disparity network that generates a disparity map from a stereo pair consisting of a color image and a monochrome image according to an embodiment of the present invention;
6 schematically shows the structure of an occlusion network for generating an occlusion map from a stereo pair according to an embodiment of the present invention;
7 schematically illustrates the color bleeding error correction effect according to the use of an occlusion map according to an embodiment of the present invention;
8 schematically shows the structure of a denoising network that removes noise from a monochrome image and a color image according to an embodiment of the present invention;
9 schematically shows a comparison of a disparity map generated from a stereo pair without noise and a stereo pair from which noise has been removed according to an embodiment of the present invention;
10 schematically illustrates the structure of a colorization network that generates a color restored image using a stereo pair, a disparity map, and an occlusion map according to an embodiment of the present invention;
11 schematically shows a test apparatus for testing a color and monochrome stereo network that performs stereo matching using a color image and a monochrome image according to an embodiment of the present invention;
12 schematically shows a disparity map generated according to an embodiment of the present invention by comparing it with a disparity map generated from various other networks;
13 schematically shows a color restored image generated according to an embodiment of the present invention by comparing it with color restored images generated from other various networks;
14 schematically shows a color image used as an input and a color restored image generated as an output according to an embodiment of the present invention by comparison;
15 schematically illustrates a disparity map generated from a stereo pair including a color image and a near-infrared (NIR) image according to an embodiment of the present invention.

후술하는 본 발명에 대한 상세한 설명은, 본 발명의 목적들, 기술적 해법들 및 장점들을 분명하게 하기 위하여 본 발명이 실시될 수 있는 특정 실시예를 예시로서 도시하는 첨부 도면을 참조한다. 이들 실시예는 통상의 기술자가 본 발명을 실시할 수 있기에 충분하도록 상세히 설명된다.DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The following detailed description of the present invention refers to the accompanying drawings, which show by way of illustration a specific embodiment in which the present invention may be practiced, in order to clarify the objects, technical solutions and advantages of the present invention. These embodiments are described in sufficient detail to enable those skilled in the art to practice the present invention.

또한, 본 발명의 상세한 설명 및 청구항들에 걸쳐, "포함하다"라는 단어 및 그것의 변형은 다른 기술적 특징들, 부가물들, 구성요소들 또는 단계들을 제외하는 것으로 의도된 것이 아니다. 통상의 기술자에게 본 발명의 다른 목적들, 장점들 및 특성들이 일부는 본 설명서로부터, 그리고 일부는 본 발명의 실시로부터 드러날 것이다. 아래의 예시 및 도면은 실례로서 제공되며, 본 발명을 한정하는 것으로 의도된 것이 아니다.Also, throughout this description and claims, the word "comprises" and variations thereof are not intended to exclude other technical features, additions, components or steps. Other objects, advantages and characteristics of the present invention will become apparent to a person skilled in the art in part from this description and in part from practice of the present invention. The following illustrations and drawings are provided by way of illustration and are not intended to limit the invention.

더욱이 본 발명은 본 명세서에 표시된 실시예들의 모든 가능한 조합들을 망라한다. 본 발명의 다양한 실시예는 서로 다르지만 상호 배타적일 필요는 없음이 이해되어야 한다. 예를 들어, 여기에 기재되어 있는 특정 형상, 구조 및 특성은 일 실시예에 관련하여 본 발명의 정신 및 범위를 벗어나지 않으면서 다른 실시예로 구현될 수 있다. 또한, 각각의 개시된 실시예 내의 개별 구성요소의 위치 또는 배치는 본 발명의 정신 및 범위를 벗어나지 않으면서 변경될 수 있음이 이해되어야 한다. 따라서, 후술하는 상세한 설명은 한정적인 의미로서 취하려는 것이 아니며, 본 발명의 범위는, 적절하게 설명된다면, 그 청구항들이 주장하는 것과 균등한 모든 범위와 더불어 첨부된 청구항에 의해서만 한정된다. 도면에서 유사한 참조부호는 여러 측면에 걸쳐서 동일하거나 유사한 기능을 지칭한다.Moreover, the invention encompasses all possible combinations of the embodiments indicated herein. It should be understood that the various embodiments of the present invention are different but need not be mutually exclusive. For example, certain shapes, structures, and characteristics described herein may be implemented in other embodiments with respect to one embodiment without departing from the spirit and scope of the invention. In addition, it should be understood that the location or arrangement of individual components within each disclosed embodiment may be changed without departing from the spirit and scope of the present invention. Accordingly, the following detailed description is not intended to be taken in a limiting sense, and the scope of the present invention, if properly described, is limited only by the appended claims, along with all scope equivalents to those claimed. Like reference numerals in the drawings refer to the same or similar functions throughout the various aspects.

이하, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 본 발명을 용이하게 실시할 수 있도록 하기 위하여, 본 발명의 바람직한 실시예들에 관하여 첨부된 도면을 참조하여 상세히 설명하기로 한다.Hereinafter, in order to enable those of ordinary skill in the art to easily practice the present invention, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 2는 본 발명의 일 실시예에 따라 하는 본 발명의 일 실시예에 따라 컬러 이미지와 모노크롬 이미지를 이용하여 스테레오 매칭을 수행하는 컬러 및 모노크롬 스테레오 네트워크(Color and Monochrome Stereo Network; CMSNet)를 학습시키기 위한 학습 장치(1000)를 개략적으로 도시한 것이다.2 is a color and monochrome stereo network (CMSNet) that performs stereo matching using a color image and a monochrome image according to an embodiment of the present invention according to an embodiment of the present invention. A schematic diagram of a learning apparatus 1000 for

도 2를 참조하면, 학습 장치(1000)는 컬러 이미지와 모노크롬 이미지를 이용하여 스테레오 매칭을 수행하는 컬러 및 모노크롬 스테레오 네트워크를 학습시키기 위한 인스트럭션들을 저장하는 메모리(1001)와 메모리(1001)에 저장된 인스트럭션들에 대응하여 컬러 이미지와 모노크롬 이미지를 이용하여 스테레오 매칭을 수행하는 컬러 및 모노크롬 스테레오 네트워크를 학습시키기 위한 프로세서(1002)를 포함할 수 있다. Referring to FIG. 2 , the learning apparatus 1000 includes a memory 1001 storing instructions for learning a color and monochrome stereo network that performs stereo matching using a color image and a monochrome image, and instructions stored in the memory 1001 . The processor 1002 may include a processor 1002 for learning a color and monochrome stereo network that performs stereo matching using a color image and a monochrome image in correspondence with each other.

구체적으로, 학습 장치(1000)는 전형적으로 컴퓨팅 장치(예컨대, 컴퓨터 프로세서, 메모리, 스토리지, 입력 장치 및 출력 장치, 기타 기존의 컴퓨팅 장치의 구성요소들을 포함할 수 있는 장치; 라우터, 스위치 등과 같은 전자 통신 장치; 네트워크 부착 스토리지(NAS) 및 스토리지 영역 네트워크(SAN)와 같은 전자 정보 스토리지 시스템)와 컴퓨터 소프트웨어(즉, 컴퓨팅 장치로 하여금 특정의 방식으로 기능하게 하는 인스트럭션들)의 조합을 이용하여 원하는 시스템 성능을 달성하는 것일 수 있다.Specifically, the learning device 1000 is typically a computing device (eg, a device that may include a computer processor, memory, storage, input and output devices, other components of a conventional computing device; electronic devices such as routers, switches, etc.) A desired system using a combination of communication devices; electronic information storage systems such as network attached storage (NAS) and storage area networks (SANs)) and computer software (ie, instructions that cause the computing device to function in a particular way). performance may be achieved.

또한, 컴퓨팅 장치의 프로세서는 MPU(Micro Processing Unit) 또는 CPU(Central Processing Unit), 캐쉬 메모리(Cache Memory), 데이터 버스(Data Bus) 등의 하드웨어 구성을 포함할 수 있다. 또한, 컴퓨팅 장치는 운영체제, 특정 목적을 수행하는 애플리케이션의 소프트웨어 구성을 더 포함할 수도 있다.In addition, the processor of the computing device may include a hardware configuration such as a micro processing unit (MPU) or a central processing unit (CPU), a cache memory, and a data bus. In addition, the computing device may further include an operating system and a software configuration of an application for performing a specific purpose.

그러나, 컴퓨팅 장치가 본 발명을 실시하기 위한 미디엄, 프로세서 및 메모리가 통합된 형태인 integrated 프로세서를 포함하는 경우를 배제하는 것은 아니다.However, a case in which the computing device includes an integrated processor in which a medium, a processor, and a memory are integrated for implementing the present invention is not excluded.

이와 같이 구성된 학습 장치(1000)를 이용하여 본 발명의 일 실시예에 따라 컬러 이미지와 모노크롬 이미지를 이용하여 스테레오 매칭을 수행하기 위한 네트워크를 학습시키기 위한 방법을 도 3 내지 도 10을 참조하여 설명하면 다음과 같다.A method for learning a network for performing stereo matching using a color image and a monochrome image according to an embodiment of the present invention using the learning apparatus 1000 configured as described above will be described with reference to FIGS. 3 to 10 . As follows.

먼저, 도 3은 본 발명의 일 실시예에 따라 컬러 이미지와 모노크롬 이미지를 이용하여 스테레오 매칭을 수행하는 컬러 및 모노크롬 스테레오 네트워크의 구조를 개략적으로 도시한 것이다. 도 3에 따르면, 컬러 및 모노크롬 스테레오 네트워크는 디스패리티 네트워크(100), 오클루션 네트워크(200), 디노이징 네트워크(300) 및 컬러화 네트워크(400)로 구성되어 있다. First, FIG. 3 schematically illustrates the structure of a color and monochrome stereo network for performing stereo matching using a color image and a monochrome image according to an embodiment of the present invention. According to FIG. 3 , the color and monochrome stereo network includes a disparity network 100 , an occlusion network 200 , a denoising network 300 , and a colorization network 400 .

도 3을 참조하면, 우선, 학습 장치(1000)는 스테레오 카메라를 구성하는 컬러 카메라에 대응되는 학습용 컬러 이미지와 스테레오 카메라를 구성하는 모노크롬 카메라에 대응되는 학습용 모노크롬 이미지를 포함하는 적어도 하나의 학습용 컬러 및 모노크롬 스테레오 페어를 획득할 수 있다.Referring to FIG. 3 , first, the learning apparatus 1000 includes at least one learning color image including a learning color image corresponding to the color camera constituting the stereo camera and a learning monochrome image corresponding to the monochrome camera constituting the stereo camera, and A monochrome stereo pair can be obtained.

여기서, 스테레오 카메라는 기설정된 간격을 가지는 2개의 카메라로 구성되며, 하나의 카메라는 컬러 카메라로 구성될 수 있으며, 다른 하나의 카메라는 모노크롬 카메라로 구성될 수 있다. 따라서, 학습 장치(1000)는 모노크롬 카메라와 컬러 카메라로 구성되는 스테레오 카메라를 이용하여 촬영한 모노크롬 이미지와 컬러 이미지를 포함하는 학습용 컬러 및 모노크롬 스테레오 페어를 획득할 수 있다.Here, the stereo camera is composed of two cameras having a predetermined interval, one camera may be composed of a color camera, and the other camera may be composed of a monochrome camera. Accordingly, the learning apparatus 1000 may acquire a color and monochrome stereo pair for learning including a monochrome image photographed using a stereo camera including a monochrome camera and a color camera and the color image.

또한, 학습 장치(1000)는 학습에 충분한 양의 컬러 이미지와 모노크롬 이미지를 포함하는 학습용 컬러 및 모노크롬 스테레오 페어를 확보하기 위해 기존의 컬러 이미지로만 구성된 스테레오 데이터 세트에 데이터 어그멘테이션(augmentation)을 수행하여 컬러 카메라에서 획득되는 컬러 이미지에 대응하는 학습용 컬러 이미지와 모노크롬 카메라에서 획득되는 모노크롬 이미지에 대응하는 학습용 모노크롬 이미지를 획득할 수 있다. 이러한 데이터 어그멘테이션은 컬러 카메라와 모노크롬 카메라 각각의 특성을 참조하여 수행될 수 있다.In addition, the learning apparatus 1000 performs data augmentation on a stereo data set consisting only of an existing color image in order to secure a color and monochrome stereo pair for learning including a sufficient amount of color images and monochrome images for learning. Thus, it is possible to obtain a color image for learning corresponding to the color image obtained from the color camera and a monochrome image for learning corresponding to the monochrome image obtained from the monochrome camera. Such data augmentation may be performed with reference to the characteristics of each of the color camera and the monochrome camera.

일 예로, 도 4를 참조하여 학습용 모노크롬 이미지와 학습용 컬러 이미지를 생성하는 방법을 설명하면 다음과 같다.As an example, a method of generating a monochrome image for learning and a color image for learning will be described with reference to FIG. 4 .

먼저, 도 4의 (a)에서와 같이, 기존의 컬러 스테레오 페어의 좌측에 위치한 컬러 이미지에 대해 탈색, 무작위 감마 맵핑(gamma mapping) 및 신호 의존성 가우시안 노이즈(signal-dependent Gaussian noise) 추가 등을 수행함으로써, 본 발명에 따른 스테레오 페어 상에서 좌측 이미지에 대응하는 학습용 모노크롬 이미지를 생성할 수 있다. 여기서, 감마 맵핑이란 이미지 상 빛의 강도 신호를 변형하는 보정을 수행하는 것이고, 가우시안 노이즈와 같이 정규분포를 가지는 잡음을 추가함으로써 이미지에서 발생하는 일반적인 노이즈를 재현할 수 있다. First, as in FIG. 4A , decolorization, random gamma mapping, and signal-dependent Gaussian noise addition are performed on a color image located on the left side of an existing color stereo pair. By doing so, it is possible to generate a learning monochrome image corresponding to the left image on the stereo pair according to the present invention. Here, gamma mapping is to perform correction to transform the light intensity signal on the image, and by adding noise having a normal distribution, such as Gaussian noise, it is possible to reproduce general noise generated in the image.

즉, 학습용 모노크롬 이미지를 생성하기 위해서는 기존의 스테레오 페어의 좌측에 위치한 컬러 이미지에 대해 통상적인 RGB에서 회색조로의 변환을 수행하여 탈색을 수행한 다음 모노크롬 카메라의 광 효율을 재현하기 위해 무작위 감마 맵핑(가령, 감마 값의 범위: [0.6, 0.9])을 적용할 수 있다. 이에 더불어,

{0.005, 0.015}의 기설정된 표준 편차를 가지는 신호 의존성 가우시안 노이즈를 무작위로 추가하여 학습용 모노크롬 이미지를 생성할 수 있다. That is, in order to generate a monochrome image for training, the color image located on the left side of the existing stereo pair is decolorized by performing normal RGB to grayscale conversion, and then random gamma mapping ( For example, a range of gamma values: [0.6, 0.9]) can be applied. In addition to this,

A monochromatic image for training may be generated by randomly adding signal-dependent Gaussian noise having a preset standard deviation of {0.005, 0.015}.

다음으로, 도 4의 (b)를 참조하면, 기존의 컬러 스테레오 페어의 우측에 위치한 컬러 이미지로부터 로우 이미지(raw image)를 생성한 다음 신호 의존성 가우시안 노이즈(signal-dependent Gaussian noise) 추가, 디모자이킹 및 대비(contrast), 색조(hue) 및 채도(saturation)에 대한 무작위 조정을 수행함으로써, 본 발명에 다른 스테레오 페어 상에서 우측 이미지에 대응되며, 학습용 모노크롬 이미지보다 낮은 광 효율과 해상도를 재현한 학습용 컬러 이미지를 생성할 수 있다.Next, referring to FIG. 4B , a raw image is generated from a color image located on the right side of an existing color stereo pair, and then signal-dependent Gaussian noise is added, demosaicing. By performing random adjustments of king and contrast, hue and saturation, it corresponds to the right image on the stereo pair according to the present invention, and for learning that reproduces lower light efficiency and resolution than the monochrome image for learning You can create color images.

구체적으로, 학습용 컬러 이미지를 생성하기 위해서는 기존 스테레오 페어의 우측에 위치한 컬러 이미지에 대해 스트라이드(stride)가 2인 고정 그리드(fixed grid)에서 각 색상 채널의 픽셀을 샘플링한 후, BGGR Bayer 패턴이 있는 회색조의 로우 포맷(RAW format)을 생성하기 위해 2Х2 픽셀의 직사각형 그리드에서 픽셀을 수집할 수 있다. 이어서, 컬러 이미지의 모노크롬 이미지보다 낮은 해상도는 디모자이킹과 관련이 있으므로 통상적인 디모자이킹 기술을 이용하여 로우 이미지를 컬러 이미지로 다시 변환할 수 있다. 이에 더하여, 채도(예: 범위 [0, 1]), 대비(예: 범위 [0, 2]) 및 색조(예: 범위 [-1, 1])를 포함한 색상 어그멘테이션을 수행하여 다양한 스펙트럼 감도를 시뮬레이션하고

{0.03, 0.05}의 학습용 모노크롬 이미지보다 상대적으로 높은 표준 편차를 갖는 무작위 신호 의존성 노이즈를 추가하여 학습용 컬러 이미지를 생성할 수 있다.Specifically, in order to generate a color image for training, pixels of each color channel are sampled from a fixed grid having a stride of 2 with respect to a color image located on the right side of the existing stereo pair, and then pixels with a BGGR Bayer pattern are Pixels can be collected from a rectangular grid of 2Х2 pixels to create a grayscale RAW format. The lower resolution of the color image than the monochrome image is then associated with demosaicing, so conventional demosaicing techniques can be used to convert the raw image back to a color image. In addition, it performs color augmentation including saturation (eg range [0, 1]), contrast (eg range [0, 2]), and hue (eg range [-1, 1]) to provide a different spectrum simulate the sensitivity

A color image for training can be generated by adding random signal-dependent noise having a relatively higher standard deviation than the monochrome image for training of {0.03, 0.05}.

한편, 상기에서는 본 발명에 따른 스테레오 카메라의 구성에서, 모노크롬 카메라가 좌측에 위치하며, 컬러 카메라가 우측에 위치하는 것으로 설명하였으나, 이는 하나의 예시를 나타낸 것으로, 모노크롬 카메라와 컬러 카메라의 위치는 서로 반대가 될 수 있으며, 그에 따라 기존의 컬러 스테레오 페어에서 학습용 모노크롬 이미지와 학습용 컬러 이미지를 생성하기 위한 각각의 컬러 이미지의 위치 또한 서로 반대로 될 수도 있다.Meanwhile, in the above configuration of the stereo camera according to the present invention, the monochrome camera is positioned on the left and the color camera is on the right. It may be reversed, and accordingly, positions of the respective color images for generating the training monochrome image and the training color image in the existing color stereo pair may also be opposite to each other.

다시 도 3을 참조하면, 학습 장치(1000)는 학습용 컬러 및 모노크롬 스테레오 페어를 디스패리티 네트워크(disparity network)(100)에 입력하여, 디스패리티 네트워크(100)로 하여금, 학습용 컬러 및 모노크롬 스테레오 페어의 학습용 컬러 이미지와 학습용 모노크롬 이미지 사이의 매칭 코스트에 따라 학습용 모노크롬 이미지에 대응하는 학습용 디스패리티 맵을 생성하도록 할 수 있다. Referring back to FIG. 3 , the learning apparatus 1000 inputs a training color and monochrome stereo pair to a disparity network 100 , and causes the disparity network 100 to select a color and monochrome stereo pair for training. According to the matching cost between the color image for learning and the monochrome image for learning, a disparity map for learning corresponding to the monochrome image for learning may be generated.

다음으로, 학습 장치(1000)는 학습용 컬러 및 모노크롬 스테레오 페어를 오클루션 네트워크(occlusion network)(200)에 입력하여, 오클루션 네트워크(200)로 하여금 학습용 컬러 이미지와 학습용 모노크롬 이미지 사이의 디퍼런스(difference)에 따른 오클루션 맵을 생성하도록 할 수 있다.Next, the learning apparatus 1000 inputs the learning color and monochrome stereo pair to the occlusion network 200, and causes the occlusion network 200 to differentiate between the learning color image and the learning monochrome image ( difference) to generate an occlusion map.

또한, 학습 장치(1000)는 학습용 컬러 및 모노크롬 스테레오 페어를 디노이징 네트워크(denoising network)(300)에 입력하여, 디노이징 네트워크(300)로 하여금 학습용 컬러 이미지 및 학습용 모노크롬 이미지 각각에서의 노이즈를 제거한 학습용 디노이즈드(denoised) 모노크롬 이미지 및 학습용 디노이즈드 컬러 이미지를 생성하도록 할 수 있다.In addition, the learning apparatus 1000 inputs the learning color and monochrome stereo pair to the denoising network 300, and the denoising network 300 removes noise from each of the learning color image and the learning monochrome image. It is possible to generate a denoised monochrome image for training and a denoised color image for training.

이어서, 학습 장치(1000)는 학습용 디스패리티 맵을 참조하여 학습용 디노이즈드 컬러 이미지를 와핑(warping)하여 학습용 와프드(warped) 디노이즈드 컬러 이미지를 생성하고, 학습용 와프드 디노이즈드 컬러 이미지와 학습용 디노이즈드 모노크롬 이미지를 컨캐터네이션(concatenation)하여 학습용 초기 컬러 복원 이미지를 생성하며, 학습용 초기 컬러 복원 이미지, 학습용 오클루션 맵 및 학습용 모노크롬 이미지를 컬러화 네트워크(colorization network)(400)에 입력하여, 컬러화 네트워크(400)로 하여금, 오클루션 맵을 참조하여 학습용 초기 컬러 복원 이미지의 컬러 정보에 따라 학습용 모노크롬 이미지의 컬러를 복원하여 학습용 최종 컬러 복원 이미지를 생성하도록 할 수 있다. 여기서, 와핑은 미분 가능한 쌍선형 보간법(differentiable bilinear interpolation)을 통해 수행될 수 있으며, 학습 장치(1000)는 학습용 와프드 디노이즈드 모노크롬 이미지와 학습용 디노이즈드 컬러 이미지를 채널 방향으로 컨캐터네이션하여, 학습용 와프드 디노이즈드 컬러 이미지의 컬러 정보가 크로미넌스(chrominance) 채널인 U 채널과 V 채널을 구성하고 학습용 디노이즈드 모노크롬 이미지의 휘도 정보가 루미넌스(luminance) 채널인 Y 채널을 구성하는 학습용 초기 컬러 복원 이미지를 생성할 수 있다.Next, the learning apparatus 1000 generates a warped denoised color image for learning by warping the denoised color image for learning with reference to the disparity map for learning, and a warped denoised color image for learning. And the denoised monochrome image for training is concatenated to generate an initial color restored image for learning, and the initial color restored image for learning, the occlusion map for learning, and the monochrome image for learning are input to a colorization network 400 Thus, the colorization network 400 may restore the color of the monochrome image for learning according to the color information of the initial color restored image for learning with reference to the occlusion map to generate the final color restored image for training. Here, the warping may be performed through differentiable bilinear interpolation, and the learning apparatus 1000 concatenates the warped denoised monochrome image for training and the denoised color image for training in the channel direction. , the color information of the warped denoised color image for training constitutes the U channel and V channel, which are the chrominance channels, and the luminance information of the denoised monochrome image for training constitutes the Y channel, which is the luminance channel. An initial color restoration image for training can be generated.

다음으로, 학습 장치(1000)는 학습용 디스패리티 맵과 디스패리티 GT(ground truth) 맵을 참조하여 생성한 제1 로스, 학습용 오클루션 맵과 오클루션 GT 맵을 참조하여 생성한 제2 로스, 학습용 디노이즈드 모노크롬 이미지 및 학습용 디노이즈드 컬러 이미지와 이에 대응하는 디노이즈드 GT 이미지를 참조하여 생성한 제3 로스 및 학습용 최종 컬러 복원 이미지의 학습용 복원 크로미넌스(chrominance) 값과 이에 대응하는 GT 크로미넌스 값을 참조하여 생성한 제4 로스에 각각 가중치를 부여하여 생성한 합산 로스를 이용하여 디스패리티 네트워크(100), 오클루션 네트워크(200), 디노이징 네트워크(300) 및 컬러화 네트워크(400)에 포함된 적어도 하나의 파라미터를 학습시킬 수 있다.Next, the learning apparatus 1000 includes a first loss generated by referring to a disparity map for learning and a disparity ground truth (GT) map, a second loss generated by referring to an occlusion map for learning and an occlusion GT map for learning, and a second loss for learning. Reconstruction chrominance values for training and GT corresponding to the third loss and final color reconstructed image for training generated by referring to the denoised monochrome image, the denoised color image for training, and the corresponding denoised GT image The disparity network 100 , the occlusion network 200 , the denoising network 300 , and the colorization network 400 use the summation loss generated by giving weight to the fourth loss generated by referring to the chrominance value. ) can be learned at least one parameter included in the .

위와 같이 디스패리티 네트워크(100), 오클루션 네트워크(200), 디노이징 네트워크(300) 및 컬러화 네트워크(400)를 이용하여 컬러 이미지와 모노크롬 이미지를 이용에 대한 스테레오 매칭을 수행하는 방법을 도 5 내지 도 10을 참조하여 구체적으로 설명하면 다음과 같다.A method of performing stereo matching using a color image and a monochrome image using the disparity network 100, the occlusion network 200, the denoising network 300, and the colorization network 400 as described above is shown in FIGS. It will be described in detail with reference to FIG. 10 as follows.

우선, 도 5는 본 발명의 일 실시예에 따라 컬러 이미지와 모노크롬 이미지로 구성된 스테레오 페어로부터 디스패리티 맵(disparity map)을 생성하는 디스패리티 네트워크(disparity network)(100)의 구조를 개략적으로 도시한 것이다. First, FIG. 5 schematically shows the structure of a disparity network 100 for generating a disparity map from a stereo pair consisting of a color image and a monochrome image according to an embodiment of the present invention. will be.

도 5를 참조하면, 학습 장치(1000)는 우선 학습용 모노크롬 이미지와 학습용 컬러 이미지를 디스패리티 네트워크(100)에 입력하여, 디스패리티 네트워크(100)로 하여금, 디스패리티 네트워크(100)의 인코더(encoder)에 포함된 제1_1 서브 인코더를 통해 학습용 모노크롬 이미지를 인코딩하여 학습용 모노크롬 이미지에 대한 학습용 제1_1 피쳐 맵(feature map)을 생성하고, 디스패리티 네트워크(100)의 인코더에 포함된 제1_2 서브 인코더를 통해 학습용 컬러 이미지를 인코딩하여 학습용 컬러 이미지에 대한 학습용 제1_2 피쳐 맵을 생성하도록 할 수 있다. 일 예로, 학습 장치(1000)는 제1_1 서브 인코더와 제1_2 서브 인코더를 통해 학습용 모노크롬 이미지와 학습용 컬러 이미지 각각에 7*7 컨볼루션 연산과 5*5 컨벌루션 연산을 수행하여 학습용 모노크롬 이미지와 학습용 컬러 이미지에 비하여 사이즈가 축소되며 채널수가 증가된 제1_1 피처 맵과 제1_2 피처 맵을 생성할 수 있다. 이때, 7*7 컨볼루션 연산과 5*5 컨볼루션 연산 각각은, 서로 다른 스트라이드를 가지는 적어도 한 번 이상의 컨볼루션 연산을 수행할 수 있으며, 컨볼루션 연산에서 제로 패딩을 통해 기설정된 사이즈의 피처맵이 출력되도록 할 수도 있다.Referring to FIG. 5 , the learning apparatus 1000 first inputs a monochrome image for learning and a color image for learning to the disparity network 100 , and causes the disparity network 100 to cause the disparity network 100 to be an encoder of the disparity network 100 . ) by encoding the monochromatic image for learning through the 1_1 sub-encoder included in ) to generate a 1_1 feature map for learning for the monochromatic image for training, and the 1_2 sub-encoder included in the encoder of the disparity network 100 Through this, it is possible to encode the training color image to generate the first_2 feature map for training with respect to the training color image. As an example, the learning apparatus 1000 performs a 7*7 convolution operation and a 5*5 convolution operation on each of the monochrome image for learning and the color image for learning through the 1_1 sub-encoder and the 1_2 sub-encoder, respectively, to obtain the monochromatic image for learning and the color for learning. The first_1 feature map and the first_2 feature map having an increased number of channels and a reduced size compared to the image may be generated. In this case, each of the 7*7 convolution operation and the 5*5 convolution operation may perform at least one convolution operation with different strides, and a feature map of a preset size through zero padding in the convolution operation. You can also make this output.

그리고, 학습 장치(1000)는 디스패리티 네트워크(100)로 하여금 디스패리티 네트워크(100)의 인코더에 포함된 코릴레이션 레이어(correlation layer)를 통해 학습용 제1_1 피쳐 맵과 학습용 제1_2 피쳐 맵에 코릴레이션 연산을 수행하여 매칭 코스트를 연산함으로써 학습용 컬러 및 모노크롬 스테레오 페어에 대응되는 학습용 코릴레이션 피쳐 맵을 생성하도록 할 수 있다. Then, the learning apparatus 1000 causes the disparity network 100 to correlate to the 1_1 feature map for learning and the 1_2 feature map for learning through a correlation layer included in the encoder of the disparity network 100 . By performing an operation and calculating a matching cost, a correlation feature map for learning corresponding to a color and monochrome stereo pair for learning may be generated.

여기서, 학습 장치(1000)는 디스패리티 네트워크(100)로 하여금, 코릴레이션 연산으로서 학습용 제1_1 피쳐 맵 상의 적어도 하나의 제1 영역에 포함된 적어도 하나의 제1 패치에 대응하는 학습용 제1 피쳐 벡터와 제1 영역에 대응되는 학습용 제1_2 피쳐 맵 상의 적어도 하나의 제2 영역에 포함된 적어도 하나의 제2 패치에 대응하는 학습용 제2 피쳐 벡터를 내적 연산함으로써 학습용 코릴레이션 피쳐 맵을 생성하도록 할 수 있다. 여기서, 학습용 제1 피쳐 벡터와 학습용 제2 피쳐 벡터는 각각 제1 패치와 제2 패치에 포함된 이미지 강도 또는 색상의 변경에 따른 이미지 상 엣지 정보를 나타낼 수 있는 기울기 정보(gradient information)를 포함할 수 있으나, 본 발명이 이에 한정되는 것은 아니다.Here, the learning apparatus 1000 causes the disparity network 100 to perform a first feature vector for learning corresponding to at least one first patch included in at least one first area on the first_1 feature map for learning as a correlation operation. and a second feature vector for learning corresponding to at least one second patch included in at least one second region on the first_2 feature map for learning corresponding to the first region to generate a correlation feature map for learning have. Here, the first feature vector for learning and the second feature vector for learning may include gradient information that may indicate edge information on an image according to a change in image intensity or color included in the first patch and the second patch, respectively. However, the present invention is not limited thereto.

그리고, 학습 장치(1000)는 디스패리티 네트워크(100)로 하여금 디스패리티 네트워크(100)의 인코더에 포함된 제2 서브 인코더를 통해 학습용 코릴레이션 피쳐 맵을 인코딩하여 학습용 제2 피쳐 맵을 생성하도록 할 수 있다. 일 예로, 학습 장치(1000)는 제2 서브 인코더를 통해 학습용 코릴레이션 피쳐 맵에 적어도 하나의 3*3 컨벌루션 연산을 수행할 수 있다. 이때, 3*3 컨볼루션 연산은, 서로 다른 스트라이드를 가지는 적어도 한 번 이상의 컨볼루션 연산을 수행할 수 있다.Then, the learning apparatus 1000 causes the disparity network 100 to encode the correlation feature map for learning through the second sub-encoder included in the encoder of the disparity network 100 to generate a second feature map for learning. can As an example, the learning apparatus 1000 may perform at least one 3*3 convolution operation on the correlation feature map for learning through the second sub-encoder. In this case, the 3*3 convolution operation may perform at least one or more convolution operations having different strides.

이어서, 학습 장치(1000)는 디스패리티 네트워크(100)로 하여금 디스패리티 네트워크(100)의 디코더(decoder)를 통해 학습용 제2 피쳐 맵을 디코딩하여 학습용 디스패리티 맵을 생성하도록 할 수 있다. 여기서, 생성된 디스패리티 맵은 학습용 컬러 및 모노크롬 스테레오 페어의 좌측 이미지인 학습용 모노크롬 이미지에 대응한 깊이 정보를 나타낼 수 있다. 일 예로, 학습 장치(1000)는 디코더를 통해 학습용 제2 피쳐 맵에 적어도 하나의 3*3 디컨벌루션 연산을 수행함으로써, 학습용 제2 피처맵에 비하여 사이즈가 확대된 학습용 디스패리티 맵을 생성할 수 있다. 이때, 디코더의 최종단에서는 하나의 채널을 가진 3*3 필터를 이용한 컨볼루션 연산을 통해 하나의 채널을 가진 학습용 디스패리티 맵을 생성할 수 있다.Subsequently, the learning apparatus 1000 may cause the disparity network 100 to decode the second feature map for learning through a decoder of the disparity network 100 to generate a learning disparity map. Here, the generated disparity map may indicate depth information corresponding to a learning monochrome image that is a left image of a color and monochrome stereo pair for learning. For example, the learning apparatus 1000 may generate a disparity map for learning whose size is enlarged compared to that of the second feature map for learning by performing at least one 3*3 deconvolution operation on the second feature map for learning through a decoder. have. In this case, the final stage of the decoder may generate a disparity map for learning having one channel through a convolution operation using a 3*3 filter having one channel.

한편, 학습 장치(1000)는 이미지 상 객체들에 대한 하이-레벨 정보(high-level information)와 정밀한 로컬 정보(local information)을 모두 보존하기 위해 디스패리티 네트워크(100)의 인코더와 디코더의 서로 대응되는 하나 이상의 레이어들에서 생성된 인터미디어트 피쳐 맵(intermediate featue map)들을 연결시킬 수 있다.Meanwhile, the learning apparatus 1000 corresponds to the encoder and the decoder of the disparity network 100 in order to preserve both high-level information and precise local information about objects on the image. It is possible to connect intermediate feature maps generated in one or more layers.

이에 따르면, 학습 장치(1000)는 디스패리티 네트워크(100)로 하여금, 제1_1 서브 인코더, 제1_2 서브 인코더 및 제2 서브 인코더에 포함된 학습용 제1 인터미디어트 피쳐 맵을 생성하는 적어도 하나의 인코딩 레이어 중 적어도 일부인 소정의 인코딩 레이어에서 생성되는 소정의 학습용 제1 인터미디어트 피쳐 맵을 디코더에 포함되어 학습용 제1 인터미디어트 피쳐 맵에 대응하는 학습용 제2 인터미디어트 피쳐 맵을 생성하는 적어도 하나의 디코딩 레이어 중 소정의 인코딩 레이어에 대응되는 소정의 디코딩 레이어에서 생성되는 소정의 학습용 제2 인터미디어트 피쳐 맵과 컨캐터네이션(concatenation)하여 적어도 하나의 학습용 컨캐터네이션 피쳐 맵을 생성할 수 있다. 여기서, 학습용 제1 인터미디어트 피쳐 맵은 학습용 제1_1 피쳐 맵, 학습용 제1_2 피쳐 맵 및 학습용 제2 피쳐 맵을 포함한다. 또한, 일 예로, 도 5와 같이 학습 장치(1000)는 학습용 제1 인터미디어트 피쳐 맵과 학습용 제2 인터미디어트 피쳐 맵을 컨캐터네이션하여 3*3 컨벌루션 연산한 후 쌍선형 보간(bilinear interpolation) 연산을 수행한 다음 3*3 디컨벌루션 연산을 수행할 수 있다.According to this, the learning apparatus 1000 causes the disparity network 100 to generate a first intermediate feature map for learning included in the 1_1 sub-encoder, the 1_2 sub-encoder, and the second sub-encoder at least one encoding. At least one of a first intermediate feature map for learning generated from a predetermined encoding layer that is at least a part of the layers included in the decoder to generate a second intermediate feature map for learning corresponding to the first intermediate feature map for learning At least one learning concatenation feature map may be generated by concatenating with a predetermined second intermediate feature map for learning generated in a predetermined decoding layer corresponding to a predetermined encoding layer among the decoding layers of . Here, the first intermediate feature map for learning includes a first_1 feature map for learning, a first_2 feature map for learning, and a second feature map for learning. Also, as an example, as shown in FIG. 5 , the learning apparatus 1000 concatenates the first intermediate feature map for learning and the second intermediate feature map for learning, performs a 3*3 convolution operation, and then performs bilinear interpolation ) operation, then 3*3 deconvolution operation can be performed.

위와 같이 구성된 디스패리티 네트워크(100)를 학습시키기 위해서 학습 장치(1000)는 학습용 디스패리티 맵과 디스패리티 GT 맵을 참조하여 제1 로스를 생성할 수 있다. 구체적으로, 디스패리티 네트워크(100)를 학습시키기 위해서는 역 후버 로스(reverse Huber loss)를 제1 로스(

)로서 생성할 수 있다.In order to learn the disparity network 100 configured as described above, the learning apparatus 1000 may generate a first loss with reference to a learning disparity map and a disparity GT map. Specifically, in order to train the disparity network 100, a reverse Huber loss is applied to the first loss (

) can be created as

[수학식 1][Equation 1]

여기서,

와

각각은 학습용 디스패리티 맵과 디스패리티 GT 맵을 나타낸다.

은 이미지 상의 픽셀 수를 나타내며,

는 이미지의 픽셀 인덱스를 나타낸다. 또한,

는 절대 연산이며,

는 그 값이

로 지정(

)된 변수이다. here,

Wow

Each represents a learning disparity map and a disparity GT map.

represents the number of pixels on the image,

represents the pixel index of the image. In addition,

is an absolute operation,

is its value

specified as (

) is a variable.

다음으로, 도 6은 본 발명의 일 실시예에 따라 학습용 컬러 및 모노크롬 스테레오 페어로부터 오클루션 맵(occlusion map)을 생성하는 오클루션 네트워크(occlusion network)(200)의 구조를 개략적으로 도시하고 있다.Next, FIG. 6 schematically shows the structure of an occlusion network 200 that generates an occlusion map from a color and monochrome stereo pair for learning according to an embodiment of the present invention.

오클루션 맵을 생성하기 위해, 학습 장치(1000)는 학습용 디스패리티 맵을 참조하여 학습용 컬러 이미지를 와핑하여 생성한 학습용 와프드(warped) 컬러 이미지 상의 각 픽셀의 픽셀값과 학습용 모노크롬 이미지 상의 각 픽셀의 픽셀값 간의 차이를 계산한 학습용 디퍼런스 맵을 생성하고, 학습용 디스패리티 맵을 생성하는데 사용한 학습용 제1 피쳐 맵과 학습용 디퍼런스 맵을 오클루션 네트워크(200)에 입력하여, 오클루션 네트워크(200)로 하여금 학습용 디퍼런스 맵에서 추출된 피쳐와 학습용 제1 피쳐 맵의 피쳐를 참조하여 학습용 와프드 컬러 이미지와 학습용 모노크롬 이미지 간의 오클루션을 각 픽셀에 대한 바이너리 값으로 표시한 학습용 오클루션 맵(occlusion map)을 생성하도록 할 수 있다. In order to generate the occlusion map, the learning apparatus 1000 refers to the training disparity map and warps the training color image to create a pixel value of each pixel on the training warped color image and each pixel on the training monochrome image. Generates a learning difference map that calculates the difference between pixel values of ) with reference to the features extracted from the difference map for learning and the features of the first feature map for learning, the occlusion between the warped color image for training and the monochrome image for training is displayed as a binary value for each pixel. map) can be created.

이에 따라, 도 6과 같이 학습 장치(1000)는 우선 오클루션 네트워크(200)로 하여금, 오클루션 네트워크(200)의 인코더를 통해 학습용 디퍼런스 맵을 인코딩하여 학습용 제3 피쳐 맵을 생성하도록 할 수 있다. 일 예로, 도 6과 같이 학습 장치(1000)는 인코더를 통해 학습용 디퍼런스 맵에 적어도 하나의 3*3 컨벌루션 연산을 수행할 수 있다.Accordingly, as shown in FIG. 6 , the learning apparatus 1000 first causes the occlusion network 200 to encode the difference map for learning through the encoder of the occlusion network 200 to generate a third feature map for learning. have. For example, as shown in FIG. 6 , the learning apparatus 1000 may perform at least one 3*3 convolution operation on the learning difference map through an encoder.

다음으로, 학습 장치(1000)는 오클루션 네트워크(200)의 디코더를 통해 학습용 제3 피쳐 맵과 학습용 제2 피쳐 맵을 컨캐터네이션(concatenation)한 다음 디코딩하여 학습용 오클루션 맵을 생성하도록 할 수 있다. 일 예로, 도 6과 같이 학습 장치(1000)는 디코더의 컨캐터네이션 레이어를 통해 학습용 제3 피쳐 맵과 학습용 제2 피쳐 맵을 컨캐터네이션한 다음 적어도 하나의 3*3 디컨벌루션 연산과 적어도 하나의 3*3 컨벌루션 연산을 수행할 수 있다. 이때, 디코더의 최종단에서는 두 개의 채널을 가진 3*3 필터를 이용한 컨볼루션 연산을 통해 두 개의 채널을 가진 학습용 오클루션 맵을 생성할 수 있다.Next, the learning apparatus 1000 concatenates the third feature map for learning and the second feature map for learning through the decoder of the occlusion network 200, and then decodes it to generate an occlusion map for learning. have. For example, as shown in FIG. 6 , the learning apparatus 1000 concatenates the third feature map for learning and the second feature map for learning through the concatenation layer of the decoder, and then performs at least one 3*3 deconvolution operation and at least one 3*3 convolution operation of . In this case, the final stage of the decoder may generate an occlusion map for learning having two channels through a convolution operation using a 3*3 filter having two channels.

여기서, 학습용 디퍼런스 맵이 오클루션 네트워크(200)의 입력으로 사용되는 이유는 텍스쳐 유사성(texture similarities)에 대한 피쳐 맵을 학습용 제3 피쳐 맵으로서 생성하기 위함이다. 또한, 오클루션 네트워크(200)는 학습용 제3 피쳐 맵의 텍스쳐 유사성 정보에 학습용 제2 피쳐 맵의 디스패리티 정보를 추가로 이용함으로써 생성되는 학습용 오클루션 맵의 정확도를 높이고자 한다. Here, the reason the difference map for learning is used as an input of the occlusion network 200 is to generate a feature map for texture similarities as a third feature map for learning. In addition, the occlusion network 200 intends to increase the accuracy of the occlusion map for learning generated by additionally using the disparity information of the second feature map for learning to the texture similarity information of the third feature map for learning.

한편, 위와 같이 구성된 오클루션 네트워크(200)를 학습시키기 위해 학습 장치(1000)는 학습용 오클루션 맵과 오클루션 GT 맵을 참조하여 제2 로스를 생성할 수 있다. 구체적으로, 제2 로스(

)를 구하기 위한 손실 함수로써는 바이너리 크로스 엔트로피(binary cross-entropy)를 사용할 수 있다.Meanwhile, in order to learn the occlusion network 200 configured as described above, the learning apparatus 1000 may generate a second loss with reference to the occlusion map for learning and the occlusion GT map. Specifically, the second loss (

), binary cross-entropy can be used as a loss function.

[수학식 2][Equation 2]

여기서,

는 바이너리 마스크(binary mask; 오클루션된 영역은 1, 오클루션되지 않은 영역은 0)를 나타내며,

는 오클루션된 영역의 예측 확률을 나타낸다.here,

represents a binary mask (1 for an occluded area, 0 for an unoccluded area),

denotes the predicted probability of the occluded region.

이와 같은 방법에 의해 생성된 오클루션 맵을 이용하여 컬러를 복원할 경우 종래에 비하여 보다 정확한 컬러 복원 이미지를 생성할 수 있다.When the color is reconstructed using the occlusion map generated by this method, a more accurate color reconstructed image can be generated compared to the related art.

즉, 도 7을 참조하면, 도 7의 (a)의 오클루션 GT 맵에서와 오클루션이 존재하는 상태에서, 도 7의 (b)에서와 같이 오클루션에 대한 정보를 반영하지 않고 컬러를 복원하였을 경우에는, 오클루션 영역의 일부 영역에서 컬러 블리딩(color bleeding) 오류, 즉, 색 번짐 현상이 발생하는 반면, 도 7의 (c)에서와 같이 본 발명에 따른 오클루션 맵을 생성한 상태에서, 도 7의 (d)에서와 같이 도 7의 (c)에서 생성된 오클루션 맵에 따른 오클루션 정보를 반영하여 컬러를 복원하였을 경우에는 색 번짐 현상이 교정되는 것을 볼 수 있다.That is, referring to FIG. 7 , in the occlusion GT map of FIG. 7 (a) and in the state in which occlusion exists, the color is restored without reflecting the occlusion information as in FIG. 7 (b). In this case, a color bleeding error, that is, color bleeding, occurs in some areas of the occlusion area, whereas in the state in which the occlusion map according to the present invention is generated as shown in FIG. 7C . , it can be seen that the color bleeding phenomenon is corrected when the color is restored by reflecting the occlusion information according to the occlusion map generated in (c) of FIG. 7 as in (d) of FIG.

다음으로, 도 8은 본 발명의 일 실시예에 따라 모노크롬 이미지와 컬러 이미지로부터 노이즈를 제거하는 디노이징 네트워크(denoising network)(300)의 구조를 개략적으로 도시한 것이다.Next, FIG. 8 schematically illustrates the structure of a denoising network 300 for removing noise from a monochrome image and a color image according to an embodiment of the present invention.

도 8을 참조하면, 학습 장치(1000)는 디노이징 네트워크(300)로 하여금, 학습용 모노크롬 이미지와 학습용 컬러 이미지 각각에 적어도 하나의 컨벌루션(convolution) 연산, 적어도 하나의 배치 정규화(batch normalization) 연산 및 적어도 하나의 ReLU 연산을 수행하여 학습용 잔차 이미지(residual image)를 획득하고, 학습용 모노크롬 이미지와 학습용 컬러 이미지 각각에 대응하는 학습용 잔차 이미지를 참조하여 학습용 모노크롬 이미지와 학습용 컬러 이미지로부터 노이즈를 제거하여 학습용 디노이즈드 모노크롬 이미지와 학습용 디노이즈드 컬러 이미지를 생성하도록 할 수 있다. 여기서, 학습용 잔차 이미지는 학습용 모노크롬 이미지와 학습용 컬러 이미지 각각으로부터 학습용 모노크롬 이미지와 학습용 컬러 이미지 각각에 대응하는 학습용 레이턴트 노이즈리스 이미지(latent noiseless image)를 제거한 출력을 예측한 것일 수 있다. Referring to FIG. 8 , the learning apparatus 1000 causes the denoising network 300 to perform at least one convolution operation, at least one batch normalization operation and Obtain a residual image for training by performing at least one ReLU operation, and remove noise from the monochrome image for training and the color image for training by referring to the training residual image corresponding to each of the training monochrome image and the training color image. You can create a noised monochrome image and a denoised color image for training. Here, the residual image for training may be a prediction of the output obtained by removing a latent noiseless image for training corresponding to each of the training monochrome image and the training color image from each of the training monochrome image and the training color image.

도 8에 따르면, 디노이징 네트워크(300)는 첫 번째 레이어와 마지막 레이어를 제외하고 배치 정규화 및 ReLU(Rectifier Linear Unit) 활성화가 적용된 17개의 컨벌루션 레이어로 구성되어 학습용 잔차 이미지를 생성할 수 있다. 여기서, 배치 정규화를 사용하는 이점은 학습용 잔차 이미지가 배치 정규화의 가우시안 정규화(Gaussian normalization) 단계를 활용한 가우시안 분포를 따른다는 데에 있다. 또한, 각 컨볼루션 레이어는 직교 정규화(orthogonal regularization)로 초기화되어 이미지의 노이즈를 억제하고 이미지의 세부 정보를 보존하는 데 효과적일 수 있다. 이때, 디노이징 네트워크(300)의 마지막 컨벌루션 레이어에서, 디노이징을 수행하는 대상이 모노크롬 이미지일 경우에는 하나의 채널을 가진 컨벌루션 연산을 수행하며, 디노이징을 수행하는 대상이 컬러 이미지일 경우에는 3개의 채널을 가진 컨벌루션 연산을 수행할 수 있다. According to FIG. 8 , the denoising network 300 is composed of 17 convolutional layers to which batch normalization and Rectifier Linear Unit (ReLU) activation are applied except for the first layer and the last layer to generate a residual image for training. Here, the advantage of using batch normalization is that the training residual image follows a Gaussian distribution utilizing the Gaussian normalization step of batch normalization. In addition, each convolutional layer can be initialized with orthogonal regularization, which can be effective in suppressing noise in the image and preserving details in the image. At this time, in the last convolutional layer of the denoising network 300, if the target to be denoised is a monochrome image, a convolution operation with one channel is performed, and if the target to be denoised is a color image, 3 A convolution operation with channels can be performed.

위와 같이 구성된 디노이징 네트워크(300)를 학습시키기 위해 학습 장치(1000)는 학습용 디노이즈드 모노크롬 이미지 및 학습용 디노이즈드 컬러 이미지와 이에 대응하는 디노이즈드 GT 이미지를 참조하여 제3 로스를 생성할 수 있다. 구체적으로, 학습 장치(1000)는 제3 로스(

)를 통해 학습용 디노이즈드 모노크롬 이미지 및 학습용 디노이즈드 컬러 이미지와 이에 대응하는 디노이즈드 GT 이미지 간의 유클리디안 거리(Euclidean Distance)를 최소화함으로써 디노이징 네트워크(300)를 학습시킬 수 있다.In order to train the denoising network 300 configured as above, the learning apparatus 1000 generates a third loss by referring to the denoised monochrome image for learning, the denoised color image for learning, and the denoised GT image corresponding thereto. can Specifically, the learning device 1000 is a third loss (

), the denoising network 300 can be trained by minimizing the Euclidean distance between the denoised monochrome image for learning and the denoised color image for learning and the corresponding denoised GT image.

[수학식 3][Equation 3]

여기서,

는 디노이즈드 GT 이미지를 나타내고,

는 노이즈가 제거되지 않은 상태의 학습용 컬러 이미지와 학습용 모노크롬 이미지를 나타내며,

은 학습용 모노크롬 이미지와 학습용 컬러 이미지 각각에 대응하는 학습용 잔차 이미지를 나타낸다. 그리고,

는 L₂ 노름(L₂ norm)을 나타내며,

는 이미지 채널의 인덱스를 나타낸다. 또한,

는 컬러 채널의 개수를 나타내며, 학습용 모노크롬 이미지가 디노이징 네트워크(300)의 입력일 경우

는 1이 된다. here,

represents the denoised GT image,

represents a color image for training and a monochrome image for training in a state where noise is not removed,

denotes a training residual image corresponding to each of the training monochrome image and the training color image. and,

represents the L ₂ norm (L ₂ norm),

represents the index of the image channel. In addition,

represents the number of color channels, and when a monochrome image for learning is an input of the denoising network 300 ,

becomes 1.

한편, 도 9는 본 발명의 일 실시예에 따라 노이즈를 제거한 스테레오 페어로부터 생성된 디스패리티 맵과 노이즈를 제거하지 않은 스테레오 페어로부터 생성된 디스패리티 맵을 비교한 것이다. 즉, 도 9의 (a)에서와 같은 디스패리티 GT 맵이 있는 상태에서, 도 9의 (b)에서와 같이 노이즈를 제거한 스테레오 페어로부터 생성된 디스패리티 맵과 도 9의 (c)에서와 같이 노이즈를 제거하지 않은 스테레오 페어로부터 생성된 디스패리티 맵을 비교하여 보면, 도 9의 (c)에서의 노이즈를 제거하지 않은 스테레오 페어로부터 생성된 디스패리티 맵에 비하여, 도 9의 (b)에서의 노이즈를 제거한 스테레오 페어로부터 생성된 디스패리티 맵에서는 일부 영역의 디스패리티 정보가 누락된 것을 볼 수 있다. 따라서, 노이즈를 제거한 학습용 디노이즈드 컬러 이미지와 학습용 디노이즈드 모노크롬 이미지로 구성된 스테레오 페어는 정확한 매칭이 어려워 노이즈를 제거하지 않은 스테레오 페어보다 부정확한 디스패리티 맵을 생성하므로 학습용 디스패리티 맵 생성에는 노이즈가 제거되지 않은 학습용 컬러 이미지와 학습용 모노크롬 이미지가 사용된다.Meanwhile, FIG. 9 shows a comparison between a disparity map generated from a stereo pair from which noise has been removed and a disparity map generated from a stereo pair from which noise has been removed according to an embodiment of the present invention. That is, in a state where there is a disparity GT map as in FIG. 9(a), as in FIG. 9(b), a disparity map generated from a stereo pair from which noise is removed and as in FIG. 9(c) Comparing the disparity map generated from the stereo pair from which the noise is not removed, compared to the disparity map generated from the stereo pair from which the noise has not been removed in FIG. In the disparity map generated from the stereo pair from which the noise has been removed, it can be seen that disparity information in some areas is missing. Therefore, a stereo pair consisting of a denoised color image for training and a denoised monochrome image for training from which noise has been removed produces an inaccurate disparity map than a stereo pair without noise removal because it is difficult to accurately match. A color image for training and a monochrome image for training in which is not removed are used.

다음으로, 도 10은 본 발명의 일 실시예에 따라 스테레오 페어, 디스패리티 맵, 오클루션 맵을 이용하여 컬러 복원 이미지를 생성하는 컬러화 네트워크(colorization network)(400)의 구조를 개략적으로 도시한 것이다.Next, FIG. 10 schematically illustrates the structure of a colorization network 400 that generates a color reconstructed image using a stereo pair, a disparity map, and an occlusion map according to an embodiment of the present invention. .

최종 컬러 복원 이미지를 생성하기 위해, 학습 장치(1000)는 컬러화 네트워크(400)로 하여금, 학습용 오클루션 맵의 오클루션 영역 정보를 참조하여 학습용 초기 컬러 복원 이미지로부터 추출된 적어도 하나의 픽셀 피쳐값에 대한 컬러 블리딩(color bleeding) 오류를 교정하여 교정된 픽셀 피쳐값을 생성하고, 레퍼런스 이미지(reference image)로 사용되는 학습용 모노크롬 이미지 상에서 적어도 하나의 엣지로 구분된 각 영역의 컬러를 복원하기 위한 각 영역의 컬러 씨드(seed)로써 교정된 픽셀 피쳐값 중 적어도 일부를 이용하여 학습용 모노크롬 이미지 상의 각 픽셀에 대한 최종 컬러 정보를 복원하여 학습용 최종 컬러 복원 이미지를 생성하도록 할 수 있다. In order to generate the final color reconstructed image, the learning apparatus 1000 causes the colorization network 400 to refer to the occlusion area information of the training occlusion map to at least one pixel feature value extracted from the initial color reconstructed image for training. Each region for generating a corrected pixel feature value by correcting a color bleeding error for By using at least some of the corrected pixel feature values as a color seed of , final color information for each pixel on the learning monochrome image may be restored to generate a final color restored image for training.

이에 따라, 도 10과 같이 학습 장치(1000)는 컬러화 네트워크(400)로 하여금, 컬러화 네트워크(400)의 인코더(encoder)를 이용하여 학습용 초기 컬러 복원 이미지, 학습용 오클루션 맵 및 학습용 모노크롬 이미지를 컨캐터네이션(concatenation)한 다음 인코딩하여 학습용 제4 피쳐 맵을 생성하도록 하며, 컬러화 네트워크(400)의 디코더(decoder)를 이용하여 학습용 제4 피쳐 맵을 디코딩하여 학습용 최종 컬러 복원 이미지를 생성하도록 할 수 있다. 여기서, 도 10을 참조하면 컬러화 네트워크(400)는 학습용 초기 컬러 복원 이미지, 학습용 오클루션 맵 및 학습용 모노크롬 이미지를 바로 컨캐터네이션하는 대신 각각의 입력을 컨벌루션 연산하여 각각의 입력의 채널 수를 동일하게 한 후 컨캐터네이션을 수행할 수 있다. 도 10은 도시상 편의를 위해 컬러화 네트워크(400)의 구조를 간소화하여 표시하였으나 컬러화 네트워크(400)는 차례로 컨벌루션-컨벌루션-배치 정규화를 수행하는 제1 컨벌루션 블록 내지 제4 컨벌루션 블록과 디컨벌루션-컨벌루션-컨벌루션-컨벌루션-업샘플링을 수행하는 제5 컨벌루션 블록 내지 제8 컨벌루션 블록으로 구성될 수 있다. 또한, 예측을 수행하는 레이어를 제외하고는 모든 컨벌루션 레이어에는 후속되는 ReLU 활성화 레이어가 존재한다. 이러한 구조에 따르면, 컬러화 네트워크(400)의 인코더에서 생성되는 피쳐 맵들은 공간상 점진적으로 절반으로 줄어들고 피쳐 다이멘션(featue dimension)은 두 배로 늘어날 수 있다. 반대로, 컬러화 네트워크(400)의 디코더에서 생성되는 피쳐 맵들은 공간 해상도(spatial resolution)는 복구되는 반면, 피쳐 다이멘션은 절반으로 줄 수 있다. 또한, 컬러화 네트워크(400)는 디스패리티 네트워크(100)와 유사하게 인코더와 디코더의 대응되는 레이어를 연결하는 스킵 커넥션(skip connection)을 통해 컬러화 네트워크(400)의 디코더가 보다 정확한 공간 해상도 복구가 가능하도록 할 수 있다. Accordingly, as shown in FIG. 10 , the learning device 1000 causes the colorization network 400 to control the initial color restoration image for learning, the occlusion map for learning, and the monochrome image for learning using the encoder of the colorization network 400 . After concatenation and encoding, the fourth feature map for learning is generated, and the fourth feature map for learning is decoded using a decoder of the colorization network 400 to generate the final color reconstructed image for learning. have. Here, referring to FIG. 10 , the colorization network 400 performs a convolution operation on each input instead of directly concatenating the initial color restoration image for training, the occlusion map for training, and the monochrome image for training to make the number of channels of each input the same. After that, concatenation can be performed. 10 shows a simplified structure of the colorization network 400 for convenience of illustration, but the colorization network 400 sequentially performs convolution-convolution-batch normalization with first to fourth convolutional blocks and deconvolution-convolution -Convolution-convolution-upsampling may be composed of a fifth convolution block to an eighth convolution block. In addition, a subsequent ReLU activation layer exists in all convolutional layers except for a layer that performs prediction. According to this structure, feature maps generated by the encoder of the colorization network 400 may be gradually halved in space and a feature dimension may be doubled. Conversely, feature maps generated by the decoder of the colorization network 400 may restore spatial resolution while halving feature dimensions. In addition, similar to the disparity network 100 , the colorization network 400 enables the decoder of the colorization network 400 to more accurately recover spatial resolution through a skip connection connecting the corresponding layers of the encoder and the decoder. can make it

일 예로, 도 10의 예시를 보면, 학습 장치(1000)는 컬러화 네트워크(400)의 인코더를 통해 학습용 초기 컬러 복원 이미지, 학습용 오클루션 맵 및 학습용 모노크롬 이미지에 각각 3*3 컨벌루션 연산을 적용한 다음, 컨캐터네이션하고, 적어도 하나의 3*3 컨벌루션 연산, 적어도 하나의 배치 정규화(batch normalization) 연산, 적어도 하나의 쌍선형 보간(bilinear interpolation) 연산을 수행하여 학습용 제4 피쳐 맵을 수행할 수 있다. 그런 다음, 학습 장치(1000)는 컬러화 네트워크(400)의 디코더를 통해 학습용 제4 피쳐 맵에 적어도 하나의 4*4 디컨벌루션 연산, 대응되는 인코더의 레이어로부터 출력된 인터미디어트 피쳐 맵과의 컨캐터네이션 연산, 적어도 하나의 3*3 컨벌루션 연산, 적어도 하나의 배치 정규화 연산을 수행할 수 있다.As an example, referring to the example of FIG. 10, the learning apparatus 1000 applies a 3*3 convolution operation to each of the initial color restoration image for learning, the occlusion map for learning, and the monochrome image for learning through the encoder of the colorization network 400. Then, The fourth feature map for learning may be performed by concatenating and performing at least one 3*3 convolution operation, at least one batch normalization operation, and at least one bilinear interpolation operation. Then, the learning apparatus 1000 performs at least one 4*4 deconvolution operation on the fourth feature map for learning through the decoder of the colorization network 400, and the intermediate feature map output from the corresponding encoder layer. A catering operation, at least one 3*3 convolution operation, and at least one batch normalization operation may be performed.

위와 같이 구성된 컬러화 네트워크(400)를 학습시키기 위해 학습 장치(1000)는 학습용 최종 컬러 복원 이미지의 학습용 복원 크로미넌스(chrominance) 값과 이에 대응하는 GT 크로미넌스 값을 참조하여 제4 로스를 생성할 수 있다. 구체적으로, 학습 장치(1000)는 제4 로스(

)를 통해 학습용 복원 크로미넌스 값(

)과 GT 크로미넌스 값(

) 간의 유클리디안 거리(Euclidean Distance)를 줄임으로써 컬러화 네트워크(400)를 학습시킬 수 있다.In order to train the colorization network 400 configured as described above, the learning apparatus 1000 generates a fourth loss by referring to a restored chrominance value for learning of the final color restored image for training and a GT chrominance value corresponding thereto. can do. Specifically, the learning apparatus 1000 is a fourth loss (

) through the restoration chrominance value for training (

) and the GT chrominance value (

), the colorization network 400 can be trained by reducing the Euclidean distance.

[수학식 4][Equation 4]

여기서, 학습 장치(1000)는 앞서 학습용 모노크롬 이미지를 생성하는 과정에서 사용한 기존 스테레오 페어의 좌측에 위치한 컬러 이미지에 대해 탈색을 수행한 탈색된 이미지로부터 YCbCr 채널의 Y 채널을 획득하고, 학습용 컬러 이미지를 생성하는 과정에서 사용한 기존 스테레오 페어의 우측에 위치한 컬러 이미지에 대해 YCbCr 채널로의 변환을 수행하여 CbCr 채널을 획득하여, 위의 Y 채널과 CbCr 채널을 컨캐터네이션하여 다시 RGB 채널로 변환한 후 이로부터 GT 크로미넌스 값을 생성할 수 있다. Here, the learning apparatus 1000 obtains the Y channel of the YCbCr channel from the decolorized image obtained by decolorizing the color image located on the left side of the existing stereo pair used in the process of generating the learning monochrome image, and the learning color image The CbCr channel is obtained by converting the color image located on the right side of the existing stereo pair used in the process of creation to the YCbCr channel, and the above Y channel and CbCr channel are concatenated to convert back to the RGB channel. You can create a GT chrominance value from

결론적으로, 학습 장치(1000)는 위와 같이 생성한 제1 로스, 제2 로스, 제3 로스 및 제4 로스에 각각 가중치를 부여하여 생성한 합산 로스를 이용하여 디스패리티 네트워크(100), 오클루션 네트워크(200), 디노이징 네트워크(300) 및 상기 컬러화 네트워크(400)에 포함된 적어도 하나의 파라미터를 학습시킬 수 있다.In conclusion, the learning apparatus 1000 performs the disparity network 100 and occlusion using the summation loss generated by giving weights to the first loss, the second loss, the third loss, and the fourth loss generated as described above. At least one parameter included in the network 200 , the denoising network 300 , and the colorization network 400 may be learned.

[수학식 5][Equation 5]

여기서,

,

각각은 하이퍼-파라미터(hyper-parameter)이며, 각각을 1, 100, 0.1과 같은 값으로 설정할 수 있다.here,

,

Each is a hyper-parameter, and each can be set to a value such as 1, 100, or 0.1.

위와 같이 컬러 이미지와 모노크롬 이미지를 이용하여 스테레오 매칭을 수행하는 컬러 및 모노크롬 스테레오 네트워크를 학습시킨 상태에서, 테스트 장치가 컬러 및 모노크롬 스테레오 네트워크를 테스트하는 방법을 아래와 같이 설명할 수 있다. As described above, a method for the test device to test the color and monochrome stereo network in a state where the color and monochrome stereo network for performing stereo matching is learned using the color image and the monochrome image can be described as follows.

우선, 도 11은 본 발명의 일 실시예에 따라 컬러 이미지와 모노크롬 이미지를 이용하여 스테레오 매칭을 수행하는 컬러 및 모노크롬 스테레오 네트워크를 테스트하기 위한 테스트 장치를 개략적으로 도시한 것이다. First, FIG. 11 schematically shows a test apparatus for testing a color and monochrome stereo network that performs stereo matching using a color image and a monochrome image according to an embodiment of the present invention.

도 11을 참조하면, 테스트 장치(2000)는 컬러 이미지와 모노크롬 이미지를 이용하여 스테레오 매칭을 수행하는 컬러 및 모노크롬 스테레오 네트워크를 테스트하기 위한 인스트럭션들을 저장하는 메모리(2001)와 메모리(2001)에 저장된 인스트럭션들에 대응하여 컬러 이미지와 모노크롬 이미지를 이용하여 스테레오 매칭을 수행하는 컬러 및 모노크롬 스테레오 네트워크를 테스트하기 위한 프로세서(2002)를 포함할 수 있다. Referring to FIG. 11 , the test device 2000 includes a memory 2001 storing instructions for testing a color and monochrome stereo network performing stereo matching using a color image and a monochrome image, and instructions stored in the memory 2001 . The processor may include a processor 2002 for testing a color and monochrome stereo network that performs stereo matching using a color image and a monochrome image in correspondence with each other.

구체적으로, 테스트 장치(2000)는 전형적으로 컴퓨팅 장치(예컨대, 컴퓨터 프로세서, 메모리, 스토리지, 입력 장치 및 출력 장치, 기타 기존의 컴퓨팅 장치의 구성요소들을 포함할 수 있는 장치; 라우터, 스위치 등과 같은 전자 통신 장치; 네트워크 부착 스토리지(NAS) 및 스토리지 영역 네트워크(SAN)와 같은 전자 정보 스토리지 시스템)와 컴퓨터 소프트웨어(즉, 컴퓨팅 장치로 하여금 특정의 방식으로 기능하게 하는 인스트럭션들)의 조합을 이용하여 원하는 시스템 성능을 달성하는 것일 수 있다.Specifically, test device 2000 is typically a computing device (eg, a device that may include a computer processor, memory, storage, input and output devices, other components of a conventional computing device; electronic devices such as routers, switches, etc.) A desired system using a combination of communication devices; electronic information storage systems such as network attached storage (NAS) and storage area networks (SANs)) and computer software (ie, instructions that cause the computing device to function in a particular way). performance may be achieved.

이와 같이 구성된 본 발명의 일 실시예에 따른 테스트 장치(2000)를 이용하여 본 발명의 일 실시예에 따라 컬러 이미지와 모노크롬 이미지를 이용하여 스테레오 매칭을 수행하기 위한 네트워크를 테스트하는 방법을 설명하면 다음과 같다. 이하에서는 도 3 내지 도 10을 참조한 설명으로부터 용이하게 이해 가능한 부분에 대해서는 상세한 설명을 생략하도록 한다.A method of testing a network for performing stereo matching using a color image and a monochrome image according to an embodiment of the present invention using the test apparatus 2000 according to an embodiment of the present invention configured as described above will be described below. same as Hereinafter, detailed description of parts that can be easily understood from the description with reference to FIGS. 3 to 10 will be omitted.

먼저, 테스트 장치(2000)는 위와 같이 디스패리티 네트워크(100), 오클루션 네트워크(200), 디노이징 네트워크(300) 및 상기 컬러화 네트워크(400)가 학습된 상태에서, 스테레오 카메라를 구성하는 컬러 카메라에 대응되는 테스트용 컬러 이미지와 스테레오 카메라를 구성하는 모노크롬 카메라에 대응되는 테스트용 모노크롬 이미지를 포함하는 적어도 하나의 테스트용 컬러 및 모노크롬 스테레오 페어를 획득할 수 있다.First, the test device 2000 is a color camera constituting a stereo camera in a state in which the disparity network 100 , the occlusion network 200 , the denoising network 300 , and the colorization network 400 are learned as described above. At least one test color and monochrome stereo pair including a test color image corresponding to , and a test monochrome image corresponding to a monochrome camera constituting the stereo camera may be acquired.

이어서, 테스트 장치(2000)는 테스트용 컬러 및 모노크롬 스테레오 페어를 디스패리티 네트워크(100)에 입력하여, 디스패리티 네트워크(100)로 하여금, 테스트용 컬러 및 모노크롬 스테레오 페어의 테스트용 컬러 이미지와 테스트용 모노크롬 이미지 사이의 매칭 코스트에 따라 테스트용 모노크롬 이미지에 대응하는 테스트용 디스패리티 맵을 생성하도록 할 수 있다.Next, the test device 2000 inputs the color and monochrome stereo pair for testing to the disparity network 100 to cause the disparity network 100 to perform a test color image and a test color image of the test color and monochrome stereo pair. According to the matching cost between the monochrome images, a disparity map for testing corresponding to the monochrome image for testing may be generated.

구체적으로, 테스트 장치(2000)는 테스트용 모노크롬 이미지와 테스트용 컬러 이미지를 디스패리티 네트워크(100)에 입력하여, 디스패리티 네트워크(100)로 하여금, 디스패리티 네트워크(100)의 인코더에 포함된 제1_1 서브 인코더를 통해 테스트용 모노크롬 이미지를 인코딩하여 테스트용 모노크롬 이미지에 대한 테스트용 제1_1 피쳐 맵을 생성하고, 디스패리티 네트워크(100)의 인코더에 포함된 제1_2 서브 인코더를 통해 테스트용 컬러 이미지를 인코딩하여 테스트용 컬러 이미지에 대한 테스트용 제1_2 피쳐 맵을 생성하도록 하며, 디스패리티 네트워크(100)의 인코더에 포함된 코릴레이션 레이어를 통해 테스트용 제1_1 피쳐 맵과 테스트용 제1_2 피쳐 맵에 코릴레이션 연산을 수행하여 매칭 코스트를 연산함으로써 테스트용 컬러 및 모노크롬 스테레오 페어에 대응되는 테스트용 코릴레이션 피쳐 맵을 생성하고, 디스패리티 네트워크(100)의 인코더에 포함된 제2 서브 인코더를 통해 테스트용 코릴레이션 피쳐 맵을 인코딩하여 테스트용 제2 피쳐 맵을 생성하도록 하며, 디스패리티 네트워크(100)의 디코더를 통해 테스트용 제2 피쳐 맵을 디코딩하여 테스트용 디스패리티 맵을 생성하도록 할 수 있다.Specifically, the test device 2000 inputs a monochrome image for testing and a color image for testing to the disparity network 100 , and causes the disparity network 100 to include The test monochrome image is encoded through the 1_1 sub-encoder to generate a test 1_1 feature map for the test monochrome image, and the test color image is generated through the 1_2 sub-encoder included in the encoder of the disparity network 100 . Encoding to generate a test 1_2 feature map for a color image for testing, and a correlation layer included in the encoder of the disparity network 100 to code the code into the first 1_1 feature map for testing and 1_2 feature map for testing By performing a relation operation and calculating a matching cost, a correlation feature map for testing corresponding to a color and monochrome stereo pair for testing is generated, and a test code is generated through the second sub-encoder included in the encoder of the disparity network 100 . The relation feature map may be encoded to generate a second feature map for testing, and a disparity map for testing may be generated by decoding the second feature map for testing through a decoder of the disparity network 100 .

여기서, 테스트 장치(2000)는 디스패리티 네트워크(100)로 하여금, 코릴레이션 연산으로서 테스트용 제1_1 피쳐 맵 상의 적어도 하나의 제1 영역에 포함된 적어도 하나의 제1 패치에 대응하는 테스트용 제1 피쳐 벡터와 제1 영역에 대응되는 테스트용 제1_2 피쳐 맵 상의 적어도 하나의 제2 영역에 포함된 적어도 하나의 제2 패치에 대응하는 테스트용 제2 피쳐 벡터를 내적 연산함으로써 테스트용 코릴레이션 피쳐 맵을 생성하도록 할 수 있다.Here, the test apparatus 2000 causes the disparity network 100 to perform a test first patch corresponding to at least one first patch included in at least one first area on the test 1_1 feature map as a correlation operation. A correlation feature map for testing by performing a dot product operation on the feature vector and a second feature vector for testing corresponding to at least one second patch included in at least one second region on the first_2 feature map for testing corresponding to the first region can be made to create

또한, 테스트 장치(2000)는 디스패리티 네트워크(100)로 하여금, 제1_1 서브 인코더, 제1_2 서브 인코더 및 제2 서브 인코더에 포함된 테스트용 제1 인터미디어트 피쳐 맵을 생성하는 적어도 하나의 인코딩 레이어 중 적어도 일부인 소정의 인코딩 레이어에서 생성되는 소정의 테스트용 제1 인터미디어트 피쳐 맵을 디코더에 포함되어 테스트용 제1 인터미디어트 피쳐 맵에 대응하는 테스트용 제2 인터미디어트 피쳐 맵을 생성하는 적어도 하나의 디코딩 레이어 중 소정의 인코딩 레이어에 대응되는 소정의 디코딩 레이어에서 생성되는 소정의 테스트용 제2 인터미디어트 피쳐 맵과 컨캐터네이션하여 적어도 하나의 테스트용 컨캐터네이션 피쳐 맵을 생성하고, 테스트용 컨캐터네이션 피쳐 맵 각각을 테스트용 컨캐터네이션 피쳐 맵을 생성하는데 사용된 소정의 테스트용 제2 인터미디어트 피쳐 맵에 대응되는 각각의 소정의 디코딩 레이어의 후속 레이어에 입력하도록 할 수 있다. 여기서, 테스트용 제1 인터미디어트 피쳐 맵은 테스트용 제1_1 피쳐 맵, 테스트용 제1_2 피쳐 맵 및 테스트용 제2 피쳐 맵을 포함할 수 있다. In addition, the test device 2000 causes the disparity network 100 to generate a first intermediate feature map for testing included in the first_1 sub-encoder, the first_2 sub-encoder, and the second sub-encoder at least one encoding. A first intermediate feature map for testing generated from a predetermined encoding layer that is at least a part of the layers is included in the decoder to generate a second intermediate feature map for testing corresponding to the first intermediate feature map for testing Concatenate with a predetermined second intermediate feature map for a test generated in a predetermined decoding layer corresponding to a predetermined encoding layer among at least one decoding layer to generate at least one concatenation feature map for testing, , input each of the concatenation feature maps for testing into subsequent layers of each predetermined decoding layer corresponding to the predetermined second intermediate feature map for testing used to generate the concatenation feature map for testing. have. Here, the first intermediate feature map for testing may include a first_1 feature map for testing, a first_2 feature map for testing, and a second feature map for testing.

다음으로, 테스트 장치(2000)는 테스트용 컬러 및 모노크롬 스테레오 페어를 오클루션 네트워크(200)에 입력하여, 오클루션 네트워크(200)로 하여금 테스트용 컬러 이미지와 테스트용 모노크롬 이미지 사이의 디퍼런스에 따른 테스트용 오클루션 맵을 생성하도록 할 수 있다. Next, the test device 2000 inputs a color and monochrome stereo pair for testing to the occlusion network 200 , and causes the occlusion network 200 according to the difference between the color image for testing and the monochrome image for testing. You can create an occlusion map for testing.

구체적으로, 테스트 장치(2000)는, 테스트용 디스패리티 맵을 참조하여 테스트용 컬러 이미지를 와핑하여 생성한 테스트용 와프드 컬러 이미지 상의 각 픽셀의 픽셀값과 테스트용 모노크롬 이미지 상의 각 픽셀의 픽셀값 간의 차이를 계산한 테스트용 디퍼런스 맵을 생성하고, 테스트용 디스패리티 맵을 생성하는데 사용한 테스트용 제2 피쳐 맵과 테스트용 디퍼런스 맵을 오클루션 네트워크(200)에 입력하여, 오클루션 네트워크(200)로 하여금 테스트용 디퍼런스 맵에서 추출된 피쳐와 테스트용 제2 피쳐 맵의 피쳐를 참조하여 테스트용 와프드 컬러 이미지와 테스트용 모노크롬 이미지 간의 오클루션을 각 픽셀에 대한 바이너리 값으로 표시한 테스트용 오클루션 맵을 생성하도록 할 수 있다.Specifically, the test device 2000 refers to the test disparity map and warps the test color image to generate a pixel value of each pixel on a warped color image for testing and a pixel value of each pixel on a monochrome image for testing. A difference map for testing is generated by calculating the difference between 200) by referring to the features extracted from the difference map for testing and the features of the second feature map for testing, and expressing the occlusion between the warped color image for testing and the monochrome image for testing as binary values for each pixel. You can create an occlusion map for

즉, 테스트 장치(2000)는 오클루션 네트워크(200)로 하여금, 오클루션 네트워크(200)의 인코더를 통해 테스트용 디퍼런스 맵을 인코딩하여 테스트용 제3 피쳐 맵을 생성하도록 하고, 오클루션 네트워크(200)의 디코더를 통해 테스트용 제3 피쳐 맵과 테스트용 제2 피쳐 맵을 컨캐터네이션한 다음 디코딩하여 테스트용 오클루션 맵을 생성하도록 할 수 있다.That is, the test device 2000 causes the occlusion network 200 to encode the difference map for testing through the encoder of the occlusion network 200 to generate a third feature map for testing, and the occlusion network ( 200), the third feature map for testing and the second feature map for testing are concatenated and then decoded to generate an occlusion map for testing.

또한, 테스트 장치(2000)는 테스트용 컬러 및 모노크롬 스테레오 페어를 디노이징 네트워크(300)에 입력하여, 디노이징 네트워크(300)로 하여금 테스트용 컬러 이미지 및 테스트용 모노크롬 이미지 각각에서의 노이즈를 제거한 테스트용 디노이즈드 모노크롬 이미지 및 테스트용 디노이즈드 컬러 이미지를 생성하도록 할 수 있다.In addition, the test device 2000 inputs a test color and monochrome stereo pair to the denoising network 300 , and causes the denoising network 300 to remove noise from each of the test color image and test monochrome image. You can create a denoised monochrome image for use and a denoised color image for testing.

구체적으로, 테스트 장치(2000)는 디노이징 네트워크(300)로 하여금, 테스트용 모노크롬 이미지와 테스트용 컬러 이미지 각각에 적어도 하나의 컨벌루션 연산, 적어도 하나의 배치 정규화 연산 및 적어도 하나의 ReLU 연산을 수행하여 테스트용 잔차 이미지(residual image)를 획득하고, 테스트용 모노크롬 이미지와 테스트용 컬러 이미지 각각에 대응하는 테스트용 잔차 이미지를 참조하여 테스트용 모노크롬 이미지와 테스트용 컬러 이미지로부터 노이즈를 제거하여 테스트용 디노이즈드 모노크롬 이미지와 테스트용 디노이즈드 컬러 이미지를 생성하도록 할 수 있다.Specifically, the test device 2000 causes the denoising network 300 to perform at least one convolution operation, at least one batch normalization operation, and at least one ReLU operation on each of the monochrome image for testing and the color image for testing. Denoise for test by acquiring a residual image for testing, and removing noise from the monochrome image for testing and color image for testing by referring to the residual image for testing corresponding to each of the monochrome image for testing and the color image for testing It can be configured to generate de-monochrome images and denoised color images for testing.

다음으로, 테스트 장치(2000)는 테스트용 디스패리티 맵을 참조하여 테스트용 디노이즈드 컬러 이미지를 와핑하여 테스트용 와프드 디노이즈드 컬러 이미지를 생성하고, 테스트용 와프드 디노이즈드 컬러 이미지와 테스트용 디노이즈드 모노크롬 이미지를 컨캐터네이션하여 테스트용 초기 컬러 복원 이미지를 생성하며, 테스트용 초기 컬러 복원 이미지, 테스트용 오클루션 맵 및 테스트용 모노크롬 이미지를 컬러화 네트워크(400)에 입력하여, 컬러화 네트워크(400)로 하여금, 오클루션 맵을 참조하여 테스트용 초기 컬러 복원 이미지의 컬러 정보에 따라 테스트용 모노크롬 이미지의 컬러를 복원하여 테스트용 최종 컬러 복원 이미지를 생성하도록 할 수 있다. 여기서, 테스트 장치(2000)는 테스트용 와프드 디노이즈드 모노크롬 이미지와 테스트용 디노이즈드 컬러 이미지를 채널 방향으로 컨캐터네이션하여, 테스트용 와프드 디노이즈드 컬러 이미지의 컬러 정보가 크로미넌스(chrominance) 채널인 U 채널과 V 채널을 구성하고 테스트용 디노이즈드 모노크롬 이미지의 휘도 정보가 루미넌스(luminance) 채널인 Y 채널을 구성하는 테스트용 초기 컬러 복원 이미지를 생성할 수 있다. Next, the test device 2000 warps the denoised color image for the test by referring to the disparity map for the test to generate a warped denoised color image for the test, and includes the warped denoised color image for the test and The denoised monochrome image for the test is concatenated to generate an initial color restored image for the test, and the initial color restored image for the test, the occlusion map for the test, and the monochrome image for the test are input to the colorization network 400 for colorization The network 400 may generate a final color restored image for testing by reconstructing the color of the monochrome image for testing according to color information of the initial color restored image for testing with reference to the occlusion map. Here, the test device 2000 concatenates the warped denoised monochrome image for the test and the denoised color image for the test in the channel direction, so that the color information of the warped denoised color image for the test is chrominance. It is possible to generate an initial color restoration image for testing in which the U channel and V channel, which are (chrominance) channels, are configured, and the luminance information of the denoised monochrome image for testing constitutes the Y channel, which is a luminance channel.

구체적으로, 테스트 장치(2000)는 컬러화 네트워크(400)로 하여금, 테스트용 오클루션 맵의 오클루션 영역 정보를 참조하여 테스트용 초기 컬러 복원 이미지로부터 추출된 적어도 하나의 픽셀 피쳐값에 대한 컬러 블리딩 오류를 교정하여 교정된 픽셀 피쳐값을 생성하고, 레퍼런스 이미지로 사용되는 테스트용 모노크롬 이미지 상에서 적어도 하나의 엣지로 구분된 각 영역의 컬러를 복원하기 위한 각 영역의 컬러 씨드로써 교정된 픽셀 피쳐값 중 적어도 일부를 이용하여 테스트용 모노크롬 이미지 상의 각 픽셀에 대한 최종 컬러 정보를 복원하여 테스트용 최종 컬러 복원 이미지를 생성하도록 할 수 있다.Specifically, the test device 2000 causes the colorization network 400 to cause a color bleeding error with respect to at least one pixel feature value extracted from the initial color restoration image for testing with reference to the occlusion area information of the occlusion map for testing. at least among the corrected pixel feature values as a color seed of each region to generate a corrected pixel feature value by correcting Using a portion, the final color information for each pixel on the test monochrome image may be restored to generate the final color restored image for the test.

이에 따라, 테스트 장치(2000)는 컬러화 네트워크(400)로 하여금, 컬러화 네트워크(400)의 인코더를 이용하여 테스트용 초기 컬러 복원 이미지, 테스트용 오클루션 맵 및 테스트용 모노크롬 이미지를 컨캐터네이션한 다음 인코딩하여 테스트용 제4 피쳐 맵을 생성하도록 하며, 컬러화 네트워크(400)의 디코더(decoder)를 이용하여 테스트용 제4 피쳐 맵을 디코딩하여 테스트용 최종 컬러 복원 이미지를 생성하도록 할 수 있다.Accordingly, the test device 2000 causes the colorization network 400 to concatenate the initial color restoration image for testing, the occlusion map for testing, and the monochrome image for testing using the encoder of the colorization network 400 , and then The fourth feature map for testing is generated by encoding, and the fourth feature map for testing is decoded using a decoder of the colorization network 400 to generate a final color restored image for testing.

한편, 도 12 내지 도 15는 본 발명의 일 실시예에 따라 컬러 이미지와 모노크롬 이미지를 이용하여 스테레오 매칭을 수행하는 컬러 및 모노크롬 스테레오 네트워크(Color and Monochrom Stereo Network; CMSNet)의 성능을 보여주고 있다.Meanwhile, FIGS. 12 to 15 show the performance of a Color and Monochrome Stereo Network (CMSNet) that performs stereo matching using a color image and a monochrome image according to an embodiment of the present invention.

우선, 도 12는 본 발명의 일 실시예에 따라 생성된 디스래피티 맵을 다른 각종 네트워크로부터 생성된 디스패리티 맵과 비교하여 개략적으로 도시한 것이다. 여기서 비교에 사용된 다른 네트워크에는 다중 스펙트럼(multi-spectral)/ 교차 채널(cross-channel) 스태레오 매칭을 수행하는 ANCC (Heo et al 2011), JDMCC (Heo et al 2013), DASC (Kim et al 2015), CCNG (Holloway et al 2015), ITER (Jeon et al 2016) 및 DMC (Zhi et al 2018) 등의 네트워크가 포함된다. 또한, 본 발명의 컬러 및 모노크롬 스테레오 네트워크(Color and Monochrome Stereo Network)를 위에서 서술된 구성과 달리하여 RGB 이미지만으로 이루어진 스테레오 페어를 입력으로 사용한 경우(CMSNet RGB Pair), 디스패리티 네트워크(100)의 인코더에서 코릴레이션 레이어 대신 컨캐터네이션 레이어를 사용하는 경우(CMSNet Feat. concat), 입력된 컬러 및 모노크롬 스테레오 페어를 우선 디노이징 네트워크(200)를 통해 디노이징한 다음 디스패리티 맵을 생성한 경우(CMSNet denoise)도 비교하였다. First, FIG. 12 schematically shows a disparity map generated according to an embodiment of the present invention by comparing it with a disparity map generated from other various networks. Other networks used for comparison include ANCC (Heo et al 2011), JDMCC (Heo et al 2013), DASC (Kim et al) that perform multi-spectral/cross-channel stereo matching. 2015), CCNG (Holloway et al 2015), ITER (Jeon et al 2016), and DMC (Zhi et al 2018) networks. In addition, when the color and monochrome stereo network of the present invention is different from the configuration described above and a stereo pair consisting of only RGB images is used as an input (CMSNet RGB Pair), the encoder of the disparity network 100 When using concatenation layer instead of correlation layer (CMSNet Feat. denoise) was also compared.

결론적으로 위와 같이 각기 다른 네트워크 구성들을 FlyingThings3D 데이터 세트, Middlebury 데이터 세트, Monkaa 데이터 세트 및 KITTI 데이터 세트을 대상으로 실험한 결과 본 발명의 구성으로 생성된 디스패리티 맵이 디스패리티 GT 맵과 가장 유사한 결과를 생성하는 것을 볼 수 있다.In conclusion, as a result of testing the different network configurations as above with the FlyingThings3D data set, Middlebury data set, Monkaa data set, and KITTI data set, the disparity map generated with the configuration of the present invention produces the most similar result to the disparity GT map. can be seen doing

그리고, 도 13은 본 발명의 일 실시예에 따라 생성된 컬러 복원 이미지를 다른 각종 네트워크로부터 생성된 컬러 복원 이미지와 비교하여 개략적으로 도시한 것이다. 여기서 비교에 사용된 다른 네트워크에는 단일 이미지에 대한 디노이징을 수행하는 네트워크인 BM3D (Dabov et al 2007), non-local means (Buades et al 2005), DnCNN (Zhang et al 2017a) 및 WAVG (Im et al 2019a) 등과 같은 네트워크와 다중 이미지에 대한 디노이징을 수행하는 네트워크인 ITER (Jeon et al 2016) 등의 네트워크가 포함되었다. 또한, 본 발명의 컬러 및 모노크롬 스테레오 네트워크(Color and Monochrome Stereo Network)를 위에서 서술된 구성과 달리하여 컬러화 네트워크(400)를 사용하지 않아 최종 컬러 복원 이미지로서 초기 컬러 복원 이미지만을 생성한 경우(CMSNet Init. Map), 오클루션 네트워크(200)에서 생성된 오클루션 맵을 사용하지 않고 최종 컬러 복원 이미지를 생성한 경우(CMSNet w/o occ)도 비교하였다.And, FIG. 13 schematically shows a color restored image generated according to an embodiment of the present invention by comparing it with color restored images generated from various other networks. Other networks used for comparison include BM3D (Dabov et al 2007), non-local means (Buades et al 2005), DnCNN (Zhang et al 2017a), and WAVG (Im et al. al 2019a) and ITER (Jeon et al 2016), a network that performs denoising on multiple images, were included. In addition, when the color and monochrome stereo network of the present invention is different from the configuration described above, and only the initial color restored image is generated as the final color restored image because the colorization network 400 is not used (CMSNet Init) Map), a case in which the final color reconstructed image is generated without using the occlusion map generated in the occlusion network 200 (CMSNet w/o occ) was also compared.

위와 같이 각기 다른 네트워크 구성들을 FlyingThings3D 데이터 세트, Middlebury 데이터 세트 및 Monkaa 데이터 세트를 대상으로 실험한 결과 본 발명의 구성으로 생성된 최종 컬러 복원 이미지가 GT 컬러 복원 이미지와 가장 유사한 결과를 생성하는 것을 볼 수 있다.As a result of testing different network configurations as above with the FlyingThings3D data set, Middlebury data set, and Monkaa data set, it can be seen that the final color restored image created with the configuration of the present invention produces the result most similar to the GT color restored image. have.

다음으로, 도 14는 본 발명의 일 실시예에 따라 입력으로 사용한 컬러 이미지와 출력으로 생성된 컬러 복원 이미지를 비교하여 개략적으로 도시한 것이다. 여기서, 도 14는 야간의 저조도 조건에서 촬영한 실제 데이터에 대한 결과를 보여주고 있다. 도 14의 (a)와 도 14의 (b)는 서로 다른 상황에서 각각 다른 객체를 촬영한 모노크롬 이미지와 이에 대응하는 컬러 이미지를 보여주고 있으며, 도 14의 (d)와 도 14의 (e)는 모노크롬 이미지와 이에 대응하는 컬러 이미지를 본 발명의 컬러 및 모노크롬 스테레오 네트워크에 입력하여 생성한 최종 컬러 복원 이미지와 디스패리티 맵을 나열하고 있고, 도 14의 (c)는 컬러 이미지와 최종 컬러 복원 이미지의 일부 영역을 비교하여 복원효과를 보여주고 있다. Next, FIG. 14 schematically illustrates a comparison of a color image used as an input and a color restored image generated as an output according to an embodiment of the present invention. Here, FIG. 14 shows the results of actual data photographed under low-illuminance conditions at night. 14 (a) and 14 (b) show monochrome images and corresponding color images obtained by photographing different objects in different situations, and FIGS. 14 (d) and 14 (e) lists the final color restored image and disparity map generated by inputting a monochrome image and a corresponding color image to the color and monochrome stereo network of the present invention, and (c) of FIG. 14 shows the color image and the final color restored image The restoration effect is shown by comparing some areas of

도 14에 따르면, 본 발명의 컬러 및 모노크롬 스테레오 네트워크는 깊이 불연속적인 특성들과 3열의 이미지에 포함된 석상, 식물 또는 4열의 이미지에 포함된 골프 클럽 등과 같은 미세구조들도 선명하게 재현할 수 있음을 볼 수 있다. 특히, 1열과 2열의 이미지에서 볼 수 있듯이 부족한 빛으로 인해 일부 노출이 불량한 영역들에 대해 카메라들이 자동 노출을 수행하는 것은 이러한 영역들을 복구하는 데에 별로 도움이 되지 않지만 본 발명의 네트워크 구성을 이용하여 컬러 및 모노크롬 스테레오 페어와 디스패리티 맵을 융합하면 입력된 컬러 이미지보다 더 밝은 이미지를 획득할 수 있디. 또한, 모노크롬 이미지를 컬러 복원 이미지의 루미넌스 채널을 생성하는데 사용하는 것은 이미지의 노이즈를 줄이고 해상도를 향상시키는데에 도움이 되며, 일 예로 본 발명의 네트워크 구성을 통해 복원된 4열의 이미지 상의 텍스트가 한층 더 뚜렷해진 것을 볼 수 있다.According to FIG. 14, the color and monochrome stereo network of the present invention can vividly reproduce depth discontinuous characteristics and microstructures such as statues, plants, or golf clubs included in the image in the third row, etc. can be seen In particular, as can be seen from the images in columns 1 and 2, it is not very helpful for the cameras to perform automatic exposure on areas that are underexposed due to insufficient light, but it is not very helpful to recover these areas, but using the network configuration of the present invention. Thus, a brighter image than the input color image can be obtained by fusing the color and monochrome stereo pair and the disparity map. In addition, using the monochrome image to generate the luminance channel of the color reconstructed image helps to reduce the noise of the image and improve the resolution. It can be seen that the clear

덧붙여, 도 15는 본 발명의 일 실시예에 따라 RGB 이미지와 근적외선(Near-Infrared; NIR) 이미지로 구성된 스테레오 페어로부터 생성된 디스패리티 맵의 예시를 개략적으로 도시하고 있다.In addition, FIG. 15 schematically illustrates an example of a disparity map generated from a stereo pair consisting of an RGB image and a near-infrared (NIR) image according to an embodiment of the present invention.

도 15를 보면, 위와 같이 데이터 어그멘테이션을 통해 생성한 학습용 모노크롬 이미지와 학습용 컬러 이미지로 학습된 본 발명의 컬러 및 모노크롬 스테레오 네트워크는 추가적인 미세조정(fine-tuning) 과정 없이도 RGB 이미지와 근적외선 이미지로 구성된 스테레오 페어에 대해서도 디스패리티 맵을 효과적으로 생성할 수 있음을 보여주고 있다. Referring to FIG. 15 , the color and monochrome stereo network of the present invention learned with the learning color image and the learning color image generated through data augmentation as above is converted into RGB image and near-infrared image without additional fine-tuning process. It has been shown that a disparity map can be effectively generated even for a configured stereo pair.

이상 설명된 본 발명에 따른 실시예들은 다양한 컴퓨터 구성요소를 통하여 수행될 수 있는 프로그램 명령어의 형태로 구현되어 컴퓨터 판독 가능한 기록 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능한 기록 매체는 프로그램 명령어, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 컴퓨터 판독 가능한 기록 매체에 기록되는 프로그램 명령어는 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 분야의 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능한 기록 매체의 예에는, 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM, DVD와 같은 광기록 매체, 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 ROM, RAM, 플래시 메모리 등과 같은 프로그램 명령어를 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령어의 예에는, 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드도 포함된다. 상기 하드웨어 장치는 본 발명에 따른 처리를 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The embodiments according to the present invention described above may be implemented in the form of program instructions that can be executed through various computer components and recorded in a computer-readable recording medium. The computer-readable recording medium may include program instructions, data files, data structures, etc. alone or in combination. The program instructions recorded on the computer-readable recording medium may be specially designed and configured for the present invention, or may be known and available to those skilled in the computer software field. Examples of the computer-readable recording medium include a hard disk, a magnetic medium such as a floppy disk and a magnetic tape, an optical recording medium such as a CD-ROM and DVD, and a magneto-optical medium such as a floppy disk. media), and hardware devices specially configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine language codes such as those generated by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like. The hardware device may be configured to operate as one or more software modules for carrying out the processing according to the present invention, and vice versa.

이상에서 본 발명이 구체적인 구성요소 등과 같은 특정 사항들과 한정된 실시예 및 도면에 의해 설명되었으나, 이는 본 발명의 보다 전반적인 이해를 돕기 위해서 제공된 것일 뿐, 본 발명이 상기 실시예들에 한정되는 것은 아니며, 본 발명이 속하는 기술분야에서 통상적인 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형을 꾀할 수 있다.In the above, the present invention has been described with specific matters such as specific components and limited embodiments and drawings, but these are provided to help a more general understanding of the present invention, and the present invention is not limited to the above embodiments. , various modifications and variations can be devised from these descriptions by those of ordinary skill in the art to which the present invention pertains.

따라서, 본 발명의 사상은 상기 설명된 실시예에 국한되어 정해져서는 아니 되며, 후술하는 특허청구범위뿐만 아니라 이 특허청구범위와 균등하게 또는 등가적으로 변형된 모든 것들은 본 발명의 사상의 범주에 속한다고 할 것이다.Therefore, the spirit of the present invention should not be limited to the above-described embodiments, and not only the claims described below but also all modifications equivalently or equivalently to the claims described below belong to the scope of the spirit of the present invention. will do it

1000: 학습 장치
1001: 메모리
1002: 프로세서
2000: 테스트 장치
2001: 메모리
2002: 프로세서
100: 디스패리티 네트워크
200: 오클루션 네트워크
300: 디노이징 네트워트
400: 컬러화 네트워크1000: learning device
1001: memory
1002: Processor
2000: test device
2001: Memory
2002: Processor
100: disparity network
200: occlusion network
300: denoising network
400: colorization network

Claims

A method of performing stereo matching using a color image and a monochrome image, the method comprising:
(a) When at least one learning color and monochrome stereo pair including a learning color image corresponding to the color camera constituting the stereo camera and a learning monochrome image corresponding to the monochrome camera constituting the stereo camera is obtained, the learning device , (i) inputting the training color and monochrome stereo pair into a disparity network, causing the disparity network to match between the training color image and the training monochrome image of the training color and monochrome stereo pair generate a learning disparity map corresponding to the learning monochrome image according to the cost; to generate an occlusion map for learning according to the difference between the image and the monochrome image for learning, (iii) input the color and monochrome stereo pair for learning into a denoising network, causing the denoising network to generate a denoised monochrome image for learning and a denoised color image for learning from which noises are removed from each of the color image for training and the monochrome image for training;
(b) the learning device warps the denoised color image for learning with reference to (i) (i-1) the disparity map for learning to generate a warped denoised color image for learning and (i-2) concatenating the warped denoised color image for training and the denoised monochrome image for training to generate an initial color restored image for learning, (ii) the initial color restored image for training , input the learning occlusion map and the learning monochrome image to a colorization network, causing the colorization network to refer to the occlusion map and according to the color information of the initial color restoration image for learning, the learning monochrome image restoring the color of to generate a final color restored image for learning; and
(c) a first loss generated by the learning device with reference to the learning disparity map and a disparity GT (ground truth) map, and a second loss generated by referring to the learning occlusion map and occlusion GT map; The third loss generated by referring to the denoised monochrome image for learning, the denoised color image for learning, and the corresponding denoised GT image, and the restored chrominance value for learning of the final color restored image for learning; The disparity network, the occlusion network, the denoising network, and the colorization network are included in the disparity network, the occlusion network, and the colorization network using the summation loss generated by giving weights to the fourth loss generated by referring to the corresponding GT chrominance value. learning at least one parameter;
How to include.

According to claim 1,
In step (a),
The learning apparatus inputs the learning monochrome image and the learning color image to a disparity network, causing the disparity network to perform (i) a first_1 sub-encoder included in an encoder of the disparity network. Encoding the learning monochrome image to generate a first_1 feature map for learning for the learning monochrome image, and encoding the color image for learning through the first_2 sub-encoder included in the encoder of the disparity network. generate a first_2 feature map for training with respect to a color image for training, and (ii) a first_1 feature map for training and a first_2 feature map for training through a correlation layer included in the encoder of the disparity network By performing a correlation operation to calculate the matching cost, a correlation feature map for training corresponding to the training color and monochrome stereo pair is generated, and (iii) a second sub-encoder included in the encoder of the disparity network. to generate a second feature map for learning by encoding the correlation feature map for learning through (iv) decoding the second feature map for learning through a decoder of the disparity network to generate the learning disparity map A method characterized in that it is made.

3. The method of claim 2,
The learning apparatus causes the disparity network to perform, as the correlation operation, a first feature vector for learning corresponding to at least one first patch included in at least one first area on the first_1 feature map for learning and the first feature vector for learning as the correlation operation. To generate the correlation feature map for learning by performing a dot product operation on a second feature vector for learning corresponding to at least one second patch included in at least one second region on the first_2 feature map for learning corresponding to region 1 How to characterize.

3. The method of claim 2,
The learning apparatus causes the disparity network to: (i) a first intermediate feature map for learning included in the first_1 sub-encoder, the first_2 sub-encoder, and the second sub-encoder - the The first intermediate feature map for training includes the first_1 feature map for training, the first_2 feature map for training, and the second feature map for training. The encoding of the predetermined first intermediate feature map for learning is included in the decoder to generate a second intermediate feature map for learning corresponding to the first intermediate feature map for learning, among at least one decoding layer. Concatenate with a predetermined second intermediate feature map for learning generated in a predetermined decoding layer corresponding to the layer to generate at least one concatenation feature map for learning, each of the concatenation feature maps for learning to input into a subsequent layer of each predetermined decoding layer corresponding to the predetermined second intermediate feature map for training used to generate the concatenation feature map for training.

3. The method of claim 2,
The learning apparatus includes: (i) a pixel value of each pixel on a warped color image for training generated by warping the training color image with reference to the training disparity map, and a pixel value of each pixel on the training monochrome image generating a difference map for learning by calculating the difference between occlusion between the warped color image for training and the monochromatic image for training with reference to the features extracted from the learning difference map and the features of the second learning feature map as binary values for each pixel. A method comprising generating a solution map.

6. The method of claim 5,
The learning apparatus causes the occlusion network to (i) encode the difference map for learning through an encoder of the occlusion network to generate a third feature map for learning, (ii) a decoder of the occlusion network Method characterized in that the third feature map for learning and the second feature map for learning are concatenated and then decoded to generate the occlusion map for learning.

According to claim 1,
In step (a),
The learning device causes the denoising network to perform at least one convolution operation, at least one batch normalization operation, and at least one ReLU operation on each of the monochrome image for training and the color image for training. to predict the output obtained by removing the latent noiseless image for learning corresponding to each of the learning monochrome image and the learning color image from each of the learning monochrome image and the learning color image, and a residual image for learning obtained, and removing the noise from the learning monochrome image and the learning color image with reference to the learning residual image corresponding to each of the learning monochrome image and the learning color image to obtain the learning denoised monochrome image and the learning denoise image A method for generating a de-color image.

According to claim 1,
In step (b),
The learning apparatus causes the colorization network to perform (i) color bleeding for at least one pixel feature value extracted from the initial color restoration image for training with reference to occlusion area information of the occlusion map for training. A color of each region for generating a corrected pixel feature value by correcting an error, and (ii) restoring a color of each region divided by at least one edge on the learning monochrome image used as a reference image The method according to claim 1, wherein the final color information for each pixel on the training monochrome image is restored using at least some of the corrected pixel feature values as a seed to generate the final color restored image for training.

According to claim 1,
In step (b),
The learning apparatus causes the colorization network to (i) use an encoder of the colorization network to concatenate the initial color restoration image for learning, the occlusion map for learning, and the monochrome image for learning. Next, to generate a fourth feature map for learning by encoding, (ii) decoding the fourth feature map for learning using a decoder of the colorization network to generate the final color restored image for learning Way.

According to claim 1,
In step (b),
The learning device concatenates the warped denoised monochrome image for training and the denoised color image for training in a channel direction, so that the color information of the warped denoised color image for training is chrominance Method characterized in that generating the initial color restoration image for training constituting the U channel and the V channel, which are channels, and the luminance information of the denoised monochrome image for training constituting the Y channel, which is a luminance channel.

A method of performing stereo matching using a color image and a monochrome image, the method comprising:
(a) When at least one learning color and monochrome stereo pair including a learning color image corresponding to the color camera constituting the stereo camera and a learning monochrome image corresponding to the monochrome camera constituting the stereo camera is obtained, the learning device , (1) (i) input the training color and monochrome stereo pair to a disparity network, causing the disparity network to generate the training color image and the training monochrome image of the training color and monochrome stereo pair generate a learning disparity map corresponding to the learning monochrome image according to a matching cost between to generate an occlusion map for learning according to the difference between the color image for learning and the monochrome image for learning, and (iii) the color and monochrome stereo pair for learning to a denoising network (2) a process for causing the denoising network to generate a denoised monochrome image for training and a denoised color image for training in which noise in each of the training color image and the training monochrome image has been removed by input, (2) ( i) (i-1) warping the denoised color image for training with reference to the disparity map for training to generate a warped denoised color image for training, (i-2) for the training The warped denoised color image and the denoised monochrome image for learning are concatenated to generate an initial color restored image for learning, (ii) the initial color restored image for learning, the occlusion map for learning, and the learning for Monochrome image colorization network ( colorization network), the colorization network refers to the occlusion map and restores the color of the learning monochrome image according to the color information of the initial color restored image for training to generate a final color restored image for training , and (3) a first loss generated by referring to the disparity map for training and a disparity GT (ground truth) map, a second loss generated by referring to the occlusion map for learning and the occlusion GT map, and the D for training The third loss generated by referring to the noised monochrome image, the denoised color image for training, and the corresponding denoised GT image, and the restored chrominance value for learning of the final color restored image for training and corresponding values At least one included in the disparity network, the occlusion network, the denoising network, and the colorization network using the summation loss generated by assigning weights to the fourth loss generated by referring to the GT chrominance value In a state in which the process of learning the parameters is performed, the test device includes a test color image corresponding to the color camera constituting the stereo camera and a test monochrome image corresponding to the monochrome camera constituting the stereo camera obtaining at least one test color and monochrome stereo pair;
(b) the test device (i) inputs the test color and monochrome stereo pair to the disparity network, so that the disparity network causes the test color image and the test color image of the test color and monochrome stereo pair; generate a test disparity map corresponding to the test monochrome image according to a matching cost between the test monochrome images, (ii) input the test color and monochrome stereo pair into the occlusion network, and cause the occlusion network to generate a test occlusion map according to a difference between the test color image and the test monochrome image, and (iii) input the test color and monochrome stereo pair into the denoising network. causing the denoising network to generate a denoised monochrome image for testing and a denoised color image for testing from which noises are removed from each of the color image for testing and the monochrome image for testing; and
(c) the test device generates a warped denoised color image for testing by (i) (i-1) warping the denoised color image for testing with reference to the disparity map for testing, ( i-2) concatenating the warped denoised color image for the test and the denoised monochrome image for the test to generate an initial color restored image for the test, (ii) the initial color restored image for the test, the Input the occlusion map for test and the monochrome image for test into the colorization network, and cause the colorization network to refer to the occlusion map and select the color information of the initial color restoration image for test according to the color information of the test monochrome image. reconstructing the color to produce a final color reconstructed image for testing;
How to include.

12. The method of claim 11,
In step (b),
The test apparatus inputs the test monochrome image and the test color image to a disparity network to cause the disparity network to perform (i) a first_1 sub-encoder included in an encoder of the disparity network. encodes the test monochrome image through to generate a test 1_1 feature map for the test monochrome image, and uses the first_2 sub-encoder included in the encoder of the disparity network for the test encode a color image to generate a test first_2 feature map for the test color image, and (ii) the test first_1 through a correlation layer included in the encoder of the disparity network. By performing a correlation operation on the feature map and the first_2 feature map for testing to calculate the matching cost, a correlation feature map for testing corresponding to the color and monochrome stereo pair for testing is generated, and (iii) the disparity Encoding the correlation feature map for testing through a second sub-encoder included in the encoder of the network to generate a second feature map for testing, (iv) the test through a decoder of the disparity network and decoding a second feature map for use to generate the disparity map for testing.

13. The method of claim 12,
The test apparatus causes the disparity network to perform a first feature vector for testing corresponding to at least one first patch included in at least one first area on the first_1 feature map for testing as the correlation operation; The correlation feature map for testing by performing a dot product operation on a second feature vector for testing corresponding to at least one second patch included in at least one second region on the first_2 feature map for testing corresponding to the first region A method characterized in that to create a.

13. The method of claim 12,
The test apparatus includes: (i) a pixel value of each pixel on a warped color image for testing generated by warping the color image for testing with reference to the disparity map for testing, and each pixel value on the monochrome image for testing. a difference map for a test obtained by calculating a difference between pixel values of a pixel is generated; By inputting into , the occlusion network causes the occlusion between the warped color image for testing and the monochrome image for testing by referring to the features extracted from the difference map for testing and the features of the second feature map for testing. and generating the occlusion map for the test expressed as a binary value for each pixel.

In the learning apparatus for performing stereo matching using a color image and a monochrome image,
a memory storing instructions for performing stereo matching using a color image and a monochrome image; and
a processor that performs an operation for performing stereo matching using a color image and a monochrome image according to the instructions stored in the memory;
including,
The processor, (I) at least one learning color and monochrome stereo pair comprising a learning color image corresponding to the color camera constituting the stereo camera and a learning monochrome image corresponding to the monochrome camera constituting the stereo camera is obtained , (i) inputting the training color and monochrome stereo pair into a disparity network, causing the disparity network to match between the training color image and the training monochrome image of the training color and monochrome stereo pair generate a learning disparity map corresponding to the learning monochrome image according to the cost; to generate an occlusion map for learning according to the difference between the image and the monochrome image for learning, (iii) input the color and monochrome stereo pair for learning into a denoising network, a process for causing the denoising network to generate a denoised monochrome image for training and a denoised color image for training in which noise in each of the training color image and the training monochrome image has been removed, (II) (i) ( i-1) warping the denoised color image for learning with reference to the disparity map for learning to generate a warped denoised color image for learning, (i-2) the warped denoised color image for learning The noised color image and the denoised monochrome image for learning are concatenated to generate an initial color restored image for training, and (ii) the initial color restored image for learning, the occlusion map for learning, and the learning monochrome image colorization network (c olorization network), the colorization network refers to the occlusion map and restores the color of the learning monochrome image according to the color information of the initial color restored image for training to generate a final color restored image for training , and (III) a first loss generated by referring to the disparity map for training and a disparity GT (ground truth) map, a second loss generated by referring to the occlusion map for training and the occlusion GT map, and the D for training The third loss generated by referring to the noised monochrome image, the denoised color image for training, and the corresponding denoised GT image, and the restored chrominance value for learning of the final color restored image for training and corresponding values At least one included in the disparity network, the occlusion network, the denoising network, and the colorization network using the summation loss generated by assigning weights to the fourth loss generated by referring to the GT chrominance value A learning device that performs the process of learning parameters.

16. The method of claim 15,
In the process (I) above,
The processor inputs the learning monochrome image and the learning color image to a disparity network, causing the disparity network to (i) through a 1_1 sub-encoder included in an encoder of the disparity network. Encoding a monochrome image for learning to generate a first_1 feature map for learning for the monochromatic image for learning, and encoding the color image for learning through the first_2 sub-encoder included in the encoder of the disparity network for the learning to generate a first_2 feature map for learning for a color image, and (ii) to the first_1 feature map for learning and the first_2 feature map for learning through a correlation layer included in the encoder of the disparity network By performing a correlation operation to calculate the matching cost, a correlation feature map for learning corresponding to the learning color and monochrome stereo pair is generated, and (iii) through a second sub-encoder included in the encoder of the disparity network. Encoding the correlation feature map for learning to generate a second feature map for learning, (iv) decoding the second feature map for learning through a decoder of the disparity network to generate the learning disparity map Learning device, characterized in that.

17. The method of claim 16,
The processor is configured to cause the disparity network to perform, as the correlation operation, a first feature vector for learning corresponding to at least one first patch included in at least one first area on the first_1 feature map for learning and the first and generating the correlation feature map for learning by performing a dot product operation on a second feature vector for learning corresponding to at least one second patch included in at least one second region on the first_2 feature map for learning corresponding to the region. learning device with

17. The method of claim 16,
The processor is configured to cause the disparity network to: (i) a first intermediate feature map for learning included in the first_1 sub-encoder, the first_2 sub-encoder and the second sub-encoder - for the learning The first intermediate feature map includes the first_1 feature map for training, the first_2 feature map for training, and the second feature map for training. The predetermined encoding layer among at least one decoding layer that includes a predetermined first intermediate feature map for learning in the decoder to generate a second intermediate feature map for learning corresponding to the first intermediate feature map for learning Concatenate with a predetermined second intermediate feature map for learning generated in a predetermined decoding layer corresponding to , to generate at least one concatenation feature map for learning, and each of the concatenation feature maps for learning and input to a subsequent layer of each predetermined decoding layer corresponding to the predetermined second intermediate feature map for learning used to generate the concatenation feature map for learning.

17. The method of claim 16,
The processor is configured to: (i) between a pixel value of each pixel on a warped color image for training generated by warping the color image for training with reference to the disparity map for training and a pixel value of each pixel on the monochrome image for training A difference map for learning is generated by calculating the difference, and (ii) the second feature map for learning and the difference map for learning used to generate the learning disparity map are input to the occlusion network to the occlusion network. The training occlusion in which the occlusion between the warped color image for training and the monochrome image for training is expressed as a binary value for each pixel by referring to the feature extracted from the difference map for learning and the feature of the second feature map for learning. A learning device, characterized in that it generates a map.

20. The method of claim 19,
The processor causes the occlusion network to (i) encode the learning difference map through an encoder of the occlusion network to generate a third feature map for learning, and (ii) a decoder of the occlusion network. Learning apparatus, characterized in that the third feature map for learning and the second feature map for learning are concatenated and then decoded to generate the occlusion map for learning.

16. The method of claim 15,
In the process (I) above,
The processor causes the denoising network to perform at least one convolution operation, at least one batch normalization operation, and at least one ReLU operation on each of the monochrome image for training and the color image for training. From each of the learning monochrome image and the learning color image, the output obtained by removing the latent noiseless image for learning corresponding to each of the learning monochrome image and the learning color image is predicted. A residual image for learning is obtained. and removing the noise from the learning monochrome image and the learning color image with reference to the learning residual image corresponding to each of the learning monochrome image and the learning color image, and the denoised monochrome image for learning and the denoised for learning A learning device, characterized in that it generates a color image.

16. The method of claim 15,
In the process (II) above,
The processor is configured to cause the colorization network to: (i) a color bleeding error for at least one pixel feature value extracted from the initial color restoration image for training with reference to occlusion region information of the training occlusion map to generate a corrected pixel feature value, and (ii) a color seed of each region for reconstructing the color of each region divided by at least one edge on the learning monochrome image used as a reference image (seed) by using at least a part of the corrected pixel feature values to restore final color information for each pixel on the learning monochrome image to generate the final color restored image for training.

16. The method of claim 15,
In the process (II) above,
The processor causes the colorization network to (i) concatenate the initial color restoration image for learning, the occlusion map for learning, and the monochrome image for learning using an encoder of the colorization network. Encoding to generate a fourth feature map for learning, (ii) decoding the fourth feature map for learning using a decoder of the colorization network to generate the final color restored image for learning Device.

16. The method of claim 15,
In the process (II) above,
The processor concatenates the warped denoised monochrome image for training and the denoised color image for training in a channel direction, so that the color information of the warped denoised color image for training is a chrominance channel Learning apparatus, characterized in that generating the initial color restoration image for training constituting the U channel and the V channel, and the luminance information of the denoised monochrome image for training constituting the Y channel which is a luminance channel.

In the test apparatus for performing stereo matching using a color image and a monochrome image,
a memory storing instructions for performing stereo matching using a color image and a monochrome image; and
a processor that performs an operation for performing stereo matching using a color image and a monochrome image according to the instructions stored in the memory;
including,
The processor, (I) at least one learning color and monochrome stereo pair comprising a learning color image corresponding to the color camera constituting the stereo camera and a learning monochrome image corresponding to the monochrome camera constituting the stereo camera is obtained , a learning device, (1) (i) input the training color and monochrome stereo pair to a disparity network, and cause the disparity network to generate the training color image and the training color image of the training color and monochrome stereo pair to generate a learning disparity map corresponding to the learning monochrome image according to a matching cost between the learning monochrome images, (ii) input the learning color and monochrome stereo pair into an occlusion network, and Let the solution network generate an occlusion map for training according to the difference between the training color image and the training monochrome image, and (iii) denoising the training color and monochrome stereo pair with a denoising network ( A process of causing the denoising network to generate a denoised monochrome image for learning and a denoised color image for learning in which noise from each of the learning color image and the learning monochrome image has been removed by input to the denoising network; (2) (i) (i-1) warping the denoised color image for training with reference to the disparity map for training to generate a warped denoised color image for training, (i- 2) generating an initial color restored image for learning by concatenating the warped denoised color image for learning and the denoised monochrome image for learning, (ii) the initial color restored image for learning, the occlusion for learning Map and the learning monochrome image Input to a colorization network, causing the colorization network to restore the color of the monochrome image for learning according to the color information of the initial color restoration image for training with reference to the occlusion map to generate a final color restoration image for training (3) a first loss generated by referring to the disparity map for learning and a disparity GT (ground truth) map, and a second loss generated by referring to the occlusion map for learning and the occlusion GT map; The third loss generated by referring to the denoised monochrome image for learning, the denoised color image for learning, and the corresponding denoised GT image, and the restored chrominance value for learning of the final color restored image for learning; The disparity network, the occlusion network, the denoising network, and the colorization network are included in the disparity network, the occlusion network, and the colorization network using the summation loss generated by assigning weights to the fourth loss generated by referring to the corresponding GT chrominance value. In a state in which the process of learning at least one parameter is performed, a test color image corresponding to the color camera constituting the stereo camera and a test monochrome image corresponding to the monochrome camera constituting the stereo camera are included a process of obtaining at least one test color and monochrome stereo pair, (II) (i) inputting the test color and monochrome stereo pair into the disparity network, causing the disparity network to: generate a test disparity map corresponding to the test monochrome image according to a matching cost between the test color image and the test monochrome image of the monochrome stereo pair; (ii) the test color and monochrome stereo pair; is input to the occlusion network, causing the occlusion network to generate a test occlusion map according to the difference between the monochrome images for the image, (iii) input the test color and monochrome stereo pair to the denoising network, so that the denoising network causes the test color a process for generating a denoised monochrome image for testing and a denoised color image for testing in which noises in the image and each of the monochrome images for testing are removed, and (III) (i) (i-1) the diss for testing warping the denoised color image for the test with reference to the parity map to generate a warped denoised color image for the test, (i-2) the warped denoised color image for the test and the denoised for the test concatenating the de-monochrome image to generate an initial color restored image for testing, (ii) inputting the initial color restored image for testing, the occlusion map for testing, and the monochrome image for testing into the colorization network, wherein the A test apparatus performing a process of causing a colorization network to generate a final color restored image for testing by reconstructing the color of the monochrome image for test according to color information of the initial color restored image for testing with reference to the occlusion map .

26. The method of claim 25,
In the process (II) above,
The processor inputs the test monochrome image and the test color image to a disparity network to cause the disparity network to perform (i) a first_1 sub-encoder included in an encoder of the disparity network. Encodes the monochrome image for testing through a first_1 feature map for testing with respect to the monochrome image for testing, and uses the first_2 sub-encoder included in the encoder of the disparity network for the test color encode an image to generate a first_2 feature map for testing for the color image for testing, and (ii) the first_1 feature for testing through a correlation layer included in the encoder of the disparity network. By performing a correlation operation on the map and the first_2 feature map for testing to calculate the matching cost, a correlation feature map for testing corresponding to the color and monochrome stereo pair for testing is generated, and (iii) the disparity network encodes the correlation feature map for testing through a second sub-encoder included in the encoder of The test apparatus according to claim 1, wherein the second feature map is decoded to generate the disparity map for the test.

27. The method of claim 26,
The processor is configured to cause the disparity network to perform, as the correlation operation, a first feature vector for testing corresponding to at least one first patch included in at least one first area on the first_1 feature map for testing and the The correlation feature map for testing is obtained by performing a dot product operation on a second feature vector for testing corresponding to at least one second patch included in at least one second region on the first_2 feature map for testing corresponding to the first region. A test device, characterized in that to generate.

27. The method of claim 26,
The processor is configured to: (i) a pixel value of each pixel on a warped color image for testing generated by warping the color image for testing with reference to the disparity map for testing and each pixel on the monochrome image for testing generate a difference map for testing by calculating the difference between pixel values of By inputting, the occlusion network performs occlusion between the warped color image for testing and the monochrome image for testing by referring to the features extracted from the difference map for testing and the features of the second feature map for testing. A test device, characterized in that it generates the occlusion map for the test expressed as a binary value for a pixel.