KR20210027653A

KR20210027653A - Self shooting image quality improvement method using deep laerning

Info

Publication number: KR20210027653A
Application number: KR1020190107170A
Authority: KR
Inventors: 정승원; 루위청; 김동욱
Original assignee: 동국대학교 산학협력단
Priority date: 2019-08-30
Filing date: 2019-08-30
Publication date: 2021-03-11
Also published as: KR102338877B1

Abstract

The present invention relates to a method for improving image quality of a self-photographed image using deep learning which provides high-definition images of both a user and a background. The method for improving image quality of a self-photographed image using deep learning comprises the steps of: (a) separating a foreground region and a background region from an input image through an attention network; (b) extracting brightness compensation values for the foreground region and the background region through the attention network; and (c) restoring a high-definition RGB image by applying the brightness compensation values to the foreground region and the background region through an ISP network, respectively.

Description

How to improve the quality of self-portrait images using deep learning {SELF SHOOTING IMAGE QUALITY IMPROVEMENT METHOD USING DEEP LAERNING}

본 발명은 딥러닝을 이용한 자가 촬영 영상의 화질 개선 방법에 관한 것이다.The present invention relates to a method for improving the quality of self-photographed images using deep learning.

카메라의 물리적 한계로 인하여, 특히 저렴한 카메라 또는 스마트폰 카메라에서 촬영된 영상은 블러, 노이즈 등의 열화로부터 취약하며 실제 장면의 색을 충분히 담아 내지 못한다. 이러한 한계를 화질 개선 소프트웨어 알고리즘을 통해 극복하고자 하는 연구가 지속적으로 수행되고 있다. Due to the physical limitations of the camera, images taken by an inexpensive camera or a smartphone camera are particularly vulnerable from deterioration such as blur and noise, and do not sufficiently capture the colors of the actual scene. Research to overcome these limitations through image quality improvement software algorithms is continuously being conducted.

최근 화질 개선에 딥러닝 기술을 적용하는 연구가 활발하게 수행되고 있다. 딥러닝 신경망 네트워크를 이용하여 저화질 영상을 고화질 영상으로 만들어 내는 방식으로 다양한 기존의 시도가 있었다. 대다수의 기존 기술은 카메라 ISP를 거친 RGB 영상을 입출력 영상의 포맷으로 사용하였다. Recently, researches on applying deep learning technology to image quality improvement have been actively conducted. There have been various existing attempts to create a low-quality image into a high-definition image using a deep learning neural network network. Most of the existing technologies used RGB video that went through the camera ISP as the format of the input/output video.

최근 카메라 ISP를 거치지 않은, 센서의 Bayern pattern만을 통과한 RAW 영상 영역에서 딥러닝 기술을 활용한 화질 개선 연구 사례가 발표된 바 있다(예 "Learning to See in the Dark"- CVPR 2018, "Towards Real Scene Super-Resolution with RAW Images" - CVPR 2019).Recently, a study case of image quality improvement using deep learning technology in the RAW image area that passed only the Bayern pattern of the sensor, which did not go through the camera ISP, was published (eg "Learning to See in the Dark"-CVPR 2018, "Towards Real" Scene Super-Resolution with RAW Images"-CVPR 2019).

이러한 딥러닝 네트워크 학습을 위해서는 양질의 데이터베이스(이하 DB) 확보가 필수적이다. 기존의 RAW 영역 딥러닝 기술들은 대부분 직접 촬영된 영상으로 구성된 DB를 이용하였다. 그러므로 영상 촬영의 시간, 공간적 제약에 따라 DB에 포함된 장면의 종류가 많지 않으며 대부분 배경 영상으로 구성되어 있다.To learn such a deep learning network, it is essential to secure a high-quality database (hereinafter referred to as DB). Most of the existing deep learning technologies in the RAW area used a DB composed of directly captured images. Therefore, there are not many types of scenes included in the DB according to the time and spatial constraints of video recording, and most are composed of background images.

그러나, 실제 카메라 사용자는 인물을 촬영하는 경우가 대부분이며 또한 빈번하게 셀프 카메라(이하 셀카) 영상을 촬영하므로 셀카 영상에 특화된 딥러닝 기반 화질 개선 기술이 요구되고 있다.However, since most of the actual camera users take pictures of people and frequently take self-camera (hereinafter referred to as selfie) images, there is a need for a deep learning-based image quality improvement technology specialized for selfie images.

본 발명은 전술한 문제를 해결하기 위해 안출된 것으로서, 셀카로 촬영된 영상의 화질을 높이기 위하여, 저조도 환경에서 촬영된 셀카 영상의 화질을 개선하여, 전경인 사용자와 배경이 모두 고화질로 표현되는 영상을 제공하고자 한다.The present invention was devised to solve the above-described problem, and in order to increase the quality of an image taken by a selfie, the image quality of a selfie image taken in a low-light environment is improved, so that both the foreground user and the background are expressed in high quality. I want to provide.

본 발명에 따른 딥러닝을 이용한 자가 촬영 영상의 화질 개선 방법은, (a) 어텐션 네트워크(Attention Network)를 통해 입력 영상으로부터 전경 영역과 배경 영역을 분리하는 단계; (b) 상기 어텐션 네트워크(Attention Network)를 통해 상기 전경 영역과 상기 배경 영역에 대한 밝기 보상 값을 추출하는 단계; 및 (c) ISP 네트워크를 통해 상기 밝기 보상 값을 상기 전경 영역과 상기 배경 영역에 각각 적용하여 고화질 RGB 영상을 복원하는 단계;를 포함한다.The method for improving the image quality of a self-photographed image using deep learning according to the present invention includes: (a) separating a foreground region and a background region from an input image through an attention network; (b) extracting brightness compensation values for the foreground area and the background area through the attention network; And (c) restoring a high-definition RGB image by applying the brightness compensation value to the foreground area and the background area, respectively, through an ISP network.

본 발명의 다른 일실시예에 따르면, 상기 (c) 단계는, 상기 ISP 네트워크를 통해 상기 밝기 보상 값을 상기 전경 영역과 상기 배경 영역에 각각 적용하여 전경 영상과 배경 영상을 생성하는 단계; 및 상기 전경 영상과 상기 배경 영상을 이용하여 고화질 RGB 영상을 복원하는 단계;를 포함할 수 있다.According to another embodiment of the present invention, step (c) may include generating a foreground image and a background image by applying the brightness compensation value to the foreground area and the background area, respectively, through the ISP network; And restoring a high-definition RGB image using the foreground image and the background image.

본 발명의 다른 일실시예에 따르면, 상기 (a) 단계는 세그먼테이션 네트워크(Segmentation Network)를 통해 입력 영상으로부터 전경 영역과 배경 영역을 추출할 수 있다.According to another embodiment of the present invention, in step (a), a foreground region and a background region may be extracted from an input image through a segmentation network.

본 발명의 다른 일실시예에 따르면, 상기 (b) 단계는 증폭 비율 추정 네트워크(Amplification Ratio Estimation Network)를 통해 상기 전경 영역과 상기 배경 영역에 대한 밝기 보상 값을 추출할 수 있다.According to another embodiment of the present invention, in step (b), brightness compensation values for the foreground region and the background region may be extracted through an amplification ratio estimation network.

본 발명의 다른 일실시예에 따르면, 상기 (a) 단계 이전에, 전경 RGB 영상과 배경 RGB 영상을 이용해 RAW 영상을 생성하는 딥러닝 네트워크에 의해 생성되는 데이터베이스를 생성하는 단계; 및 상기 데이터베이스를 이용해 상기 어텐션 네트워크(Attention Network)와 상기 ISP 네트워크를 학습시키는 단계;를 더 포함할 수 있다.According to another embodiment of the present invention, before step (a), generating a database generated by a deep learning network that generates a RAW image using a foreground RGB image and a background RGB image; And learning the attention network and the ISP network using the database.

본 발명의 다른 일실시예에 따르면, 상기 데이터베이스를 생성하는 단계는, 상기 전경 RGB 영상과 상기 배경 RGB 영상에 딥러닝 네트워크를 적용하여 각각 전경 RAW 영상과 배경 RAW 영상을 생성하는 단계; 및 상기 전경 RAW 영상과 상기 배경 RAW 영상을 합성하여 RAW 영상을 생성하고, 상기 RAW 영상을 포함하는 데이터베이스를 구성하는 단계;를 포함할 수 있다.According to another embodiment of the present invention, generating the database includes: generating a foreground RAW image and a background RAW image, respectively, by applying a deep learning network to the foreground RGB image and the background RGB image; And generating a RAW image by synthesizing the foreground RAW image and the background RAW image, and configuring a database including the RAW image.

본 발명의 실시예에 따르면, 저조도 환경에서 촬영된 셀카 영상의 화질을 개선하여, 전경인 사용자와 배경이 모두 고화질로 표현되는 영상을 생성함으로써, 셀카로 촬영된 영상의 화질을 보다 향상시켜 제공할 수 있다.According to an embodiment of the present invention, by improving the quality of a selfie image taken in a low-light environment, by generating an image in which both the foreground user and the background are expressed in high quality, the quality of the image taken with the selfie can be further improved and provided. I can.

도 1은 본 발명의 일실시예에 따른 딥러닝을 이용한 자가 촬영 영상의 화질 개선 방법을 설명하기 위한 흐름도이다.
도 2는 본 발명의 일실시예에 따른 딥러닝을 이용한 자가 촬영 영상의 화질 개선 방법을 설명하기 위한 도면이다.
도 3 및 도 4는 본 발명의 일실시예에 따른 딥러닝을 이용한 자가 촬영 영상의 화질 개선 방법의 학습 방법을 설명하기 위한 도면이다.
도 5는 본 발명의 일실시예에 따른 어텐션 네트워크(Attention Network)를 설명하기 위한 도면이다.
도 6은 본 발명의 일실시예에 따른 ISP 네트워크(Network)를 설명하기 위한 도면이다.1 is a flowchart illustrating a method of improving the quality of a self-photographed image using deep learning according to an embodiment of the present invention.
2 is a view for explaining a method of improving the quality of a self-photographed image using deep learning according to an embodiment of the present invention.
3 and 4 are diagrams for explaining a learning method of a method for improving image quality of a self-photographed image using deep learning according to an embodiment of the present invention.
5 is a diagram illustrating an attention network according to an embodiment of the present invention.
6 is a diagram illustrating an ISP network according to an embodiment of the present invention.

본 발명은 다양한 변환을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세한 설명에 상세하게 설명하고자 한다. 그러나, 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변환, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다.Since the present invention can apply various transformations and have various embodiments, specific embodiments are illustrated in the drawings and will be described in detail in the detailed description. However, this is not intended to limit the present invention to a specific embodiment, it should be understood to include all conversions, equivalents, and substitutes included in the spirit and scope of the present invention.

본 발명을 설명함에 있어서, 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다. 또한, 본 명세서의 설명 과정에서 이용되는 숫자(예를 들어, 제1, 제2 등)는 하나의 구성요소를 다른 구성요소와 구분하기 위한 식별기호에 불과하다.In describing the present invention, when it is determined that a detailed description of a related known technology may unnecessarily obscure the subject matter of the present invention, a detailed description thereof will be omitted. In addition, numbers (eg, first, second, etc.) used in the description of the present specification are merely identification symbols for distinguishing one component from other components.

또한, 명세서 전체에서, 일 구성요소가 다른 구성요소와 "연결된다" 거나 "접속된다" 등으로 언급된 때에는, 상기 일 구성요소가 상기 다른 구성요소와 직접 연결되거나 또는 직접 접속될 수도 있지만, 특별히 반대되는 기재가 존재하지 않는 이상, 중간에 또 다른 구성요소를 매개하여 연결되거나 또는 접속될 수도 있다고 이해되어야 할 것이다. 또한, 명세서 전체에서, 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다. 또한, 명세서에 기재된 "부", "모듈" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하나 이상의 하드웨어나 소프트웨어 또는 하드웨어 및 소프트웨어의 조합으로 구현될 수 있음을 의미한다.In addition, throughout the specification, when one component is referred to as "connected" or "connected" to another component, the one component may be directly connected or directly connected to the other component, but specially It should be understood that as long as there is no opposite substrate, it may be connected or may be connected via another component in the middle. In addition, throughout the specification, when a certain part "includes" a certain component, it means that other components may be further included rather than excluding other components unless specifically stated to the contrary. In addition, terms such as "unit" and "module" described in the specification mean a unit that processes at least one function or operation, which means that it can be implemented as one or more hardware or software, or a combination of hardware and software. .

본 발명에 따른 딥러닝을 이용한 자가 촬영 영상의 화질 개선 방법은 저조도 영상으로 제한되는 것은 아니나, 저조도에서 촬영된 영상에서 기술적 효과가 가장 크게 나타날 수 있다. 본 발명에 따른 딥러닝을 이용한 자가 촬영 영상의 화질 개선 방법은 셀카 영상에 특화된 기술이므로 입력 영상에는 한 명 또는 복수의 인물이 존재하며, 인물 영역(이하 전경)이 영상에서 큰 영역을 차지하는 경우에 더욱 효과적이다. 제안 기술의 입력 영상의 포맷은 카메라 ISP를 거치지 않은 RAW 영상(Bayer RAW RGB 영상)이며 출력 영상의 포맷은 RGB 영상이다. The method for improving the image quality of a self-photographed image using deep learning according to the present invention is not limited to a low-light image, but the technical effect may be the greatest in an image photographed in low-light. Since the method of improving the quality of self-portrait images using deep learning according to the present invention is a technology specialized for selfie images, when there is one or more people in the input image, and the person area (hereinafter, the foreground) occupies a large area in the image, It is more effective. The format of the input image of the proposed technology is a RAW image (Bayer RAW RGB image) that does not go through the camera ISP, and the format of the output image is an RGB image.

이하, 첨부된 도면들을 참조하여 본 발명의 실시예를 상세히 설명한다. Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일실시예에 따른 딥러닝을 이용한 자가 촬영 영상의 화질 개선 방법을 설명하기 위한 흐름도이고, 도 2는 본 발명의 일실시예에 따른 딥러닝을 이용한 자가 촬영 영상의 화질 개선 방법을 설명하기 위한 도면이다.1 is a flow chart for explaining a method of improving the quality of a self-photographed image using deep learning according to an embodiment of the present invention, and FIG. 2 is a flow chart illustrating an improvement of the image quality of a self-photographed image using deep learning according to an embodiment of the present invention. It is a diagram for explaining the method.

이후부터는 도 1 및 도 2를 참조하여 본 발명의 일실시예에 따른 딥러닝을 이용한 자가 촬영 영상의 화질 개선 방법을 설명하기로 한다.Hereinafter, a method of improving the quality of a self-photographed image using deep learning according to an embodiment of the present invention will be described with reference to FIGS. 1 and 2.

본 발명의 일실시예에 따른 본 발명의 딥러닝을 이용한 자가 촬영 영상의 화질 개선 시스템에서는, 고화질 영상 생성을 위하여 두 개의 딥러닝 네트워크로서 어텐션 네트워크(Attention Network)와 ISP 네트워크가 사용된다.In the system for improving the image quality of self-photographed images using deep learning of the present invention according to an embodiment of the present invention, an attention network and an ISP network are used as two deep learning networks to generate a high-definition image.

먼저, 본 발명의 일실시예에 따르면 어텐션 네트워크(Attention Network)를 통해 입력 영상으로부터 전경 영역과 배경 영역을 분리하고(S110), 상기 전경 영역과 상기 배경 영역에 대한 밝기 보상 값을 추출한다(S120).First, according to an embodiment of the present invention, a foreground region and a background region are separated from an input image through an attention network (S110), and brightness compensation values for the foreground region and the background region are extracted (S120). ).

이때, 어텐션 네트워크(Attention Network)는 영상으로부터 전경과 배경 영역을 추출하고, 화질 개선을 위하여 전경과 배경 영역에 적용될 밝기 보상 값을 추출한다.At this time, the attention network extracts the foreground and background regions from the image, and extracts a brightness compensation value to be applied to the foreground and background regions to improve image quality.

또한, 어텐션 네트워크(Attention Network)는 전경과 배경 영역에 각각 다른 밝기 보상 값이 곱해진 두 장의 RAW 영상을 출력한다.In addition, the attention network outputs two RAW images obtained by multiplying the foreground and background regions with different brightness compensation values.

이후, ISP 네트워크를 통해 상기 밝기 보상 값을 상기 전경 영역과 상기 배경 영역에 각각 적용하여 고화질 RGB 영상을 복원한다(S130).Thereafter, the brightness compensation value is applied to the foreground area and the background area through an ISP network, respectively, to restore a high-quality RGB image (S130).

ISP 네트워크는 두 장의 RAW 영상을 입력으로 받아 고화질의 RGB 영상 한 장을 생성하는 네트워크이다.The ISP network is a network that receives two RAW images as inputs and generates one high-quality RGB image.

즉, 상기 ISP 네트워크를 통해 상기 밝기 보상 값을 상기 전경 영역과 상기 배경 영역에 각각 적용하여 전경 영상과 배경 영상을 생성하고, 상기 전경 영상과 상기 배경 영상을 이용하여 고화질 RGB 영상을 복원할 수 있다.That is, a foreground image and a background image may be generated by applying the brightness compensation value to the foreground region and the background region through the ISP network, respectively, and a high-quality RGB image may be restored using the foreground image and the background image. .

이때, ISP 네트워크는 일반적인 카메라 ISP에서 수행하는 demosaicing, denoising, tone mapping 등을 내부적으로 수행할 수 있다.In this case, the ISP network may internally perform demosaicing, denoising, and tone mapping performed by a general camera ISP.

본 발명에 따르면, 개념적으로 어텐션 네트워크(Attention Network)와 ISP 네트워크로서 두 개의 네트워크로 나누어 져 있으나, 한 장의 RAW 영상을 입력으로 받아 한 장의 RGB 영상을 만들어 낸다는 측면에서, 기존의 일반적인 ISP와 입출력이 동일하나, 출력 영상을 생성하는 방식이 다르다. According to the present invention, conceptually, it is divided into two networks as an attention network and an ISP network, but in terms of generating a single RGB image by receiving a single RAW image as an input, the conventional ISP and input/output are It is the same, but the method of generating the output image is different.

도 3 및 도 4는 본 발명의 일실시예에 따른 딥러닝을 이용한 자가 촬영 영상의 화질 개선 방법의 학습 방법을 설명하기 위한 도면이다.3 and 4 are diagrams for explaining a learning method of a method for improving image quality of a self-photographed image using deep learning according to an embodiment of the present invention.

어텐션 네트워크(Attention Network)와 ISP 네트워크(Network)의 학습을 위해서는 충분한 수의 영상으로 구성된 DB가 필요하다.A DB composed of a sufficient number of images is required for learning the Attention Network and the ISP Network.

특히, 영상에는 다양한 배경과 다양한 인물을 포함하여야 하며 저화질의 RAW 영상(입력)과 고화질의 RGB 영상(출력)이 같이 필요하기 때문에, 직접적인 촬영을 통하여 충분한 데이터베이스(DB)를 생성하기 어렵다. 그러므로 본 발명의 일실시예에서는 도 3 및 도 4에 기재된 방식을 사용하여 DB를 생성한다. In particular, it is difficult to create a sufficient database (DB) through direct photographing because the image must include various backgrounds and various people, and both low-quality RAW images (input) and high-quality RGB images (output) are required. Therefore, in an embodiment of the present invention, a DB is created using the method described in FIGS. 3 and 4.

도 3은 RGB 영상을 입력으로 RAW 영상을 만드는 네트워크(이하 r2rNet)를 도시하고 있다.3 shows a network (hereinafter, r2rNet) that generates a RAW image by inputting an RGB image.

일반적인 카메라 ISP가 RAW 영상을 입력으로 RGB 영상을 만들기 때문에 그 반대 방향으로 동작하는 네트워크에 해당한다.It corresponds to a network that operates in the opposite direction because a typical camera ISP creates an RGB image from a RAW image as an input.

셀카가 아닌 영상에 대해서는 고화질의 RAW와 RGB 영상이 쌍으로 존재하는 DB가 존재하므로 r2rNet의 학습이 가능하다. r2rNe은 RGB를 입력, RAW를 출력으로 하는 신경망 네트워크이다.For non-selfie images, r2rNet learning is possible because there is a DB in which high-quality RAW and RGB images exist in pairs. r2rNe is a neural network network that inputs RGB and outputs RAW.

인물 영상 분할(human body segmentation, facial region segmentation)과 같은 분야에서 인물 영역을 정교하게 분할한 정보를 제공하는 다양한 DB가 존재한다. 그러므로 도 4에 도시된 방식을 통하여 본 발명에 필요한 DB를 생성할 수 있다.In fields such as human body segmentation and facial region segmentation, there are various DBs that provide information by elaborately segmenting a person region. Therefore, the DB required for the present invention can be created through the method shown in FIG. 4.

즉, 각각의 배경 RGB 영상과 전경 RGB 영상 (인물 영역 만을 포함한 영상)에 대하여 r2rNet을 이용하여 RAW 영상을 생성하여 데이터베이스(DB)를 구성하고, 상기 데이터베이스를 이용해 상기 어텐션 네트워크(Attention Network)와 상기 ISP 네트워크를 학습시킬 수 있다.That is, for each background RGB image and foreground RGB image (image including only the portrait area), a RAW image is generated using r2rNet to construct a database (DB), and the Attention Network and the You can train your ISP network.

보다 구체적으로, 전경 RAW 영상과 배경 RAW 영상에 대하여 각각 다양한 밝기 보정 값 (0-1 사이의 실수)를 곱하여 다양한 조도에서 촬영된 RAW 영상을 시뮬레이션 할 수 있다. 특히 저조도 환경에서 카메라 설정에 따라 배경이 밝게 찍히고 얼굴은 어둡게 찍히거나, 반대로 얼굴은 밝게 찍히나 배경은 거의 보이지 않는 경우가 빈번하게 발생한다. 이러한 영상들을 RAW 블렌딩(blending) 단계에서 합성을 통하여 생성할 수 있다.More specifically, it is possible to simulate a RAW image shot at various illuminances by multiplying the foreground RAW image and the background RAW image by various brightness correction values (real numbers between 0-1). In particular, in a low-light environment, depending on the camera settings, the background is bright and the face is dark, or, on the contrary, the face is bright but the background is almost invisible. These images can be generated through synthesis in the RAW blending step.

이렇게 합성을 통하여 생성된 RAW 영상이 본 발명의 딥러닝을 이용한 자가 촬영 영상의 화질 개선 시스템에 입력될 수 있다.The RAW image generated through the synthesis may be input to the image quality improvement system of a self-photographed image using deep learning of the present invention.

또한, 4의 RAW 영상 합성 시 이용된 전경 RGB 영상과 배경 RGB 영상을 RGB 영역에서 정교하게 합성(object composition) 한 후, 그 결과 합성 영상을 ISP 네트워크를 학습하기 위한 목표 영상(ground-truth image)으로 사용할 있다.In addition, the foreground RGB image and the background RGB image used when synthesizing the RAW image of 4 are elaborately synthesized in the RGB area, and the resultant synthesized image is used as a ground-truth image for learning the ISP network. Can be used as.

즉, 본 발명의 일실시예에 따르면 전경 RGB 영상과 배경 RGB 영상을 이용해 RAW 영상을 생성하는 딥러닝 네트워크에 의해 생성되는 데이터베이스(DB)를 생성하며, 이때 상기 전경 RGB 영상과 상기 배경 RGB 영상에 딥러닝 네트워크를 적용하여 각각 전경 RAW 영상과 배경 RAW 영상을 생성하고, 상기 전경 RAW 영상과 상기 배경 RAW 영상을 합성하여 RAW 영상을 생성하고, 상기 RAW 영상을 포함하는 데이터베이스(DB)를 구성하며, 상기 데이터베이스(DB)를 이용해 상기 어텐션 네트워크(Attention Network)와 상기 ISP 네트워크를 학습하도록 할 수 있다.That is, according to an embodiment of the present invention, a database (DB) generated by a deep learning network that generates a RAW image using a foreground RGB image and a background RGB image is generated, and at this time, the foreground RGB image and the background RGB image are By applying a deep learning network, each foreground RAW image and background RAW image are generated, the foreground RAW image and the background RAW image are combined to generate a RAW image, and a database (DB) including the RAW image is constructed, The Attention Network and the ISP network may be learned using the database DB.

도 3 및 도 4에 도시된 데이터베이스(DB) 생성 방식은 본 발명의 딥러닝을 이용한 자가 촬영 영상의 화질 개선 방법에서 네트워크를 학습하는데 필요한 DB가 없거나 부족할 경우에 합성을 통하여 DB를 생성하는 방식이다. 그러므로 직접 촬영을 통하여 생성된 충분한 DB(인물과 배경을 포함하는 저화질 RAW 영상과 이와 똑 같은 배경/전경에서의 고화질 RGB 영상으로 구성된 DB)가 있는 경우 이를 이용하도록 구성될 수 있다.The database (DB) generation method shown in FIGS. 3 and 4 is a method of generating a DB through synthesis when there is no or insufficient DB required to learn a network in the method of improving the quality of self-photographed images using deep learning of the present invention. . Therefore, if there is a sufficient DB (a DB composed of a low-quality RAW image including a person and a background and a high-quality RGB image in the same background/foreground) generated through direct shooting, it can be configured to use it.

도 5는 본 발명의 일실시예에 따른 어텐션 네트워크(Attention Network)를 설명하기 위한 도면이다.5 is a diagram illustrating an attention network according to an embodiment of the present invention.

어텐션 네트워크(Attention Network)에는 전경과 배경이 포함된 RAW 영상이입력되며, 전경과 배경에 해당하는 밝기 보정 수치 값이 출력된다.RAW images including foreground and background are input to the attention network, and brightness correction values corresponding to the foreground and background are output.

어텐션 네트워크(Attention Network)는 내부의 세그먼테이션 네트워크(Segmentation Network)를 통해 전경과 배경 영역을 추출한다.The attention network extracts foreground and background regions through an internal segmentation network.

도 3 및 도 4에 도시된 방식을 통해 생성된 데이터베이스(DB)를 이용하면, 전경을 배경에 합성할 경우 세그먼트(segmentation)의 정확한 정답을 알고 있기 때문에 세그먼테이션 네트워크(segmentation network)가 이를 추정하도록 학습시킬 수 있다. When using the database (DB) created through the method shown in Figs. 3 and 4, when the foreground is synthesized into the background, the segmentation network learns to estimate this because the correct answer of the segmentation is known. I can make it.

세그먼테이션 네트워크(segmentation network)의 전경 및 배경 추정 결과와 RAW 영상을 입력으로 하여, 증폭 비율 추정 네트워크(Amplification Ratio Estimation Network)에서 전경과 배경에 해당하는 밝기 보정 수치 값을 추출할 수 있다.By inputting the foreground and background estimation results of the segmentation network and the RAW image, brightness correction values corresponding to the foreground and background may be extracted from an amplification ratio estimation network.

도 3 및 도 4에 도시된 방식을 통해 RAW 영상을 합성하였을 경우, RAW 합성(blending) 단계에서 전경과 배경에 적용한 밝기 보정 값이 정답에 해당한다. 그러므로 증폭 비율 추정 네트워크(Amplification Ratio Estimation Network)가 정답에 최대한 가까운 밝기 보정 수치 값을 추정하도록 학습시킬 수 있다. When the RAW image is synthesized through the method shown in FIGS. 3 and 4, the brightness correction value applied to the foreground and background in the RAW blending step corresponds to the correct answer. Therefore, the Amplification Ratio Estimation Network can be trained to estimate the brightness correction value as close as possible to the correct answer.

도 6은 본 발명의 일실시예에 따른 ISP 네트워크(Network)를 설명하기 위한 도면이다.6 is a diagram illustrating an ISP network according to an embodiment of the present invention.

어텐션 네트워크(Attention Network)에서 취득한 전경 영역에 대하여 전경 영역 밝기 보정 값의 역수를 곱하여 전경에 대한 밝기가 보정된 영상을 생성할 수 있다. An image in which the brightness of the foreground is corrected may be generated by multiplying the foreground region acquired from the attention network by an reciprocal of the foreground region brightness correction value.

마찬가지로 배경 영역에 대하여 밝기가 보정된 영상을 생성할 수 있다.Likewise, an image whose brightness is corrected for the background area may be generated.

ISP 네트워크는 이와 같은 방식으로 생성된 두 장의 영상을 입력으로 하여 하나의 고화질 RGB 영상을 출력으로 생성한다.The ISP network generates one high-quality RGB image as an output by taking two images generated in this manner as inputs.

이때, DB에 포함된 고화질 RGB 영상을 ISP 네트워크가 출력하도록 네트워크를 학습시킬 수 있으며, 본 발명에서는 ISP 네트워크의 세부적인 구조를 한정하지 않으나, RAW 영상으로부터 RGB 영상을 생성하기 때문에 ISP 네트워크에서 내부적으로 demosaicing과 denoising 및 tone mapping 등을 수행할 수 있다.At this time, the network can be trained so that the ISP network outputs the high-definition RGB image included in the DB, and the detailed structure of the ISP network is not limited in the present invention. Demosaicing, denoising, and tone mapping can be performed.

본 발명의 일실시예에 따른 딥러닝을 이용한 자가 촬영 영상의 화질 개선 방법에 의하면, 밝기 보정 값이나 전배경 영역 추출 결과는 내부적으로만 계산이 되며 고화질로 복원된 영상만이 결과 영상으로서 사용자에게 제공될 수 있다.According to the method for improving the image quality of self-photographed images using deep learning according to an embodiment of the present invention, the brightness correction value or the result of extracting the entire background area is calculated only internally, and only the image restored to high quality is used as a result image to the user. Can be provided.

또한, 본 발명의 일실시예에 따르면 생성한 영상에 대하여 sharpness enhancement, tone mapping, color correction 등이 후처리 기술이 보조적으로 적용될 수도 있다.In addition, according to an embodiment of the present invention, post-processing techniques such as sharpness enhancement, tone mapping, and color correction may be additionally applied to the generated image.

이와 같이 본 발명에 따르면, 저조도 환경에서 촬영된 셀카 영상의 화질을 개선하여, 전경인 사용자와 배경이 모두 고화질로 표현되는 영상을 생성함으로써, 셀카로 촬영된 영상의 화질을 보다 향상시켜 제공할 수 있다.As described above, according to the present invention, by improving the quality of a selfie image captured in a low-light environment, by generating an image in which both the foreground user and the background are expressed in high quality, the quality of the image taken by the selfie can be further improved and provided. have.

이상에서는 본 발명의 실시예를 참조하여 설명하였지만, 해당 기술 분야에서 통상의 지식을 가진 자라면 하기의 특허 청구의 범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 쉽게 이해할 수 있을 것이다.Although the above has been described with reference to the embodiments of the present invention, those of ordinary skill in the relevant technical field variously modify the present invention within the scope not departing from the spirit and scope of the present invention described in the following claims. And it will be easily understood that it can be changed.

Claims

(a) separating a foreground region and a background region from the input image through an attention network;
(b) extracting brightness compensation values for the foreground area and the background area through the attention network; And
(c) restoring a high-definition RGB image by applying the brightness compensation value to the foreground area and the background area, respectively, through an ISP network;
Method for improving the quality of self-photographed images using deep learning comprising a.

The method according to claim 1,
The step (c),
Generating a foreground image and a background image by applying the brightness compensation value to the foreground area and the background area, respectively, through the ISP network; And
Restoring a high-definition RGB image using the foreground image and the background image;
Method for improving the quality of self-photographed images using deep learning comprising a.

The method according to claim 1,
The step (a),
A method for improving the image quality of self-portrait images using deep learning that extracts foreground and background regions from an input image through a segmentation network.

The method according to claim 1,
The step (b),
A method of improving the quality of a self-photographed image using deep learning for extracting brightness compensation values for the foreground region and the background region through an amplification ratio estimation network.

The method according to claim 1,
Before step (a),
Generating a database generated by a deep learning network that generates a RAW image using a foreground RGB image and a background RGB image; And
Learning the attention network and the ISP network using the database;
Method for improving the quality of self-photographed images using deep learning further comprising a.

The method of claim 5,
The step of creating the database,
Generating a foreground RAW image and a background RAW image, respectively, by applying a deep learning network to the foreground RGB image and the background RGB image; And
Generating a RAW image by synthesizing the foreground RAW image and the background RAW image, and constructing a database including the RAW image;
Method for improving the quality of self-photographed images using deep learning comprising a.