KR102236904B1

KR102236904B1 - Method and apparatus for compositing images

Info

Publication number: KR102236904B1
Application number: KR1020190167199A
Authority: KR
Inventors: 나인섭; 이윤혁; 김영심
Original assignee: 조선대학교산학협력단
Priority date: 2019-12-13
Filing date: 2019-12-13
Publication date: 2021-04-06

Abstract

Disclosed are a method and an apparatus thereof for synthesizing images by executing an artificial intelligence (AI) algorithm and/or a machine learning algorithm in a 5G environment connected for Internet of Things. According to one embodiment of the present invention, the image synthesis method includes the steps of: obtaining a first image including a face image; detecting a characteristic point of the first image by applying a pre-trained deep neural network model; obtaining a second image to be synthesized with the first image; extracting a border of the second image; matching a coordinate value corresponding to the border of the second image based on a coordinate value of characteristic values of the first image; and merging and outputting the first and second images.

Description

Image composition method and apparatus {METHOD AND APPARATUS FOR COMPOSITING IMAGES}

본 발명은 탑재된 인공지능(artificial intelligence, AI) 알고리즘 및/또는 기계학습(machine learning) 알고리즘을 실행하여 이미지를 합성할 수 있도록 하는 이미지 합성 방법 및 장치에 관한 것이다.The present invention relates to a method and apparatus for synthesizing images by executing an onboard artificial intelligence (AI) algorithm and/or a machine learning algorithm.

최근 들어 스마트폰 보급이 대중화되고, 스마트폰을 포함하는 모바일 기기에 부착된 소형 카메라의 성능이 꾸준히 발전하고 있다. 이러한 소형 카메라를 이용하여 손쉽게 자신을 촬영할 수 있을 뿐만 아니라, 촬영한 사진을 토대로 사진을 자신의 취향에 맞게 수정할 수 있다. 이에 촬영한 사진을 이용하는 소프트웨어들이 꾸준히 개발되고 있다. 특히 SNS(Social Network System) 산업의 발달로 인해 디지털 콘텐츠의 생산량 및 확산(전파) 속도가 기하급수적으로 증가하고 있다.Recently, the spread of smartphones has become popular, and the performance of small cameras attached to mobile devices including smartphones is steadily developing. Not only can you easily take a picture of yourself using such a small camera, but you can also modify the picture to suit your taste based on the taken picture. As a result, software that uses photographed photos is constantly being developed. In particular, due to the development of the SNS (Social Network System) industry, the production volume and spread (propagation) speed of digital contents are increasing exponentially.

한편, 모바일 기기에 부착된 소형 카메라로 얼굴을 촬영하고, 촬영한 사진을 토대로 수정하거나 다른 이미지를 합성하기 위해서는 얼굴 검출 기술이 기반이 되어야 한다. 얼굴 검출(face detection)은 컴퓨터 비전(computer vision)의 한 분야로 이미지(Image)에서 얼굴이 존재하는 위치를 알려주는 기술이다. 얼굴 검출은 다양한 알고리즘을 통해 구현될 수 있다.Meanwhile, in order to photograph a face with a small camera attached to a mobile device, correct it based on the photographed picture, or synthesize another image, a face detection technology must be the basis. Face detection is a branch of computer vision, a technology that tells the location of a face in an image. Face detection can be implemented through various algorithms.

공개특허공보 제10-1206132호 (발명 신고서 기재된 공개건 대응)Unexamined Patent Publication No. 10-1206132 (corresponds to the disclosure case described in the invention report)

등록특허공보 제10-1862128호 (발명 신고서 기재된 공개건 대응)Registered Patent Publication No. 10-1862128 (corresponds to the disclosure case described in the invention report)

다만 종래의 이미지 합성 소프트웨어(애플리케이션 등)들은 얼굴 인식 기술을 실제 환경에 적용할 때 주변 환경에 따라 인식 성능이 저하되어 부정확한 얼굴 이미지 합성의 문제가 대두되고 있다. 뿐만 아니라, 종래의 이미지 합성 소프트웨어에 탑재되는 합성 기술은 이미지 내의 얼굴의 위치나 크기에 고려하지 않고 미리 설정된 위치나 개발자가 설정한 위치에 임의의 크기로 이미지를 합성하여 이미지 내의 얼굴과 조화를 이루지 못하는 문제가 있다.However, in the case of conventional image synthesis software (applications, etc.), when a face recognition technology is applied to an actual environment, recognition performance is degraded depending on the surrounding environment, leading to a problem of inaccurate face image synthesis. In addition, the compositing technology installed in the conventional image compositing software does not consider the position or size of the face in the image, and combines the image at an arbitrary size at a preset position or a position set by the developer to achieve harmony with the face in the image. There is a problem that cannot be done.

전술한 배경기술은 발명자가 본 발명의 도출을 위해 보유하고 있었거나, 본 발명의 도출 과정에서 습득한 기술 정보로서, 반드시 본 발명의 출원 전에 일반 공중에게 공개된 공지기술이라 할 수는 없다.The above-described background technology is technical information possessed by the inventor for derivation of the present invention or acquired during the derivation process of the present invention, and is not necessarily known to be known to the general public prior to filing the present invention.

본 개시의 일 과제는, 탑재된 인공지능(artificial intelligence, AI) 알고리즘 및/또는 기계학습(machine learning) 알고리즘을 실행하여 이미지를 합성할 수 있도록 하는데 있다.An object of the present disclosure is to enable an onboard artificial intelligence (AI) algorithm and/or a machine learning algorithm to be executed to synthesize an image.

본 개시의 일 과제는, 얼굴이 포함된 이미지에 대해 인공지능 심층 학습 기술을 기반으로 특징점들을 추출하여, 이미지 내의 얼굴과 합성될 이미지가 자연스럽게 합성될 수 있도록 하는데 있다.An object of the present disclosure is to extract feature points from an image including a face based on artificial intelligence deep learning technology, so that a face in the image and an image to be synthesized can be naturally synthesized.

본 개시의 일 과제는, 얼굴 인식 기술 및 얼굴의 특징점 검출과 같은 지능화 수준의 기술과 정확하게 이미지를 축소시키는 기술을 조합하여, 최적의 이미지 합성이 가능하도록 하는데 있다.An object of the present disclosure is to enable optimal image synthesis by combining a technology of an intelligent level such as face recognition technology and face feature point detection, and a technology for accurately reducing an image.

본 개시의 일 과제는, 미리 정의된 딥 러닝 네트워크를 이용하여 정확한 얼굴 특징점을 검출하여, 주변 환경에 상관없이 인식 성능을 향상시킬 수 있으며, 정확한 얼굴 이미지 합성이 가능하도록 하는데 있다.An object of the present disclosure is to detect accurate facial feature points using a predefined deep learning network, thereby improving recognition performance irrespective of the surrounding environment, and enabling accurate face image synthesis.

본 개시의 일 과제는, 이미지 내의 얼굴의 위치나 크기를 고려하여 최적의 위치에 이미지를 합성하여 이미지 내의 얼굴과 조화를 이룰 수 있도록 하는데 있다.An object of the present disclosure is to synthesize an image at an optimal position in consideration of the position or size of a face in the image so that it can be harmonized with the face in the image.

본 개시의 실시예의 목적은 이상에서 언급한 과제에 한정되지 않으며, 언급되지 않은 본 발명의 다른 목적 및 장점들은 하기의 설명에 의해서 이해될 수 있고, 본 발명의 실시 예에 의해 보다 분명하게 이해될 것이다. 또한, 본 발명의 목적 및 장점들은 특허 청구 범위에 나타낸 수단 및 그 조합에 의해 실현될 수 있음을 알 수 있을 것이다.The object of the embodiments of the present disclosure is not limited to the above-mentioned problems, and other objects and advantages of the present invention that are not mentioned can be understood by the following description, and will be more clearly understood by the embodiments of the present invention. will be. In addition, it will be appreciated that the objects and advantages of the present invention can be realized by the means shown in the claims and combinations thereof.

본 개시의 일 실시 예에 따른 이미지 합성 방법은, 탑재된 인공지능(artificial intelligence, AI) 알고리즘 및/또는 기계학습(machine learning) 알고리즘을 실행하여 이미지를 합성할 수 있도록 하는 단계를 포함할 수 있다.An image synthesis method according to an embodiment of the present disclosure may include executing an onboard artificial intelligence (AI) algorithm and/or a machine learning algorithm to synthesize an image. .

구체적으로 본 개시의 일 실시 예에 따른 이미지 합성 방법은, 얼굴 이미지를 포함하는 제 1 이미지를 획득하는 단계와, 미리 훈련된 심층 신경망 모델을 적용하여 제 1 이미지의 특징점을 검출하는 단계와, 제 1 이미지에 합성하기 위한 제 2 이미지를 획득하는 단계와, 제 2 이미지의 경계를 추출하는 단계와, 제 1 이미지의 특징점들의 좌표값을 기반으로 제 2 이미지의 경계에 대응하는 좌표값을 매칭하는 단계와, 제 1 이미지와 제 2 이미지를 병합(merging)하여 출력하는 단계를 포함할 수 있다.Specifically, the image synthesis method according to an embodiment of the present disclosure includes obtaining a first image including a face image, detecting a feature point of the first image by applying a pretrained deep neural network model, and Acquiring a second image to be combined with the first image, extracting the boundary of the second image, and matching coordinate values corresponding to the boundary of the second image based on the coordinate values of the feature points of the first image. And merging and outputting the first image and the second image.

본 개시의 일 실시 예에 따른 이미지 합성 방법을 통하여, 얼굴이 포함된 이미지에 대해 인공지능 심층 학습 기술을 기반으로 특징점들을 추출함으로써, 주변 환경에 상관없이 인식 성능을 향상시킬 수 있으며, 얼굴 인식 기술 및 얼굴의 특징점 검출과 같은 지능화 수준의 기술과 정확하게 이미지를 축소시키는 기술을 조합하여 이미지를 합성함으로써, 이미지 내의 얼굴과 합성될 이미지가 자연스럽게 합성되도록 할 수 있다.Through the image synthesis method according to an embodiment of the present disclosure, by extracting feature points based on artificial intelligence deep learning technology for an image including a face, recognition performance can be improved regardless of the surrounding environment, and face recognition technology And an image to be synthesized with a face in the image may be naturally synthesized by combining an intelligent level technology such as detecting feature points of a face and a technology for accurately reducing the image.

이 외에도, 본 발명의 구현하기 위한 다른 방법, 다른 시스템 및 상기 방법을 실행하기 위한 컴퓨터 프로그램이 저장된 컴퓨터로 판독 가능한 기록매체가 더 제공될 수 있다.In addition, another method for implementing the present invention, another system, and a computer-readable recording medium storing a computer program for executing the method may be further provided.

전술한 것 외의 다른 측면, 특징, 이점이 이하의 도면, 특허청구범위 및 발명의 상세한 설명으로부터 명확해질 것이다.Other aspects, features, and advantages other than those described above will become apparent from the following drawings, claims, and detailed description of the invention.

본 개시의 실시 예에 의하면, 얼굴이 포함된 이미지에 대해 인공지능 심층 학습 기술을 기반으로 특징점들을 추출함으로써, 주변 환경에 상관없이 인식 성능을 향상시킬 수 있다.According to an embodiment of the present disclosure, by extracting feature points from an image including a face based on artificial intelligence deep learning technology, recognition performance may be improved regardless of a surrounding environment.

또한, 얼굴 인식 기술 및 얼굴의 특징점 검출과 같은 지능화 수준의 기술과 정확하게 이미지를 축소시키는 기술을 조합하여 이미지를 합성함으로써, 이미지 내의 얼굴과 합성될 이미지가 자연스럽게 합성되도록 하여, 이미지 합성 장치의 성능을 향상시킬 수 있다.In addition, by synthesizing an image by combining an intelligent level technology such as face recognition technology and face feature point detection, and a technology that accurately reduces the image, the face in the image and the image to be combined are naturally synthesized, thereby improving the performance of the image synthesis device. Can be improved.

또한, 탑재된 인공지능(artificial intelligence, AI) 알고리즘 및/또는 기계학습(machine learning) 알고리즘을 실행하여 이미지를 합성할 수 있도록 함으로써, 정확한 얼굴 특징점을 검출하여, 정확한 얼굴 이미지 합성이 가능하도록 할 수 있다.In addition, by executing onboard artificial intelligence (AI) algorithms and/or machine learning algorithms to synthesize images, accurate facial feature points can be detected to enable accurate facial image synthesis. have.

또한, 이미지 내의 얼굴의 위치나 크기를 고려하여 최적의 위치에 이미지를 합성함으로써, 이미지 내의 얼굴과 조화를 이룰 수 있도록 할 수 있다.In addition, by synthesizing the image at an optimal position in consideration of the position or size of the face in the image, it is possible to achieve harmony with the face in the image.

또한, 심층학습기술로 얼굴 특징점 검출 기술을 구현함으로써, 이미지 내의 얼굴과 합성될 이미지가 자연스럽게 합성하여 각종 얼굴을 기반으로 한 이미지 서비스에 응용 제공할 수 있다.In addition, by implementing the facial feature point detection technology as a deep learning technology, a face in the image and an image to be synthesized can be naturally synthesized and applied to image services based on various faces.

본 발명의 효과는 이상에서 언급된 것들에 한정되지 않으며, 언급되지 아니한 다른 효과들은 아래의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다.The effects of the present invention are not limited to those mentioned above, and other effects not mentioned will be clearly understood by those skilled in the art from the following description.

도 1은 본 개시의 일 실시 예에 따른 심층 신경망 모델을 설명하기 위한 개략적인 예시도이다.
도 2는 본 개시의 일 실시 예에 따른 이미지 합성 장치를 개략적으로 나타낸 블록도이다.
도 3은 본 개시의 일 실시 예에 따른 처리부를 개략적으로 나타낸 블록도이다.
도 4는 본 개시의 일 실시 예에 따른 경계 추출을 설명하기 위한 예시도이다.
도 5 및 도 6은 본 개시의 일 실시 예에 따른 특징점 검출을 설명하기 위한 예시도이다.
도 7은 본 개시의 일 실시 예에 따른 이미지 합성 결과를 나타낸 예시도이다.
도 8은 본 개시의 일 실시 예에 따른 이미지 합성 방법을 설명하기 위한 흐름도이다.1 is a schematic diagram illustrating a deep neural network model according to an embodiment of the present disclosure.
2 is a block diagram schematically illustrating an image synthesizing apparatus according to an embodiment of the present disclosure.
3 is a block diagram schematically illustrating a processing unit according to an embodiment of the present disclosure.
4 is an exemplary diagram for describing boundary extraction according to an embodiment of the present disclosure.
5 and 6 are exemplary diagrams for describing feature point detection according to an embodiment of the present disclosure.
7 is an exemplary view showing an image synthesis result according to an embodiment of the present disclosure.
8 is a flowchart illustrating a method of synthesizing an image according to an embodiment of the present disclosure.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 설명되는 실시 예들을 참조하면 명확해질 것이다. 그러나 본 발명은 아래에서 제시되는 실시 예들로 한정되는 것이 아니라, 서로 다른 다양한 형태로 구현될 수 있고, 본 발명의 사상 및 기술 범위에 포함되는 모든 변환, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. 아래에 제시되는 실시 예들은 본 발명의 개시가 완전하도록 하며, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이다. 본 발명을 설명함에 있어서 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다.Advantages and features of the present invention, and a method of achieving them will be apparent with reference to embodiments described in detail together with the accompanying drawings. However, it should be understood that the present invention is not limited to the embodiments presented below, but may be implemented in various different forms, and includes all transformations, equivalents, and substitutes included in the spirit and scope of the present invention. . The embodiments presented below are provided to complete the disclosure of the present invention, and to completely inform the scope of the invention to those of ordinary skill in the art to which the present invention pertains. In describing the present invention, when it is determined that a detailed description of a related known technology may obscure the subject matter of the present invention, a detailed description thereof will be omitted.

본 출원에서 사용한 용어는 단지 특정한 실시 예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 출원에서, 포함하다 또는 가지다 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다. 제1, 제2 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다.The terms used in the present application are only used to describe specific embodiments, and are not intended to limit the present invention. Singular expressions include plural expressions unless the context clearly indicates otherwise. In the present application, terms such as include or have are intended to designate the existence of features, numbers, steps, actions, components, parts, or combinations thereof described in the specification, and one or more other features, numbers, and steps. It is to be understood that it does not preclude the possibility of the presence or addition of, operations, components, parts, or combinations thereof. Terms such as first and second may be used to describe various elements, but the elements should not be limited by the terms. The above terms are used only for the purpose of distinguishing one component from another component.

이하, 본 발명에 따른 실시 예들을 첨부된 도면을 참조하여 상세히 설명하기로 하며, 첨부 도면을 참조하여 설명함에 있어, 동일하거나 대응하는 구성요소는 동일한 도면번호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings, and in the description with reference to the accompanying drawings, the same or corresponding components are given the same reference numbers, and redundant descriptions thereof are omitted. I will do it.

먼저, 본 실시 예는 미리 훈련된 심층 신경망 모델을 기반으로 얼굴 이미지의 특징점을 검출하고 상기 얼굴 이미지에 다른 이미지를 합성하는 것에 관한 것이다. 특히 본 실시 예에서 심층 신경망 모델은 합성곱 신경망(CNN: Convolution Neural Network)일 수 있다. 본 실시 예에서, 이미지 합성 방법을 간략히 설명하면, 얼굴 이미지에서 얼굴을 인식하고, 특징점을 추출하기 위하여 훈련할 수 있다. 그리고 합성될 이미지의 경계를 추출한 후 얼굴 이미지에 대응하여 크기를 조절할 수 있다. 또한 얼굴 이미지에서 예측된 좌표에 스케일된 합성될 이미지를 병합하여 합성 결과를 출력할 수 있다.First, the present embodiment relates to detecting feature points of a face image based on a pretrained deep neural network model and synthesizing another image with the face image. In particular, in this embodiment, the deep neural network model may be a convolution neural network (CNN). In the present embodiment, briefly describing the image synthesis method, training may be performed to recognize a face from a face image and extract feature points. And after extracting the boundary of the image to be synthesized, the size can be adjusted corresponding to the face image. In addition, the result of synthesis may be output by merging the scaled image to be synthesized with the coordinates predicted in the face image.

여기서, 얼굴 이미지의 특징점이란 얼굴 이미지에 결합될 이미지를 합성하기 위해 기준점이 될 수 있는 얼굴의 특정 부위들로서, 예를 들어, 눈썹의 양 끝 지점, 눈의 양 끝 지점, 눈동자, 코 중앙 끝 지점, 입술의 양 끝 지점, 윗 입술 중앙 지점, 아랫 입술 중앙 지점 등일 수 있으며, 이러한 지점들은 임의로 정해질 수 있다. Here, the feature points of the face image are specific parts of the face that can be reference points for synthesizing the image to be combined with the face image. For example, the ends of the eyebrows, the ends of the eyes, the pupil, the central end of the nose , Both end points of the lips, the center of the upper lip, the center of the lower lip, etc., and these points can be arbitrarily determined.

도 1은 본 개시의 일 실시 예에 따른 심층 신경망 모델을 설명하기 위한 개략적인 예시도이다.1 is a schematic diagram illustrating a deep neural network model according to an embodiment of the present disclosure.

CNN의 네트워크 구조는 합성곱 계층(convolutional layer)과 풀링 계층(pooling layer)을 포함할 수 있다. 합성곱 계층은 특정 시스템에 입력이 가해졌을 때 시스템의 반응이 어떻게 되는지 해석하기 위한 것이다. 합성곱 계층은 이미지 처리 분야에서 주로 필터 연산에 사용이 되며, 이미지로부터 특정 특징(feature)들을 추출하기 위한 필터를 구현할 때 사용될 수 있다. 예를 들어, 3*3 또는 그 이상의 윈도우 혹은 마스크를 이미지 전체에 대해서 반복적으로 수행을 하게 되면, 그 마스크의 계수(weight) 값들에 따라 적정한 결과를 얻을 수 있다. 합성곱 계층에서의 입출력 데이터를 특징 맵(feature map)이라고 할 수 있다.The network structure of the CNN may include a convolutional layer and a pooling layer. The convolutional layer is intended to interpret how a system reacts when an input is applied to a particular system. The convolutional layer is mainly used for filter operation in the field of image processing, and can be used when implementing a filter for extracting specific features from an image. For example, if a 3*3 or more window or mask is repeatedly performed on the entire image, an appropriate result can be obtained according to the weight values of the mask. The input/output data in the convolutional layer may be referred to as a feature map.

풀링 계층은 세로, 가로 방향의 공간을 줄이는 연산으로 overfitting을 방지하기 위해 사용될 수 있다. 풀링은 최대 풀링(max pooling), 평균 풀링(average pooling) 등이 포함될 수 있다. 최대 풀링은 대상 영역에서 최댓값을 취하는 연산이며, 평균 풀링은 대상 영역의 평균을 계산하는 것이다.The pooling layer can be used to prevent overfitting by reducing the space in the vertical and horizontal directions. Pooling may include max pooling, average pooling, and the like. The maximum pooling is an operation that takes the maximum value in the target area, and the average pooling is calculating the average of the target area.

도 1에 도시된 바와 같이, CNN의 네트워크는 컨볼루션 레이어(Convolution Layer)와 맥스풀링 레이어(Maxpooling Layer)를 반복적으로 스택을 쌓는 특징 추출(Feature Extraction) 부분과 이전 계층의 모든 뉴런과 결합된 형태의 층인 완전연결 레이어(Fully Connected Layer)를 구성하고 마지막 출력 층에 소프트맥스(SoftMax)를 적용한 분류 부분으로 나눌 수 있다. As shown in Fig. 1, the CNN network is a feature extraction part that repeatedly stacks a convolution layer and a maxpooling layer, and a form in which all neurons of the previous layer are combined. It can be divided into a category that composes the fully connected layer, which is a layer of, and applies SoftMax to the last output layer.

예를 들어 CNN은 96*96*1 크기의 입력 이미지에 대해, 먼저 5*5 크기를 갖는 kernel(학습이 될 변수)을 사용해 1차 컨볼루션(convolution)을 적용하여, 96*96*n1 크기를 갖는 특징 맵(feature map)들을 생성할 수 있다. 이때 패딩(padding) 옵션은 valid로 설정될 수 있다. 다음으로, 1차 컨볼루션으로 생성된 특징 맵은 맥스 풀링(max-pooling)을 거치고 48*48*n1 크기의 중간 이미지가 생성될 수 있다. 그리고 다시 5*5 크기를 갖는 kernel을 사용해 2차 컨볼루션을 수행하고, 48*48*n2 크기를 갖는 feature map들이 생성될 수 있다. 또한 2차 컨볼루션으로 생성된 특징 맵은 다시 Max-pooling을 거치고 24*24*n2 크기의 이미지들이 생성될 수 있다. 본 실시 예의 CNN에서는, 그 중 256개를 선택하여 FC(Fully connected) 레이어에 연결하고, 또한 128개를 선택하여 FC에 연결한 후, 최종적으로 30개의 class를 분류할 수 있도록 할 수 있다.For example, CNN applies a first-order convolution for an input image of size 96*96*1, using a kernel (variable to be trained) with a size of 5*5 first, and then applies a size of 96*96*n1. It is possible to generate feature maps with. At this time, the padding option may be set to valid. Next, the feature map generated by the first convolution may undergo max-pooling and an intermediate image having a size of 48*48*n1 may be generated. And again, a second order convolution is performed using a kernel having a size of 5*5, and feature maps having a size of 48*48*n2 can be generated. In addition, the feature map generated by the second convolution may undergo max-pooling again, and images having a size of 24*24*n2 may be generated. In the CNN of the present embodiment, 256 of them are selected and connected to the FC (fully connected) layer, and 128 of them are selected and connected to the FC, and finally 30 classes can be classified.

즉, CNN에서의 컨볼루션은 필터링을 위한 계수가 고정되어 있는 것이 아니라, 학습을 통해 계수 값을 정할 수 있다. 다시 말해, CNN 알고리즘을 통해 처리하고자 하는 과제에 따라 최종 컨볼루션 kernel의 계수가 달라질 수 있다. 동일 과제일지라도 학습에 사용하는 학습 데이터에 따라서도 달라질 수 있고, 설정한 하이퍼 파라미터(hyper-parameter)의 값에 따라서도 달라질 수 있다. 이때, 계수의 값은 기울기(gradient)에 기반한 역전파(back-propagation)에 의해 결정될 수 있다.That is, in the convolution in CNN, coefficients for filtering are not fixed, but coefficient values can be determined through learning. In other words, the coefficient of the final convolution kernel may vary depending on the task to be processed through the CNN algorithm. Even for the same task, it may vary according to the learning data used for learning, and may vary according to the value of the set hyper-parameter. In this case, the value of the coefficient may be determined by back-propagation based on a gradient.

또한 CNN은 컨볼루션의 특성을 살린 신경망 연산을 하는 것으로, 국지적 연결성의 특징을 가질 수 있다. 즉 CNN은 수용영역(receptive field)과 유사하게 로컬 정보를 활용할 수 있다. 여기서, 수용영역은 외부 자극이 전체 영향을 끼치는 것이 아니라 특정 영역에만 영향을 주는 것으로, 최종출력의 1개 픽셀에 영향을 미치는 입력이미지의 영역을 의미할 수 있다. 그리고 CNN은 공간적으로 인접한 신호들에 대한 상관관계(correlation)를 비선형 필터를 적용하여 추출해 낼 수 있다. 이런 필터를 여러 개를 적용하면 다양한 local 특징을 추출해 낼 수 있게 된다. 즉, 서브 샘플링(Subsampling) 과정을 거치면서 이미지의 크기를 줄이고 local feature들에 대한 필터 연산을 반복적으로 적용하면 점차 global feature를 얻을 수 있게 된다. 또한, CNN은 동일한 계수를 갖는 filter를 전체 이미지에 반복적으로 적용하여 변수의 수를 획기적으로 줄일 수 있으며, 토폴로지(topology) 변화에 무관한 불변성(invariance)를 얻을 수 있게 된다. 한편, CNN의 네트워크 구조는 상술하는 기재에 한정되지는 않는다.In addition, CNN is a neural network operation that utilizes the characteristics of convolution, and can have a feature of local connectivity. That is, the CNN can utilize local information similar to the receptive field. Here, the receptive region does not affect the entire external stimulus, but only a specific region, and may refer to a region of the input image that affects one pixel of the final output. In addition, the CNN can extract correlations between spatially adjacent signals by applying a nonlinear filter. Applying multiple such filters makes it possible to extract various local features. That is, if the size of the image is reduced while undergoing a subsampling process and filter operations for local features are repeatedly applied, global features can be gradually obtained. In addition, CNN can drastically reduce the number of variables by repeatedly applying filters having the same coefficients to the entire image, and can obtain invariance independent of changes in topology. On the other hand, the network structure of the CNN is not limited to the above description.

도 2는 본 개시의 일 실시 예에 따른 이미지 합성 장치를 개략적으로 나타낸 블록도이다.2 is a block diagram schematically illustrating an image synthesizing apparatus according to an embodiment of the present disclosure.

도 2를 참조하면, 이미지 합성 장치는 통신 인터페이스(100), 카메라(200), 센싱부(300), 프로세서(400), 메모리(500), 표시부(600) 및 처리부(700)를 포함할 수 있다.Referring to FIG. 2, the image synthesizing apparatus may include a communication interface 100, a camera 200, a sensing unit 300, a processor 400, a memory 500, a display unit 600, and a processing unit 700. have.

통신 인터페이스(100)는 얼굴 이미지를 포함하는 제 1 이미지 및 상기 제 1 이미지에 합성하기 위한 제 2 이미지를 수신하는 통신 수단일 수 있다. 통신 인터페이스(100)는 카메라(200)로부터 제 1 이미지 및 제 2 이미지를 수신할 수 있고, 그 외 서버(미도시)나 별도 입력 수단을 통해 이미지를 수신할 수 있다. 그리고 통신 인터페이스(100)는 수신한 이미지들을 프로세서(400)에 전송할 수 있다.The communication interface 100 may be a communication means for receiving a first image including a face image and a second image to be combined with the first image. The communication interface 100 may receive a first image and a second image from the camera 200, and may receive an image through a server (not shown) or a separate input means. In addition, the communication interface 100 may transmit the received images to the processor 400.

통신 인터페이스(100)는 네트워크(미도시)와 연동하여 이미지 합성 장치 및/또는 서버 간의 송수신 신호를 패킷 데이터 형태로 제공할 수 있다. 또한 통신 인터페이스(100)는 이미지 합성 장치로부터의 소정의 정보 요청 신호를 서버로 전송하거나 서버로부터의 소정의 정보 요청 신호를 이미지 합성 장치로 전송할 수 있다. 그리고 통신 인터페이스(100)는 서버가 처리한 응답 신호를 수신하여 이미지 합성 장치로 전송하거나, 이미지 합성 장치에서 처리된 응답 신호를 수신하여 서버로 전송할 수 있다.The communication interface 100 may interwork with a network (not shown) to provide a transmission/reception signal between an image synthesizing apparatus and/or a server in the form of packet data. In addition, the communication interface 100 may transmit a predetermined information request signal from the image synthesizing apparatus to the server or transmit a predetermined information request signal from the server to the image synthesizing apparatus. In addition, the communication interface 100 may receive a response signal processed by the server and transmit it to the image synthesizing device, or may receive a response signal processed by the image synthesizing device and transmit it to the server.

또한 통신 인터페이스(100)는 다른 네트워크 장치와 유무선 연결을 통해 제어 신호 또는 데이터 신호와 같은 신호를 송수신하기 위해 필요한 하드웨어 및 소프트웨어를 포함하는 장치일 수 있다.In addition, the communication interface 100 may be a device including hardware and software necessary for transmitting and receiving a signal such as a control signal or a data signal through a wired or wireless connection with another network device.

또한, 통신 인터페이스(100)는 각종 사물 지능 통신(IoT(internet of things), IoE(internet of everything), IoST(internet of small things) 등)을 지원할 수 있으며, M2M(machine to machine) 통신, V2X(vehicle to everything communication) 통신, D2D(device to device) 통신 등을 지원할 수 있다.In addition, the communication interface 100 may support various intelligent communication of things (internet of things (IoT), internet of everything (IoE), internet of small things (IoST), etc.)), and machine to machine (M2M) communication, V2X (vehicle to everything communication) communication, D2D (device to device) communication, etc. can be supported.

한편, 서버는 각종 인공지능 알고리즘을 적용하는데 필요한 빅데이터 및, 이미지 합성 장치를 동작시키는 데이터를 제공하는 데이터베이스 서버일 수 있다. 이미지 합성 장치의 프로세싱 능력에 따라, 서버에서 수행되는 기능이 달라질 수 있다.Meanwhile, the server may be a database server that provides big data required to apply various artificial intelligence algorithms and data for operating an image synthesis device. Depending on the processing capability of the image synthesizing device, functions performed in the server may vary.

또한 서버가 AI 서버인 경우, 서버는 AI 프로세싱을 수행하는 서버와 빅 데이터에 대한 연산을 수행하는 서버를 포함할 수 있다. 그 밖에 서버는 사용자 단말기(미도시)에 설치된 이미지 합성 시스템 애플리케이션 또는 이미지 합성 시스템 웹 브라우저를 이용하여 사용자가 이미지 합성 시스템을 이용할 수 있도록 하는 웹 서버 또는 애플리케이션 서버를 포함할 수 있다.In addition, when the server is an AI server, the server may include a server that performs AI processing and a server that performs operations on big data. In addition, the server may include a web server or an application server that enables a user to use the image synthesis system by using an image synthesis system application installed in a user terminal (not shown) or an image synthesis system web browser.

여기서, 사용자 단말기는 이미지 합성 시스템 애플리케이션 또는 이미지 합성 시스템 사이트에 접속한 후 인증 과정을 통하여 이미지 합성 시스템 작동 또는 제어를 위한 서비스를 제공받을 수 있다. 본 실시 예에서 인증 과정을 마친 사용자 단말기는 이미지 합성 시스템을 작동시키고, 제어할 수 있다. 본 실시 예에서 사용자 단말기는 사용자가 조작하는 데스크 탑 컴퓨터, 스마트폰, 노트북, 태블릿 PC, 스마트 TV, 휴대폰, PDA(personal digital assistant), 랩톱, 미디어 플레이어, 마이크로 서버, GPS(global positioning system) 장치, 전자책 단말기, 디지털방송용 단말기, 네비게이션, 키오스크, MP3 플레이어, 디지털 카메라, 가전기기 및 기타 모바일 또는 비모바일 컴퓨팅 장치일 수 있으나, 이에 제한되지 않는다. 또한, 사용자 단말기는 통신 기능 및 데이터 프로세싱 기능을 구비한 시계, 안경, 헤어 밴드 및 반지 등의 웨어러블 단말기 일 수 있다. 사용자 단말기는 상술한 내용에 제한되지 아니하며, 웹 브라우징이 가능한 단말기는 제한 없이 차용될 수 있다. Here, the user terminal may be provided with a service for operating or controlling the image synthesis system through an authentication process after accessing the image synthesis system application or the image synthesis system site. In this embodiment, the user terminal that has completed the authentication process may operate and control the image synthesis system. In the present embodiment, the user terminal is a desktop computer, a smartphone, a notebook computer, a tablet PC, a smart TV, a mobile phone, a personal digital assistant (PDA), a laptop, a media player, a micro server, and a global positioning system (GPS) device operated by the user. , E-book terminals, digital broadcasting terminals, navigation, kiosks, MP3 players, digital cameras, home appliances, and other mobile or non-mobile computing devices, but are not limited thereto. In addition, the user terminal may be a wearable terminal such as a watch, glasses, hair band, and ring having a communication function and a data processing function. The user terminal is not limited to the above description, and a terminal capable of web browsing may be borrowed without limitation.

한편, 서버는 AI 장치들과 네트워크를 통하여 연결되고, 연결된 AI 장치들의 AI 프로세싱을 적어도 일부를 도울 수 있다. AI 장치는 예를 들어, 사용자 단말기뿐만 아니라, 로봇, 자율 주행 차량, XR 장치, 가전 등을 포함할 수 있으며, AI 장치들을 통해 이미지 합성 시스템 환경을 구성할 수 있다. 이때, 서버는 AI 장치를 대신하여 머신 러닝 알고리즘에 따라 인공 신경망을 학습시킬 수 있고, 학습 모델을 직접 저장하거나 AI 장치에 전송할 수 있다. Meanwhile, the server is connected to the AI devices through a network, and may help at least part of the AI processing of the connected AI devices. The AI device may include, for example, not only a user terminal, but also a robot, an autonomous vehicle, an XR device, and a home appliance, and an image synthesis system environment may be configured through the AI devices. In this case, the server may train an artificial neural network according to a machine learning algorithm in place of the AI device, and may directly store the learning model or transmit it to the AI device.

여기서 인공 지능(artificial intelligence, AI)은, 인간의 지능으로 할 수 있는 사고, 학습, 자기계발 등을 컴퓨터가 할 수 있도록 하는 방법을 연구하는 컴퓨터 공학 및 정보기술의 한 분야로, 컴퓨터가 인간의 지능적인 행동을 모방할 수 있도록 하는 것을 의미할 수 있다.Here, artificial intelligence (AI) is a field of computer science and information technology that studies how computers can do the thinking, learning, and self-development that human intelligence can do. It could mean being able to imitate intelligent behavior.

또한, 인공 지능은 그 자체로 존재하는 것이 아니라, 컴퓨터 과학의 다른 분야와 직간접적으로 많은 관련을 맺고 있다. 특히 현대에는 정보기술의 여러 분야에서 인공 지능적 요소를 도입하여, 그 분야의 문제 풀이에 활용하려는 시도가 매우 활발하게 이루어지고 있다.In addition, artificial intelligence does not exist by itself, but is directly or indirectly related to other fields of computer science. In particular, in modern times, attempts to introduce artificial intelligence elements in various fields of information technology and use them to solve problems in the field are being made very actively.

머신 러닝(machine learning)은 인공 지능의 한 분야로, 컴퓨터에 명시적인 프로그램 없이 배울 수 있는 능력을 부여하는 연구 분야를 포함할 수 있다. 구체적으로 머신 러닝은, 경험적 데이터를 기반으로 학습을 하고 예측을 수행하고 스스로의 성능을 향상시키는 시스템과 이를 위한 알고리즘을 연구하고 구축하는 기술이라 할 수 있다. 머신 러닝의 알고리즘들은 엄격하게 정해진 정적인 프로그램 명령들을 수행하는 것이라기 보다, 입력 데이터를 기반으로 예측이나 결정을 이끌어내기 위해 특정한 모델을 구축하는 방식을 취할 수 있다.Machine learning is a branch of artificial intelligence that can include a field of research that gives computers the ability to learn without explicit programming. Specifically, machine learning can be said to be a technology that studies and builds a system that learns based on empirical data, performs prediction, and improves its own performance, and algorithms for it. Rather than executing strictly defined static program instructions, machine learning algorithms can take a way to build specific models to derive predictions or decisions based on input data.

한편, 네트워크는 이미지 합성 장치와 서버를 연결하는 역할을 수행할 수 있다. 이러한 네트워크는 예컨대 LANs(local area networks), WANs(wide area networks), MANs(metropolitan area networks), ISDNs(integrated service digital networks) 등의 유선 네트워크나, 무선 LANs, CDMA, 블루투스, 위성 통신 등의 무선 네트워크를 망라할 수 있으나, 본 발명의 범위가 이에 한정되는 것은 아니다. 또한 네트워크는 근거리 통신 및/또는 원거리 통신을 이용하여 정보를 송수신할 수 있다. 여기서 근거리 통신은 블루투스(bluetooth), RFID(radio frequency identification), 적외선 통신(IrDA, infrared data association), UWB(ultra-wideband), ZigBee, Wi-Fi (wireless fidelity) 기술을 포함할 수 있고, 원거리 통신은 CDMA(code division multiple access), FDMA(frequency division multiple access), TDMA(time division multiple access), OFDMA(orthogonal frequency division multiple access), SC-FDMA(single carrier frequency division multiple access) 기술을 포함할 수 있다.Meanwhile, the network may play a role of connecting the image synthesis device and the server. Such networks include wired networks such as LANs (local area networks), WANs (wide area networks), MANs (metropolitan area networks), ISDNs (integrated service digital networks), and wireless LANs, CDMA, Bluetooth, and satellite communications. The network may be covered, but the scope of the present invention is not limited thereto. In addition, the network may transmit and receive information using short-range communication and/or long-distance communication. Here, short-range communication may include Bluetooth, radio frequency identification (RFID), infrared data association (IrDA), ultra-wideband (UWB), ZigBee, and wireless fidelity (Wi-Fi) technologies, and Communication includes code division multiple access (CDMA), frequency division multiple access (FDMA), time division multiple access (TDMA), orthogonal frequency division multiple access (OFDMA), single carrier frequency division multiple access (SC-FDMA) technology. I can.

또한 네트워크는 허브, 브리지, 라우터, 스위치 및 게이트웨이와 같은 네트워크 요소들의 연결을 포함할 수 있다. 네트워크는 인터넷과 같은 공용 네트워크 및 안전한 기업 사설 네트워크와 같은 사설 네트워크를 비롯한 하나 이상의 연결된 네트워크들, 예컨대 다중 네트워크 환경을 포함할 수 있다. 네트워크에의 액세스는 하나 이상의 유선 또는 무선 액세스 네트워크들을 통해 제공될 수 있다. 더 나아가 네트워크는 사물 등 분산된 구성 요소들 간에 정보를 주고 받아 처리하는 IoT(Internet of Things, 사물인터넷) 망 및/또는 5G 통신을 지원할 수 있다.The network may also include the connection of network elements such as hubs, bridges, routers, switches and gateways. The network may include one or more connected networks, such as a multi-network environment, including a public network such as the Internet and a private network such as a secure corporate private network. Access to the network may be provided through one or more wired or wireless access networks. Furthermore, the network may support an Internet of Things (IoT) network and/or 5G communication that exchanges and processes information between distributed components such as objects.

카메라(200)는 주변 환경을 촬영하는 것으로, 특히 본 실시 예에서는 얼굴이 포함되는 이미지를 촬영할 수 있다. 카메라(200)는 이미지 촬영을 위한 수단으로, 카메라(200)에서 촬영된 이미지는 프로세서(400)에 전송될 수 있다.The camera 200 photographs the surrounding environment, and in particular, in the present embodiment, an image including a face may be photographed. The camera 200 is a means for capturing an image, and an image captured by the camera 200 may be transmitted to the processor 400.

센싱부(300)는 이미지 합성 장치의 주변 상황을 센싱하는 각종 센서를 포함할 수 있다. 본 실시 예에서는 카메라(200) 이외의 정보에 대해서는 센싱부(300)를 통해 획득할 수 있다. 예를 들어, 얼굴 이미지에 사용자의 나이에 대응하는 이미지를 합성하고자 하는 경우, 얼굴 인식을 통한 나이 판단뿐만 아니라 센싱부(300)를 통해 나이를 정확하게 판단하여 나이에 대응하는 이미지를 정확하게 결정할 수 있다.The sensing unit 300 may include various sensors that sense the surrounding situation of the image synthesizing device. In this embodiment, information other than the camera 200 may be obtained through the sensing unit 300. For example, in the case of synthesizing a face image with an image corresponding to the user's age, not only age determination through face recognition but also age through the sensing unit 300 may be accurately determined to accurately determine an image corresponding to the age. .

한편 센싱부(300)는 이미지센서(미도시)를 포함할 수 있다. 이미지센서는 이미지 합성 장치 주변을 촬영할 수 있는 카메라(미도시)를 포함할 수 있으며, 촬영 효율을 위해 복수 개가 설치될 수 있다. 이때 이미지센서에 포함되는 카메라는 상기 카메라(200)와 동일한 모듈일 수 있다.Meanwhile, the sensing unit 300 may include an image sensor (not shown). The image sensor may include a camera (not shown) capable of photographing around the image synthesizing device, and a plurality of image sensors may be installed for photographing efficiency. In this case, the camera included in the image sensor may be the same module as the camera 200.

예를 들어, 카메라는 적어도 하나의 광학렌즈와, 광학렌즈를 통과한 광에 의해 상이 맺히는 다수개의 광다이오드(photodiode, 예를 들어, pixel)를 포함하여 구성된 이미지센서(예를 들어, CMOS image sensor)와, 광다이오드들로부터 출력된 신호를 바탕으로 이미지를 구성하는 디지털 신호 처리기(DSP: digital signal processor)를 포함할 수 있다. 디지털 신호 처리기는 정지이미지는 물론이고, 정지이미지로 구성된 프레임들로 이루어진 동이미지를 생성할 수 있다. 한편, 이미지센서로서의 카메라가 촬영하여 획득된 이미지는 메모리(500)에 저장될 수 있다.For example, the camera includes at least one optical lens and a plurality of photodiodes (eg, pixels) formed by light passing through the optical lens. ), and a digital signal processor (DSP) that configures an image based on signals output from photodiodes. The digital signal processor can generate not only still images but also moving images composed of frames composed of still images. Meanwhile, an image acquired by photographing a camera as an image sensor may be stored in the memory 500.

본 실시 예에서 센싱부(300)는 이미지센서로 국한되지 않고, 이미지 합성 장치의 주변 상황을 감지할 수 있는 센서 예를 들어, 라이다 센서(Lidar sensor), 무게 감지 센서, 조도 센서(illumination sensor), 터치 센서(touch sensor), 가속도 센서(acceleration sensor), 자기 센서(magnetic sensor), 중력 센서(G-sensor), 자이로스코프 센서(gyroscope sensor), 모션 센서(motion sensor), RGB 센서, 적외선 센서(IR 센서: infrared sensor), 지문인식 센서(finger scan sensor), 초음파 센서(ultrasonic sensor), 광 센서(optical sensor), 마이크로폰(microphone), 배터리 게이지(battery gauge), 환경 센서(예를 들어, 기압계, 습도계, 온도계, 방사능 감지 센서, 열 감지 센서, 가스 감지 센서 등), 화학 센서(예를 들어, 전자 코, 헬스케어 센서, 생체 인식 센서 등) 중 적어도 하나를 포함할 수 있다. 한편, 본 실시 예에서 이미지 합성 장치는 이러한 센서들 중 적어도 둘 이상의 센서에서 센싱되는 정보들을 조합하여 활용할 수 있다.In this embodiment, the sensing unit 300 is not limited to an image sensor, and a sensor capable of detecting the surrounding situation of the image synthesis device, for example, a lidar sensor, a weight detection sensor, an illumination sensor. ), touch sensor, acceleration sensor, magnetic sensor, gravity sensor, gyroscope sensor, motion sensor, RGB sensor, infrared Sensor (IR sensor: infrared sensor), fingerprint sensor (finger scan sensor), ultrasonic sensor (ultrasonic sensor), optical sensor (optical sensor), microphone (microphone), battery gauge (battery gauge), environmental sensor (e.g. , A barometer, a hygrometer, a thermometer, a radiation sensor, a heat sensor, a gas sensor, etc.), a chemical sensor (eg, an electronic nose, a healthcare sensor, a biometric sensor, etc.). Meanwhile, in the present embodiment, the image synthesizing apparatus may combine and utilize information sensed by at least two or more of these sensors.

프로세서(400)는 카메라(200) 및/또는 통신 인터페이스(100)를 통해 입력 받은 제 1 이미지 및 제 2 이미지를 처리부(700)에 전송할 수 잇다. 그리고 프로세서(400)는 미리 훈련된 심층 신경망 모델을 적용하여 검출된 제 1 이미지의 특징점들의 좌표값을 기반으로 제 2 이미지의 경계에 대응하는 좌표값을 매칭할 수 있다. 그리고 프로세서(400)는 제 1 이미지와 제 2 이미지를 병합하여 표시부(600)를 통해 출력할 수 있다.The processor 400 may transmit the first image and the second image received through the camera 200 and/or the communication interface 100 to the processing unit 700. In addition, the processor 400 may match coordinate values corresponding to the boundary of the second image based on coordinate values of the feature points of the first image detected by applying the pretrained deep neural network model. In addition, the processor 400 may merge the first image and the second image and output it through the display unit 600.

여기서, 특징점들을 검출하는 미리 훈련된 심층 신경망 모델은 특징점들이 레이블링된 다수의 얼굴 이미지들을 포함하는 훈련 데이터를 이용하여 얼굴 이미지들이 입력되는 경우 대응하는 특징점의 위치를 특정할 수 있도록 지도학습 방식으로 미리 훈련된 신경망 모델일 수 있다. 프로세서(400)는 일종의 중앙처리장치로서 메모리(500)에 탑재된 제어 소프트웨어를 구동하여 이미지 합성 장치 전체의 동작을 제어할 수 있다. 프로세서(400)는 데이터를 처리할 수 있는 모든 종류의 장치를 포함할 수 있다. 여기서, '프로세서(processor)'는, 예를 들어 프로그램 내에 포함된 코드 또는 명령어로 표현된 기능을 수행하기 위해 물리적으로 구조화된 회로를 갖는, 하드웨어에 내장된 데이터 처리 장치를 의미할 수 있다. 이와 같이 하드웨어에 내장된 데이터 처리 장치의 일 예로써, 마이크로프로세서(microprocessor), 중앙처리장치(central processing unit: CPU), 프로세서 코어(processor core), 멀티프로세서(multiprocessor), ASIC(application-specific integrated circuit), FPGA(field programmable gate array) 등의 처리 장치를 망라할 수 있으나, 본 발명의 범위가 이에 한정되는 것은 아니다.Here, the pre-trained deep neural network model for detecting feature points is a supervised learning method to specify the location of the corresponding feature point when facial images are input using training data including a plurality of face images labeled with the feature points. It can be a trained neural network model. The processor 400 is a kind of central processing unit and may control the operation of the entire image synthesis apparatus by driving control software installed in the memory 500. The processor 400 may include all types of devices capable of processing data. Here, the'processor' may refer to a data processing device embedded in hardware, which has a circuit physically structured to perform a function represented by a code or instruction included in a program. As an example of a data processing device built into the hardware as described above, a microprocessor, a central processing unit (CPU), a processor core, a multiprocessor, and an application-specific integrated (ASIC) circuit), a field programmable gate array (FPGA), and the like, but the scope of the present invention is not limited thereto.

본 실시 예에서 프로세서(400)는 이미지 합성 장치가 최적의 이미지 합성 결과를 출력하도록, 제 1 이미지 특징정 검출에 대하여 딥러닝(Deep Learning) 등 머신 러닝(machine learning)을 수행할 수 있고, 메모리(500)는, 머신 러닝에 사용되는 데이터, 결과 데이터 등을 저장할 수 있다. In this embodiment, the processor 400 may perform machine learning such as deep learning for the first image feature detection so that the image synthesis device outputs the optimal image synthesis result, and the memory 500 may store data, result data, and the like used for machine learning.

머신 러닝의 일종인 딥러닝(deep learning) 기술은 데이터를 기반으로 다단계로 깊은 수준까지 내려가 학습할 수 있다. 딥러닝은 단계를 높여갈수록 복수의 데이터들로부터 핵심적인 데이터를 추출하는 머신 러닝 알고리즘의 집합을 나타낼 수 있다. Deep learning technology, a kind of machine learning, can learn by going down to the deep level in multiple stages based on data. Deep learning can represent a set of machine learning algorithms that extract core data from a plurality of data as the level increases.

딥러닝 구조는 인공신경망(ANN)을 포함할 수 있으며, 예를 들어 딥러닝 구조는 CNN(convolutional neural network), RNN(recurrent neural network), DBN(deep belief network) 등 심층신경망(DNN)으로 구성될 수 있다. 본 실시 예에 따른 딥러닝 구조는 공지된 다양한 구조를 이용할 수 있다. 예를 들어, 본 발명에 따른 딥러닝 구조는 CNN, RNN, DBN 등을 포함할 수 있다. RNN은, 자연어 처리 등에 많이 이용되고 있으며, 시간의 흐름에 따라 변하는 시계열 데이터(time-series data) 처리에 효과적인 구조로 매 순간마다 레이어를 쌓아 올려 인공신경망 구조를 구성할 수 있다. DBN은 딥러닝 기법인 RBM(restricted boltzman machine)을 다층으로 쌓아 구성되는 딥러닝 구조를 포함할 수 있다. RBM 학습을 반복하여, 일정 수의 레이어가 되면 해당 개수의 레이어를 가지는 DBN을 구성할 수 있다. CNN은 사람이 물체를 인식할 때 물체의 기본적인 특징들을 추출되는 다음 뇌 속에서 복잡한 계산을 거쳐 그 결과를 기반으로 물체를 인식한다는 가정을 기반으로 만들어진 사람의 뇌 기능을 모사한 모델을 포함할 수 있다.The deep learning structure may include an artificial neural network (ANN), and for example, the deep learning structure consists of a deep neural network (DNN) such as a convolutional neural network (CNN), a recurrent neural network (RNN), and a deep belief network (DBN). Can be. The deep learning structure according to the present embodiment may use various known structures. For example, the deep learning structure according to the present invention may include CNN, RNN, DBN, and the like. RNN is widely used in natural language processing, etc., and is an effective structure for processing time-series data that changes with the passage of time, and an artificial neural network structure can be constructed by stacking layers every moment. The DBN may include a deep learning structure constituted by stacking RBM (restricted boltzman machine), which is a deep learning technique, in multiple layers. By repeating RBM learning, when a certain number of layers is reached, a DBN having the corresponding number of layers can be configured. CNN can include a model that simulates human brain function, which is made based on the assumption that when a person recognizes an object, the basic features of the object are extracted, and then the brain undergoes complex calculations and recognizes the object based on the result. have.

한편, 인공신경망의 학습은 주어진 입력에 대하여 원하는 출력이 나오도록 노드간 연결선의 웨이트(weight)를 조정(필요한 경우 바이어스(bias) 값도 조정)함으로써 이루어질 수 있다. 또한, 인공신경망은 학습에 의해 웨이트(weight) 값을 지속적으로 업데이트시킬 수 있다. 또한, 인공신경망의 학습에는 역전파(back propagation) 등의 방법이 사용될 수 있다.On the other hand, learning of the artificial neural network can be accomplished by adjusting the weight of the connection line between nodes (if necessary, adjusting the bias value) so that a desired output is produced for a given input. In addition, the artificial neural network may continuously update the weight value by learning. In addition, a method such as back propagation may be used for learning of the artificial neural network.

메모리(500)는 이미지 합성 장치의 동작에 필요한 각종 정보들을 저장하고, 이미지 합성 장치를 동작시킬 수 있는 제어 소프트웨어를 저장할 수 있는 것으로, 휘발성 또는 비휘발성 기록 매체를 포함할 수 있다.The memory 500 may store various types of information necessary for the operation of the image synthesis device and control software capable of operating the image synthesis device, and may include a volatile or nonvolatile recording medium.

메모리(500)는 하나 이상의 프로세서와 연결되는 것으로, 프로세서에 의해 실행될 때, 프로세서로 하여금 이미지 합성 장치를 제어하도록 야기하는(cause) 코드들을 저장할 수 있다.The memory 500 is connected to one or more processors, and when executed by the processor, may store codes that cause the processor to control the image synthesis apparatus.

여기서, 메모리(500)는 자기 저장 매체(magnetic storage media) 또는 플래시 저장 매체(flash storage media)를 포함할 수 있으나, 본 발명의 범위가 이에 한정되는 것은 아니다. 이러한 메모리(500)는 내장 메모리 및/또는 외장 메모리를 포함할 수 있으며, DRAM, SRAM, 또는 SDRAM 등과 같은 휘발성 메모리, OTPROM(one time programmable ROM), PROM, EPROM, EEPROM, mask ROM, flash ROM, NAND 플래시 메모리, 또는 NOR 플래시 메모리 등과 같은 비휘발성 메모리, SSD. CF(compact flash) 카드, SD 카드, Micro-SD 카드, Mini-SD 카드, Xd 카드, 또는 메모리 스틱(memory stick) 등과 같은 플래시 드라이브, 또는 HDD와 같은 저장 장치를 포함할 수 있다.Here, the memory 500 may include a magnetic storage medium or a flash storage medium, but the scope of the present invention is not limited thereto. The memory 500 may include internal memory and/or external memory, and volatile memory such as DRAM, SRAM, or SDRAM, one time programmable ROM (OTPROM), PROM, EPROM, EEPROM, mask ROM, flash ROM, Non-volatile memory such as NAND flash memory, or NOR flash memory, SSD. A flash drive such as a compact flash (CF) card, an SD card, a Micro-SD card, a Mini-SD card, an Xd card, or a memory stick, or a storage device such as an HDD.

표시부(600)는 이미지 합성 결과의 출력을 위한 출력 수단으로, 예를 들어 디스플레이 등을 포함할 수 있다. 디스플레이는 프로세서(400)의 제어 하에 이미지 합성 장치의 이미지 합성 과정 또는 결과를 디스플레이 할 수 있다. 실시 예에 따라서, 디스플레이는 터치패드와 상호 레이어 구조를 이루어 터치스크린으로 구성될 수 있다. 이 경우에, 디스플레이는 사용자의 터치에 의한 정보의 입력이 가능한 조작부로도 사용될 수 있다. 이를 위해 디스플레이는 터치 인식 디스플레이 제어기 또는 이외의 다양한 입출력 제어기로 구성될 수 있다. 이와 같은 디스플레이는 예를 들어 터치 인식이 가능한 OLED(organic light emitting display) 또는 LCD(liquid crystal display) 또는 LED(light emitting display)와 같은 소정의 디스플레이 부재일 수 있다.The display unit 600 is an output means for outputting an image synthesis result, and may include, for example, a display. The display may display the image synthesis process or result of the image synthesis apparatus under the control of the processor 400. Depending on the embodiment, the display may be configured as a touch screen by forming a layer structure with a touch pad. In this case, the display may also be used as a manipulation unit capable of inputting information by a user's touch. To this end, the display may be configured with a touch-sensitive display controller or various input/output controllers. Such a display may be, for example, a predetermined display member such as an organic light emitting display (OLED) capable of recognizing a touch, a liquid crystal display (LCD), or a light emitting display (LED).

또한 본 실시 예에서는, 마이크, 디스플레이 등의 입력 수단을 구비하여, 마이크로 입력되는 발화자의 발화 특성을 분석하여 얼굴 이미지를 생성할 수도 있다. 또한 본 실시 예에서는 마이크를 통해 발화자가 말하는 얼굴 이미지를 생성하거나 디스플레이를 통해 입력되는 텍스트에 기초하여 얼굴 이미지를 생성할 수도 있다. 이에, 생성된 얼굴 이미지의 특징점을 검출하여 합성할 이미지를 매칭시켜 합성할 수도 있다.In addition, in the present embodiment, an input means such as a microphone and a display may be provided, and a face image may be generated by analyzing the speech characteristics of a talker input to the microphone. In addition, in the present embodiment, a face image spoken by a speaker may be generated through a microphone or a face image may be generated based on text input through a display. Accordingly, a feature point of the generated face image may be detected, and an image to be synthesized may be matched and synthesized.

처리부(700)는 프로세서(400)와 연계하여 학습을 수행하거나, 프로세서(400)로부터 학습 결과를 수신할 수 있다. 본 실시 예에서 처리부(700)는 도 2에 도시된 바와 같이 프로세서(400) 외부에 구비될 수도 있고, 프로세서(400) 내부에 구비되어 프로세서(400)처럼 동작할 수도 있다. 또한 처리부(700)는 서버 내에서 구현될 수도 있다. 즉 처리부(700)는 메모리(500)에 저장된 코드들에 기반하여, 최적의 이미지 합성이 수행되도록 처리할 수 있다. 이하 처리부(700)의 상세한 내용은 도 3을 참조하여 설명하기로 한다.The processing unit 700 may perform learning in connection with the processor 400 or may receive a learning result from the processor 400. In this embodiment, the processing unit 700 may be provided outside the processor 400 as shown in FIG. 2, or may be provided inside the processor 400 to operate like the processor 400. Also, the processing unit 700 may be implemented in a server. That is, the processing unit 700 may process to perform optimal image synthesis based on codes stored in the memory 500. Details of the processing unit 700 will be described below with reference to FIG. 3.

도 3은 본 개시의 일 실시 예에 따른 처리부를 개략적으로 나타낸 블록도이다. 도 3에 대한 설명과 중복되는 부분은 그 설명을 생략하기로 한다.3 is a block diagram schematically illustrating a processing unit according to an embodiment of the present disclosure. Parts that overlap with the description of FIG. 3 will be omitted.

도 3을 참조하면, 처리부(700)는 경계 추출부(710), 얼굴 인식부(720), 전처리부(730), 특징점 검출부(740), 매칭부(750), 스케일부(760) 및 출력부(770)를 포함할 수 있다.3, the processing unit 700 includes a boundary extraction unit 710, a face recognition unit 720, a preprocessing unit 730, a feature point detection unit 740, a matching unit 750, a scale unit 760, and an output. It may include a unit 770.

처리부(700)는 얼굴 이미지를 포함하는 제 1 이미지와, 제 1 이미지에 합성하기 위한 제 2 이미지를 수신하여 이미지 합성을 수행할 수 있다. 이때 제 1 이미지는 카메라(200)를 통해 획득될 수 있으나, 사용자의 입력 인터페이스나 별도 서버로부터 수신되거나 생성될 수도 있다. 또한 제 2 이미지는 사용자나 별도 서버로부터 수신될 수 있으나, 카메라를 통해 획득되거나, 사용자나 별도 서버에 의해 생성될 수도 있다.The processing unit 700 may perform image synthesis by receiving a first image including a face image and a second image to be combined with the first image. In this case, the first image may be obtained through the camera 200, but may be received or generated from a user's input interface or a separate server. In addition, the second image may be received from a user or a separate server, but may be acquired through a camera or generated by a user or a separate server.

먼저, 경계 추출부(710)는 제 2 이미지의 경계를 추출할 수 있다. 즉, 경계 추출부(710)는 제 2 이미지의 레벨 값을 기반으로 하여 2차원 배열에서 윤곽을 추출하고, 제 2 이미지를 회색조(grayscale) 이미지로 변환할 수 있다. 그리고 경계 추출부(710)는 제 2 이미지 내 화소들의 중간 값을 산출해 배열의 중간 값을 지정하여, 제 2 이미지의 경계를 추출할 수 있다.First, the boundary extraction unit 710 may extract the boundary of the second image. That is, the boundary extraction unit 710 may extract a contour from a two-dimensional array based on the level value of the second image and convert the second image into a grayscale image. In addition, the boundary extraction unit 710 may extract the boundary of the second image by calculating the intermediate value of the pixels in the second image and designating the intermediate value of the array.

보다 구체적으로, 경계 추출부(710)는 제 2 이미지의 레벨 값을 기반으로 한 2차원 배열에서 미분값을 구하여 윤곽의 강도와 방향을 계산할 수 있다. 윤곽(edge)이란 사전적으로 물체의 외각을 나타내는 선을 의미하며, 이미지 처리의 차원에서는 이미지의 특징을 짓는 선 요소를 의미할 수 있다. 즉 경계선을 인지하는 것, 윤곽선에 해당하는 픽셀을 구하는 것을 에지 추출(edge detection)이라고 한다. 윤곽은 이미지 안에서 픽셀의 값이 갑자기 변하는 곳이다. 즉 윤곽선은 다른 명암도를 가진 두 영역 사이의 경계를 의미할 수 있으며, 픽셀의 밝기가 임계값 보다 크게 변하는 부분을 의미할 수 있다. 여기서 임계값은 경계유무를 판단하는 임의의 기준치 값을 의미할 수 있다.More specifically, the boundary extraction unit 710 may calculate the strength and direction of the contour by obtaining a derivative value from a two-dimensional array based on the level value of the second image. An edge means a line representing the outer angle of an object in dictionary, and in the dimension of image processing, it can mean a line element that characterizes an image. That is, recognizing a boundary line and obtaining a pixel corresponding to an outline is called edge detection. The outline is where the value of a pixel changes abruptly in an image. That is, the outline may refer to a boundary between two areas having different intensity and intensity, and may refer to a portion in which the brightness of a pixel changes larger than a threshold value. Here, the threshold value may mean an arbitrary reference value for determining the presence or absence of a boundary.

윤곽을 추출하는 알고리즘은 이미지를 미분한 그레디언트 벡터의 크기로 판단할 수 있다. 즉 편미분 연산자의 계산에 근거하여 기울기를 구할 수 있다. 다시 말해, 윤곽은 농담치가 급격히 변하는 부분이기 때문에 함수의 변화분을 취하는 미분 연산이 윤곽선 추출에 사용될 수 있다. 미분에는 1차미분(gradient)과 2차미분(laplacian)이 있으며, 1차 미분값에서 그래프 기울기의 크기로 이미지에서 윤곽선의 존재 여부를 확인하고, 2차 미분값에서 그래프 기울기의 부호로 윤곽선 픽셀의 밝고 어두운 부분의 위치를 확인할 수 있다. 윤곽을 추출하는 알고리즘은 sobel edge Detection, Canny edge Detection 등을 포함할 수 있다.The algorithm for extracting the contour can determine the size of the gradient vector that has been differentiated from the image. That is, the slope can be calculated based on the calculation of the partial differential operator. In other words, since the contour is a part where the shade value changes rapidly, a derivative operation that takes a change in the function can be used for contour extraction. Differentiation has a first derivative (gradient) and a second derivative (laplacian), and checks the existence of an outline in the image by the size of the graph slope at the first derivative value, and the outline pixel by the sign of the graph slope at the second derivative value. You can check the location of the bright and dark parts of the. The algorithm for extracting the contour may include sobel edge detection and canny edge detection.

그리고 경계 추출부(710)는 제 2 이미지의 정확한 경계를 찾기 위하여, '밝은 값'과 '어두운 값'을 찾기 위해 제 2 이미지를 회색조 이미지로 변환할 수 있다.In addition, the boundary extracting unit 710 may convert the second image into a grayscale image to find a'bright value' and a'dark value' in order to find an accurate boundary of the second image.

경계 추출부(710)는 제 2 이미지 내 화소들의 중간 값을 산출해 배열의 중간 값을 지정함으로써, 제 2 이미지의 경계를 추출할 수 있다. 즉 회색조로 변환한 이미지를 이진화하여 중간 값을 추출하고 윤곽을 추출할 수 있다. 예를 들어, 이미지에 미분 방식을 적용하는 방법으로 마스크를 사용할 수 있다. 즉 이미지의 3*3 픽셀에 마스크를 씌워 계산 후 중앙 픽셀을 결정하여 윤관선을 검출할 수 있다. 마스크의 크기는 3*3이 일반적이나 5*5, 7*7 크기의 마스크도 적용 가능하며 마스크가 커지면 에지는 두꺼워져서 선명하게 나타날 수 있다.The boundary extractor 710 may extract the boundary of the second image by calculating the intermediate value of the pixels in the second image and designating the intermediate value of the array. That is, the image converted to grayscale can be binarized to extract an intermediate value and an outline. For example, you can use a mask as a way to apply a differentiation method to an image. In other words, it is possible to detect the contour line by determining the center pixel after calculation by covering 3*3 pixels of the image with a mask. The size of a mask is generally 3*3, but masks with a size of 5*5 and 7*7 are also applicable. When the mask is enlarged, the edge becomes thicker and can be clearly seen.

즉 경계 추출부(710)는 마스크 영역 내의 픽셀들 중 중간 값을 대표값으로 취하는 필터링을 수행할 수 있다. 중간 값을 이용하여 주변값과의 차이에 따라 적응 가중치를 설정하여 필터 처리 하므로 Spot noise와 같은 임펄스성 노이즈를 줄여줄 수 있고, 이미지의 평활화(에지나 경계면 보호)가 가능하며, 선예도 차이가 크게 없으면서 잡음제거를 수행할 수 있다.That is, the boundary extraction unit 710 may perform filtering taking an intermediate value among pixels in the mask area as a representative value. Using the intermediate value, the adaptive weight is set and filtered according to the difference from the surrounding value, so impulsive noise such as spot noise can be reduced, smoothing of the image (edge or boundary protection) is possible, and the sharpness difference is large. Without it, noise reduction can be performed.

본 실시 예에서, 경계 추출부(710)는 최종 에지 검출을 하기 전에 잡음을 제거하기 위해 중간값(median) 필터와 같은 smoothing 필터를 적용하고 에지 검출을 위해 공간 필터를 적용할 수 있다. 즉 경계 추출부(710)는 배열의 중간 값에 기초하여 판단된 후보 동질 영역에 대해 재귀 호출을 수행하고, 재귀 호출 수행 결과, 제 2 이미지의 화소 값의 변화의 평균 값을 산출할 수 있다. 그리고 경계 추출부(710)는 평균 값에 기초하여 제 2 이미지의 최종 동질 영역을 검출하고, 제 2 이미지에서 제 1 이미지에 합성될 최종 동질 영역의 배열을 결정하여, 제 2 이미지의 경계를 추출할 수 있다.In this embodiment, the boundary extraction unit 710 may apply a smoothing filter such as a median filter to remove noise before performing the final edge detection and apply a spatial filter to detect the edge. That is, the boundary extraction unit 710 may perform a recursive call on the candidate homogeneous region determined based on the intermediate value of the array, and calculate an average value of changes in pixel values of the second image as a result of performing the recursive call. Further, the boundary extraction unit 710 detects the final homogeneous region of the second image based on the average value, determines the arrangement of the final homogeneous regions to be combined with the first image from the second image, and extracts the boundary of the second image. can do.

즉 경계 추출부(710)는 1개 또는 여러 개의 기준 픽셀을 기반으로 주변 픽셀과의 비교를 통해 비슷한 특성을 갖는 픽셀들을 같은 영역으로 간주할 수 있다. 경계 추출부(710)는 같은 영역으로 간주되는 픽셀들을 인접 영역으로 설정할 수 있다. 그리고 경계 추출부(710)는 비슷한 속성인지를 판단할 수 있는 통계적인 방법을 적용하여 비교할 수 있다. That is, the boundary extraction unit 710 may regard pixels having similar characteristics as the same region through comparison with neighboring pixels based on one or more reference pixels. The boundary extraction unit 710 may set pixels considered to be the same area as adjacent areas. In addition, the boundary extraction unit 710 may apply and compare a statistical method capable of determining whether the property is similar.

도 4는 본 개시의 일 실시 예에 따른 경계 추출을 설명하기 위한 예시도이다. 예를 들어 경계 추출부(710)는 매 픽셀에 각각의 레이블(label)을 할당하고, 이것을 기반으로 하여 밝기 값의 차가 10 이하이면 동일한 영역으로 본다고 가정을 하고, 동질 영역을 기반으로 한 경계 추출을 수행할 수 있다. 도 4에 도시된 바와 같이, 경계 추출부(710)는 수염 모양의 영역과 외부 영역을 각각 동질영역으로 판단할 수 있다. 이에 경계 추출부(710)는 수염 모양의 경계 추출을 수행할 수 있다. 다만 경계 추출 방법은 이에 한정되는 것이 아니며, 다양한 알고리즘에 의해 경계 추출이 수행될 수 있다.4 is an exemplary diagram for describing boundary extraction according to an embodiment of the present disclosure. For example, the boundary extraction unit 710 allocates a label to each pixel, and based on this, assumes that if the difference in brightness value is less than 10, it is assumed to be viewed as the same region, and boundary extraction based on the homogeneous region You can do it. As shown in FIG. 4, the boundary extraction unit 710 may determine the beard-shaped region and the external region as homogeneous regions, respectively. Accordingly, the boundary extraction unit 710 may extract a beard-shaped boundary. However, the boundary extraction method is not limited thereto, and boundary extraction may be performed by various algorithms.

도 5 및 도 6은 본 개시의 일 실시 예에 따른 특징점 검출을 설명하기 위한 예시도이다.5 and 6 are exemplary diagrams for describing feature point detection according to an embodiment of the present disclosure.

도 5를 참조하면, 얼굴 인식부(720)는 제 1 이미지에 포함된 얼굴 영역의 개수를 조정할 수 있다. 즉, 얼굴 인식부(720)는 제 1 이미지에 포함된 얼굴 후보영역을 추출하고, 얼굴 후보영역의 정확도를 산출할 수 있다. 그리고 얼굴 인식부(720)는 얼굴 후보영역의 정확도에 기초해 최종 얼굴 영역을 결정하여, 제 1 이미지에 포함된 얼굴 영역의 개수를 조정할 수 있다. 여기서, 얼굴 영역의 개수를 조정하는 프로세스는 복수의 후보 얼굴 영역들 중 특징점 검출의 대상이 되는 얼굴 영역을 확정하기 위한 프로세스일 수 있다.Referring to FIG. 5, the face recognition unit 720 may adjust the number of face regions included in the first image. That is, the face recognition unit 720 may extract a face candidate region included in the first image and calculate the accuracy of the face candidate region. In addition, the face recognition unit 720 may determine the final face region based on the accuracy of the face candidate region, and adjust the number of face regions included in the first image. Here, the process of adjusting the number of face regions may be a process of determining a face region to be targeted for feature point detection among the plurality of candidate face regions.

즉 얼굴 인식부(720)는 카메라(200)를 통해 획득된 이미지 신호에 대응되는 이미지에서 얼굴 영역을 인식할 수 있다. 이때, 얼굴 인식부(720)는 우선 얼굴이라고 판단되는 후보영역을 추출하고, 얼굴 인식 알고리즘에 기초하여 추출된 후보영역의 정확도를 수치화할 수 있다.That is, the face recognition unit 720 may recognize a face area from an image corresponding to an image signal acquired through the camera 200. In this case, the face recognition unit 720 may first extract a candidate region determined to be a face, and quantify the accuracy of the extracted candidate region based on a face recognition algorithm.

그리고, 예를 들어, 얼굴 인식부(720)는 조명 변화에 덜 민감하며, 다양한 피부색을 검출하는 HR비로 피부색을 검출하고, 라벨링으로 영역을 분할한 후 임의의 크기의 영역을 후보 얼굴 영역으로 결정할 수 있다. 또한 얼굴 인식부(720)는 후보 얼굴 영역에서 얼굴 특징등의 기하학적 위치 정보와 색 정보를 이용하여 눈과 입을 검출하여 최종 얼굴 영역을 검출할 수 있다. 다만, 얼굴 인식 방법은 상술한 내용에 한정되는 것이 아니며, 다양한 얼굴 인식 알고리즘이 적용될 수 있다. And, for example, the face recognition unit 720 is less sensitive to changes in lighting, detects skin color with an HR ratio that detects various skin colors, and determines an area of an arbitrary size as a candidate face area after dividing the area by labeling. I can. In addition, the face recognition unit 720 may detect a final face region by detecting eyes and a mouth using geometric location information such as facial features and color information in the candidate face region. However, the face recognition method is not limited to the above description, and various face recognition algorithms may be applied.

여기서, 얼굴 인식 알고리즘은 사람의 얼굴을 인식하도록 훈련된 신경망 모델일 수 있으며, 이러한 신경망 모델은 다수의 사람 얼굴 이미지와 해당 이미지에 대해 사람의 얼굴이라는 레이블링이 된 데이터 세트를 포함하는 훈련 데이터에 의해 미리 훈련된 신경망 모델일 수 있다. Here, the face recognition algorithm may be a neural network model trained to recognize a human face, and the neural network model is based on training data including a plurality of human face images and a data set labeled as a human face. It may be a pretrained neural network model.

아울러, 훈련 데이터에는 사람의 얼굴 이외에 동물의 얼굴이라고 레이블링된 동물의 얼굴 이미지가 더 포함될 수도 있다.In addition, the training data may further include an image of an animal's face labeled as an animal's face in addition to the human face.

얼굴 인식을 위해 훈련된 신경망 모델은 모델에 입력되는 이미지에서 얼굴 이미지 부분이 어디인지 인식할 수 있고, 사람의 얼굴 이미지라는 판단에 대한 정확도를 확률로서 함께 출력할 수 있다.The neural network model trained for face recognition can recognize where the face image part is in the image input to the model, and can output the accuracy of determining that the face image of a person is a probability together.

한편, 얼굴 인식부(720)는 연속적으로 촬영된 복수의 유사한 이미지들 중 적어도 하나의 이미지의 2차원 특징점들을 이용하여 3차원 얼굴 프로파일을 획득할 수도 있다.Meanwhile, the face recognition unit 720 may acquire a 3D face profile by using 2D feature points of at least one image among a plurality of consecutively photographed similar images.

도 6을 참조하면, 전처리부(730)는 제 1 이미지에 포함된 얼굴 영역의 전처리(preprocess)를 수행할 수 있다. 즉, 전처리부(730)는 최종 결정된 얼굴 영역의 크기, 위치, 색상, 밝기 및 방향 중 적어도 하나 이상을 조정하여, 얼굴 영역의 전처리를 수행할 수 있다.Referring to FIG. 6, the preprocessor 730 may perform preprocessing of a face area included in the first image. That is, the preprocessor 730 may perform preprocessing of the face area by adjusting at least one of the size, position, color, brightness, and direction of the finally determined face area.

전처리부(730)는 제 1 이미지의 특징점 검출이 용이하게 이루어지도록 전처리 작업을 수행할 수 있다. 이때, 전처리부(730)는 특징점 검출을 위한 입력 데이터의 크기에 대응되도록 제 1 이미지의 크기를 자동으로 조절할 수 있다. 예를 들어, 전처리부(730)는 제 1 이미지에서 관심 영역에 해당하는 부분을 추출하여 정사각형 형태로 만들 수 있다. 그리고 전처리부(730)는 제 1 이미지를 특징추출 및 연산처리에 적합한 크기로 변경할 수 있다. 또한 전처리부(730)는 보간법을 사용하여 이미지의 크기를 적절한 범위로 줄이거나 또는 증가시킬 수 있으며, 경우에 따라 출력은 입력과 동일할 수 있다. 또한 전처리부(730)는 제 1 이미지에서 일정한 값을 R, G, B 픽셀에서 뺀 뒤 나누어 전체 픽셀의 밝기값이 기 설정된 범위를 갖도록 할 수 있다. 전처리부(730)는 상술한 전처리 방법 외에도, 제 1 이미지에서 기 설정된 카테고리들에 대한 값을 기 설정된 범위로 변경시키는 전처리를 수행할 수 있다.The preprocessor 730 may perform a preprocessing operation to facilitate detection of the feature point of the first image. In this case, the preprocessor 730 may automatically adjust the size of the first image to correspond to the size of the input data for detecting the feature point. For example, the preprocessor 730 may extract a portion corresponding to the region of interest from the first image to form a square shape. In addition, the preprocessor 730 may change the first image to a size suitable for feature extraction and operation processing. Also, the preprocessor 730 may reduce or increase the size of the image to an appropriate range by using an interpolation method, and in some cases, the output may be the same as the input. In addition, the preprocessor 730 may subtract and divide a certain value from the R, G, and B pixels from the first image so that the brightness values of all pixels have a preset range. In addition to the above-described preprocessing method, the preprocessor 730 may perform preprocessing of changing values for preset categories in the first image to a preset range.

특징점 검출부(740)는 미리 훈련된 심층 신경망 모델을 적용하여 제 1 이미지의 특징점을 검출할 수 있다. 본 실시 예에서, 미리 훈련된 심층 신경망 모델은 제 1 이미지의 얼굴 데이터를 입력으로 하여 특징점을 추출하도록 훈련된 학습 모델로서, 합성곱 신경망(CNN: Convolution Neural Network) 기반의 학습 모델일 수 있다. 이에, 본 실시 예에서는, 단일 모델을 이용하여 제한되지 않은(Unconstrained) 환경에서도 빠르게 얼굴 특징점을 검출할 수 있다.The feature point detection unit 740 may detect the feature point of the first image by applying a pretrained deep neural network model. In this embodiment, the pretrained deep neural network model is a training model trained to extract feature points by inputting face data of the first image, and may be a learning model based on a convolution neural network (CNN). Accordingly, in the present embodiment, facial feature points can be quickly detected even in an unconstrained environment using a single model.

얼굴의 특징점(facial landmark)은 얼굴의 특징이 되는 부분에 표시된 점이며, 눈, 코, 입, 귀 등에 표시될 수 있다. 즉 특징점 검출부(740)는 제 1 이미지로부터 눈썹, 눈, 코, 입, 턱, 귀 등의 위치를 검출 및 추적할 수 있다.A facial landmark is a point displayed on a part that becomes a feature of the face, and may be displayed on the eyes, nose, mouth, and ears. That is, the feature point detection unit 740 may detect and track positions of eyebrows, eyes, nose, mouth, chin, and ears from the first image.

그리고 본 실시 예에서는 심층 신경망 모델을 이용하여, 예를 들어, 눈동자 중심을 시작점으로 하여, 15개의 최적의 특징점을 추출할 수 있다. 또한, 본 실시 예에서는 특징점 검출을 통해, 윤곽선, 눈동자, 눈 모양, 코 모양, 입 모양, 이마 모양, 광대뼈 모양 또는 턱 모양 등을 검출할 수 있다. 나아가, 본 실시 예에서는 얼굴의 가로:세로 비율, 눈 크기, 입 크기, 이마:눈썹:코끝:턱끝 간 위치 비율 등을 포함하는 얼굴 비율 데이터를 산출함으로써 카메라 이미지 속 얼굴의 특징 정보를 추출할 수도 있다.In this embodiment, 15 optimal feature points may be extracted using the deep neural network model, for example, with the center of the pupil as a starting point. In addition, in the present embodiment, through the detection of feature points, a contour line, a pupil, an eye shape, a nose shape, a mouth shape, a forehead shape, a cheekbone shape, a jaw shape, and the like may be detected. Furthermore, in the present embodiment, feature information of the face in the camera image may be extracted by calculating face ratio data including the width: height ratio, eye size, mouth size, forehead: eyebrow: tip of nose: tip of chin position, etc. have.

한편, 본 실시 예에서는, 제 2 이미지의 특성이나 합성될 위치에 대응하여 특징점을 추출할 수도 있다. 예를 들어, 제 2 이미지가 턱수염, 구레나룻, 머리 스타일 등인 경우, 특징점 추출부(740)는 성별인식이나 얼굴형 파악이 중요하기 때문에 얼굴의 외곽에 특징점(인식영역)들을 추출할 수 있다. 또한, 특징점 추출부(740)는 주름 등 나이인식에 기반한 경우, 얼굴 내부에 특징점들을 추출할 수 있다. 이를 통해, 본 실시 예는 범죄자의 변신된 모습을 추정하는 프로그램에 적용될 수 있다.Meanwhile, in the present embodiment, a feature point may be extracted corresponding to a characteristic of the second image or a position to be synthesized. For example, when the second image is a beard, whiskers, hair style, or the like, the feature point extraction unit 740 may extract feature points (recognition areas) on the outer face of the face because it is important to recognize gender or face shape. In addition, the feature point extracting unit 740 may extract feature points within the face when age recognition, such as wrinkles, is based. Through this, the present embodiment can be applied to a program for estimating the transformed form of a criminal.

한편, 위와 같이 얼굴에서 특징점을 찾도록 미리 훈련된 심층 신경망 모델에 의한 결과는 정확도가 떨어질 수 있으므로, 이하에서 설명되는 바와 같이 특징점을 보다 정교하게 결정하는 프로세스가 추가될 수 있다.On the other hand, since the result of the deep neural network model trained in advance to find the feature point on the face as described above may be less accurate, a process of more elaborately determining the feature point may be added as described below.

이하에서는 설명의 편의를 위해, 심층 신경망 모델에 의해 출력된 특징점들을 1차 특징점들로 지칭하고, 1차 특징점들을 기초로 정교화 프로세스를 거쳐 결정되는 특징점들을 2차 특징점들로 지칭할 수 있다.Hereinafter, for convenience of description, feature points output by the deep neural network model may be referred to as primary feature points, and feature points determined through an elaboration process based on the first feature points may be referred to as secondary feature points.

이미지 합성 장치의 프로세서는 입력된 제 1 이미지에 미리 훈련된 심층 신경망 모델을 적용하여 얼굴 이미지를 포함하는 제 1 이미지에서 1차 특징점들을 검출할 수 있다.The processor of the image synthesizing apparatus may apply a pretrained deep neural network model to the input first image to detect primary feature points in the first image including the face image.

프로세서는, 1차 특징점들의 위치는 원래 목표로 하는 특징점에 정확히 일치하지 못할 수 있으므로, 1차 특징점들 각각의 인접 화소들을 대상으로 정의되는 화소값에 대한 정규분포를 이용하여 보다 정교한 위치의 2차 특징점들을 결정할 수 있다. 한편, 화소값은 화소의 색깔(color), 밝기(brightness), 강도(intensity) 등 각각의 화소가 가지는 값을 포함할 수 있다.Since the positions of the primary feature points may not exactly match the original target feature points, the processor uses a normal distribution of pixel values defined for each of the adjacent pixels of the primary feature points to obtain a more precise location of the second order feature. Feature points can be determined. Meanwhile, the pixel value may include a value of each pixel, such as color, brightness, and intensity of the pixel.

여기서, 정규분포 그래프의 대상이 되는 1차 특징점의 인접 화소들은 1차 특징점을 중심으로 미리 정해진 사이즈의 윈도우 내에 포함되는 화소들로 우선적으로 결정될 수 있다.Here, pixels adjacent to the primary feature point that are the target of the normal distribution graph may be preferentially determined as pixels included in a window of a predetermined size around the primary feature point.

프로세서는, 인접 화소들을 대상으로 화소값에 대한 분포 그래프를 생성하여, 생성된 분포 그래프와 정규분포의 유사도를 평가하고, 생성된 분포 그래프가 정규분포와 일정 정도 이상으로 유사하면 윈도우 사이즈를 특정하고, 해당 윈도우 내의 화소들을 대상으로 한 분포 그래프에서 정점을 찾는다.The processor generates a distribution graph for pixel values for adjacent pixels, evaluates the similarity between the generated distribution graph and the normal distribution, and specifies the window size if the generated distribution graph is similar to the normal distribution by a certain degree or more. , Find the vertex in the distribution graph targeting the pixels in the window.

유사도 평가 결과, 생성된 분포 그래프가 정규분포와 일정 정도 미만으로 유사하면 윈도우 사이즈를 확대하고, 확대된 윈도우 사이즈 내의 화소들을 대상으로 한 분포 그래프와 정규 분포 사이의 유사도를 재평가한다. 여기서, 확대되는 크기는 분석 대상 이미지의 해상도, 프로세서의 처리 능력 등을 고려하여 임의로 정해질 수 있다. As a result of similarity evaluation, if the generated distribution graph is less than a certain degree similar to the normal distribution, the window size is enlarged, and the similarity between the distribution graph and the normal distribution for pixels within the enlarged window size is reevaluated. Here, the enlarged size may be arbitrarily determined in consideration of the resolution of the image to be analyzed and the processing power of the processor.

이러한 재평가 프로세스는 윈도우의 사이즈를 증가시키며 유사도가 미리 정해진 임계치 이상이 될 때까지 반복되고, 유사도가 미리 정해진 임계치 이상이 되면 윈도우의 사이즈를 확정하고 대상 인접 화소들에 대한 분포 그래프에서 정점을 확인할 수 있다.This re-evaluation process increases the size of the window and repeats until the similarity reaches a predetermined threshold or higher, and when the similarity exceeds a predetermined threshold, the size of the window is determined and vertices can be identified in the distribution graph of the target adjacent pixels. have.

프로세서는, 정점에 대응하는 위치를 보다 정교한 2차 특징점들로 결정할 수 있고, 이후의 합성 동작에서 기준이 되는 특징점들은 2차 특징점들일 수 있다.The processor may determine a position corresponding to the vertex as more elaborate secondary feature points, and feature points serving as a reference in a subsequent synthesis operation may be secondary feature points.

매칭부(750)는 제 1 이미지의 특징점들의 좌표값을 기반으로 제 2 이미지의 경계에 대응하는 좌표값을 매칭할 수 있다. 즉 매칭부(750)는 제 1 이미지의 특징점의 좌표 값과 제 2 이미지의 경계의 좌표 값의 비율을 추출하고, 제 1 이미지의 특징점의 좌표 값을 기준으로 벡터의 길이를 산출할 수 있다. 다시 말해, 매칭부(750)는 비율 및 방향을 기반으로 제 2 이미지의 경계의 배열에서 최대 값과 최소 값을 이용해 제 2 이미지가 축소 또는 확대되도록 하여, 제 1 이미지의 특징점들의 좌표값을 기반으로 제 2 이미지의 경계에 대응하는 좌표값을 매칭할 수 있다.The matching unit 750 may match a coordinate value corresponding to the boundary of the second image based on the coordinate values of the feature points of the first image. That is, the matching unit 750 may extract a ratio of the coordinate value of the feature point of the first image and the coordinate value of the boundary of the second image, and calculate the length of the vector based on the coordinate value of the feature point of the first image. In other words, the matching unit 750 reduces or enlarges the second image by using the maximum and minimum values in the arrangement of the boundary of the second image based on the ratio and direction, based on the coordinate values of the feature points of the first image. As a result, coordinate values corresponding to the boundary of the second image may be matched.

따라서, 본 실시 예에서는, 제 1 이미지에 합성하고자 하는 제 2 이미지의 합성 영역이 동일하다고 하더라도 제 1 이미지의 특징점에 기초하여 다르게 매칭될 수 있으며, 이에 다른 합성 결과를 제공할 수 있다.Accordingly, in the present embodiment, even if the composite area of the second image to be combined with the first image is the same, different matching may be performed based on the feature points of the first image, and a different synthesis result may be provided.

스케일부(760)는 제 1 이미지의 제 2 이미지가 합성될 위치에 대응하는 좌표 값에 기초하여, 제 2 이미지를 축소 또는 확대할 수 있다. 즉, 스케일부(760)는 비율 및 방향을 기반으로 제 2 이미지의 경계의 배열에서 최대 값과 최소 값을 이용해 제 2 이미지를 축소 또는 확대할 수 있다. 다시 말해, 스케일부(760)는 제 1 이미지의 특징점과 제 2 이미지의 경계의 비율 및 벡터의 방향에 기반하여, 제 2 이미지를 최대값과 최소값 범위 내에서 축소 또는 확대할 수 있으며, 제 2 이미지를 제 1 이미지에 합성 가능한 크기로 변경할 수 있다. 즉 스케일부(760)는 제 1 이미지의 얼굴 영역과 제 2 이미지의 합성할 영역에 대해 동일한 위치와 자세를 갖도록 할 수 있다. 이때, 얼굴의 크기를 기반으로 크기 정보를 결정할 수도 있다. 예를 들어, 인식된 얼굴의 크기와 제 2 이미지의 합성 영역의 크기가 유사하도록 크기 정보를 결정하여 보다 자연스럽게 합성될 수 있도록 할 수 있다. 또한, 얼굴의 위치를 기반으로 위치 정보를 결정할 수 있다. 예를 들어, 제 2 이미지의 합성 영역이 출력되어 인식된 얼굴을 가리지 않도록 얼굴의 위치를 고려하여 위치 정보를 결정할 수 있다.The scale unit 760 may reduce or enlarge the second image based on a coordinate value corresponding to a position where the second image of the first image is to be synthesized. That is, the scale unit 760 may reduce or enlarge the second image by using the maximum value and the minimum value in the arrangement of the boundary of the second image based on the ratio and direction. In other words, the scale unit 760 may reduce or enlarge the second image within the range of the maximum value and the minimum value based on the ratio of the boundary between the feature points of the first image and the second image and the direction of the vector. The image can be changed to a size that can be combined with the first image. That is, the scale unit 760 may have the same position and posture with respect to the face region of the first image and the region to be combined with the second image. In this case, size information may be determined based on the size of the face. For example, size information may be determined so that the size of the recognized face and the size of the synthesis region of the second image are similar, so that the synthesis may be more natural. In addition, location information may be determined based on the location of the face. For example, position information may be determined in consideration of the position of the face so that the composite region of the second image is output so as not to cover the recognized face.

도 7은 본 개시의 일 실시 예에 따른 이미지 합성 결과를 나타낸 예시도이다. 도 7에 도시된 바와 같이, 출력부(770)는 제 1 이미지와 제 2 이미지를 병합(merging)하여 출력할 수 있다. 즉 출력부(770)는 제 1 이미지의 얼굴 영역에 제 2 이미지의 합성할 영역을 병합하여 출력할 수 있다. 즉, 본 실시 예에서는, 제 1 이미지의 제 2 이미지가 합성될 위치에 대응하는 좌표 값에 기초하여, 축소 또는 확대된 제 2 이미지를 제 1 이미지에 합성하여 출력할 수 있다. 7 is an exemplary view showing an image synthesis result according to an embodiment of the present disclosure. As shown in FIG. 7, the output unit 770 may merge and output the first image and the second image. That is, the output unit 770 may merge and output the area to be combined with the second image with the face area of the first image. That is, in the present embodiment, based on a coordinate value corresponding to a position where the second image of the first image is to be combined, the reduced or enlarged second image may be combined with the first image and output.

한편, 본 실시 예에서는, 상기 제 1 이미지의 얼굴 영역을 인식하고, 상기 얼굴 영역의 나이, 성별, 피부 타입, 이목구비 특징 및 감정 상태 중 적어도 하나 이상을 포함하는 얼굴 정보를 검출하며, 상기 얼굴 정보에 기초하여 상기 제 1 이미지에 합성할 제 2 이미지를 설정할 수도 있다. Meanwhile, in the present embodiment, the face region of the first image is recognized, face information including at least one of age, gender, skin type, feature feature, and emotional state of the face region is detected, and the face information A second image to be combined with the first image may be set based on.

예를 들어, 제 1 이미지 분석 결과, '여성'이고 '어린이'인 것으로 분석되면, 여자 어린이에게 적합한 콘텐츠를 이미지에 합성하여 출력할 수 있다. 여기서, 적합한 콘텐츠는 데이터베이스에 기저장된 콘텐츠 중 기설정된 기준에 따라 선정되는 콘텐츠일 수 있다. 예를 들면, 여자 어린이가 가장 선호하는 콘텐츠가 기준으로 설정된 경우, 여자 어린이에게 가장 많이 선택된 콘텐츠를 적합한 콘텐츠로서 설정할 수 있다. 제 1 이미지의 얼굴 이미지의 성별, 나이, 인종, 피부 타입, 감정 상태 등은 각각의 특징을 분석하도록 훈련된 신경망 모델을 통해 이루어질 수 있다.For example, as a result of analyzing the first image, if it is analyzed as being'female' and'child', content suitable for female children may be synthesized and outputted on the image. Here, the appropriate content may be a content selected according to a preset criterion among content previously stored in the database. For example, when a content most preferred by a girl child is set as a reference, the content most selected for a girl child may be set as a suitable content. The gender, age, race, skin type, and emotional state of the face image of the first image may be achieved through a neural network model trained to analyze each feature.

또한, 본 실시 예에서는, 제 1 이미지를 분석하여, 성별인식, 나이인식, 및 표정인식 중 어느 하나에 따른 콘텐츠를 기저장된 데이터베이스에서 리드하고, 상기 리드된 콘텐츠를, 상기 이미지에 합성하여 출력할 수 있다. 예를 들면, 제 1 이미지의 얼굴의 성별이 남자이고, 나이가 20대인 것으로 분석되는 경우, 분석 결과에 기초하여 범죄자의 변신된 모습을 추정하기 위한 콘텐츠를 합성하여 출력할 수 있다. 즉 사용자가 합성하기 위한 제 2 이미지(콘텐츠)를 선택하지 않아도 사용자의 얼굴 분석에 따른 정보에 기초하여 사용자 맞춤형 콘텐츠를 합성해 줌으로써, 자연스러우면서 인터랙티브한 얼굴 합성 서비스를 제공할 수 있다.In addition, in this embodiment, by analyzing the first image, the content according to any one of gender recognition, age recognition, and facial expression recognition is read from a pre-stored database, and the read content is synthesized and outputted to the image. I can. For example, when it is analyzed that the gender of the face of the first image is a man and that the age is in his twenties, content for estimating the transformed appearance of the criminal based on the analysis result may be synthesized and output. That is, even if the user does not select the second image (content) to be synthesized, the user-customized content is synthesized based on information according to the user's face analysis, thereby providing a natural and interactive face synthesis service.

도 8은 본 개시의 일 실시 예에 따른 이미지 합성 방법을 설명하기 위한 흐름도이다. 도 1 내지 도 7에 대한 설명과 중복되는 부분은 그 설명을 생략하기로 한다.8 is a flowchart illustrating a method of synthesizing an image according to an embodiment of the present disclosure. Parts that overlap with the description of FIGS. 1 to 7 will be omitted.

도 8을 참조하면, S100단계에서, 프로세서(400)는 얼굴 이미지를 포함하는 제 1 이미지를 획득한다. 이때 제 1 이미지는 카메라(200)를 통해 획득될 수 있다. 다만 이에 한정되는 것은 아니며, 제 1 이미지가 사용자로부터 입력 인터페이스를 통해 입력되거나 별도 서버로부터 수신될 수 있다. 또한 입력 인터페이스를 통해 입력된 이미지 설명 등에 의해서 이미지가 생성될 수도 있다. Referring to FIG. 8, in step S100, the processor 400 acquires a first image including a face image. In this case, the first image may be acquired through the camera 200. However, the present invention is not limited thereto, and the first image may be input from a user through an input interface or may be received from a separate server. In addition, an image may be generated by describing an image input through an input interface.

또한, 본 실시 예에서는, 제 1 이미지를 획득한 후, 제 1 이미지에 포함된 얼굴 영역의 개수를 조정할 수 있다. 즉, 프로세서(400)는 제 1 이미지에 포함된 얼굴 후보영역들을 추출하고, 얼굴 후보영역의 정확도를 산출할 수 있다. 그리고 프로세서(400)는 얼굴 후보영역의 정확도에 기초해 최종 얼굴 영역을 결정하여, 제 1 이미지에 포함된 얼굴 영역의 개수를 조정할 수 있다. In addition, in the present embodiment, after acquiring the first image, the number of face regions included in the first image may be adjusted. That is, the processor 400 may extract facial candidate regions included in the first image and calculate the accuracy of the facial candidate regions. Further, the processor 400 may determine the final face region based on the accuracy of the face candidate region, and adjust the number of face regions included in the first image.

또한, 프로세서(400)는 제 1 이미지에 포함된 얼굴 영역의 전처리(preprocess)를 수행할 수 있다. 즉, 프로세서(400)는 최종 결정된 얼굴 영역의 크기, 위치, 색상, 밝기 및 방향 중 적어도 하나 이상을 조정하여, 얼굴 영역의 전처리를 수행할 수 있다. 즉 프로세서(400)는 제 1 이미지의 특징점 검출이 용이하게 이루어지도록 전처리 작업을 수행할 수 있다. Also, the processor 400 may perform preprocessing of the face area included in the first image. That is, the processor 400 may perform pre-processing of the face region by adjusting at least one of the size, position, color, brightness, and direction of the finally determined face region. That is, the processor 400 may perform a pre-processing operation to facilitate detection of the feature point of the first image.

S200단계에서, 프로세서(400)는 심층 신경망 모델을 적용하여 제 1 이미지의 특징점을 검출한다. 여기서, 미리 훈련된 심층 신경망 모델은 제 1 이미지의 얼굴 데이터를 입력으로 하여 특징점을 추출하도록 훈련된 학습 모델로서, 합성곱 신경망(CNN: Convolution Neural Network) 기반의 학습 모델일 수 있다. 이에, 본 실시 예에서는, 단일 모델을 이용하여 제한되지 않은(Unconstrained) 환경에서도 빠르게 얼굴 특징점을 검출할 수 있다.In step S200, the processor 400 detects a feature point of the first image by applying the deep neural network model. Here, the pretrained deep neural network model is a training model trained to extract feature points by inputting face data of the first image, and may be a learning model based on a convolution neural network (CNN). Accordingly, in the present embodiment, facial feature points can be quickly detected even in an unconstrained environment using a single model.

얼굴의 특징점(facial landmark)은 얼굴의 특징이 되는 부분에 표시된 점이며, 눈, 코, 입, 귀 등에 표시될 수 있다. 즉 특징점 검출부(740)는 제 1 이미지로부터 눈썹, 눈, 코, 입, 턱, 귀 등의 위치를 검출 및 추적할 수 있다. 그리고 본 실시 예에서는 심층 신경망 모델을 이용하여, 예를 들어, 눈동자 중심을 시작점으로 하여, 15개의 최적의 특징점을 추출할 수 있다. A facial landmark is a point displayed on a part that becomes a feature of the face, and may be displayed on the eyes, nose, mouth, and ears. That is, the feature point detection unit 740 may detect and track positions of eyebrows, eyes, nose, mouth, chin, and ears from the first image. In this embodiment, 15 optimal feature points may be extracted using the deep neural network model, for example, with the center of the pupil as a starting point.

S300단계에서, 프로세서(400)는 제 1 이미지에 합성하기 위한 제 2 이미지를 획득한다. 이때, 제 2 이미지는 사용자나 별도 서버로부터 수신될 수 있으나, 카메라를 통해 획득되거나, 사용자나 별도 서버에 의해 생성될 수도 있다.In step S300, the processor 400 acquires a second image to be combined with the first image. In this case, the second image may be received from a user or a separate server, but may be acquired through a camera or may be generated by a user or a separate server.

S400단계에서, 프로세서(400)는 제 2 이미지의 경계를 추출한다. 즉, 프로세서(400)는 제 2 이미지의 레벨 값을 기반으로 하여 2차원 배열에서 윤곽을 추출하고, 제 2 이미지를 회색조(grayscale) 이미지로 변환할 수 있다. 그리고 프로세서(400)는 제 2 이미지 내 화소들의 중간 값을 산출해 배열의 중간 값을 지정하여, 제 2 이미지의 경계를 추출할 수 있다. 즉 프로세서(400)는 제 2 이미지 내 화소들의 중간 값을 산출해 배열의 중간 값을 지정함으로써, 제 2 이미지의 경계를 추출할 수 있다. 즉 회색조로 변환한 이미지를 이진화하여 중간 값을 추출하고 윤곽을 추출할 수 있다. 예를 들어, 이미지에 미분 방식을 적용하는 방법으로 마스크를 사용할 수 있다. 즉 프로세서(400)는 마스크 영역 내의 픽셀들 중 중간 값을 대표값으로 취하는 필터링을 수행할 수 있다. 중간 값을 이용하여 주변값과의 차이에 따라 적응 가중치를 설정하여 필터 처리 하므로 Spot noise와 같은 임펄스성 노이즈를 줄여줄 수 있고, 이미지의 평활화(에지나 경계면보호)가 가능하며, 선예도 차이가 크게 없으면서 잡음제거를 수행할 수 있다.In step S400, the processor 400 extracts the boundary of the second image. That is, the processor 400 may extract an outline from the 2D array based on the level value of the second image and convert the second image into a grayscale image. In addition, the processor 400 may calculate an intermediate value of the pixels in the second image and designate an intermediate value of the array to extract a boundary of the second image. That is, the processor 400 may extract the boundary of the second image by calculating the median value of the pixels in the second image and designating the median value of the array. That is, the image converted to grayscale can be binarized to extract an intermediate value and an outline. For example, you can use a mask as a way to apply a differentiation method to an image. That is, the processor 400 may perform filtering taking as a representative value an intermediate value among pixels in the mask area. Since the filter process is performed by setting the adaptive weight according to the difference from the surrounding value using the intermediate value, impulsive noise such as spot noise can be reduced, smoothing of the image (edge or boundary protection) is possible, and the sharpness difference is large. Without it, noise reduction can be performed.

본 실시 예에서는 최종 에지 검출을 하기 전에 잡음을 제거하기 위해 중간값(median) 필터와 같은 smoothing 필터를 적용하고 에지 검출을 위해 공간 필터를 적용할 수 있다. 즉 프로세서(400)는 배열의 중간 값에 기초하여 판단된 후보 동질 영역에 대해 재귀 호출을 수행하고, 재귀 호출 수행 결과, 제 2 이미지의 화소 값의 변화의 평균 값을 산출할 수 있다. 그리고 프로세서(400)는 평균 값에 기초하여 제 2 이미지의 최종 동질 영역을 검출하고, 제 2 이미지에서 제 1 이미지에 합성될 최종 동질 영역의 배열을 결정하여, 제 2 이미지의 경계를 추출할 수 있다.In the present embodiment, a smoothing filter such as a median filter may be applied to remove noise before the final edge detection is performed, and a spatial filter may be applied for edge detection. That is, the processor 400 may perform a recursive call on the candidate homogeneous region determined based on the intermediate value of the array, and calculate an average value of changes in pixel values of the second image as a result of performing the recursive call. Further, the processor 400 may detect the final homogeneous region of the second image based on the average value, determine the arrangement of the final homogeneous regions to be combined with the first image from the second image, and extract the boundary of the second image. have.

S500단계에서, 프로세서(400)는 제 1 이미지와 제 2 이미지를 매칭한다. 즉 프로세서(400)는 제 1 이미지의 특징점들의 좌표값을 기반으로 제 2 이미지의 경계에 대응하는 좌표값을 매칭할 수 있다. 다시 말해, 프로세서(400)는 제 1 이미지의 특징점의 좌표 값과 제 2 이미지의 경계의 좌표 값의 비율을 추출하고, 제 1 이미지의 특징점의 좌표 값을 기준으로 벡터의 길이를 산출할 수 있다. 즉, 프로세서(400)는 비율 및 방향을 기반으로 제 2 이미지의 경계의 배열에서 최대 값과 최소 값을 이용해 제 2 이미지가 축소 또는 확대되도록 하여, 제 1 이미지의 특징점들의 좌표값을 기반으로 제 2 이미지의 경계에 대응하는 좌표값을 매칭할 수 있다. 따라서, 본 실시 예에서는, 제 1 이미지에 합성하고자 하는 제 2 이미지의 합성 영역이 동일하다고 하더라도 제 1 이미지의 특징점에 기초하여 다르게 매칭될 수 있으며, 이에 다른 합성 결과를 제공할 수 있다.In step S500, the processor 400 matches the first image and the second image. That is, the processor 400 may match coordinate values corresponding to the boundary of the second image based on the coordinate values of the feature points of the first image. In other words, the processor 400 may extract a ratio of the coordinate value of the feature point of the first image and the coordinate value of the boundary of the second image, and calculate the length of the vector based on the coordinate value of the feature point of the first image. . That is, the processor 400 reduces or enlarges the second image using the maximum value and the minimum value in the arrangement of the boundary of the second image based on the ratio and direction, and provides a resolution based on the coordinate values of the feature points of the first image. 2 Coordinate values corresponding to the boundary of the image can be matched. Accordingly, in the present embodiment, even if the composite area of the second image to be combined with the first image is the same, different matching may be performed based on the feature points of the first image, and a different synthesis result may be provided.

S600단계에서, 프로세서(400)는 제 1 이미지와 제 2 이미지를 병합하여 출력한다. 먼저, 프로세서(400)는 제 1 이미지의 제 2 이미지가 합성될 위치에 대응하는 좌표 값에 기초하여, 제 2 이미지를 축소 또는 확대할 수 있다. 즉, 프로세서(400)는 제 1 이미지의 특징점과 제 2 이미지의 경계의 비율 및 벡터의 방향에 기반하여, 제 2 이미지를 최대값과 최소값 범위 내에서 축소 또는 확대할 수 있으며, 제 2 이미지를 제 1 이미지에 합성 가능한 크기로 변경할 수 있다. 그리고 프로세서(400)는 제 1 이미지의 제 2 이미지가 합성될 위치에 대응하는 좌표 값에 기초하여, 축소 또는 확대된 제 2 이미지를 제 1 이미지에 합성하여 출력할 수 있다. In step S600, the processor 400 merges and outputs the first image and the second image. First, the processor 400 may reduce or enlarge the second image based on a coordinate value corresponding to a position where the second image of the first image is to be synthesized. That is, the processor 400 may reduce or enlarge the second image within the range of the maximum value and the minimum value, based on the ratio of the boundary between the feature point of the first image and the second image and the direction of the vector. It can be changed to a size that can be combined with the first image. In addition, the processor 400 may synthesize and output the reduced or enlarged second image with the first image based on a coordinate value corresponding to a position where the second image of the first image is to be synthesized.

이상 설명된 본 발명에 따른 실시 예는 컴퓨터 상에서 다양한 구성요소를 통하여 실행될 수 있는 컴퓨터 프로그램의 형태로 구현될 수 있으며, 이와 같은 컴퓨터 프로그램은 컴퓨터로 판독 가능한 매체에 기록될 수 있다. 이때, 매체는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM 및 DVD와 같은 광기록 매체, 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical medium), 및 ROM, RAM, 플래시 메모리 등과 같은, 프로그램 명령어를 저장하고 실행하도록 특별히 구성된 하드웨어 장치를 포함할 수 있다.The embodiment according to the present invention described above may be implemented in the form of a computer program that can be executed through various components on a computer, and such a computer program may be recorded in a computer-readable medium. In this case, the medium is a magnetic medium such as a hard disk, a floppy disk, and a magnetic tape, an optical recording medium such as a CD-ROM and a DVD, a magnetic-optical medium such as a floptical disk, and a ROM. , A hardware device specially configured to store and execute program instructions, such as RAM, flash memory, and the like.

한편, 상기 컴퓨터 프로그램은 본 발명을 위하여 특별히 설계되고 구성된 것이거나 컴퓨터 소프트웨어 분야의 당업자에게 공지되어 사용 가능한 것일 수 있다. 컴퓨터 프로그램의 예에는, 컴파일러에 의하여 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용하여 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드도 포함될 수 있다.Meanwhile, the computer program may be specially designed and configured for the present invention, or may be known and usable to a person skilled in the computer software field. Examples of computer programs may include not only machine language codes such as those produced by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like.

본 발명의 명세서(특히 특허청구범위에서)에서 "상기"의 용어 및 이와 유사한 지시 용어의 사용은 단수 및 복수 모두에 해당하는 것일 수 있다. 또한, 본 발명에서 범위(range)를 기재한 경우 상기 범위에 속하는 개별적인 값을 적용한 발명을 포함하는 것으로서(이에 반하는 기재가 없다면), 발명의 상세한 설명에 상기 범위를 구성하는 각 개별적인 값을 기재한 것과 같다. In the specification of the present invention (especially in the claims), the use of the term "above" and the reference term similar thereto may correspond to both the singular and the plural. In addition, when a range is described in the present invention, the invention to which an individual value falling within the range is applied (unless otherwise stated), and each individual value constituting the range is described in the detailed description of the invention. Same as.

본 발명에 따른 방법을 구성하는 단계들에 대하여 명백하게 순서를 기재하거나 반하는 기재가 없다면, 상기 단계들은 적당한 순서로 행해질 수 있다. 반드시 상기 단계들의 기재 순서에 따라 본 발명이 한정되는 것은 아니다. 본 발명에서 모든 예들 또는 예시적인 용어(예들 들어, 등등)의 사용은 단순히 본 발명을 상세히 설명하기 위한 것으로서 특허청구범위에 의해 한정되지 않는 이상 상기 예들 또는 예시적인 용어로 인해 본 발명의 범위가 한정되는 것은 아니다. 또한, 당업자는 다양한 수정, 조합 및 변경이 부가된 특허청구범위 또는 그 균등물의 범주 내에서 설계 조건 및 팩터에 따라 구성될 수 있음을 알 수 있다.If there is no explicit order or contradictory description of the steps constituting the method according to the present invention, the steps may be performed in a suitable order. The present invention is not necessarily limited according to the order of description of the steps. The use of all examples or illustrative terms (for example, etc.) in the present invention is merely for describing the present invention in detail, and the scope of the present invention is limited by the above examples or illustrative terms unless limited by the claims. It does not become. In addition, those skilled in the art can recognize that various modifications, combinations, and changes may be configured according to design conditions and factors within the scope of the appended claims or their equivalents.

따라서, 본 발명의 사상은 상기 설명된 실시 예에 국한되어 정해져서는 아니 되며, 후술하는 특허청구범위뿐만 아니라 이 특허청구범위와 균등한 또는 이로부터 등가적으로 변경된 모든 범위는 본 발명의 사상의 범주에 속한다고 할 것이다.Accordingly, the spirit of the present invention is limited to the above-described embodiments and should not be defined, and all ranges equivalent to or equivalently changed from the claims to be described later as well as the claims to be described later are the scope of the spirit of the present invention. It will be said to belong to.

100 : 통신 인터페이스
200 : 카메라
300 : 센싱부
400 : 프로세서
500 : 메모리
600 : 표시부
700 : 처리부
710 : 경계 추출부
720 : 얼굴 인식부
730 : 전처리부
740 : 특징점 검출부
750 : 매칭부
760 : 스케일부
770 : 출력부100: communication interface
200: camera
300: sensing unit
400: processor
500: memory
600: display
700: processing unit
710: boundary extraction unit
720: face recognition unit
730: pretreatment unit
740: feature point detection unit
750: matching unit
760: scale part
770: output

Claims

As an image composition method,
Obtaining a first image including a face image;
Applying a pretrained deep neural network model to detect a plurality of primary feature points for each feature with the center of each feature of the face of the first image as a starting point;
Determining secondary feature points based on a normal distribution of pixel values defined for adjacent pixels of each of the primary feature points;
Obtaining a second image for combining with the first image;
Extracting a boundary of the second image;
Matching coordinate values corresponding to a boundary of the second image based on coordinate values of secondary feature points of the first image; And
Including the step of merging and outputting the first image and the second image,
The image synthesis method,
Recognizing a face area of the first image;
Detecting face information including two or more of age, gender, skin type, feature features, and emotional state of the face area; And
Further comprising setting a second image to be combined with the first image based on the at least two face information,
The step of determining the secondary feature points,
Generating a distribution graph of pixel values for adjacent pixels included in a window of a predetermined size based on the primary feature points;
Evaluating the similarity between the distribution graph and the normal distribution;
Increasing the size of the window and evaluating the degree of similarity; when the degree of similarity is greater than or equal to a predetermined threshold, determining the size of the window and confirming a vertex in a distribution graph of adjacent pixels; And
Including the step of determining a position corresponding to the vertex as the secondary feature points,
Image composition method.

The method of claim 1,
The step of obtaining the first image,
Adjusting the number of face regions included in the first image; And
Comprising the step of performing a preprocessing (preprocess) of the face region included in the first image,
Image composition method.

The method of claim 2,
Adjusting the number of face regions,
Extracting a face candidate region included in the first image;
Calculating the accuracy of the face candidate region; And
Including the step of determining a final face region based on the accuracy of the face candidate region,
Image composition method.

The method of claim 3,
The step of performing pre-processing of the face area,
Including the step of adjusting at least one or more of the determined size, position, color, brightness, and direction of the final face area,
Image composition method.

delete

The method of claim 1,
Extracting the boundary of the second image,
Extracting an outline from the 2D array based on the level value of the second image;
Converting the second image to a grayscale image; And
Comprising the step of calculating the median value of the pixels in the second image to designate the median value of the array,
Image composition method.

The method of claim 1,
The pretrained deep neural network model,
Using training data including a plurality of face images labeled with feature points, when face images are input, a neural network model trained in advance to specify the positions of corresponding feature points in the input face images,

Image composition method.

The method of claim 6,
The matching step,
Extracting a ratio between the size of the image to be merged of the second image and the size of the face image of the first image based on coordinate values of secondary feature points of the first image and coordinate values of the boundary of the second image ;
Calculating a vector connecting the secondary feature points of the first image and the coordinate values of the border of the second image based on the coordinate values of the secondary feature points of the first image and the coordinate values of the boundary of the second image step; And
Reducing or expanding the second image based on the ratio and the length and direction of the vector,
Image composition method.

The method of claim 8,
The merging and outputting,
Comprising the step of synthesizing the reduced or enlarged second image with the first image based on a coordinate value corresponding to a position where the second image is to be combined in the first image,
Image composition method.

delete

As an image synthesis device,
A communication interface for receiving a first image including a face image and a second image to be combined with the first image;
One or more processors; And
Including a memory connected to the one or more processors,
The memory, when executed by the processor, causes the processor to:
Applying a pretrained deep neural network model to detect a plurality of primary feature points for each feature with the center of each feature of the face of the first image as a starting point,
Secondary feature points are determined based on a normal distribution of pixel values defined for adjacent pixels of each of the primary feature points,
Extracting the boundary of the second image,
Matching a coordinate value corresponding to a boundary of the second image based on the coordinate values of the secondary feature points of the first image,
Store codes that cause the first image and the second image to be merged and output, and
The memory, when executed by the processor, causes the processor to
Recognizes the face region of the first image, detects face information including two or more of age, gender, skin type, feature feature, and emotional state of the face region, and based on the two or more face information, the first Additionally store codes that cause to set up a second image to be composited to the image,
The operation of determining the secondary feature points,
Generating a distribution graph of pixel values for adjacent pixels included in a window of a predetermined size based on the primary feature points;
Evaluating the similarity between the distribution graph and the normal distribution;
Increasing the size of the window and evaluating the similarity, determining the size of the window and checking a vertex in a distribution graph for adjacent pixels when the similarity is greater than or equal to a predetermined threshold; And
Including an operation of determining a position corresponding to the vertex as the secondary feature points,
Image compositing device.

The method of claim 11,
The memory, when executed by the processor, causes the processor to:
Storing codes that cause the number of face regions included in the first image to be adjusted and to perform a preprocess of the face regions included in the first image,
Image compositing device.

The method of claim 12,
The memory, when executed by the processor, causes the processor to:
A face candidate region included in the first image is extracted, an accuracy of the face candidate region is calculated, a final face region is determined based on the accuracy of the face candidate region, and the face region included in the first image is Storing the codes that cause the number to be adjusted,
Image compositing device.

The method of claim 13,
The memory, when executed by the processor, causes the processor to:
Adjusting at least one or more of the determined size, position, color, brightness, and direction of the final face region, and storing codes that cause preprocessing of the face region to be performed,
Image compositing device.

delete

The method of claim 11,
The memory, when executed by the processor, causes the processor to:
Based on the level value of the second image, an outline is extracted from a two-dimensional array, the second image is converted to a grayscale image, and the median value of the pixels in the second image is calculated to calculate the median value of the array. To store codes that cause to extract the boundary of the second image,
Image compositing device.

The method of claim 11,
The pretrained deep neural network model,
A neural network model trained in advance to specify the positions of corresponding feature points when face images are input using training data including a plurality of face images labeled with feature points,
Image compositing device.

The method of claim 16,
The memory, when executed by the processor, causes the processor to:
Extracting a ratio of the size of the image to be merged of the second image and the size of the face image of the first image based on coordinate values of secondary feature points of the first image and coordinate values of the boundary of the second image, A vector connecting the secondary feature points of the first image and the coordinate values of the border of the second image is calculated based on the coordinate values of the secondary feature points of the first image and the coordinate values of the boundary of the second image, And storing codes that cause the second image to be reduced or enlarged based on the ratio and the length and direction of the vector.

The method of claim 18,
The memory, when executed by the processor, causes the processor to:
Storing codes that cause the reduced or enlarged second image to be combined with the first image based on a coordinate value corresponding to a position at which the second image of the first image is to be combined,
Image compositing device.

delete