KR101840563B1

KR101840563B1 - Method and device for reconstructing 3d face using neural network

Info

Publication number: KR101840563B1
Application number: KR1020160131165A
Authority: KR
Inventors: 송연우; 김유찬; 이민식
Original assignee: 한양대학교 에리카산학협력단
Priority date: 2016-07-04
Filing date: 2016-10-11
Publication date: 2018-03-20
Also published as: KR20180004635A

Abstract

컨볼루션 신경망을 이용한 3차원 얼굴 복원 방법 및 장치가 개시된다. 본 발명의 실시예에 따른 3차원 얼굴 복원 방법은 신경망을 이용한 학습을 통해 미리 결정된 복수의 얼굴 이미지들과 깊이 데이터에 대한 학습 모델을 획득하는 단계; 단일 카메라에 의해 촬영되는 제1 얼굴 이미지를 수신하는 단계; 상기 학습 모델에 기초하여 상기 제1 얼굴 이미지에 대한 깊이 데이터를 획득하는 단계; 및 상기 획득된 깊이 데이터를 이용하여 상기 제1 얼굴 이미지에 대한 3차원 얼굴을 복원하는 단계를 포함한다.A three-dimensional face reconstruction method and apparatus using a convolution neural network are disclosed. A three-dimensional face reconstruction method according to an embodiment of the present invention includes: acquiring a learning model for a plurality of predetermined face images and depth data through learning using a neural network; Receiving a first face image taken by a single camera; Acquiring depth data for the first face image based on the learning model; And reconstructing a three-dimensional face of the first face image using the obtained depth data.

Description

TECHNICAL FIELD The present invention relates to a three-dimensional face reconstruction method and apparatus using a neural network,

본 발명은 3차원 얼굴 복원 방법 및 장치에 관한 것으로서, 보다 구체적으로 신경망 예를 들어, 컨볼루션 신경망을 이용하여 단일 카메라에 의해 촬영된 이미지로 3차원 얼굴을 복원할 수 있는 3차원 얼굴 복원 방법 및 장치에 관한 것이다.The present invention relates to a three-dimensional face reconstruction method and apparatus, and more particularly, to a three-dimensional face reconstruction method capable of reconstructing a three-dimensional face using an image captured by a single camera using a neural network, for example, &Lt; / RTI >

종래의 3차원 얼굴 복원 시스템은 사용자의 3차원 얼굴을 복원하기 위해, 사용자의 얼굴 영역과 배경을 분리한다. 이 때, 3차원 얼굴 복원 시스템은 3차원 복원 품질 및 속도를 향상하기 위해 크로마키 배경을 사용한다.A conventional three-dimensional face restoration system separates a user's face area and a background to restore a user's three-dimensional face. At this time, 3D face restoration system uses chroma key background to improve 3D restoration quality and speed.

그리고, 사용자는 얼굴을 제외한 머리, 몸에 크로마키 배경과 동일한 색의 천 또는 옷을 착용한다. 다시 말해, 3차원 얼굴 복원 시스템은 사용자의 얼굴을 제외한 나머지 부분을 크로마키 처리하여 사용자의 얼굴 영역과 배경을 분리한다.The user wears a cloth or clothes of the same color as the background of the chroma key on the head and body except for the face. In other words, the 3D face restoration system separates the face area of the user from the background by performing chromakey processing on the rest of the face except for the face of the user.

또한, 3차원 얼굴 복원 시스템은 사용자의 얼굴에 반사나 하이라이트가 일어나지 않도록 조명을 조절한다. 그리고, 3차원 얼굴 복원 시스템은 배경 영역이 어둡지 않도록 광량을 확보하는 작업을 수행한다.In addition, the 3D face restoration system adjusts the illumination so that no reflection or highlight occurs on the user's face. In addition, the 3D face restoration system performs an operation of securing the light amount so that the background area is not dark.

이러한 과정에서 사용자는 얼굴을 제외한 머리, 몸에 천 또는 옷을 착용하기 때문에, 주변의 환경적 요인에 따른 더위 등의 불편함이 있다. 그리고, 3차원 얼굴 복원 시스템은 3차원 얼굴의 복원 품질 확보가 될 때까지 위와 같은 복잡한 과정을 지속적으로 수행해야 된다. 또한, 이러한 복잡한 과정을 거치지 않고 간단히 생성된 3차원 얼굴은 복원 정확도가 낮아 미용, 의료 등의 응용 분야에 사용되기 어렵다. 그래서, 이런 과정을 거치지 않고 생성된 3차원 얼굴은 3차원 복원 시도 자체에 의의를 두는 엔터테이먼트용으로 주로 사용된다.In this process, since the wearer wears clothes or clothes on the head and body except the face, there is discomfort such as the heat due to the surrounding environmental factors. In addition, the 3D facial restoration system must continuously perform the above complex process until the 3D facial restoration quality is secured. In addition, since the reconstructed three-dimensional face is not easily obtained without complicated processes, it is difficult to be used in applications such as cosmetics and medical care. Therefore, the 3D face generated without going through this process is mainly used for the entertainment which has significance to the 3D restoration attempt itself.

한편, 3차원 얼굴 복원 시스템은 동종 또는 이기종 카메라를 통해 촬영된 영상으로부터 사용자의 3차원 얼굴을 복원한다. 이 때, 3차원 얼굴 복원 시스템은 카메라간의 동기화가 필수적이나, 동기화 과정이 복잡하고, 대중화가 어렵다.On the other hand, the 3D face restoration system restores the 3D face of the user from the image captured through the homogeneous or heterogeneous camera. At this time, the three-dimensional face restoration system requires synchronization between cameras, but the synchronization process is complicated, and it is difficult to popularize the three-dimensional face restoration system.

또한, 최근에는 깊이 카메라를 통해 고속으로 사용자의 3차원 깊이 정보를 획득한다. 그리고, 깊이 카메라는 크로마키 배경뿐만 아니라, 외부 환경에서 사용자와 배경을 분리하는데 용이하다. 하지만, 깊이 카메라는 컬러 정보에 대한 해상도 및 품질이 낮아 주로 사용자 검출 및 동작 인식 목적으로 활용된다. 또한, 깊이 카메라는 물체와의 최소 거리가 약 80cm에서 1m 이상일 때부터 3차원 깊이 정보를 추출하므로, 사용의 한계가 있다.Recently, 3D depth information of the user is acquired at a high speed through a depth camera. And the depth camera is easy to separate the user and the background in the external environment as well as the chroma key background. However, the depth camera has low resolution and quality for color information and is mainly used for user detection and motion recognition purposes. In addition, since the depth camera extracts the 3D depth information from a minimum distance of about 80 cm to an object, the use of the depth camera is limited.

따라서, 3차원 얼굴을 효과적으로 복원할 수 있는 방법 및 장치의 필요성이 요구된다.Accordingly, there is a need for a method and an apparatus capable of effectively restoring a three-dimensional face.

본 발명의 실시예들은, 컨볼루션 신경망과 같은 신경망을 이용하여 단일 촬영 수단에 의해 촬영된 이미지로 3차원 얼굴을 복원할 수 있는 3차원 얼굴 복원 방법 및 장치를 제공한다.Embodiments of the present invention provide a three-dimensional face reconstruction method and apparatus capable of reconstructing a three-dimensional face with an image captured by a single photographing means using a neural network such as a convolution neural network.

본 발명의 일 실시예에 따른 3차원 얼굴 복원 방법은 신경망을 이용한 학습을 통해 미리 결정된 복수의 얼굴 이미지들과 깊이 데이터에 대한 학습 모델을 획득하는 단계; 단일 촬영 수단에 의해 촬영되는 제1 얼굴 이미지를 수신하는 단계; 상기 학습 모델에 기초하여 상기 제1 얼굴 이미지에 대한 깊이 데이터를 획득하는 단계; 및 상기 획득된 깊이 데이터를 이용하여 상기 제1 얼굴 이미지에 대한 3차원 얼굴을 복원하는 단계를 포함한다.According to another aspect of the present invention, there is provided a 3D face reconstruction method including: acquiring a learning model for a plurality of predetermined face images and depth data through learning using a neural network; The method comprising: receiving a first facial image taken by a single imaging means; Acquiring depth data for the first face image based on the learning model; And reconstructing a three-dimensional face of the first face image using the obtained depth data.

상기 학습 모델을 획득하는 단계는 컨볼루션 신경망(CNN; Convolutional Neural Network)을 이용하여 상기 학습 모델을 획득할 수 있다.The acquiring of the learning model may acquire the learning model using a convolutional neural network (CNN).

나아가, 본 발명의 일 실시예에 따른 3차원 얼굴 복원 방법은 상기 수신된 제1 얼굴 이미지로부터 특징점들을 추출하는 단계; 및 상기 추출된 특징점들을 미리 정렬된 위치로 변환함으로써, 변환된 제1 얼굴 이미지를 획득하는 단계를 더 포함하고, 상기 깊이 데이터를 획득하는 단계는 상기 변환된 제1 얼굴 이미지에 대한 깊이 데이터를 획득할 수 있다.Further, the 3D face reconstruction method according to an embodiment of the present invention includes extracting minutiae points from the received first face image; And acquiring the converted first face image by converting the extracted feature points into a previously aligned position, wherein the step of acquiring the depth data acquires depth data of the converted first face image can do.

상기 제1 얼굴 이미지를 수신하는 단계는 상기 단일 카메라에 의해 실시간으로 촬영되는 피사체로부터 얼굴 이미지를 실시간으로 추적하고, 상기 추적된 얼굴 이미지를 상기 제1 얼굴 이미지로 수신할 수 있다.The step of receiving the first face image may track a face image in real time from a subject photographed in real time by the single camera and receive the tracked face image as the first face image.

상기 제1 얼굴 이미지를 수신하는 단계는 상기 피사체에서 얼굴을 인식하여 얼굴 영역을 추출하고, 상기 추출된 얼굴 영역을 실시간 추적함으로써, 상기 제1 얼굴 이미지를 수신할 수 있다.The receiving of the first face image may receive the first face image by extracting a face region by recognizing a face in the subject and real-time tracking the extracted face region.

본 발명의 다른 일 실시예에 따른 3차원 얼굴 복원 방법은 단일 촬영 수단에 의해 촬영되는 피사체로부터 얼굴 이미지를 실시간으로 획득하는 단계; 미리 결정된 복수의 얼굴 이미지들과 깊이 데이터에 대한 학습 모델을 이용하여 상기 획득된 얼굴 이미지에 대한 깊이 데이터를 획득하는 단계; 및 상기 획득된 깊이 데이터를 이용하여 상기 획득된 얼굴 이미지에 대한 3차원 얼굴을 복원하는 단계를 포함한다.According to another aspect of the present invention, there is provided a three-dimensional face reconstruction method comprising: obtaining a face image in real time from a subject photographed by a single photographing means; Acquiring depth data of the obtained face image using a learning model for a plurality of predetermined face images and depth data; And reconstructing a three-dimensional face of the obtained face image using the obtained depth data.

상기 학습 모델을 획득하는 단계는 상기 복수의 얼굴 이미지들에 표면 노말 맵(surface normal map) 요소를 추가하여 학습함으로써, 상기 학습 모델을 획득할 수 있다.The acquiring of the learning model may acquire the learning model by adding a surface normal map element to the plurality of face images.

본 발명의 일 실시예에 따른 3차원 얼굴 복원 장치는 신경망을 이용한 학습을 통해 미리 결정된 복수의 얼굴 이미지들과 깊이 데이터에 대한 학습 모델을 획득하는 학습부; 단일 촬영 수단에 의해 촬영되는 제1 얼굴 이미지를 수신하는 수신부; 상기 학습 모델에 기초하여 상기 제1 얼굴 이미지에 대한 깊이 데이터를 획득하는 획득부; 및 상기 획득된 깊이 데이터를 이용하여 상기 제1 얼굴 이미지에 대한 3차원 얼굴을 복원하는 복원부를 포함한다.The three-dimensional face reconstruction apparatus according to an embodiment of the present invention includes a learning unit for acquiring a learning model for a plurality of predetermined face images and depth data through learning using a neural network; A receiving unit for receiving a first face image photographed by a single photographing means; An acquiring unit acquiring depth data of the first face image based on the learning model; And a reconstruction unit for reconstructing the 3D face of the first face image using the obtained depth data.

상기 학습부는 컨볼루션 신경망(CNN; Convolutional Neural Network)을 이용하여 상기 학습 모델을 획득할 수 있다.The learning unit may acquire the learning model using a Convolutional Neural Network (CNN).

나아가, 본 발명의 일 실시예에 따른 3차원 얼굴 복원 장치는 상기 수신된 제1 얼굴 이미지로부터 특징점들을 추출하는 추출부; 및 상기 추출된 특징점들을 미리 정렬된 위치로 변환함으로써, 변환된 제1 얼굴 이미지를 획득하는 변환부를 더 포함하고, 상기 획득부는 상기 변환된 제1 얼굴 이미지에 대한 깊이 데이터를 획득할 수 있다.Furthermore, the three-dimensional face reconstruction apparatus according to an embodiment of the present invention includes: an extraction unit for extracting minutiae points from the received first face image; And a conversion unit for obtaining the converted first face image by converting the extracted feature points into a previously aligned position, and the obtaining unit can obtain the depth data of the converted first face image.

상기 수신부는 상기 단일 카메라에 의해 실시간으로 촬영되는 피사체로부터 얼굴 이미지를 실시간으로 추적하고, 상기 추적된 얼굴 이미지를 상기 제1 얼굴 이미지로 수신할 수 있다.The receiving unit may track a face image in real time from a subject photographed in real time by the single camera, and receive the tracked face image as the first face image.

상기 수신부는 상기 피사체에서 얼굴을 인식하여 얼굴 영역을 추출하고, 상기 추출된 얼굴 영역을 실시간 추적함으로써, 상기 제1 얼굴 이미지를 수신할 수 있다.The receiving unit may receive the first face image by recognizing a face in the subject and extracting the face region and tracking the extracted face region in real time.

상기 학습부는 상기 복수의 얼굴 이미지들에 표면 노말 맵(surface normal map) 요소를 추가하여 학습함으로써, 상기 학습 모델을 획득할 수 있다.The learning unit may acquire the learning model by adding a surface normal map element to the plurality of face images.

본 발명의 실시예들에 따르면, 컨볼루션 신경망과 같은 신경망을 이용하여 단일 촬영 수단에 의해 촬영된 이미지로 3차원 얼굴을 복원할 수 있다.According to embodiments of the present invention, a three-dimensional face can be reconstructed with an image taken by a single photographing means using a neural network such as a convolution neural network.

본 발명의 실시예들에 따르면, 웹캠과 같은 단일 촬영 수단으로 얼굴을 인식하고 단일 이미지로 3차원 얼굴을 복원할 수 있기 때문에 게임 및 가상 현실 등에 적용하기 용이하다.According to the embodiments of the present invention, a face can be recognized by a single photographing means such as a webcam, and a three-dimensional face can be restored with a single image, so that it is easy to apply to a game and a virtual reality.

이러한 본 발명은 가상 화장 시스템, 게임용 개인 아바타, 디지털 배우 생성, 피부 분석 시스템, 3D 특수 분장 시뮬레이션 등 다양한 분야에 적용할 수 있으며, 촬영 이미지로부터 3차원 복원을 수행하는 모든 분야의 어플리케이션에 적용할 수도 있다.The present invention can be applied to various fields such as a virtual makeup system, a personal avatar for a game, a digital actor creation, a skin analysis system, a 3D special makeup simulation, and the like, have.

도 1은 본 발명의 일 실시예에 따른 3차원 얼굴 복원 방법에 대한 동작 흐름도를 나타낸 것이다.
도 2는 학습 모델을 획득하는 과정을 설명하기 위한 예시도를 나타낸 것이다.
도 3은 얼굴 이미지를 변환하는 과정을 설명하기 위한 예시도를 나타낸 것이다.
도 4는 3차원 얼굴을 복원하는 과정을 설명하기 위한 개념도를 나타낸 것이다.
도 5는 본 발명에 따른 방법의 전반적인 과정을 설명하기 위한 예시도를 나타낸 것이다.
도 6은 신경망 구조에 대한 일 예시도를 나타낸 것이다.
도 7은 erode에 따른 성능 지표에 대한 예를 나타낸 것이다.
도 8은 본 발명의 일 실시예에 따른 3차원 얼굴 복원 장치에 대한 구성을 나타낸 것이다.FIG. 1 is a flowchart illustrating an operation of a 3D face reconstruction method according to an embodiment of the present invention.
FIG. 2 shows an exemplary diagram for explaining a process of acquiring a learning model.
FIG. 3 illustrates an example of a process of converting a face image.
FIG. 4 is a conceptual diagram for explaining a process of restoring a three-dimensional face.
5 shows an exemplary diagram for explaining the overall process of the method according to the present invention.
Figure 6 shows an example of a neural network structure.
FIG. 7 shows an example of a performance index according to erode.
FIG. 8 shows a configuration of a three-dimensional face reconstruction apparatus according to an embodiment of the present invention.

이하, 본 발명에 따른 실시예들을 첨부된 도면을 참조하여 상세하게 설명한다. 그러나 본 발명이 실시예들에 의해 제한되거나 한정되는 것은 아니다. 또한, 각 도면에 제시된 동일한 참조 부호는 동일한 부재를 나타낸다.Hereinafter, embodiments according to the present invention will be described in detail with reference to the accompanying drawings. However, the present invention is not limited to or limited by the embodiments. In addition, the same reference numerals shown in the drawings denote the same members.

본 발명의 실시예들은, 단일 촬영 수단 예를 들어, 웹캠에 의해 촬영된 얼굴 이미지에 대한 3차원 얼굴을 복원하는 것을 그 요지로 한다.Embodiments of the present invention are intended to restore a three-dimensional face of a single photographing means, for example, a face image taken by a webcam.

이 때, 본 발명은 신경망 예를 들어, 컨볼루션 신경망, 딥 신경망 등을 이용하여 얼굴 이미지와 이에 대한 깊이 데이터를 학습하고, 이렇게 학습된 학습 모델과 신경망을 이용하여 촬영된 얼굴 이미지에 대한 3차원 얼굴을 복원할 수 있다.At this time, the present invention learns a face image and depth data therefrom using a neural network, for example, a convolution neural network, a deep neural network, and the like. Then, using the learned learning model and the neural network, You can restore your face.

기존 3D 얼굴 복원은 많은 시간, 레이저 스캐닝, 고성능의 스테레오 카메라 등의 고가의 장비가 필요하고, 다중 시점의 영상이 필요하며, 이를 3차원의 얼굴 형태로 만들기 위해서는 많은 연산량이 필요하다.Conventional 3D face restoration requires expensive equipment such as a lot of time, laser scanning, and a high-performance stereo camera, and a multi-viewpoint image is required, and a large amount of computation is required to make it into a three-dimensional face shape.

반면 본 발명은 3D 얼굴 복원을 개인이 접근하기 용이하고, 고가의 장비가 필요 없이 웹캠 등의 단일 촬영 수단을 이용하여 얼굴을 인식하기 때문에 단일 사진 또는 단일 이미지로 얼굴에 대한 3D 복원이 가능해 진다.On the other hand, according to the present invention, 3D face restoration is easily accessible to individuals, and faces are recognized using a single photographing means such as a webcam without expensive equipment, so that 3D restoration of faces can be performed with a single photograph or a single image.

이 때, 얼굴 이미지가 신경망을 통과하여 출력하는데 걸리는 시간은 152.961ms 정도로, 빠른 시간에 복원이 가능하며, 병렬 연산 처리가 가능한 GPU 등을 사용함으로써, 빠른 속도로 3D 얼굴 복원을 수행할 수 있다.In this case, the time taken for the facial image to pass through the neural network is 152.961 ms, and it can be restored in a short time, and the 3D face restoration can be performed at a high speed by using the GPU capable of parallel processing.

도 1은 본 발명의 일 실시예에 따른 3차원 얼굴 복원 방법에 대한 동작 흐름도를 나타낸 것이다.FIG. 1 is a flowchart illustrating an operation of a 3D face reconstruction method according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 실시예에 따른 방법은 신경망을 이용한 학습을 통해 미리 결정된 복수의 얼굴 이미지들과 깊이 데이터에 대한 학습 모델을 획득한다(S110).Referring to FIG. 1, a method according to an embodiment of the present invention acquires a learning model for a plurality of predetermined face images and depth data through learning using a neural network (S110).

여기서, 신경망은 컨볼루션 신경망(CNN), 딥 신경망, 딥 컨볼루션 신경망 중 어느 하나를 포함할 수 있으며, 학습에 사용되는 얼굴 이미지에 대한 개수는 본 발명을 제공하는 사업자 또는 개인에 의해 결정될 수 있다.Here, the neural network may include any one of a convolutional neural network (CNN), a deep neural network, and a deep convolution neural network, and the number of facial images used for learning may be determined by a provider or an individual providing the present invention .

예를 들어, 단계 S110은 도 2에 도시된 바와 같이, 신경망(210)의 입력 데이터(input)로 얼굴 이미지와 깊이 데이터(image and depth)를 수신하고, 얼굴 이미지와 해당 얼굴 이미지에 대한 깊이 데이터에 대하여, 전향적 추론(forward inference)과 역 전달(back propagation)을 이용한 웨이트(weight) 값 튜닝을 수행하는 학습 과정을 통해 출력 데이터(trained data)로 학습 모델을 획득할 수 있다.For example, in step S110, the face image and the depth data (image and depth) are received as input data of the neural network 210, and the face image and the depth data The learning model can be acquired with output data (trained data) through a learning process of performing weight value tuning using forward inference and back propagation.

여기서, 단계 S110은 지도 학습(supervised Learning)으로써 500개의 얼굴 이미지와 답에 해당하는 500개의 깊이 데이터를 입력 데이터로 수신하여 학습함으로써, 학습 모델을 획득할 수 있다.Here, step S110 can acquire a learning model by receiving 500 depth data corresponding to 500 face images and answers as supervised learning and receiving the input data as input data.

즉, 단계 S110은 지도 학습으로써, 얼굴 이미지와 깊이 데이터가 짝이 되는 데이터 셋을 학습을 통해 구성할 수 있으며, 실수 형태를 사용할 수 있다. 다시 말해, 모든 픽셀이 정보와 답이 될 수 있다. 이러한 데이터 셋을 만들기 위해서 Matlab으로 hdf5파일 형식을 이용하여 데이터 셋을 구성할 수 있으며, 성능을 향상시키기 위해 데이터에 대해 노말라이제이션(normalization), 플립(flip), 와핑과 모핑(warping and morphing), 노이즈(noise), 표면 노말 맵(surface normal map) 등을 활용하여 데이터 량을 늘릴 수 있다.That is, in step S110, a data set having a face image and depth data paired with each other can be constructed through learning, and a real shape can be used. In other words, all pixels can be information and answers. To create these datasets, you can use datafiles with the hdf5 file format in Matlab. You can use normalization, flip, warping and morphing on the data to improve performance. , Noise, surface normal map, and the like to increase the amount of data.

예를 들어, 단계 S110은 표면 노말 파인튜닝(finetuning)을 시도함으로써, 표면 노말을 미리 학습시켜 놓고 이어서 깊이 데이터를 학습하여 학습 모델을 획득할 수도 있다.For example, in step S110, a learning model may be acquired by learning the surface normal by learning the depth data in advance by attempting finetuning as a surface normal.

그리고, 본 발명에서 학습 모델을 획득하는 과정이 단순한 2D 이미지와 깊이 데이터를 짝을 이루어 학습하는 것이 아니라, 2D 이미지에 해당하는 표면 노말 맵 요소를 추가하여 실질적인 깊이 데이터를 학습하는데 도움을 줄 수도 있다. 예컨대, 2D 이미지를 학습하는 과정에 2D 이미지에 대응하는 3개 채널로 구성된 표면 노멀 맵 요소를 추가하여 학습함으로써, 최종적으로 학습되는 깊이 데이터를 획득하는데 도움을 줄 수 있다.In the present invention, the process of acquiring a learning model may help to learn actual depth data by adding a surface normal map element corresponding to a 2D image, instead of learning a pair of a simple 2D image and a depth data . For example, by learning the surface normal map element composed of three channels corresponding to the 2D image in the process of learning the 2D image, it is possible to help acquire the depth data to be finally learned.

단계 S110에서 입력 데이터로 사용되는 얼굴 이미지는 얼굴 영역만이 추출된 이미지일 수 있으며, 이렇게 추출된 얼굴 영역에서 특징점들을 추출하고 눈, 코, 입 등의 특징점을 동일 선상에 위치시킨 정렬된 데이터일 수 있으며, 깊이 데이터는 얼굴 이미지에 해당하는 깊이 정보를 가지고 있고 실수 형태의 데이터일 수 있다.In step S110, the face image used as the input data may be an image in which only the face area is extracted, and the feature points are extracted from the extracted face area and the aligned data such as eye, nose, And the depth data has depth information corresponding to the face image and may be real-shaped data.

단계 S110에 의해 학습 모델이 획득된 상태에서, 단일 촬영 수단 예를 들어, 단일 카메라(또는 웹캠)을 통해 촬영되는 얼굴을 실시간으로 추적함으로써, 3D 얼굴을 복원하고자 하는 얼굴 이미지(이하, "제1 얼굴 이미지"라 칭함)를 수신하고, 이렇게 수신된 제1 얼굴 이미지의 얼굴 영역에서 특징점들을 추출하며, 추출된 특징점들을 미리 정렬된 위치로 변환함으로써, 제1 얼굴 이미지를 특징점들이 정렬된 위치로 변환된 제1 얼굴 이미지로 변환한다(S120 내지 S140).A face image to be restored to the 3D face (hereinafter, referred to as "first face image (hereinafter referred to as " first face image "Quot; face image "), extracts the feature points from the face region of the received first face image, and converts the extracted feature points into pre-aligned positions, thereby converting the first face image into the aligned positions of the feature points Into the first face image (S120 to S140).

본 발명에서는 단일 촬영 수단 예를 들어, 웹캠과 Matlab을 연결함으로써, 3D 얼굴을 복원하기 위한 제1 얼굴 이미지를 수신하고, 특징점들이 정렬된 제1 얼굴 이미지를 획득할 수 있다.In the present invention, a first image of a face for restoring a 3D face can be received by connecting a single photographing means, for example, a webcam and Matlab, and a first face image in which feature points are arranged can be obtained.

예컨대, 도 3에 도시된 바와 같이, 단계 S120은 단일 촬영 수단에 의해 촬영되는 피사체에서 얼굴을 인식하여 얼굴 영역을 추출하고, 이렇게 추출된 얼굴 영역을 실시간 추적(face recognition and tracking)(310)함으로써, 제1 얼굴 이미지 즉, 얼굴 영역을 수신할 수 있으며, 단계 S130은 추출된 얼굴 영역에서 눈, 코, 입, 얼굴 윤곽선을 찾기 위해, 미리 결정된 개수 예를 들어, 68개의 특징점들을 추출(face feature extraction 68)(320)한다.For example, as shown in FIG. 3, in step S120, a face region is extracted from a subject photographed by a single photographing unit, and face recognition and tracking 310 of the extracted face region is performed , A first face image, that is, a face region may be received. In step S130, a predetermined number of, for example, 68 feature points are extracted to search for eye, nose, mouth, extraction 68) (320).

여기서, 단계 S120은 촬영된 영상에서 얼굴이 인식되지 않은 경우 얼굴이 인식되지 않았다는 경고문 또는 얼굴 이미지가 없다는 경고문을 발생시킬 수도 있으며, 얼굴 영역은 얼굴 인식 후 일정 프레임 예를 들어, 15 프레임 경과 후 추출될 수 있고, 단계 S130은 기존에 트레이닝된(또는 학습된) 학습 모델을 바탕으로 68개의 특징점들을 추출할 수 있다.Here, if the face is not recognized in the photographed image, a warning message indicating that the face is not recognized or a warning message indicating that there is no face image may be generated in the photographed image. The face region may include a predetermined frame, for example, And step S130 may extract 68 feature points based on the previously trained (or learned) learning model.

추출된 얼굴 영역에서 기본적으로 학습에 사용되었던 데이터는 눈, 코, 입 등이 정렬된 데이터(aligned data)이기 때문에 제1 얼굴 이미지에 대해서도 같은 과정이 필요하며, 단계 S140이 이러한 과정으로, 도 3에 도시된 바와 같이 추출된 특징점들을 정렬된 위치로 변환(face morphing)(330)함으로써, 특징점들이 정렬된 위치로 변환된 제1 얼굴 이미지를 획득한다.Since the data used for learning in the extracted face region is aligned data of eyes, nose, mouth, and the like, the same process is also required for the first face image. In step S140, , The extracted feature points are subjected to face morphing (330) to obtain the first face image converted into the aligned position of the feature points.

단계 S140은 필요에 따라 마스크(mask)를 씌워 추출된 얼굴 영역에서 불필요한 영역을 제거할 수도 있다.In step S140, an unnecessary area may be removed from the extracted face area by covering a mask if necessary.

단계 S140에 의해 특징점들이 정렬된 위치로 변환된 제1 얼굴 이미지가 생성되면, 신경망과 학습 모델에 기초하여 제1 얼굴 이미지에 대한 깊이 데이터를 획득하고, 이렇게 획득된 깊이 데이터를 이용하여 제1 얼굴 이미지에 대한 3D 얼굴을 복원한다(S150, S160).When the first face image converted into the position where the minutiae are aligned is generated in step S140, the depth data of the first face image is acquired based on the neural network and the learning model, The 3D face of the image is restored (S150, S160).

예컨대, 도 4에 도시된 바와 같이 3D 얼굴을 복원하기 위하여, 단계 S110에 의해 학습된 학습 모델을 로딩(load the trained data)(410)하고, 단계 S140에 의해 특징점들이 정렬된 위치로 변환된 제1 얼굴 이미지를 로딩(load the face aligned image)(420)함으로써, 단계 S150은 제1 얼굴 이미지에 대한 깊이 데이터를 출력(output the labels)(430)하고, 단계 S160은 출력된 깊이 데이터를 이용하여 3D 얼굴을 복원(reconstruction depth map)(440)한다.For example, to restore the 3D face as shown in FIG. 4, the learning model learned in step S110 is loaded (the the trained data) 410, and in step S140, By loading a face aligned image 420, step S150 outputs the depth data for the first face image 430, and step S160 uses the output depth data 3D reconstruction depth map (440).

여기서, 학습 모델(trained data)는 트레이닝이 완료되면 학습 모델이 ‘caffemodel’ 확장자로 파일이 생성될 수 있고, 단계 S150은 새로운 얼굴 이미지 데이터가 로드되면 신경망을 통과하여 1024의 깊이 데이터(label)을 출력할 수 있는데, 깊이 데이터(label)는 실수 형태로서 실제 깊이 정보를 가질 수 있으며, 이러한 깊이 데이터를 텍스트(txt) 파일로 저장하여 Matlab에서 txt파일을 불러와 일정 크기 예를 들어, 32 × 32 크기의 이미지로 구성하고, 마지막으로 Matlab Surf 함수를 사용하여 3D 얼굴을 복원할 수 있다.In this case, when training is completed, a training model can be created with a 'caffemodel' extension, and in step S150, when new face image data is loaded, the depth data of 1024 passes through the neural network The depth data (label) is a real number and can have real depth information. You can save the depth data as a text (txt) file and load the txt file in Matlab. Size image, and finally use the Matlab Surf function to restore the 3D face.

이러한 본 발명에 대한 방법을 도 5에 도시된 과정으로 설명될 수 있으며, 도 5에 도시된 바와 같이, 얼굴 이미지와 깊이 데이터를 이용한 학습 모델이 획득된 상태에서, 단일 촬영 수단에 의해 얼굴 이미지가 촬영되면, 얼굴을 인식하여 얼굴 영역을 실시간을 추적(face tracking)하고, 추적된 얼굴 영역에서 미리 결정된 개수의 특징점들을 추출(feature point)한 후 추출된 특징점들을 정렬된 위치로 변환(face morphing)하는 과정을 함으로써, 얼굴의 특징점들이 정렬된 위치로 변환된 얼굴 이미지(face alignment, aligned data)를 획득한다.The method according to the present invention can be explained by the process shown in Fig. 5. In the state in which the learning model using the face image and the depth data is acquired as shown in Fig. 5, When a face is photographed, face recognition is performed to face the face region in real time, a predetermined number of feature points are extracted from the tracked face region, and then the extracted feature points are converted to an aligned position (face morphing) (Face alignment, aligned data) is obtained by converting the feature points of the face into the aligned positions.

이렇게 특징점들이 정렬된 위치로 변환된 얼굴 이미지는 신경망(deep neural networks)를 통과하고, 미리 학습된 학습 모델을 이용하여 새로운 얼굴 이미지에 대한 깊이 데이터를 출력(output the depth labels)하며, 이렇게 획득 또는 출력된 깊이 데이터를 이용하여 3D 얼굴을 복원(reconstruction depth map)하게 된다.The facial image transformed to the position where the feature points are arranged passes through deep neural networks and outputs the depth data of a new face image using a previously learned learning model. The reconstructed depth map is reconstructed using the output depth data.

본 발명에서 사용되는 신경망은 컨볼루션 신경망일 수도 있고, 딥 신경망일 수도 있고, 딥 컨볼루션 신경망일 수도 있다. 물론, 본 발명에서 사용되는 신경망은 특별히 한정되지 않으며, 본 발명의 목적을 수행하기에 적합한 모든 신경망이 사용될 수 있다.The neural network used in the present invention may be a convolutional neural network, a deep neural network, or a deep convolution neural network. Of course, the neural network used in the present invention is not particularly limited, and all neural networks suitable for the purpose of the present invention can be used.

도 6은 본 발명에서 사용할 수 있는 신경망에 대한 일 예의 구조를 나타낸 것으로, 데이터 레이어, 컨볼루션 레이어, RELU 레이어, 풀링(pooling) 레이어, 제1 이너 프로덕트 레이어(inner_product layer; output 64), 제2 이너 프로덕트 레이어(inner_product layer; output 1024) 및 유클리드 로스(Euclidean_loss) 레이어를 포함할 수 있다.6 shows a structure of an example of a neural network that can be used in the present invention and includes a data layer, a convolution layer, a RELU layer, a pooling layer, a first inner product layer (output 64) An inner product layer (output 1024) and an Euclidean_loss layer.

본 발명에서의 신경망 구조가 도 6에 도시된 신경망 구조와 같이 간단하게 구성된 것은 오버피팅(overfitting)을 방지하기 위해서이다.The neural network structure according to the present invention is configured as simple as the neural network structure shown in FIG. 6 in order to prevent overfitting.

이와 같이, 본 발명의 실시예에 따른 방법은 컨볼루션 신경망과 같은 신경망을 이용하여 단일 촬영 수단에 의해 촬영된 이미지로 3차원 얼굴을 복원할 수 있다.As described above, the method according to the embodiment of the present invention can restore a three-dimensional face with an image taken by a single photographing means using a neural network such as a convolution neural network.

본 발명에 따른 방법은 다음과 같은 효과를 가질 수 있다.The method according to the present invention can have the following effects.

첫째, 핸드폰으로 사진을 찍어 바로 3D 얼굴을 복원시켜 입체 얼굴 모형에 복원된 마스크를 씌우는 것과 같은 인물의 3D 아바타를 만들어내는 어플리케이션에 적용할 수 있다.First, it can be applied to an application that creates a 3D avatar of a person, such as putting a restored mask on a three-dimensional face model by restoring the 3D face immediately by taking a picture with a mobile phone.

둘째, 가상 현실 시장이 확대되는 추세에 영향을 받아 3D 아바타 시장의 수요도 증가 될 수 있기에, 이 때 별다른 도구 없이 카메라 하나로 3D 아바타를 만들 수 있는 본 발명에 따른 방법을 적용한 어플리케이션을 이용하면, 개인 프로슈머들을 증가시켜 시장 활성화에 큰 영향을 끼칠 수 있다.Secondly, the demand of the 3D avatar market can be increased due to the trend of expanding the virtual reality market. Therefore, when an application using the method according to the present invention, which can create a 3D avatar with one camera without any tools, Increasing the number of prosumers can have a significant impact on market activation.

셋째, 개인의 아바타 뿐만 아니라 연예인의 사진을 통한 연예인 아바타 또한 복원 가능하기 때문에 엔터테인먼트 산업에도 큰 영향을 미칠 수 있다.Third, the avatars of the entertainers, as well as the avatars of the entertainers, can be restored as well as individual avatars, which can have a major impact on the entertainment industry.

도 7은 erode에 따른 성능 지표에 대한 예를 나타낸 것이다.FIG. 7 shows an example of a performance index according to erode.

신경망 성능 지표는 트레이닝 데이터 500개를 입력에 넣어 각각 1024의 라벨(label)을 출력하게 되며, 신경망을 통해 출력된 값과 실제 실측 자료(ground truth) 값과의 MAE성능을 측정하는 것으로, 2.5 정도의 성능을 보이는 것을 알 수 있다. 여기서, MAE 수치가 작을수록 성능이 좋다.The neural network performance index is based on the input of 500 training data and outputs 1024 labels. The MAE performance of the neural network is measured by the neural network and the ground truth value is 2.5 The performance of the system can be seen. Here, the smaller the MAE value, the better the performance.

erode에 따른 성능지표는, 마스크에 대해 erode 연산을 점점 더욱 크게 하면 마스크의 크기는 작아지게 되며, 이 경우 가장자리의 깊이 픽(셀depth pixel)이 사라지게 된다. 이 때, 성능을 측정하게 되면 픽셀 값이 사라지게 때문에 성능이 더욱 좋아지는 것을 알 수 있다.As the performance index according to the erode, if the erode operation is further increased for the mask, the size of the mask becomes smaller. In this case, the edge depth pixel (cell depth pixel) disappears. At this time, when the performance is measured, the performance is improved because the pixel value disappears.

도 8은 본 발명의 일 실시예에 따른 3차원 얼굴 복원 장치에 대한 구성을 나타낸 것으로, 상술한 도 1 내지 도 7의 동작을 수행하는 장치에 대한 구성을 나타낸 것이다.FIG. 8 shows a configuration of a three-dimensional face reconstruction apparatus according to an embodiment of the present invention, and shows a configuration of an apparatus for performing the operations of FIGS. 1 to 7 described above.

도 8을 참조하면, 본 발명의 실시예에 따른 장치(800)는 학습부(810), 수신부(820), 추출부(830), 변환부(840), 획득부(850) 및 복원부(860)를 포함한다.8, an apparatus 800 according to an embodiment of the present invention includes a learning unit 810, a receiving unit 820, an extracting unit 830, a converting unit 840, an acquiring unit 850, 860).

학습부(810)는 신경망을 이용한 학습을 통해 미리 결정된 복수의 얼굴 이미지들과 깊이 데이터에 대한 학습 모델을 획득한다.The learning unit 810 acquires a learning model for a plurality of predetermined face images and depth data through learning using a neural network.

여기서, 신경망은 컨볼루션 신경망(CNN), 딥 신경망, 딥 컨볼루션 신경망 중 어느 하나를 포함할 수 있다.Here, the neural network may include any one of a convolutional neural network (CNN), a deep neural network, and a deep convolution neural network.

예컨대, 학습부(810)는 신경망의 입력 데이터로 얼굴 이미지와 깊이 데이터를 수신하고, 얼굴 이미지와 해당 얼굴 이미지에 대한 깊이 데이터에 대하여, 전향적 추론(forward inference)과 역 전달(back propagation)을 이용한 웨이트 값 튜닝을 수행하는 학습 과정을 통해 학습 모델을 획득할 수 있다.For example, the learning unit 810 receives the face image and the depth data as input data of the neural network, and performs forward inference and back propagation on the face image and the depth data of the face image The learning model can be acquired through the learning process of performing the weight value tuning using the weight value.

여기서, 학습부(810)는 얼굴 이미지와 깊이 데이터가 짝이 되는 데이터 셋을 학습을 통해 구성할 수 있으며, 성능을 향상시키기 위해 데이터에 대해 노말라이제이션(normalization), 플립(flip), 와핑과 모핑(warping and morphing), 노이즈(noise), 표면 노말 맵(surface normal map) 등을 활용하여 데이터 량을 늘릴 수 있다.Here, the learning unit 810 can construct a data set in which the face image is paired with the depth data by learning. In order to improve the performance, normalization, flip, The amount of data can be increased by utilizing warping and morphing, noise, surface normal map, and the like.

수신부(820)는 단일 촬영 수단 예를 들어, 단일 카메라(또는 웹캠)을 통해 촬영되는 얼굴을 실시간으로 추적함으로써, 3D 얼굴을 복원하고자 하는 제1 얼굴 이미지를 수신한다.The receiving unit 820 receives a first face image for restoring a 3D face by tracking a face photographed through a single photographing means, for example, a single camera (or a webcam) in real time.

이 때, 수신부(820)는 단일 촬영 수단에 의해 촬영되는 피사체에서 얼굴을 인식하여 얼굴 영역을 추출하고, 이렇게 추출된 얼굴 영역을 실시간 추적함으로써, 제1 얼굴 이미지 또는 제1 얼굴 이미지의 얼굴 영역을 수신할 수 있다.At this time, the receiving unit 820 recognizes the face in the subject photographed by the single photographing unit and extracts the face region, and real-time tracking of the extracted face region realizes the face region of the first face image or the first face image .

추출부(830)는 수신된 제1 얼굴 이미지의 얼굴 영역에서 미리 결정된 개수의 특징점들을 추출한다.The extracting unit 830 extracts a predetermined number of feature points from the face region of the received first face image.

여기서, 추출부(830)는 추출된 얼굴 영역에서 눈, 코, 입, 얼굴 윤곽선을 찾기 위해, 미리 결정된 개수 예를 들어, 68개의 특징점들을 추출할 수 있다.Here, the extracting unit 830 may extract a predetermined number of, for example, 68 feature points to search for the eye, nose, mouth, and face contours in the extracted face region.

변환부(840)는 추출된 특징점들을 미리 정렬된 위치로 변환함으로써, 제1 얼굴 이미지를 특징점들이 정렬된 위치로 변환된 제1 얼굴 이미지로 변환한다.The converting unit 840 converts the extracted feature points into the previously aligned positions, thereby converting the first face image into the first face image converted into the aligned position of the feature points.

여기서, 변환부(840)는 마스크(mask)를 씌워 추출된 얼굴 영역에서 불필요한 영역을 제거할 수도 있다.Here, the converting unit 840 may remove an unnecessary area from the extracted face area by covering a mask.

획득부(850)는 특징점들이 정렬된 위치로 변환된 제1 얼굴 이미지가 생성되면, 신경망과 학습 모델에 기초하여 제1 얼굴 이미지에 대한 깊이 데이터를 획득한다.The acquiring unit 850 acquires depth data of the first face image based on the neural network and the learning model when the first face image converted into the aligned position of the feature points is generated.

복원부(860)는 이렇게 획득된 깊이 데이터를 이용하여 제1 얼굴 이미지에 대한 3D 얼굴을 복원한다.The restoration unit 860 restores the 3D face of the first face image by using the depth data thus obtained.

도 8의 구성이 순차적으로 각 기능을 수행하는 것으로 기재되어 있지만, 이에 한정하지 않으며, 상술한 도 1 내지 도 7의 동작에 대한 기능 구성 블록을 도시한 것으로, 적어도 어느 하나의 프로세서에 의해 수행될 수 있으며, 모든 구성이 한번의 동작에 의해 실시간으로 이루어질 수도 있다. 즉, 획득부(850)에서 깊이 데이터를 획득한 시점에서 제1 얼굴 이미지에 대한 3D 얼굴의 복원이 완료될 수 있다.Although the configuration of FIG. 8 is described as sequentially performing each function, the present invention is not limited to this, and the functional configuration block for the operations of the above-described FIGs. 1 to 7 is shown. And all configurations may be made in real time by a single operation. That is, the 3D face reconstruction for the first face image can be completed at the time when the acquiring unit 850 acquires the depth data.

비록, 도 8에서 설명하지 않았더라도, 도 8의 장치는 상술한 도 1 내지 도 7의 동작을 모두 수행할 수 있으며, 도 1 내지 도 7의 내용을 모두 포함할 수 있다.Although not illustrated in FIG. 8, the apparatus of FIG. 8 may perform all of the operations of FIGS. 1 through 7 described above and may include all of the contents of FIGS.

그리고, 본 발명에 따른 방법 및 장치에 대한 설명에서 학습하는 기능과 다른 기능들이 순차적으로 진행되는 것으로 도시 또는 기재된 부분이 있을 수 있지만, 본 발명에 따른 방법 및 장치에서 학습 과정은 한번만 수행하여 학습 모들을 획득한 후 이후 복원하고 싶은 이미지가 입력되면 학습 모델을 이용하여 복원을 수행하는 것이다. 즉, 본 발명은 매번 다시 학습을 수행하지는 않고, 학습 과정은 사전에 미리 한번만 수행하며, 복원 시에는 그 학습 모델을 추가적인 학습없이 사용한다.In the method and apparatus according to the present invention, the learning process may be performed only once, and the learning process may be performed in the same manner as described above. And then restoration is performed using the learning model when an image to be restored is input. That is, the present invention does not perform the learning again each time, but the learning process is performed only once in advance and the learning model is used without restoration at the time of restoration.

이상에서 설명된 시스템 또는 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 시스템, 장치 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPA(field programmable array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The system or apparatus described above may be implemented as a hardware component, a software component, and / or a combination of hardware components and software components. For example, the systems, devices, and components described in the embodiments may be implemented in various forms such as, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable array ), A programmable logic unit (PLU), a microprocessor, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. The processing device may also access, store, manipulate, process, and generate data in response to execution of the software. For ease of understanding, the processing apparatus may be described as being used singly, but those skilled in the art will recognize that the processing apparatus may have a plurality of processing elements and / As shown in FIG. For example, the processing unit may comprise a plurality of processors or one processor and one controller. Other processing configurations are also possible, such as a parallel processor.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may include a computer program, code, instructions, or a combination of one or more of the foregoing, and may be configured to configure the processing device to operate as desired or to process it collectively or collectively Device can be commanded. The software and / or data may be in the form of any type of machine, component, physical device, virtual equipment, computer storage media, or device , Or may be permanently or temporarily embodied in a transmitted signal wave. The software may be distributed over a networked computer system and stored or executed in a distributed manner. The software and data may be stored on one or more computer readable recording media.

실시예들에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method according to embodiments may be implemented in the form of a program instruction that may be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, and the like, alone or in combination. The program instructions to be recorded on the medium may be those specially designed and configured for the embodiments or may be available to those skilled in the art of computer software. Examples of computer-readable media include magnetic media such as hard disks, floppy disks, and magnetic tape; optical media such as CD-ROMs and DVDs; magnetic media such as floppy disks; Magneto-optical media, and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like. Examples of program instructions include machine language code such as those produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter or the like. The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. For example, it is to be understood that the techniques described may be performed in a different order than the described methods, and / or that components of the described systems, structures, devices, circuits, Lt; / RTI > or equivalents, even if it is replaced or replaced.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents to the claims are also within the scope of the following claims.

Claims

Acquiring a learning model for a plurality of predetermined face images and depth data through learning using a neural network;
Receiving a first face image taken by a single camera;
Acquiring depth data for the first face image based on the learning model; And
Reconstructing a three-dimensional face of the first face image using the obtained depth data
Lt; / RTI >
The step of acquiring the learning model
And a controller for receiving the plurality of facial images and the depth data as input data of the neural network and performing forward inference and back propagation on the plurality of facial images and depth data of the facial image, Wherein the learning model is acquired through a supervised learning process of performing a weight value tuning using the 3D model.

The method according to claim 1,
The step of acquiring the learning model
Wherein the learning model is acquired using a Convolutional Neural Network (CNN).

The method according to claim 1,
Extracting feature points from the received first face image; And
Obtaining the converted first face image by converting the extracted feature points into a previously aligned position
Further comprising:
The step of acquiring the depth data
And acquiring depth data of the converted first face image.

The method according to claim 1,
The step of receiving the first face image
Wherein the face image is tracked in real time from a subject photographed in real time by the single camera, and the tracked face image is received as the first face image.

5. The method of claim 4,
The step of receiving the first face image
Wherein the first face image is received by extracting a face region by recognizing a face in the subject and real-time tracking the extracted face region.

The method according to claim 1,
The step of acquiring the learning model
And acquiring the learning model by adding a surface normal map element to the plurality of face images.

Acquiring a face image in real time from a subject photographed by a single camera;
Acquiring depth data of the obtained face image using a learning model for a plurality of predetermined face images and depth data; And
And reconstructing a three-dimensional face of the obtained face image using the obtained depth data
/ RTI >
The learning model
The method includes receiving the plurality of face images and depth data as input data of a neural network, and performing forward inference and back propagation on the plurality of face images and the depth data of the face image Dimensional face is obtained through a supervised learning process of the neural network for performing weight value tuning using the 3D face reconstruction method.

8. The method of claim 7,
The learning model
Wherein the three-dimensional face image is acquired using a convolutional neural network (CNN).

A learning unit for acquiring a learning model for a plurality of predetermined face images and depth data through learning using a neural network;
A receiving unit for receiving a first face image taken by a single camera;
An acquiring unit acquiring depth data of the first face image based on the learning model; And
And restoring the three-dimensional face of the first face image by using the obtained depth data,
/ RTI >
The learning unit
And a controller for receiving the plurality of facial images and the depth data as input data of the neural network and performing forward inference and back propagation on the plurality of facial images and depth data of the facial image, Wherein the learning model is acquired through a supervised learning process for performing weight value tuning using the 3D model.

10. The method of claim 9,
The learning unit
Wherein the learning model is acquired using a convolutional neural network (CNN).

10. The method of claim 9,
An extracting unit for extracting feature points from the received first face image; And
And converting the extracted minutiae points into a previously aligned position to obtain a converted first face image,
Further comprising:
The obtaining unit
And acquires depth data of the converted first face image.

10. The method of claim 9,
The receiving unit
Wherein the face image is tracked in real time from a subject photographed in real time by the single camera, and the tracked face image is received as the first face image.

13. The method of claim 12,
The receiving unit
Wherein the first face image is received by extracting a face region by recognizing a face in the subject and real-time tracking the extracted face region.

10. The method of claim 9,
The learning unit
Wherein the learning model is acquired by learning a surface normal map element added to the plurality of face images.