KR20210086905A

KR20210086905A - Landmarks Decomposition Apparatus, Method and Computer Readable Recording Medium Thereof

Info

Publication number: KR20210086905A
Application number: KR1020190179927A
Authority: KR
Inventors: 안상일; 하성주; 김동영; 마틴 커스너; 김범수; 서석준
Original assignee: 주식회사 하이퍼커넥트
Priority date: 2019-12-31
Filing date: 2019-12-31
Publication date: 2021-07-09
Also published as: KR102422779B1

Abstract

Disclosed are a landmark decomposition apparatus, a method, and a computer-readable recording medium. A landmark decomposition method according to an embodiment of the present invention comprises the steps of: receiving the face image of a first person and landmark information corresponding to the face image; estimating a transformation matrix corresponding to the landmark information; calculating the expression landmark of the first person using the transformation matrix; and calculating the unique landmark of the first person by using the expression landmark. It is possible to perform landmark decomposition even for targets with only a small amount of data.

Description

Landmarks Decomposition Apparatus, Method and Computer Readable Recording Medium Thereof

본 발명은 랜드마크 분리 장치, 방법 및 컴퓨터 판독 가능한 기록매체에 관한 것으로, 보다 구체적으로는 하나의 프레임 혹은 적은 수의 프레임으로부터 랜드마크를 분리할 수 있는 랜드마크 분리 장치, 방법 및 컴퓨터 판독 가능한 기록매체에 관한 것이다.The present invention relates to a landmark separation apparatus, method, and computer-readable recording medium, and more particularly, to a landmark separation apparatus, method, and computer-readable recording capable of separating a landmark from one frame or a small number of frames. It's about the medium.

얼굴 랜드마크(facial landmark)는 얼굴의 주요 요소의 기점(key point)을 추출하거나 기점을 연결하여 그린 윤곽선을 추출하는 분석 방법이다. Facial landmark는 얼굴 표정 분류, 포즈 분석, 합성 및 변형 등 얼굴 영상의 analysis, synthesis, morphing, reenactment, classification 등의 기술의 가장 밑단에서 활용되고 있다.A facial landmark is an analysis method that extracts the key points of the main elements of the face or extracts the outline drawn by connecting the key points. Facial landmarks are used at the very bottom of techniques such as analysis, synthesis, morphing, reenactment, and classification of facial images such as facial expression classification, pose analysis, synthesis and transformation.

Facial landmark를 기반으로 하는 기존의 얼굴 영상 분석 및 활용 기술은 facial landmark를 처리할 때 대상의 외모적 특성과 표정 등의 감정에 의한 특성을 구분하지 않아 이로 인한 성능 하락을 동반한다. 예를 들어, 눈썹의 위치가 남들보다 높이 있는 외모적 특성을 가지고 있는 사람의 감정을 분류하는 경우 실제로 무표정 하더라도 놀란 표정을 짓고 있는 것으로 잘못 분류될 수 있다.Existing facial image analysis and utilization technologies based on facial landmarks do not distinguish between the physical characteristics of the target and the characteristics of emotions such as facial expressions when processing facial landmarks, which is accompanied by a decrease in performance. For example, when classifying the emotions of a person whose eyebrows are higher than others, even if they are actually expressionless, they may be incorrectly classified as making a surprised expression.

본 발명은 적은 양의 데이터만 있는 대상에 대해서도 랜드마크 분리를 수행할 수 있는 랜드마크 분리 장치, 방법 및 컴퓨터 판독 가능한 기록매체를 제공하는 것을 목적으로 한다.An object of the present invention is to provide a landmark separation apparatus, method, and computer-readable recording medium capable of performing landmark separation even for an object having only a small amount of data.

본 발명의 일 실시예에 따른 랜드마크 분리 방법은, 제1 인물의 얼굴 이미지와 상기 얼굴 이미지에 대응하는 랜드마크 정보를 수신하는 단계, 상기 랜드마크 정보에 대응하는 변환 행렬을 추정하는 단계, 상기 변환 행렬을 이용하여 상기 제1 인물의 표현 랜드마크를 산출하는 단계 및 상기 표현 랜드마크를 이용하여 상기 제1 인물의 고유 랜드마크를 산출하는 단계를 포함한다.A landmark separation method according to an embodiment of the present invention includes the steps of: receiving a face image of a first person and landmark information corresponding to the face image; estimating a transformation matrix corresponding to the landmark information; Calculating an expression landmark of the first person by using a transformation matrix, and calculating a unique landmark of the first person by using the expression landmark.

또한, 상기 변환 행렬을 추정하는 단계에서는 임의의 얼굴 이미지와 상기 임의의 얼굴 이미지에 대응하는 랜드마크 정보로부터 PCA 변환 행렬을 추정하도록 학습된 학습 모델을 사용할 수 있다.In addition, in the step of estimating the transformation matrix, a learning model trained to estimate the PCA transformation matrix from an arbitrary face image and landmark information corresponding to the arbitrary face image may be used.

또한, 상기 학습 모델은 복수의 랜드마크 정보를 복수의 시맨틱 그룹(semantic group)으로 분류하고, 상기 복수의 시맨틱 그룹 각각에 대응하는 PCA 변환 계수를 출력할 수 있다.Also, the learning model may classify a plurality of landmark information into a plurality of semantic groups and output PCA transform coefficients corresponding to each of the plurality of semantic groups.

또한, 상기 표현 랜드마크를 산출하는 단계에서는, 상기 추정된 변환 행렬과 상기 PCA 단위 벡터를 연산하여 상기 제1 인물의 표현 랜드마크를 산출할 수 있다.Also, in the calculating of the expression landmark, the expression landmark of the first person may be calculated by calculating the estimated transformation matrix and the PCA unit vector.

또한, 상기 랜드마크 정보는 상기 표현 랜드마크, 상기 고유 랜드마크 및 충분히 많은 사람의 얼굴의 평균 랜드마크의 합으로 정의될 수 있다.In addition, the landmark information may be defined as the sum of the expression landmark, the unique landmark, and the average landmark of a sufficiently large number of people's faces.

또한, 상기 고유 랜드마크를 산출하는 단계에서는, 상기 랜드마크 정보에서 상기 표현 랜드마크 및 상기 평균 랜드마크를 연산하여 상기 제1 인물의 고유 랜드마크를 산출할 수 있다.In addition, in the step of calculating the unique landmark, it is possible to calculate the unique landmark of the first person by calculating the expression landmark and the average landmark from the landmark information.

한편, 본 발명에 따른 랜드마크 분리 방법을 수행하기 위한 프로그램이 기록된 컴퓨터 판독 가능한 기록매체가 제공될 수 있다.On the other hand, a computer-readable recording medium in which a program for performing the landmark separation method according to the present invention is recorded may be provided.

한편, 본 발명의 일 실시예에 따른 랜드마크 분리 장치는, 제1 인물의 얼굴 이미지와 상기 얼굴 이미지에 대응하는 랜드마크 정보를 수신하는 수신부, 상기 랜드마크 정보에 대응하는 변환 행렬을 추정하는 변환 행렬 추정부 및 상기 변환 행렬을 이용하여 상기 제1 인물의 표현 랜드마크를 산출하고, 상기 표현 랜드마크를 이용하여 상기 제1 인물의 고유 랜드마크를 산출하는 연산부를 포함한다.On the other hand, the landmark separation apparatus according to an embodiment of the present invention, a receiving unit for receiving a face image of a first person and landmark information corresponding to the face image, a transformation for estimating a transformation matrix corresponding to the landmark information and a calculation unit for calculating an expression landmark of the first person by using a matrix estimator and the transformation matrix, and calculating a unique landmark of the first person by using the expression landmark.

또한, 상기 변환 행렬 추정부는 임의의 얼굴 이미지와 상기 임의의 얼굴 이미지에 대응하는 랜드마크 정보로부터 PCA 변환 행렬을 추정하도록 학습된 학습 모델을 사용할 수 있다.In addition, the transformation matrix estimator may use a learning model trained to estimate a PCA transformation matrix from an arbitrary face image and landmark information corresponding to the arbitrary face image.

또한, 상기 학습 모델은 상기 랜드마크 정보를 복수의 시맨틱 그룹(semantic group)으로 분류하고, 상기 복수의 시맨틱 그룹 각각에 대응하는 PCA 변환 계수를 출력할 수 있다.Also, the learning model may classify the landmark information into a plurality of semantic groups and output PCA transform coefficients corresponding to each of the plurality of semantic groups.

또한, 상기 연산부는, 상기 추정된 변환 행렬과 상기 PCA 단위 벡터를 연산하여 상기 제1 인물의 표현 랜드마크를 산출할 수 있다.Also, the calculator may calculate the expression landmark of the first person by calculating the estimated transformation matrix and the PCA unit vector.

또한, 상기 제1 인물의 고유 랜드마크를 산출하는 단계에서는, 상기 랜드마크 정보에서 상기 표현 랜드마크 및 상기 평균 랜드마크를 연산하여 상기 제1 인물의 고유 랜드마크를 산출할 수 있다.In addition, in the step of calculating the unique landmark of the first person, it is possible to calculate the unique landmark of the first person by calculating the expression landmark and the average landmark from the landmark information.

본 발명은 적은 양의 데이터만 있는 대상에 대해서도 랜드마크 분리를 수행할 수 있는 랜드마크 분리 장치, 방법 및 컴퓨터 판독 가능한 기록매체를 제공할 수 있다.The present invention can provide a landmark separation apparatus, method, and computer-readable recording medium capable of performing landmark separation even for an object having only a small amount of data.

도 1은 본 발명에 따른 랜드마크 분리 장치가 동작하는 환경을 개략적으로 나타내는 도면이다.
도 2는 본 발명의 일 실시예에 따른 랜드마크 분리 방법을 개략적으로 나타내는 순서도이다.
도 3은 본 발명의 일 실시예에 따른 변환 행렬을 연산하는 방법을 개략적으로 나타내는 도면이다.
도 4는 본 발명의 일 실시예에 따른 랜드마크 분리 장치의 구성을 개략적으로 나타내는 도면이다.
도 5는 본 발명을 이용하여 얼굴을 재현하는 방법을 예시적으로 나타내는 도면이다.1 is a diagram schematically illustrating an environment in which a landmark separation apparatus according to the present invention operates.
2 is a flowchart schematically illustrating a landmark separation method according to an embodiment of the present invention.
3 is a diagram schematically illustrating a method of calculating a transformation matrix according to an embodiment of the present invention.
4 is a diagram schematically showing the configuration of a landmark separation apparatus according to an embodiment of the present invention.
5 is a view exemplarily showing a method of reproducing a face using the present invention.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 것이며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하며, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. 명세서 전체에 걸쳐 동일 참조 부호는 동일 구성 요소를 지칭한다.Advantages and features of the present invention and methods of achieving them will become apparent with reference to the embodiments described below in detail in conjunction with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but will be implemented in a variety of different forms, and only these embodiments allow the disclosure of the present invention to be complete, and common knowledge in the technical field to which the present invention belongs It is provided to fully inform the possessor of the scope of the invention, and the present invention is only defined by the scope of the claims. Like reference numerals refer to like elements throughout.

비록 "제1" 또는 "제2" 등이 다양한 구성요소를 서술하기 위해서 사용되나, 이러한 구성요소는 상기와 같은 용어에 의해 제한되지 않는다. 상기와 같은 용어는 단지 하나의 구성요소를 다른 구성요소와 구별하기 위하여 사용될 수 있다. 따라서, 이하에서 언급되는 제1구성요소는 본 발명의 기술적 사상 내에서 제2구성요소일 수도 있다.Although "first" or "second" is used to describe various elements, these elements are not limited by the above terms. Such terms may only be used to distinguish one component from another. Accordingly, the first component mentioned below may be the second component within the spirit of the present invention.

본 명세서에서 사용된 용어는 실시예를 설명하기 위한 것이며 본 발명을 제한하고자 하는 것은 아니다. 본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함한다. 명세서에서 사용되는 "포함한다(comprises)" 또는 "포함하는(comprising)"은 언급된 구성요소 또는 단계가 하나 이상의 다른 구성요소 또는 단계의 존재 또는 추가를 배제하지 않는다는 의미를 내포한다.The terminology used herein is for the purpose of describing the embodiment and is not intended to limit the present invention. As used herein, the singular also includes the plural unless specifically stated otherwise in the phrase. As used herein, “comprises” or “comprising” implies that the stated component or step does not exclude the presence or addition of one or more other components or steps.

다른 정의가 없다면, 본 명세서에서 사용되는 모든 용어는 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 공통적으로 이해될 수 있는 의미로 해석될 수 있다. 또한, 일반적으로 사용되는 사전에 정의되어 있는 용어들은 명백하게 특별히 정의되어 있지 않는 한 이상적으로 또는 과도하게 해석되지 않는다.Unless otherwise defined, all terms used herein may be interpreted with meanings commonly understood by those of ordinary skill in the art to which the present invention pertains. In addition, terms defined in a commonly used dictionary are not to be interpreted ideally or excessively unless specifically defined explicitly.

도 1은 본 발명에 따른 랜드마크 분리 장치가 동작하는 환경을 개략적으로 나타내는 도면이다. 도 1을 참조하면, 제1 단말기(10), 및 제2 단말기(200)가 동작하는 환경은 서버(100) 및 서버(100)와 서로 연결된 제1 단말기(10), 및 제2 단말기(20)를 포함할 수 있다. 설명의 편의를 위해 도 1에는 두 개의 단말기, 즉 제1 단말기(10), 및 제2 단말기(20) 만을 도시하고 있으나, 두 개 보다 더 많은 수의 단말기가 포함될 수 있다. 추가될 수 있는 단말기에 대하여, 특별히 언급될 설명을 제외하고, 제1 단말기(10), 및 제2 단말기(20)에 대한 설명이 적용될 수 있다.1 is a diagram schematically illustrating an environment in which a landmark separation apparatus according to the present invention operates. Referring to FIG. 1 , an environment in which a first terminal 10 and a second terminal 200 operate is a server 100 and a first terminal 10 and a second terminal 20 connected to the server 100 . ) may be included. For convenience of explanation, only two terminals, ie, the first terminal 10 and the second terminal 20, are shown in FIG. 1, but more terminals than two may be included. With respect to the terminal that can be added, the description of the first terminal 10 and the second terminal 20 may be applied, except for the description to be specifically mentioned.

서버(100)는 통신망에 연결될 수 있다. 서버(100)는 상기 통신망을 통해 외부의 다른 장치와 서로 연결될 수 있다. 서버(100)는 서로 연결된 다른 장치에 데이터를 전송하거나 상기 다른 장치로부터 데이터를 수신할 수 있다.The server 100 may be connected to a communication network. The server 100 may be connected to other external devices through the communication network. The server 100 may transmit data to or receive data from other devices connected to each other.

서버(100)와 연결된 통신망은 유선 통신망, 무선 통신망, 또는 복합 통신망을 포함할 수 있다. 통신망은 3G, LTE, 또는 LTE-A 등과 같은 이동 통신망을 포함할 수 있다. 통신망은 와이파이(Wi-Fi), UMTS/GPRS, 또는 이더넷(Ethernet) 등과 같은 유선 또는 무선 통신망을 포함할 수 있다. 통신망은 마그네틱 보안 전송(MST, Magnetic Secure Transmission), RFID(Radio Frequency Identification), NFC(Near Field Communication), 지그비(ZigBee), Z-Wave, 블루투스(Bluetooth), 저전력 블루투스(BLE, Bluetooth Low Energy), 또는 적외선 통신(IR, InfraRed communication) 등과 같은 근거리 통신망을 포함할 수 있다. 통신망은 근거리 네트워크(LAN, Local Area Network), 도시권 네트워크(MAN, Metropolitan Area Network), 또는 광역 네트워크(WAN, Wide Area Network) 등을 포함할 수 있다.The communication network connected to the server 100 may include a wired communication network, a wireless communication network, or a complex communication network. The communication network may include a mobile communication network such as 3G, LTE, or LTE-A. The communication network may include a wired or wireless communication network such as Wi-Fi, UMTS/GPRS, or Ethernet. The communication network is Magnetic Secure Transmission (MST), Radio Frequency Identification (RFID), Near Field Communication (NFC), ZigBee, Z-Wave, Bluetooth, Bluetooth Low Energy (BLE, Bluetooth Low Energy) , or may include a local area network such as infrared communication (IR, InfraRed communication). The communication network may include a local area network (LAN), a metropolitan area network (MAN), or a wide area network (WAN).

서버(100)는 제1 단말기(10), 및 제2 단말기(20) 중 적어도 하나로부터 데이터를 수신할 수 있다. 서버(100)는 제1 단말기(10), 및 제2 단말기(20) 중 적어도 하나로부터 수신된 데이터를 이용하여 연산을 수행할 수 있다. 서버(100)는 상기 연산 결과를 제1 단말기(10), 및 제2 단말기(20) 중 적어도 하나에 전송할 수 있다.The server 100 may receive data from at least one of the first terminal 10 and the second terminal 20 . The server 100 may perform an operation using data received from at least one of the first terminal 10 and the second terminal 20 . The server 100 may transmit the operation result to at least one of the first terminal 10 and the second terminal 20 .

서버(100)는 제1 단말기(10), 및 제2 단말기(20) 중 적어도 하나의 단말기로부터 중개 요청을 수신할 수 있다. 서버(100)는 중개 요청을 전송한 단말을 선택할 수 있다. 예를 들어, 서버(100)는 제1 단말기(10), 및 제2 단말기(20)를 선택할 수 있다.The server 100 may receive a mediation request from at least one of the first terminal 10 and the second terminal 20 . The server 100 may select a terminal that has transmitted the mediation request. For example, the server 100 may select the first terminal 10 and the second terminal 20 .

서버(100)는 상기 선택된 제1 단말기(10), 및 제2 단말기(20) 사이의 통신 연결을 중개할 수 있다. 예컨대, 서버(100)는 제1 단말기(10), 및 제2 단말기(20) 사이의 영상 통화 연결을 중개하거나, 텍스트 송수신 연결을 중개할 수 있다. 서버(100)는 제1 단말기(10)에 대한 연결 정보를 제2 단말기(20)에 전송할 수 있고, 제2 단말기(20)에 대한 연결 정보를 제1 단말기(10)에 전송할 수 있다.The server 100 may mediate a communication connection between the selected first terminal 10 and the second terminal 20 . For example, the server 100 may mediate a video call connection between the first terminal 10 and the second terminal 20 or a text transmission/reception connection. The server 100 may transmit connection information on the first terminal 10 to the second terminal 20 , and may transmit connection information on the second terminal 20 to the first terminal 10 .

제1 단말기(10)에 대한 연결 정보는 예를 들어, 제1 단말기(10)의 아이피(IP) 주소 및 포트(port) 번호를 포함할 수 있다. 제2 단말기(20)에 대한 연결 정보를 수신한 제1 단말기(10)는 상기 수신된 연결 정보를 이용하여 제2 단말기(20)에의 연결을 시도할 수 있다.The connection information for the first terminal 10 may include, for example, an IP address and a port number of the first terminal 10 . Upon receiving the connection information for the second terminal 20 , the first terminal 10 may attempt to connect to the second terminal 20 using the received connection information.

제1 단말기(10)의 제2 단말기(20)에의 연결 시도 또는 제2 단말기(20)의 제1 단말기(10)에의 연결 시도가 성공함으로써, 제1 단말기(10) 및 제2 단말기(20) 사이의 영상 통화 세션이 수립될 수 있다. 상기 영상 통화 세션을 통해 제1 단말기(10)는 제2 단말기(20)에 영상 또는 소리를 전송할 수 있다. 제1 단말기(10)는 영상 또는 소리를 디지털 신호로 인코딩하고, 상기 인코딩 된 결과물을 제2 단말기(20)에 전송할 수 있다.When the connection attempt of the first terminal 10 to the second terminal 20 or the connection attempt of the second terminal 20 to the first terminal 10 is successful, the first terminal 10 and the second terminal 20 A video call session may be established between Through the video call session, the first terminal 10 may transmit a video or sound to the second terminal 20 . The first terminal 10 may encode an image or sound into a digital signal, and transmit the encoded result to the second terminal 20 .

또한, 상기 영상 통화 세션을 통해 제1 단말기(10)는 제2 단말기(20)로부터 영상 또는 소리를 수신할 수 있다. 제1 단말기(10)는 디지털 신호로 인코딩 된 영상 또는 소리를 수신하고, 상기 수신된 영상 또는 소리를 디코딩할 수 있다.Also, through the video call session, the first terminal 10 may receive a video or sound from the second terminal 20 . The first terminal 10 may receive an image or sound encoded as a digital signal, and decode the received image or sound.

상기 영상 통화 세션을 통해 제2 단말기(20)는 제1 단말기(10)에 영상 또는 소리를 전송할 수 있다. 또한, 상기 영상 통화 세션을 통해 제2 단말기(20)는 제1 단말기(10)로부터 영상 또는 소리를 수신할 수 있다. 이로써, 제1 단말기(10)의 사용자 및 제2 단말기(20)의 사용자는 서로 영상 통화를 할 수 있다.Through the video call session, the second terminal 20 may transmit a video or sound to the first terminal 10 . Also, through the video call session, the second terminal 20 may receive a video or sound from the first terminal 10 . Accordingly, the user of the first terminal 10 and the user of the second terminal 20 can make a video call with each other.

제1 단말기(10), 및 제2 단말기(20)는, 예를 들어, 데스크탑 컴퓨터, 랩탑 컴퓨터, 스마트폰, 스마트 태블릿, 스마트 워치, 이동 단말, 디지털 카메라, 웨어러블 디바이스(wearable device), 또는 휴대용 전자기기 등일 수 있다. 제1 단말기(10), 및 제2 단말기(20)는 프로그램 또는 애플리케이션을 실행할 수 있다. 제1 단말기(10), 및 제2 단말기(200) 각각은 서로 동일한 종류의 장치일 수 있고, 서로 다른 종류의 장치일 수도 있다.The first terminal 10 and the second terminal 20 may be, for example, a desktop computer, a laptop computer, a smart phone, a smart tablet, a smart watch, a mobile terminal, a digital camera, a wearable device, or a portable device. It may be an electronic device or the like. The first terminal 10 and the second terminal 20 may execute programs or applications. Each of the first terminal 10 and the second terminal 200 may be a device of the same type or may be a device of a different type.

도 2는 본 발명의 일 실시예에 따른 랜드마크 분리 방법을 개략적으로 나타내는 순서도이다. 도 2를 참조하면, 본 발명의 일 실시예에 따른 랜드마크 분리 방법은 얼굴 이미지와 랜드마크 정보를 수신하는 단계(S110), 변환 행렬을 추정하는 단계(S120), 표현 랜드마크를 산출하는 단계(S130), 및 고유 랜드마크를 산출하는 단계(S140)를 포함한다.2 is a flowchart schematically illustrating a landmark separation method according to an embodiment of the present invention. Referring to FIG. 2 , the landmark separation method according to an embodiment of the present invention includes the steps of receiving a face image and landmark information (S110), estimating a transformation matrix (S120), and calculating an expression landmark (S130), and calculating a unique landmark (S140).

단계(S110)에서는 제1 인물의 얼굴 이미지와 상기 얼굴 이미지에 대응하는 랜드마크(landmark) 정보를 수신한다. 여기서, 상기 랜드마크는 상기 얼굴의 랜드마크(facial landmark)로 이해할 수 있다. 상기 랜드마크는 얼굴의 주요 요소, 예컨대, 눈, 눈썹, 코, 입, 턱선 등을 의미할 수 있다.In step S110, a face image of the first person and landmark information corresponding to the face image are received. Here, the landmark may be understood as a landmark of the face. The landmark may mean major elements of the face, for example, eyes, eyebrows, nose, mouth, jaw line, and the like.

그리고, 상기 랜드마크 정보는 상기 얼굴의 주요 요소의 위치, 크기, 또는 모양에 관한 정보를 포함할 수 있다. 또한, 상기 랜드마크 정보는 상기 얼굴의 주요 요소의 색상 또는 질감에 관한 정보를 포함할 수 있다.In addition, the landmark information may include information about the position, size, or shape of the main element of the face. In addition, the landmark information may include information about a color or texture of a major element of the face.

상기 제1 인물은 임의의 인물을 의미하며, 단계(S110)에서는 임의의 인물의 얼굴 이미지와 상기 얼굴 이미지에 대응하는 랜드마크 정보를 수신한다. 상기 랜드마크 정보는 공지의 기술을 통해 획득 가능하며, 공지된 방법 중 어떤 방법을 사용하더라도 무방하다. 또한, 상기 랜드마크를 획득하는 방법에 의하여 본 발명이 제한되는 것은 아니다.The first person means an arbitrary person, and in step S110, a face image of an arbitrary person and landmark information corresponding to the face image are received. The landmark information can be obtained through a known technique, and any of the known methods may be used. In addition, the present invention is not limited by the method of obtaining the landmark.

단계(S120)에서는 상기 랜드마크 정보에 대응하는 변환 행렬을 추정한다. 상기 변환 행렬은 미리 정해진 단위 벡터(unit vector)와 함께 상기 랜드마크 정보를 구성할 수 있다. 예를 들어, 제1 랜드마크 정보는 상기 단위 벡터와 제1 변환 행렬의 곱으로 연산될 수 있다. 또한, 제2 랜드마크 정보는 상기 단위 벡터와 제2 변환 행렬의 곱으로 연산될 수 있다.In step S120, a transformation matrix corresponding to the landmark information is estimated. The transformation matrix may constitute the landmark information together with a predetermined unit vector. For example, the first landmark information may be calculated as a product of the unit vector and a first transformation matrix. Also, the second landmark information may be calculated as a product of the unit vector and a second transformation matrix.

상기 변환 행렬은 고차원의 랜드마크 정보를 저차원의 데이터로 변환하는 행렬로서, 주성분분석(Principal Component Analysis, PCA)에서 활용될 수 있다. PCA는 데이터의 분산을 최대한 보존하면서 서로 직교하는 새 축을 찾아 고차원 공간의 변수들을 저차원 공간의 변수로 변환하는 차원 축소 기법이다. PCA는 먼저 데이터에 가장 가까운 초평면(hyperplane)을 구한 뒤에 데이터를 저차원의 초평면에 투영(projection)시켜 데이터의 차원을 축소한다.The transformation matrix is a matrix that transforms high-dimensional landmark information into low-dimensional data, and may be utilized in Principal Component Analysis (PCA). PCA is a dimensionality reduction technique that converts variables in high-dimensional space into variables in low-dimensional space by finding new axes orthogonal to each other while preserving data variance as much as possible. PCA reduces the dimension of data by first finding the hyperplane closest to the data, and then projecting the data onto a low-dimensional hyperplane.

PCA에서 i 번째 축을 정의하는 단위 벡터를 i 번째 주성분(Principal Component, PC)라고 하고, 이러한 축들을 선형 결합하여 고차원 데이터를 저차원 데이터로 변환할 수 있다.In PCA, a unit vector defining the i-th axis is called an i-th principal component (PC), and these axes are linearly combined to convert high-dimensional data into low-dimensional data.

여기서, X는 고차원의 랜드마크 정보, Y는 저차원의 주성분, 그리고 α는 변환 행렬을 의미한다.Here, X denotes high-dimensional landmark information, Y denotes a low-dimensional principal component, and α denotes a transformation matrix.

앞서 설명한 바와 같이, 상기 단위 벡터, 즉 주성분은 미리 결정되어 있을 수 있다. 따라서, 새로운 랜드마크 정보가 수신되면, 이에 대응하는 변환 행렬이 결정될 수 있다. 이 때, 하나의 랜드마크 정보에 대응하여 복수 개의 변환 행렬이 존재할 수 있다.As described above, the unit vector, that is, the principal component may be predetermined. Accordingly, when new landmark information is received, a transformation matrix corresponding thereto may be determined. In this case, a plurality of transformation matrices may exist corresponding to one piece of landmark information.

한편, 단계(S120)에서는 상기 변환 행렬을 추정하도록 학습된 학습 모델을 사용할 수 있다. 상기 학습 모델은 임의의 얼굴 이미지와 상기 임의의 얼굴 이미지에 대응하는 랜드마크 정보로부터 PCA 변환 행렬을 추정하도록 학습된 모델로 이해할 수 있다.Meanwhile, in step S120 , a learning model trained to estimate the transformation matrix may be used. The learning model may be understood as a model trained to estimate a PCA transformation matrix from an arbitrary face image and landmark information corresponding to the arbitrary face image.

상기 학습 모델은 서로 다른 사람들의 얼굴 이미지와, 각각의 얼굴 이미지에 대응하는 랜드마크 정보로부터 상기 변환 행렬을 추정하도록 학습될 수 있다. 하나의 고차원 랜드마크 정보에 대응하는 변환 행렬은 여러 개가 존재할 수 있는데, 상기 학습 모델은 여러 개의 변환 행렬 중 하나의 변환 행렬만을 출력하도록 학습될 수 있다.The learning model may be trained to estimate the transformation matrix from face images of different people and landmark information corresponding to each face image. A plurality of transformation matrices corresponding to one high-dimensional landmark information may exist, and the learning model may be trained to output only one transformation matrix from among several transformation matrices.

상기 학습 모델에 입력으로 사용되는 상기 랜드마크 정보는 얼굴 이미지로부터 랜드마크 정보를 추출하여 이를 이미지화(visualization)하는 공지의 방법을 통해 획득될 수 있다.The landmark information used as an input to the learning model may be obtained through a known method of extracting landmark information from a face image and visualizing it.

따라서, 단계(S120)에서는 상기 제1 인물의 얼굴 이미지와 상기 얼굴 이미지에 대응하는 랜드마크 정보를 입력으로 수신하고, 이로부터 하나의 변환 행렬을 추정하여 출력하게 된다.Accordingly, in step S120, the face image of the first person and landmark information corresponding to the face image are received as inputs, and one transformation matrix is estimated and outputted therefrom.

한편, 상기 학습 모델은 랜드마크 정보를 우안, 좌안, 코, 입에 각각 대응하는 복수의 시맨틱 그룹(semantic group)으로 분류하고, 상기 복수의 시맨틱 그룹 각각에 대응하는 PCA 변환 계수를 출력하도록 학습될 수 있다.On the other hand, the learning model classifies the landmark information into a plurality of semantic groups corresponding to each of the right eye, left eye, nose, and mouth, and is trained to output PCA transform coefficients corresponding to each of the plurality of semantic groups. can

이 때, 상기 시맨틱 그룹은 반드시 우안, 좌안, 코, 입에 대응하도록 분류되는 것은 아니며, 눈썹, 눈, 코, 입, 턱선에 대응하도록 분류하거나, 눈썹, 우안, 좌안, 코, 입, 턱선, 귀에 대응하도록 분류하는 것도 가능하다. 단계(S120)에서는 상기 학습 모델에 따라 상기 랜드마크 정보를 세분화된 단위의 시맨틱 그룹으로 분류하고, 분류된 시맨틱 그룹에 대응하는 PCA 변환 계수를 추정할 수 있다.In this case, the semantic group is not necessarily classified to correspond to the right eye, left eye, nose, and mouth, but is classified to correspond to the eyebrow, eye, nose, mouth, and jaw line, or the eyebrow, right eye, left eye, nose, mouth, jaw line, It is also possible to classify according to the ear. In step S120, the landmark information may be classified into a semantic group of a subdivided unit according to the learning model, and a PCA transform coefficient corresponding to the classified semantic group may be estimated.

단계(S130)에서는 상기 변환 행렬을 이용하여 상기 제1 인물의 표현(expression) 랜드마크를 산출한다. 랜드마크 정보는 복수의 서브 랜드마크(sub landmark) 정보로 분리(decompose)될 수 있는데, 본 발명에서는 상기 랜드마크 정보가 다음과 같이 표현될 수 있음을 상정한다.In step S130, an expression landmark of the first person is calculated using the transformation matrix. The landmark information may be decomposed into a plurality of sub landmark information. In the present invention, it is assumed that the landmark information may be expressed as follows.

여기서, l(c, t)는 인물 c 가 포함된 비디오의 t 번째 프레임에서의 랜드마크 정보, l_m은 인간의 평균 랜드마크(mean facial landmark) 정보, l_id(c)는 인물 c 개인의 고유 랜드마크(facial landmark of identity geometry) 정보, l_exp(c, t)는 인물 c 가 포함된 비디오의 t 번째 프레임에서의 상기 인물 c 의 표현 랜드마크(facial landmark of expression geometry)를 의미한다.Here, l(c, t) is landmark information in the t-th frame of the video including person c, l _m is human mean facial landmark information, and l _id (c) is information of person c individual. The unique landmark (facial landmark of identity geometry) information, l _exp (c, t) means the facial landmark of expression geometry of the person c in the t-th frame of the video including the person c.

즉, 특정 인물의 특정 프레임에서의 랜드마크 정보는, 모든 사람의 얼굴의 평균적인 랜드마크 정보, 상기 특정 인물만의 고유의 랜드마크 정보, 그리고 상기 특정 프레임에서 상기 특정 인물의 표정 및 움직임 정보의 합으로 표현될 수 있다.That is, landmark information in a specific frame of a specific person includes average landmark information of all faces, landmark information unique to the specific person, and expression and movement information of the specific person in the specific frame. can be expressed as a sum.

상기 평균 랜드마크 정보는 다음의 수학식과 같이 정의할 수 있고, 사전에 수집 가능한 다량의 비디오를 바탕으로 계산할 수 있다.The average landmark information may be defined by the following equation, and may be calculated based on a large amount of video that can be collected in advance.

여기서, T는 비디오의 전체 프레임 수를 의미하며, 따라서 l_m은 사전에 수집한 비디오에 등장하는 모든 인물의 랜드마크 l(c, t)의 평균을 의미한다.Here, T means the total number of frames in the video, and therefore l _m means the average of landmark l(c, t) of all persons appearing in the video collected in advance.

한편, 상기 표현 랜드마크는 다음의 수학식을 이용하여 산출할 수 있다.On the other hand, the expression landmark can be calculated using the following equation.

위 수학식은 인물 c 의 시맨틱 그룹 각각에 대한 PCA 수행 결과를 나타낸다. n_exp는 모든 시맨틱 그룹의 expression basis 수의 합, b _exp는 PCA의 basis인 expression basis, α는 PCA의 계수를 의미한다.The above equation represents the PCA performance result for each semantic group of person c. n _exp is the sum of the number of expression basis of all semantic groups, b _exp is the expression basis that is the basis of PCA, and α is the coefficient of PCA.

다시 말해, b _exp는 앞서 설명한 고유 벡터를 의미하며, 고차원의 표현 랜드마크는 저차원의 고유 벡터들의 조합으로 정의될 수 있다. 그리고, n_exp는 인물 c 가 우안, 좌안 코, 입 등을 통해 표현할 수 있는 표정 및 움직임의 총 개수를 의미한다.In other words, b _exp means the eigenvector described above, and a high-dimensional expression landmark may be defined as a combination of low-dimensional eigenvectors. And, n _exp means the total number of facial expressions and movements that the person c can express through his right eye, left eye, nose, and mouth.

따라서, 상기 제1 인물의 표현 랜드마크는 얼굴의 주요 부위 즉, 상기 우안, 좌안, 코, 입 각각에 대한 표현 정보의 집합으로 정의할 수 있다. 그리고, α_k(c, t)는 각각의 고유 벡터에 대응하여 존재할 수 있다.Accordingly, the expression landmark of the first person may be defined as a set of expression information for each of the main parts of the face, that is, the right eye, the left eye, the nose, and the mouth. And, α _k (c, t) may exist corresponding to each eigenvector.

앞서 설명한 학습 모델은 수학식 2와 같이 랜드마크 정보를 분리하고자 하는 인물 c 의 사진 x(c, t)와 랜드마크 정보 l(c, t)를 입력으로 하여 PCA 계수 α(c, t)를 추정하도록 학습시킬 수 있다. 이러한 학습을 통해 상기 학습 모델은 특정한 인물의 이미지와 이에 대응하는 랜드마크 정보로부터 PCA 계수를 추정할 수 있고, 상기 저차원의 고유 벡터를 추정할 수 있게 된다.As shown in Equation 2, the learning model described above uses the photo x(c, t) of the person c to separate the landmark information and the landmark information l(c, t) as inputs, and calculates the PCA coefficient α(c, t). can be taught to estimate. Through such learning, the learning model can estimate a PCA coefficient from an image of a specific person and corresponding landmark information, and can estimate the low-dimensional eigenvector.

학습된 뉴럴 네트워크(neural network)를 적용할 때는 랜드마크 분리를 수행하고자 하는 인물 c` 의 사진 x(c`, t)와 랜드마크 정보 l(c`, t)를 뉴럴 네트워크의 입력으로 하고, PCA 변환 행렬을 추정한다. 이 때, b _exp는 학습 데이터로부터 구한 값을 사용하고 예측(추정)된 PCA 계수와 b _exp를 이용하여 다음과 같이 표현 랜드마크를 추정할 수 있다.When applying the learned neural network, the photograph x(c`, t) and landmark information l(c`, t) of the person c` who want to perform landmark separation are input to the neural network, Estimate the PCA transformation matrix. In this case, b _exp uses a value obtained from the training data, and the predicted (estimated) PCA coefficient and b _exp can be used to estimate the expression landmark as follows.

여기서,

는 추정된 표현 랜드마크,

는 추정된 PCA 변환 행렬을 의미한다.here,

is the estimated expression landmark,

denotes the estimated PCA transformation matrix.

단계(S140)에서는 상기 표현 랜드마크를 이용하여 상기 제1 인물의 고유(identity) 랜드마크를 산출한다. 수학식 2를 참조로 하여 설명한 바와 같이, 랜드마크 정보는 평균 랜드마크 정보, 고유 랜드마크 정보 및 표현 랜드마크 정보의 합으로 정의될 수 있으며, 상기 표현 랜드마크 정보는 단계(S130)에서 수학식 5를 통해 추정될 수 있다.In step S140, an identity landmark of the first person is calculated using the expression landmark. As described with reference to Equation 2, the landmark information may be defined as the sum of the average landmark information, the unique landmark information, and the expression landmark information, and the expression landmark information is expressed by the equation in step S130. 5 can be estimated.

따라서, 상기 고유 랜드마크는 다음과 같이 산출할 수 있다.Therefore, the unique landmark can be calculated as follows.

상기 수학식은 수학식 2로부터 도출될 수 있으며, 단계(S130)에서 표현 랜드마크가 산출되면, 단계(S140)에서는 수학식 6을 통해 고유 랜드마크를 산출할 수 있다. 평균 랜드마크 정보 l_m은 사전에 수집 가능한 다량의 비디오를 바탕으로 계산할 수 있다.The Equation may be derived from Equation 2, and when an expression landmark is calculated in step S130, a unique landmark may be calculated through Equation 6 in step S140. Average landmark information l _m can be calculated based on a large amount of video that can be collected in advance.

따라서, 임의의 인물의 얼굴 이미지가 주어지면 이로부터 랜드마크 정보를 획득할 수 있고, 상기 얼굴 이미지와 랜드마크 정보로부터 표현 랜드마크 정보 및 고유 랜드마크 정보를 산출할 수 있다.Accordingly, when a face image of an arbitrary person is given, landmark information can be obtained therefrom, and expression landmark information and unique landmark information can be calculated from the face image and landmark information.

도 3은 본 발명의 일 실시예에 따른 변환 행렬을 연산하는 방법을 개략적으로 나타내는 도면이다. 도 3을 참조하면, 인공 신경망(neural network)은 임의의 인물의 얼굴 이미지(input image)를 입력으로 수신한다. 상기 인공 신경망은 공지의 인공 신경망 중 일부를 적용할 수 있는데, 일 실시예에서 상기 인공 신경망은 ResNet 일 수 있다. ResNet 은 CNN(Convolution Neural Network)의 일종이며, 본 발명이 특정한 인공 신경망의 종류로 제한되는 것은 아니다.3 is a diagram schematically illustrating a method of calculating a transformation matrix according to an embodiment of the present invention. Referring to FIG. 3 , an artificial neural network receives an input image of an arbitrary person as an input. The artificial neural network may apply some of known artificial neural networks, and in an embodiment, the artificial neural network may be ResNet. ResNet is a type of Convolution Neural Network (CNN), and the present invention is not limited to a specific type of artificial neural network.

MLP(Multi Layer Perceptron)는 단층 Perceptron의 한계를 극복하기 위해 여러 층의 Perceptron을 쌓아올린 형태의 인공 신경망의 일종이다. 도 3을 참조하면, MLP는 상기 인공 신경망의 출력과 상기 얼굴 이미지에 대응하는 랜드마크 정보(landmark)를 입력으로 수신한다. 그리고, MLP는 변환 행렬(transformation matrix)을 출력한다.MLP (Multi Layer Perceptron) is a type of artificial neural network in which multiple layers of perceptrons are stacked to overcome the limitations of single-layer perceptrons. Referring to FIG. 3 , the MLP receives the output of the artificial neural network and landmark information corresponding to the face image as inputs. Then, the MLP outputs a transformation matrix.

도 3에서 상기 인공 신경망과 MLP가 전체로서 하나의 학습된 인공 신경망을 구성하는 것으로도 이해할 수 있다.It can also be understood that the artificial neural network and the MLP in FIG. 3 constitute one learned artificial neural network as a whole.

학습된 인공 신경망을 통해 상기 변환 행렬이 추정되면, 도 2를 참조로 하여 설명한 바와 같이, 표현 랜드마크 정보와 고유 랜드마크 정보를 산출할 수 있다. 본 발명에 따른 랜드마크 분리 방법은 매우 적은 수의 얼굴 이미지만 존재하거나 단 한 프레임의 얼굴 이미지만 존재하는 경우에도 적용될 수 있다.When the transformation matrix is estimated through the learned artificial neural network, as described with reference to FIG. 2 , expression landmark information and unique landmark information can be calculated. The landmark separation method according to the present invention can be applied even when there are only a very small number of face images or only one frame of face images.

상기 학습된 인공 신경망은 수 많은 얼굴 이미지와 그에 대응하는 랜드마크 정보로부터 저차원의 고유 벡터 및 변환 계수를 추정하도록 학습되어 있으며, 이렇게 학습된 인공 신경망은 한 프레임의 얼굴 이미지만 주어지더라도 상기 고유 벡터와 변환 계수를 추정할 수 있다.The learned artificial neural network is trained to estimate low-dimensional eigenvectors and transform coefficients from a large number of face images and corresponding landmark information. Vector and transform coefficients can be estimated.

이러한 방법으로 임의의 인물의 표현 랜드마크와 고유 랜드마크가 분리되면 facial landmark를 기반으로 한 face reenactment, face classification, face morphing 등의 얼굴 영상 처리 기술의 품질을 향상시킬 수 있다.In this way, when the expression landmark of an arbitrary person and the unique landmark are separated, the quality of facial image processing technology such as face reenactment, face classification, and face morphing based on the facial landmark can be improved.

Face reenactment (얼굴 재연) 기술은 타겟(target) 얼굴과 드라이버(driver)의 얼굴이 주어졌을 때 드라이버 얼굴의 움직임을 따르지만 타겟 얼굴의 특성 (identity)을 지닌 얼굴 영상 및 사진을 합성하는 기술이다.Face reenactment technology is a technology for synthesizing face images and photos that follow the movement of the driver's face when given a target face and a driver's face, but have the identity of the target face.

Face morphing (얼굴 모핑) 기술은 인물 1과 인물 2의 얼굴 영상 혹은 사진이 주어졌을 때 인물 1과 인물 2의 특성을 따르는 제3의 인물의 얼굴 영상 또는 사진을 합성하는 기술이다. 전통적인 morphing 알고리즘은 얼굴의 기점(face key point)을 찾은 뒤, 상기 기점을 기준으로 겹치지 않는 삼각형 혹은 사각형 조각으로 얼굴을 나눈다. 그 후, 인물 1과 인물 2의 사진을 합쳐 제3의 인물의 사진을 합성하는데, 인물 1과 인물 2의 기점의 위치가 서로 다르기 때문에 인물 1과 인물 2의 사진을 pixel-wise로 합쳐 제3의 인물의 사진을 만들 경우 위화감이 크게 느껴질 수 있다. 기존의 face morphing 기술은 대상의 외모적 특성과 표정 등의 감정에 의한 특성을 구분하지 않기 때문에 morphing 결과물의 품질이 낮을 수 있다.Face morphing technology is a technology for synthesizing a face image or photo of a third person that follows the characteristics of person 1 and person 2 when face images or photos of person 1 and person 2 are given. A traditional morphing algorithm finds a face key point, and then divides the face into non-overlapping triangular or square pieces based on the face key point. After that, the photos of person 1 and person 2 are combined to synthesize a picture of a third person. Since the starting positions of person 1 and person 2 are different, the pictures of person 1 and person 2 are combined pixel-wise to form a third person. If you are making a photo of a person, you may feel a sense of incongruity. Since the existing face morphing technology does not distinguish between the physical characteristics of the object and the characteristics caused by emotions such as facial expressions, the quality of the morphing result may be low.

본 발명에 따른 랜드마크 분리 방법은 하나의 랜드마크 정보로부터 표현 랜드마크 정보와 고유 랜드마크 정보를 각각 분리해낼 수 있으므로, facial landmark를 활용하는 얼굴 영상 처리 기술의 결과를 향상시키는데 기여할 수 있다. 특히, 본 발명에 따른 랜드마크 분리 방법은, 매우 적은 양의 얼굴 이미지 데이터만 주어지는 경우에도 랜드마크를 분리할 수 있으므로 활용도가 매우 높다.Since the landmark separation method according to the present invention can separate the expression landmark information and the unique landmark information from one landmark information, respectively, it can contribute to improving the results of facial image processing technology using facial landmarks. In particular, since the landmark separation method according to the present invention can separate the landmark even when only a very small amount of face image data is given, the utility is very high.

도 4는 본 발명의 일 실시예에 따른 랜드마크 분리 장치의 구성을 개략적으로 나타내는 도면이다. 도 4를 참조하면, 본 발명의 일 실시예에 따른 랜드마크 분리 장치(100)는 수신부(110), 변환 행렬 추정부(120), 및 연산부(130)를 포함한다.4 is a diagram schematically showing the configuration of a landmark separation apparatus according to an embodiment of the present invention. Referring to FIG. 4 , the landmark separation apparatus 100 according to an embodiment of the present invention includes a receiver 110 , a transformation matrix estimator 120 , and a calculator 130 .

수신부(110)는, 제1 인물의 얼굴 이미지와 상기 얼굴 이미지에 대응하는 랜드마크 정보를 수신한다. 여기서 상기 랜드마크는 상기 얼굴의 랜드마크(facial landmark)로서 얼굴의 주요 요소, 예컨대, 눈, 눈썹, 코, 입, 턱선 등을 포함하는 개념으로 이해할 수 있다.The receiver 110 receives the face image of the first person and landmark information corresponding to the face image. Here, the landmark may be understood as a concept including major elements of the face, for example, eyes, eyebrows, nose, mouth, jaw line, etc. as a landmark of the face.

상기 제1 인물은 임의의 인물을 의미하며, 수신부(110)는 임의의 인물의 얼굴 이미지와 상기 얼굴 이미지에 대응하는 랜드마크 정보를 수신한다. 상기 랜드마크 정보는 공지의 기술을 통해 획득 가능하며, 공지된 방법 중 어떤 방법을 사용하더라도 무방하다. 또한, 상기 랜드마크를 획득하는 방법에 의하여 본 발명이 제한되는 것은 아니다.The first person means an arbitrary person, and the receiving unit 110 receives a face image of an arbitrary person and landmark information corresponding to the face image. The landmark information can be obtained through a known technique, and any of the known methods may be used. In addition, the present invention is not limited by the method of obtaining the landmark.

변환 행렬 추정부(120)는, 상기 랜드마크 정보에 대응하는 변환 행렬을 추정한다. 상기 변환 행렬은 미리 정해진 단위 벡터(unit vector)와 함께 상기 랜드마크 정보를 구성할 수 있다. 예를 들어, 제1 랜드마크 정보는 상기 단위 벡터와 제1 변환 행렬의 곱으로 연산될 수 있다. 또한, 제2 랜드마크 정보는 상기 단위 벡터와 제2 변환 행렬의 곱으로 연산될 수 있다.The transformation matrix estimator 120 estimates a transformation matrix corresponding to the landmark information. The transformation matrix may constitute the landmark information together with a predetermined unit vector. For example, the first landmark information may be calculated as a product of the unit vector and a first transformation matrix. Also, the second landmark information may be calculated as a product of the unit vector and a second transformation matrix.

한편, 변환 행렬 추정부(120)는 상기 변환 행렬을 추정하도록 학습된 학습 모델을 사용할 수 있다. 상기 학습 모델은 임의의 얼굴 이미지와 상기 임의의 얼굴 이미지에 대응하는 랜드마크 정보로부터 PCA 변환 행렬을 추정하도록 학습된 모델로 이해할 수 있다.Meanwhile, the transformation matrix estimator 120 may use a learning model learned to estimate the transformation matrix. The learning model may be understood as a model trained to estimate a PCA transformation matrix from an arbitrary face image and landmark information corresponding to the arbitrary face image.

따라서, 변환 행렬 추정부(120)는 상기 제1 인물의 얼굴 이미지와 상기 얼굴 이미지에 대응하는 랜드마크 정보를 입력으로 수신하고, 이로부터 하나의 변환 행렬을 추정하여 출력하게 된다.Accordingly, the transformation matrix estimator 120 receives the face image of the first person and landmark information corresponding to the face image as inputs, and estimates and outputs one transformation matrix therefrom.

이 때, 상기 시맨틱 그룹은 반드시 우안, 좌안, 코, 입에 대응하도록 분류되는 것은 아니며, 눈썹, 눈, 코, 입, 턱선에 대응하도록 분류하거나, 눈썹, 우안, 좌안, 코, 입, 턱선, 귀에 대응하도록 분류하는 것도 가능하다. 변환 행렬 추정부(120)는 상기 학습 모델에 따라 상기 랜드마크 정보를 세분화된 단위의 시맨틱 그룹으로 분류하고, 분류된 시맨틱 그룹에 대응하는 PCA 변환 계수를 추정할 수 있다.In this case, the semantic group is not necessarily classified to correspond to the right eye, left eye, nose, and mouth, but is classified to correspond to the eyebrow, eye, nose, mouth, and jaw line, or the eyebrow, right eye, left eye, nose, mouth, jaw line, It is also possible to classify according to the ear. The transformation matrix estimator 120 may classify the landmark information into a subdivided semantic group according to the learning model, and estimate a PCA transform coefficient corresponding to the classified semantic group.

연산부(130)는, 상기 변환 행렬을 이용하여 상기 제1 인물의 표현 랜드마크를 산출하고, 상기 표현 랜드마크를 이용하여 상기 제1 인물의 고유 랜드마크를 산출한다. 랜드마크 정보는 복수의 서브 랜드마크(sub landmark) 정보로 분리(decompose)될 수 있는데, 예를 들어, 평균 랜드마크 정보, 고유 랜드마크 정보 및 표현 랜드마크 정보로 분리될 수 있다.The calculator 130 calculates an expression landmark of the first person by using the transformation matrix, and calculates a unique landmark of the first person by using the expression landmark. The landmark information may be decomposed into a plurality of sub landmark information, for example, may be divided into average landmark information, unique landmark information, and expression landmark information.

학습된 뉴럴 네트워크(neural network)를 적용할 때는 랜드마크 분리를 수행하고자 하는 인물 c` 의 사진 x(c`, t)와 랜드마크 정보 l(c`, t)를 뉴럴 네트워크의 입력으로 하고, PCA 변환 행렬을 추정한다. 이 때, b _exp는 학습 데이터로부터 구한 값을 사용하고 예측(추정)된 PCA 계수와 b _exp를 이용하여 수학식 5와 같이 표현 랜드마크를 추정할 수 있다.When applying the learned neural network, the photograph x(c`, t) and landmark information l(c`, t) of the person c` who want to perform landmark separation are input to the neural network, Estimate the PCA transformation matrix. In this case, b _exp is a value obtained from the learning data, and the expression landmark can be estimated as in Equation 5 using the predicted (estimated) PCA coefficient and b _exp.

한편, 수학식 2를 참조로 하여 설명한 바와 같이, 랜드마크 정보는 평균 랜드마크 정보, 고유 랜드마크 정보 및 표현 랜드마크 정보의 합으로 정의될 수 있으며, 상기 표현 랜드마크 정보는 단계(S130)에서 수학식 5를 통해 추정될 수 있다.On the other hand, as described with reference to Equation 2, the landmark information may be defined as the sum of the average landmark information, the unique landmark information, and the expression landmark information, and the expression landmark information is obtained in step S130. It can be estimated through Equation 5.

따라서, 상기 고유 랜드마크는 수학식 6과 같이 산출할 수 있으며, 임의의 인물의 얼굴 이미지가 주어지면 이로부터 랜드마크 정보를 획득할 수 있고, 상기 얼굴 이미지와 랜드마크 정보로부터 표현 랜드마크 정보 및 고유 랜드마크 정보를 산출할 수 있다.Accordingly, the unique landmark can be calculated as in Equation 6, and when a face image of an arbitrary person is given, landmark information can be obtained from it, and expression landmark information and Unique landmark information can be calculated.

도 5는 본 발명을 이용하여 얼굴을 재연하는 방법을 예시적으로 나타내는 도면이다. 도 5를 참조하면, 타겟(target) 이미지(41)와 드라이버(driver) 이미지(42)가 도시되고 타겟 이미지(41)는 드라이버 이미지(42)에 대응하는 모습을 재연될 수 있다.5 is a diagram exemplarily showing a method of reproducing a face using the present invention. Referring to FIG. 5 , a target image 41 and a driver image 42 are shown, and a shape corresponding to the target image 41 may be reproduced.

재연된 이미지(43)는 타겟 이미지(41)의 특성을 갖고 있으나 그 표정은 드라이버 이미지(42)에 대응하는 것을 알 수 있다. 즉, 재연된 이미지(43)는 타겟 이미지(41)의 고유 랜드마크를 갖되, 표현 랜드마크는 드라이버 이미지(42)에 대응하는 특징을 갖는다.It can be seen that the reproduced image 43 has characteristics of the target image 41 , but its expression corresponds to the driver image 42 . That is, the reproduced image 43 has a unique landmark of the target image 41 , and the expression landmark has a characteristic corresponding to the driver image 42 .

따라서, 자연스러운 얼굴 재연을 위해서는 하나의 랜드마크에서 고유 랜드마크와 표현 랜드마크를 적절하게 분리하는 것이 중요함을 알 수 있다.Therefore, it can be seen that it is important to properly separate the unique landmark and the expression landmark in one landmark for natural face reproduction.

이상에서 설명된 실시예는 컴퓨터에 의해 실행되는 프로그램 모듈과 같은 컴퓨터에 의해 실행 가능한 명령어를 포함하는 기록매체의 형태로도 구현될 수 있다. 컴퓨터 판독 가능 매체는 컴퓨터에 의해 액세스될 수 있는 임의의 가용 매체일 수 있고, 휘발성 및 비 휘발성 매체, 분리형 및 비분리형 매체를 모두 포함할 수 있다.The embodiments described above may also be implemented in the form of a recording medium including instructions executable by a computer, such as a program module executed by a computer. Computer-readable media can be any available media that can be accessed by a computer and can include both volatile and non-volatile media, removable and non-removable media.

또한, 컴퓨터 판독 가능 매체는 컴퓨터 저장 매체를 포함할 수 있다. 컴퓨터 저장 매체는 컴퓨터 판독 가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비 휘발성, 분리형 및 비분리형 매체를 모두 포함할 수 있다.In addition, computer-readable media may include computer storage media. Computer storage media may include both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.

이상에서 첨부된 도면을 참조하여 본 발명의 실시예들을 설명하였지만, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자는 본 발명이 그 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다.Although the embodiments of the present invention have been described above with reference to the accompanying drawings, those of ordinary skill in the art to which the present invention pertains can implement the present invention in other specific forms without changing its technical spirit or essential features. You will understand that there is Therefore, it should be understood that the embodiments described above are illustrative in all respects and not restrictive.

100: 서버, 랜드마크 분리 장치
110: 수신부
120: 변환 행렬 추정부
130: 연산부100: server, landmark separation device
110: receiver
120: transformation matrix estimator
130: arithmetic unit

Claims

receiving a face image of a first person and landmark information corresponding to the face image;
estimating a transformation matrix corresponding to the landmark information;
calculating an expression landmark of the first person by using the transformation matrix; and
calculating a unique landmark of the first person by using the expression landmark;
A landmark separation method comprising a.

According to claim 1,
In the step of estimating the transformation matrix, a landmark separation method using a learning model trained to estimate a PCA transformation matrix from an arbitrary face image and landmark information corresponding to the arbitrary face image.

3. The method of claim 2,
The learning model classifies a plurality of landmark information into a plurality of semantic groups, and outputs a PCA transform coefficient corresponding to each of the plurality of semantic groups.

4. The method of claim 3,
In the step of calculating the expression landmark, the landmark separation method of calculating the expression landmark of the first person by calculating the estimated transformation matrix and the PCA unit vector.

According to claim 1,
and the landmark information is defined as the sum of the expression landmark, the unique landmark, and the average landmark of a sufficiently large number of people's faces.

6. The method of claim 5,
In the step of calculating the unique landmark, the landmark separation method for calculating the unique landmark of the first person by calculating the expression landmark and the average landmark from the landmark information.

A computer-readable recording medium in which a program for performing the method according to any one of claims 1 to 6 is recorded.

a receiving unit configured to receive a face image of a first person and landmark information corresponding to the face image;
a transformation matrix estimator for estimating a transformation matrix corresponding to the landmark information; and
a calculator for calculating an expression landmark of the first person by using the transformation matrix, and calculating a unique landmark of the first person by using the expression landmark;
A landmark separation device comprising a.

9. The method of claim 8,
The transformation matrix estimating unit landmark separation apparatus using a learning model trained to estimate a PCA transformation matrix from an arbitrary face image and landmark information corresponding to the arbitrary facial image.

10. The method of claim 9,
The learning model classifies the landmark information into a plurality of semantic groups and outputs a PCA transform coefficient corresponding to each of the plurality of semantic groups.

11. The method of claim 10,
The operation unit is configured to calculate the expression landmark of the first person by calculating the estimated transformation matrix and the PCA unit vector.

9. The method of claim 8,
The landmark information is a landmark separation device defined as the sum of the expression landmark, the unique landmark, and the average landmark of a sufficiently large number of people's faces.

13. The method of claim 12,
In the step of calculating the unique landmark of the first person, the landmark separation device for calculating the unique landmark of the first person by calculating the expression landmark and the average landmark from the landmark information.