KR20230105110A

KR20230105110A - Method of generating a target object model imitating the characteristics of the source object and device for the same method

Info

Publication number: KR20230105110A
Application number: KR1020220000267A
Authority: KR
Inventors: 블랑코 리베라 로저; 강도균; 백창주
Original assignee: 주식회사 씨제스걸리버스튜디오
Priority date: 2022-01-03
Filing date: 2022-01-03
Publication date: 2023-07-11

Abstract

본 발명은 소스 객체의 특징을 모방한 모델의 생성 방법에 관한 것으로 렌더링 방법에 관한 것이다. 본 발명에 따른 소스 객체의 특징을 모방한 타겟 객체 모델의 생성 방법은 조명 조건에 따라 소스 객체에 대한 복수의 소스 이미지를 획득하는 단계; 상기 소스 이미지로부터 텍스쳐 맵을 생성하는 단계; 상기 생성된 텍스쳐 맵을 미리 생성된 타겟 베이스 모델에 적용하여 렌더링 이미지를 추출하는 단계; 상기 추출된 렌더링 이미지와 상기 소스 이미지의 픽셀간 오차를 산출하는 단계; 및 상기 산출된 오차에 따라 상기 텍스쳐 맵의 파라미터를 갱신하는 단계를 포함하고, 상기 렌더링 이미지를 추출하는 단계는 상기 산출된 픽셀간 오차가 최소화되도록 반복 수행되는 것이 바람직하다. 본 발명에 따르면, 촬영된 카메라 이미지만을 이용하여 사람의 안면 특징을 모방한 사실적인 디지털 휴먼을 제작할 수 있다. The present invention relates to a method of generating a model that mimics the characteristics of a source object and relates to a rendering method. A method of generating a target object model that mimics the characteristics of a source object according to the present invention includes acquiring a plurality of source images of a source object according to lighting conditions; generating a texture map from the source image; extracting a rendered image by applying the generated texture map to a pre-generated target base model; calculating an error between pixels of the extracted rendered image and the source image; and updating a parameter of the texture map according to the calculated error, wherein the step of extracting the rendered image is repeatedly performed to minimize the calculated error between pixels. According to the present invention, it is possible to create a realistic digital human that mimics the facial features of a person using only a photographed camera image.

Description

Method of generating a target object model imitating the characteristics of the source object and device for the same method}

본 발명은 소스 객체의 특징을 모방한 모델의 생성 방법에 관한 것으로 렌더링 방법에 관한 것이다.The present invention relates to a method of generating a model that mimics the characteristics of a source object and relates to a rendering method.

최근 컴퓨터그래픽(CG), 시각특수효과(VFX) 등 그래픽 엔진을 활용한 콘텐츠의 제작 기술의 발달로 최소한의 세트로 가상의 스튜디오에서 다양한 환경을 구현하고 영화나 광고, 뮤직비디오, 라이브 공연 등을 촬영할 수 있게 되었다.Recently, with the development of content production technology using graphic engines such as computer graphics (CG) and visual special effects (VFX), various environments can be implemented in a virtual studio with minimal sets and movies, advertisements, music videos, live performances, etc. was able to film.

또한, 코로나19의 확산 이후 공연산업과 전시, 행사산업이 활동 영역이 오프라인에서 온라인으로 옮기는 과정에서 가상 공간을 만들어내는 VFX 기술의 니즈가 크게 확대되었으며 이에 따라 최근에는 가상과 현실 간의 경계가 허물어지고 메타버스라는 가상 공간을 통하여 현실 세계의 시공간적 제약을 넘어 제한 없는 경험의 제공이 가능하게 되었다.In addition, after the spread of Corona 19, the need for VFX technology to create virtual space has greatly expanded as the performance industry, exhibition, and event industries have shifted their activities from offline to online. Through the virtual space called metaverse, it is possible to provide unlimited experience beyond the temporal and spatial limitations of the real world.

무한한 서비스가 제공되는 가상 공간에서 보다 사실적인 인간을 만들고 렌더링하는 기술은 단순 엔터테인먼트를 위한 시각 효과에서 벗어나 교육 의료 분야까지 그 수요가 확장되고 있다. Demand for technology that creates and renders more realistic humans in a virtual space where unlimited services are provided extends beyond visual effects for simple entertainment to education and medical fields.

하지만, 컴퓨터그래픽을 통해 생성된 디지털 휴먼은 사람의 인체 속성을 비롯한 다양한 구성 요소를 가상으로 구현하여야 하기 때문에 일반적인 사물에 비해 큰 복잡도를 가지고, 사람이 갖는 특유의 특성을 반영한 가상 캐릭터를 생성하기 위해서는 모델링에 많은 시간과 비용이 요구된다.However, since digital humans created through computer graphics must virtually implement various components including human body attributes, in order to create a virtual character that reflects the unique characteristics of humans, it has a greater complexity than general objects. Modeling takes a lot of time and money.

일반적으로 휴먼 모델링은 빛이 피부와 상호 작용하는 방식을 빛과 카메라의 특성을 반영하여 렌더링 함으로써 사람의 특성을 재현하고 있으나, 매우 복잡한 사람의 안면을 구성하는 특징들의 분포는 고차원으로 표현되어야 하며 표면의 모든 지점에서 들어오고 나가는 빛의 샘플링을 수행하는 것 자체가 매우 어렵다.In general, human modeling reproduces human characteristics by rendering the way light interacts with the skin by reflecting the characteristics of light and camera. It is very difficult in itself to perform sampling of incoming and outgoing light at all points of the .

따라서, 보다 효율적인 사람의 렌더링 방법이 지속적으로 고안되고 있다.Accordingly, more efficient human rendering methods are continuously being devised.

본 발명은 사람의 특징을 모방한 디지털 휴먼을 제작하는 방법을 제안하는 것을 목적으로 한다.An object of the present invention is to propose a method for producing a digital human imitating human characteristics.

보다 구체적으로, 본 발명은 사람의 안면 특징을 다양한 카메라와 조명을 통해 추출하고 이를 기본 모델에 적용하여 효율적인 렌더링 방법을 제안하는 것을 목적으로 한다.More specifically, an object of the present invention is to propose an efficient rendering method by extracting human facial features through various cameras and lighting and applying them to a basic model.

또한, 본 발명은 사람의 개인화된 표정이나 특성을 반영하기 위해 딥러닝 기반의 특징 추출 방법을 이용하는 것을 목적으로 한다.In addition, an object of the present invention is to use a deep learning-based feature extraction method to reflect a person's personalized expression or characteristics.

상기 기술적 과제를 해결하기 위한 본 발명에 따른 소스 객체의 특징을 모방한 타겟 객체 모델의 생성 방법은 조명 조건에 따라 소스 객체에 대한 복수의 소스 이미지를 획득하는 단계; 상기 소스 이미지로부터 텍스쳐 맵을 생성하는 단계; 상기 생성된 텍스쳐 맵을 미리 생성된 타겟 베이스 모델에 적용하여 렌더링 이미지를 추출하는 단계; 상기 추출된 렌더링 이미지와 상기 소스 이미지의 픽셀간 오차를 산출하는 단계; 및 상기 산출된 오차에 따라 상기 텍스쳐 맵의 파라미터를 갱신하는 단계를 포함하고, 상기 렌더링 이미지를 추출하는 단계는 상기 산출된 픽셀간 오차가 최소화되도록 반복 수행되는 것이 바람직하다.A method for generating a target object model that mimics the characteristics of a source object according to the present invention for solving the above technical problem includes obtaining a plurality of source images of a source object according to lighting conditions; generating a texture map from the source image; extracting a rendered image by applying the generated texture map to a pre-generated target base model; calculating an error between pixels of the extracted rendered image and the source image; and updating a parameter of the texture map according to the calculated error, wherein the step of extracting the rendered image is repeatedly performed to minimize the calculated error between pixels.

상기 중간 렌더링 이미지를 획득하는 단계는 상기 갱신된 파라미터를 갖는 텍스쳐 맵을 상기 타겟 베이스 모델에 적용하여 갱신된 렌더링 이미지를 추출하는 것이 바람직하다.Preferably, in the obtaining of the intermediate rendered image, an updated rendered image is extracted by applying the texture map having the updated parameter to the target base model.

상기 복수의 소스 이미지로부터 미리 결정된 복수의 2차원의 제1 랜드마크를 추출하는 단계; 상기 추출된 제1 랜드마크를 해부학적 특징을 고려하여 3차원 공간 상의 제2 랜드마크로 정합하는 단계; 및 상기 정합된 제2 랜드마크를 타겟 객체의 스캔 데이터에 적용하여 타겟 베이스 모델을 생성하는 단계를 포함한다.extracting a plurality of predetermined 2-dimensional first landmarks from the plurality of source images; matching the extracted first landmark with a second landmark in a 3D space in consideration of an anatomical feature; and generating a target base model by applying the matched second landmark to scan data of a target object.

상기 타겟 객체의 관심 포인트(Interest Point)의 위상을 상기 소스 객체의 관심 포인트의 위상에 따라 정렬(edit)하는 단계를 포함한다.and aligning (editing) the phase of the interest point of the target object according to the phase of the interest point of the source object.

상기 갱신되는 파라미터는 알베도(Albedo), 스페큘러(Specular), 및 형태(Shape) 값 중 적어도 하나를 포함한다.The updated parameter includes at least one of albedo, specular, and shape values.

상기 형태 값은 상기 타겟 객체의 전역적인 형태를 정의하는 기하학적(Geometry) 벡터 및 국부적인 형태를 정의하는 노말 벡터를 포함한다.The shape value includes a geometry vector defining the global shape of the target object and a normal vector defining the local shape.

상기 제1 랜드마크를 추출하는 단계는 미리 학습된 신경망을 이용하여 추출되는 것이 바람직하다.Preferably, the step of extracting the first landmark is extracted using a pre-learned neural network.

상기 파라미터를 갱신하는 단계는 상기 추출된 제2 랜드마크를 상기 파라미터의 갱신 제약(Constrain) 조건으로 이용하는 것이 바람직하다.In the step of updating the parameter, it is preferable to use the extracted second landmark as an update constraint condition of the parameter.

상기 기술적 과제를 해결하기 위한 본 발명에 따른 소스 객체의 특징을 모방한 타겟 객체 모델의 생성 방법은 조명 조건에 따라 소스 객체에 대한 복수의 소스 이미지를 획득하는 단계; 상기 소스 이미지로부터 텍스쳐 맵을 생성하는 단계; 상기 생성된 텍스쳐 맵을 상기 복수의 소스 이미지로부터 추출된 제2 랜드마크를 타겟 객체의 스캔 데이터에 적용하여 생성된 타겟 베이스 모델에 적용하여 렌더링 이미지를 추출하는 단계; 상기 추출된 렌더링 이미지와 상기 소스 이미지의 픽셀간 오차를 산출하는 단계; 및 상기 산출된 오차에 따라 상기 텍스쳐 맵의 파라미터를 갱신하는 단계를 포함하고, 상기 렌더링 이미지를 추출하는 단계는 상기 산출된 픽셀간 오차가 최소화되도록 반복 수행되는 것이 바람직하다.A method for generating a target object model that mimics the characteristics of a source object according to the present invention for solving the above technical problem includes obtaining a plurality of source images of a source object according to lighting conditions; generating a texture map from the source image; extracting a rendered image by applying the generated texture map to a target base model generated by applying second landmarks extracted from the plurality of source images to scan data of a target object; calculating an error between pixels of the extracted rendered image and the source image; and updating a parameter of the texture map according to the calculated error, wherein the step of extracting the rendered image is repeatedly performed to minimize the calculated error between pixels.

상기 기술적 과제를 해결하기 위한 본 발명에 따른 소스 객체의 특징을 모방한 타겟 객체 모델의 생성 방법은 조명 조건에 따라 소스 객체의 안면에 대한 복수의 소스 이미지를 획득하는 단계; 상기 소스 이미지로부터 텍스쳐 맵을 생성하는 단계; 상기 생성된 텍스쳐 맵을 상기 복수의 소스 이미지로부터 추출된 제2 랜드마크를 타겟 객체의 스캔 데이터에 적용하여 생성된 타겟 베이스 모델에 적용하여 렌더링 이미지를 추출하는 단계; 및 상기 추출된 렌더링 이미지와 상기 소스 이미지의 픽셀간 오차를 산출하는 단계; 상기 산출된 오차에 따라 상기 텍스쳐 맵의 파라미터를 갱신하는 단계; 및 상기 갱신된 파라미터를 갖는 텍스쳐 맵을 상기 타겟 베이스 모델에 적용하여 갱신된 렌더링 이미지를 가상 캐릭터의 안면에 적용하는 단계를 포함한다.A method for generating a target object model that mimics the characteristics of a source object according to the present invention for solving the above technical problem includes acquiring a plurality of source images of a face of a source object according to lighting conditions; generating a texture map from the source image; extracting a rendered image by applying the generated texture map to a target base model generated by applying second landmarks extracted from the plurality of source images to scan data of a target object; and calculating an error between pixels of the extracted rendered image and the source image. updating a parameter of the texture map according to the calculated error; and applying the updated rendered image to the face of the virtual character by applying the texture map having the updated parameters to the target base model.

본 발명에 따르면, 촬영된 카메라 이미지만을 이용하여 사람의 안면 특징을 모방한 사실적인 디지털 휴먼을 제작할 수 있다. According to the present invention, it is possible to create a realistic digital human that mimics the facial features of a person using only a photographed camera image.

또한, 본 발명은 개인의 특유의 표정을 구현하는 안면 근육의 특성을 반영한 3차원 모델을 생성하고 생성된 모델에 안면의 특징들을 렌더링 함으로써 동적인 묘사가 가능하다.In addition, according to the present invention, dynamic depiction is possible by creating a three-dimensional model reflecting the characteristics of facial muscles that embody a person's unique expression and rendering facial features to the created model.

또한, 본 발명은 딥러닝을 이용하여 안면의 랜드 마크들을 추출하고 랜드마크를 3차원 모델링에 이용함으로써 보다 효율적인 디지털 휴먼의 제작이 가능하다.In addition, the present invention extracts facial landmarks using deep learning and uses the landmarks for 3D modeling, thereby enabling more efficient production of digital humans.

나아가, 본 발명은 다양한 객체의 위상(topology)을 수정하는 인터페이스를 제공함으로써 디지털 휴먼을 확장할 수 있다.Furthermore, the present invention can extend the digital human by providing an interface for modifying the topology of various objects.

도 1은 본 발명의 일 실시예에 따른 소스 객체의 특징을 모방한 타겟 객체 모델 생성 방법을 수행하는 시스템의 구조를 나타낸다.
도 2 내지 3은 본 발명의 일 실시예에 따른 소스 객체의 특징을 모방한 타겟 객체 모델 생성 방법의 프로세스를 나타낸다.
도 4는 본 발명의 일 실시예에 따른 소스 객체의 특징을 모방한 타겟 객체 모델 생성 방법의 소스 이미지 예를 나타낸다.
도 5는 본 발명의 일 실시예에 따른 소스 객체의 특징을 모방한 타겟 객체 모델 생성 방법의 텍스쳐 맵 예를 나타낸다.
도 6은 본 발명의 일 실시예에 따른 소스 객체의 특징을 모방한 타겟 객체 모델 생성 방법의 베이스 모델 생성 프로세스를 나타낸다.
도 7은 본 발명의 일 실시예에 따른 소스 객체의 특징을 모방한 타겟 객체 모델 생성 방법의 신경망의 예를 나타낸다.
도 8은 본 발명의 일 실시예에 따른 소스 객체의 특징을 모방한 타겟 객체 모델 생성 방법의 라인 랜드마크 예를 나타낸다.
도 9는 본 발명의 일 실시예에 따른 소스 객체의 특징을 모방한 타겟 객체 모델 생성 방법의 베이스 모델 생성 예를 나타낸다.
도 10은 본 발명의 일 실시예에 따른 소스 객체의 특징을 모방한 타겟 객체 모델 생성 방법의 반복 프로세스를 나타낸다.1 shows the structure of a system for performing a method of generating a target object model that mimics the characteristics of a source object according to an embodiment of the present invention.
2 and 3 show a process of a method for generating a target object model that mimics the characteristics of a source object according to an embodiment of the present invention.
4 shows an example of a source image of a method for generating a target object model that mimics the characteristics of a source object according to an embodiment of the present invention.
5 shows an example of a texture map of a method for generating a target object model that mimics the characteristics of a source object according to an embodiment of the present invention.
6 shows a base model creation process of a method for generating a target object model that mimics the characteristics of a source object according to an embodiment of the present invention.
7 shows an example of a neural network of a method for generating a target object model that mimics the characteristics of a source object according to an embodiment of the present invention.
8 illustrates an example of a line landmark of a method for generating a target object model that mimics the characteristics of a source object according to an embodiment of the present invention.
9 illustrates an example of generating a base model of a method for generating a target object model that mimics characteristics of a source object according to an embodiment of the present invention.
10 shows an iterative process of a method for generating a target object model that mimics the characteristics of a source object according to an embodiment of the present invention.

이하의 내용은 단지 발명의 원리를 예시한다. 그러므로 당업자는 비록 본 명세서에 명확히 설명되거나 도시 되지 않았지만 발명의 원리를 구현하고 발명의 개념과 범위에 포함된 다양한 장치를 발명할 수 있는 것이다. 또한, 본 명세서에 열거된 모든 조건부 용어 및 실시 예들은 원칙적으로, 발명의 개념이 이해되도록 하기 위한 목적으로만 명백히 의도되고, 이외같이 특별히 열거된 실시 예들 및 상태들에 제한적이지 않는 것으로 이해되어야 한다. The following merely illustrates the principles of the invention. Therefore, those skilled in the art can invent various devices that embody the principles of the invention and fall within the concept and scope of the invention, even though not explicitly described or shown herein. In addition, all conditional terms and embodiments listed in this specification are, in principle, clearly intended only for the purpose of understanding the concept of the invention, and it should be understood that it is not limited to the specifically listed embodiments and conditions. .

상술한 목적, 특징 및 장점은 첨부된 도면과 관련한 다음의 상세한 설명을 통하여 보다 분명해질 것이며, 그에 따라 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 발명의 기술적 사상을 용이하게 실시할 수 있을 것이다. The above objects, features and advantages will become more apparent through the following detailed description in conjunction with the accompanying drawings, and accordingly, those skilled in the art to which the invention belongs will be able to easily implement the technical idea of the invention. .

또한, 발명을 설명함에 있어서 발명과 관련된 공지 기술에 대한 구체적인 설명이 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에 그 상세한 설명을 생략하기로 한다. 이하에는 첨부한 도면을 참조하여 본 발명의 바람직한 실시 예에 대해 상세하게 설명한다.In addition, in describing the invention, if it is determined that a detailed description of a known technology related to the invention may unnecessarily obscure the subject matter of the invention, the detailed description will be omitted. Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 소스 객체(10)의 특징을 모방한 타겟 객체 모델 생성 방법을 수행하는 시스템의 구조를 나타낸다.1 shows the structure of a system for performing a method of generating a target object model that mimics the characteristics of a source object 10 according to an embodiment of the present invention.

도 1을 참고하면 본 실시예에 따른 시스템은 소스 객체(10)(예를 들어 배우나 특정 인물 등)를 다양한 카메라 셋팅에 맞춰서 촬영하는 카메라(15)와 조명장치(미도시) 및 모델을 생성하는 모델 생성 장치(100)로 구성될 수 있다.Referring to FIG. 1, the system according to this embodiment generates a camera 15, a lighting device (not shown), and a model that shoots a source object 10 (eg, an actor or a specific person) according to various camera settings. It may be configured as a model generating device 100.

카메라로부터 촬영된 복수의 소스 이미지를 모델 생성 장치(100)는 수신하고, 소스 이미지 내 소스 객체(10)의 특징들을 추출함으로써 렌더링에 이용한다.The model generating apparatus 100 receives a plurality of source images taken from the camera, extracts features of the source object 10 from the source images, and uses them for rendering.

구체적으로 이미지 입력부는 복수의 소스 이미지를 카메라로부터 수신하고, 모델 생성부는 소스 이미지 중 3차원의 깊이 맵 으로 포인트 정보를 이용하여 타겟 베이스 모델을 생성할 수 있다.Specifically, the image input unit may receive a plurality of source images from the camera, and the model generator may generate a target base model using point information as a 3D depth map of the source images.

모델 생성부는 객체의 종류에 따라, 예를 들어 남성 또는 여성의 기본 템플릿 모델을 3차원의 깊이 맵이나 포인트 클라우드 등의 정보를 이용하여 수정함으로써 소스 객체(10) 특유의 3차원 특징들이 반영된 타겟 베이스 모델을 생성할 수 있다.The model creation unit modifies the basic template model of a male or female according to the type of object, for example, using information such as a 3D depth map or a point cloud, thereby reflecting the 3D characteristics unique to the source object 10. model can be created.

렌더링부는 타겟 베이스 모델에 대하여 소스 이미지로부터 생성된 텍스쳐 맵을 타겟 베이스 모델에 입히는 렌더링 작업을 수행함으로써 타겟 객체 모델을 생성할 수 있다.The rendering unit may generate a target object model by performing a rendering operation of applying a texture map generated from a source image to the target base model.

이때, 타겟 객체 모델은 최초의 텍스쳐 맵을 기초로 렌더링 됨으로써 실제 소스 객체(10)의 특징과 차이(loss)가 있을 수 있는데 이러한 차이를 최소화히기 위하여 반복 수행될 수 있다.At this time, since the target object model is rendered based on the first texture map, there may be a difference (loss) from the characteristics of the actual source object 10, and this difference may be repeatedly performed to minimize.

구체적으로 타겟 객체 모델로부터 렌더링된 이미지를 추출하고 원본 소스 이미지와 픽셀 별 차이를 산출하고, 해당 차이에 따라 텍스쳐 맵의 속성들의 가중치를 변경함으로써 타겟 객체 모델을 업데이트 할 수 있다.Specifically, the target object model may be updated by extracting a rendered image from the target object model, calculating a pixel-by-pixel difference from the original source image, and changing weights of attributes of the texture map according to the difference.

나아가, 본 실시예에서 모델 생성부는 소스 이미지로부터 학습된 신경망을 이용하여 랜드마크를 추출하고 랜드마크를 이용하여 3차원의 타겟 베이스 모델을 생성한다.Furthermore, in this embodiment, the model generation unit extracts landmarks using a neural network learned from the source image and creates a 3D target base model using the landmarks.

이하, 도 2를 참조하여 본 실시예에 따른 소스 객체(10)의 특징을 모방한 타겟 객체 모델 생성 방법에 대하여 보다 상세히 설명한다.Hereinafter, with reference to FIG. 2, a method for generating a target object model that mimics the characteristics of the source object 10 according to the present embodiment will be described in more detail.

먼저 조명 조건에 따라 소스 객체(10)에 대한 복수의 소스 이미지를 획득한다(S100).First, a plurality of source images of the source object 10 are acquired according to lighting conditions (S100).

즉, 다양한 조건에 따른 조명을 소스 객체(10)에 조사하고 소스 객체(10)의 피부 톤이나 질감을 정확히 센싱할 수 있도록 한다.That is, lighting according to various conditions is radiated to the source object 10 and the skin tone or texture of the source object 10 can be accurately sensed.

예를 들어 색상(Color) 카메라로부터 소스 객체(10)의 색상 정보들이 포함된 이미지를 획득할 수 있으며, 얼굴의 음영이 제외된 상태에서 색상 정보를 획득하기 위해 피사체의 얼굴을 가능한 한 균일하게 조명하고 교차 편광(cross-polarization)을 사용하여 피부에서 직접적으로 정반사 되는 빛과 음영이 제거된 이미지(알베도(Albedo) 이미지)를 캡처할 수 있다.For example, an image including color information of the source object 10 may be obtained from a color camera, and the face of the subject is illuminated as uniformly as possible to obtain color information in a state in which the shadow of the face is excluded. and use cross-polarization to capture directly specularly reflected light from the skin and de-shadowed images (albedo images).

그 외에 RGB-D 카메라를 이용하여 RGB 색영상과 깊이 정보로 소스 객체(10)에 대한 깊이 정보를 갖는 3D 포인트에 색상 값이 할당된 컬러 포인트 클라우드를 획득하는 것도 가능하다.In addition, it is also possible to obtain a color point cloud in which color values are assigned to 3D points having depth information about the source object 10 using an RGB-D camera and depth information.

다음, 소스 이미지로부터 텍스쳐 맵을 생성한다(S200).Next, a texture map is created from the source image (S200).

카메라와 다양한 조명 조건에 따른 복수의 이미지로부터 안면의 질감을 표현하기 위한 텍스쳐 맵을 생성한다.Create a texture map to express the texture of the face from multiple images according to camera and various lighting conditions.

카메라로부터 수신된 이미지를 그대로 이용하는 것 외에, 촬영된 이미지를 후처리를 통해 다채로운 특성이 강조된 형태의 특징 맵을 생성하는 것도 가능하다.In addition to using the image received from the camera as it is, it is also possible to generate a feature map in the form of emphasizing various characteristics through post-processing of the captured image.

예를 들어서, 색상 카메라와 교차 편광 카메라의 출력을 이용하여 스페큘러(Specualr) 이미지를 생성할 수 있다. 스페큘러 이미지는 피부 표면에서 반사된 빛을 모델링한 것으로 피부의 모공이나 잔주름으로 인한 디테일이 포함될 수 있다.For example, a specular image may be generated using outputs of a color camera and a cross-polarization camera. The specular image is a model of light reflected from the surface of the skin, and may include details due to pores or fine wrinkles on the skin.

이상의 과정으로 생성된 텍스쳐 맵을 미리 생성된 타겟 베이스 모델에 적용하여 렌더링 이미지를 추출한다(S300).A rendered image is extracted by applying the texture map generated through the above process to a pre-generated target base model (S300).

소스 이미지로부터 획득된 텍스쳐 맵을 타겟 베이스 모델에 입혀서 생성된 최초(Initial) 타겟 객체 모델로부터 제1 렌더링 이미지를 추출할 수 있다. A first rendering image may be extracted from an initial target object model generated by applying a texture map obtained from a source image to a target base model.

다만, 이때의 제1 렌더링 이미지는 모방하고자 하는 소스 객체(10)의 특성이 완전히 반영되지 않을 수 있으므로 추출된 렌더링 이미지와 상기 소스 이미지의 픽셀간 오차를 산출한다.However, since the characteristics of the source object 10 to be imitated may not be fully reflected in the first rendered image at this time, a pixel error between the extracted rendered image and the source image is calculated.

다음 산출된 오차에 따라 상기 텍스쳐 맵의 파라미터를 갱신하고, 갱신된 텍스쳐 맵을 타겟 베이스 모델에 적용하고 제2 렌더링 이미지를 추출하는 과정을 반복하여 수행한다. Next, the process of updating the parameters of the texture map according to the calculated error, applying the updated texture map to the target base model, and extracting the second rendered image is repeatedly performed.

이러한 렌더링 이미지를 추출하는 단계(S300)는 산출된 픽셀간 오차가 최소화되거나, 미리 결정된 임계값 이하가 되도록 반복 수행될 수 있다.The step of extracting the rendered image (S300) may be repeatedly performed so that the calculated inter-pixel error is minimized or becomes less than or equal to a predetermined threshold value.

이상의 본 실시예에 따른 타겟 객체 모델의 생성 방법을 도 3 내지 5를 참조하여 보다 상세히 설명한다.A method of generating a target object model according to the present embodiment will be described in more detail with reference to FIGS. 3 to 5 .

도 3을 참조하면, 피사체로 소스 객체(10)에 미리 결정된 조건의 조명이 조사되고 카메라는 소스 객체(10)로부터 반사되는 광을 수신하여 소스 이미지를 생성할 수 있다. 촬영에 이용되는 조명의 조건은 이후 렌더링 과정에서 조명 보정 값으로 입력될 수 있으며 이를 통해 조명에 따른 영향을 최소화한 타겟 객체 모델을 생성할 수 있다.Referring to FIG. 3 , a source object 10 as a subject may be irradiated with light under a predetermined condition, and a camera may generate a source image by receiving light reflected from the source object 10 . Lighting conditions used for photography may be input as lighting correction values in a subsequent rendering process, and through this, a target object model in which the influence of lighting is minimized may be generated.

소스 이미지를 획득하는 카메라는 일반적인 색상 카메라 외 교차 편광 카메라, RGB-D 카메라가 이용될 수 있으며, 복수의 카메라로부터 획득된 복수의 소스 이미지는 후처리를 통하여 추가적인 소스 이미지의 획득에 이용될 수 있다.A camera that acquires a source image may use a cross-polarized camera or an RGB-D camera in addition to a general color camera, and a plurality of source images obtained from a plurality of cameras may be used to acquire additional source images through post-processing. .

도 4를 참고하면 본 실시예에서 소스 이미지는 색상 카메라로부터 촬영된 원본 이미지와 교차 편광을 이용하여 촬영된 알베도 이미지, 원본 이미지와 알베도 이미지를 차분하여 생성한 스페큘러 이미지를 생성할 수 있다.Referring to FIG. 4 , in this embodiment, the source image may generate an original image captured by a color camera, an albedo image captured using cross polarization, and a specular image generated by differentially dividing the original image and the albedo image.

또한, RGB-D 카메라의 경우 깊이 맵 정보로 포인트 클라우드와 같은 3차원 정보를 소스 이미지로 획득할 수 있다.In addition, in the case of an RGB-D camera, 3D information such as a point cloud may be obtained as a source image as depth map information.

다음 소스 이미지들은 다양한 각도에서 촬영된 이미지들의 프로젝션 과정을 거친 후 블렌딩됨으로써 다양한 특징을 표현하는 텍스쳐 맵으로 생성될 수 있다. 블렌딩 과정은 일 예로 가우시안 피라미드와 객체에 대한 마스크를 이용하여 라플라시안 피라미드와 합성을 통해 텍스쳐를 생성할 수 있다.The following source images may be generated as texture maps expressing various characteristics by being blended after going through a projection process of images photographed at various angles. In the blending process, for example, a texture may be generated through synthesis with a Laplacian pyramid using a Gaussian pyramid and a mask for an object.

도 5를 참고하면, 본 실시예에서 텍스쳐 맵은 알베도 값들을 정의하는 특징 맵, 스페큘러 맵 및 노말 맵을 포함한다. Referring to FIG. 5 , in this embodiment, the texture map includes a feature map defining albedo values, a specular map, and a normal map.

알베도 맵 및 스페큘러 맵은 상술한 알베도 소스 이미지와 스페큘러 소스 이미지의 블렌딩으로 생성될 수 잇으며 노말 맵은 객체의 표면의 방향을 정의하는 노말 벡터 정보를 포함할 수 있다.The albedo map and the specular map may be generated by blending the above-described albedo source image and the specular source image, and the normal map may include normal vector information defining the direction of the surface of the object.

이상의 생성된 텍스쳐 맵들은 객체의 기하학적 형태를 모델링한 베이스 모델에 반영되어 최초 타겟 모델로 생성될 수 있다.The above generated texture maps may be reflected in a base model that models the geometric shape of an object and generated as an initial target model.

이때, 베이스 모델의 생성은 상술한 RGB-D 카메라의 스캔 데이터로 생성될 수 있으며 본 실시예에서 베이스 모델은 소스 객체(10)의 형태와 함께 움직임을 표현하는 객체의 근육의 형태를 고려하여 구성될 수 있다.In this case, the base model may be generated with scan data of the above-described RGB-D camera, and in this embodiment, the base model is configured in consideration of the shape of the source object 10 and the shape of the muscles of the object expressing motion. It can be.

도 6을 참조하면, 본 실시예에서 베이스 모델은 스캔 데이터로 포인트 클라우드 정보와 함께 안면의 해부학적 구조상 움직임이 발생되는 랜드 마크들로 근육의 형태를 정의하는 라인 랜드 마크를 이용하여 생성될 수 있다.Referring to FIG. 6 , in the present embodiment, the base model may be created using point cloud information as scan data and line landmarks defining the shape of muscles as landmarks where movements in the anatomical structure of the face are generated. .

또한 라인 랜드 마크는 3차원 정보로서 2차원의 랜드 마크들을 이용하여 재구성될 수 있다.In addition, line landmarks can be reconstructed using 2-dimensional landmarks as 3-dimensional information.

또한 본 실시예에서 2차원 랜드마크는 학습된 신경망을 이용하여 추출될 수 있다. Also, in this embodiment, a 2D landmark may be extracted using a learned neural network.

즉, 색상 카메라로부터 획득된 이미지를 학습된 신경망에 입력함으로써 안면의 해부학적 특징들의 좌표 정보를 출력할 수 있다. 예를 들어 안면의 윤곽, 눈, 코, 입의 위치를 특징 점으로 추출할 수 있다.That is, by inputting an image obtained from a color camera to the learned neural network, coordinate information of anatomical features of the face may be output. For example, facial contours, eyes, nose, and mouth positions can be extracted as feature points.

도 7을 참조하면 신경망(70)은 소스 이미지 내 안면 특징 정보들의 위치를 특징점으로 추출하도록 학습될 수 있으며 이를 통해 사람의 수작업 없이 안면 내 눈의 위치, 코의 위치 나 윤곽 등의 주요 특징들의 위치 정보를 출력할 수 있도록 한다.Referring to FIG. 7 , the neural network 70 can be trained to extract the positions of facial feature information in a source image as feature points, and through this, the positions of key features such as the position of the eyes, the position of the nose, or the outline of the face within the face without human manual intervention. Allow information to be output.

이상의 과정으로 생성된 특징 점의 위치를 촬영 각도를 고려하여 3차원으로 재구성함으로써 라인 랜드마크를 생성할 수 있다. 또한 라인 랜드마크의 생성 시 스캔 데이터의 깊이 정보를 반영함으로 얼굴 전체의 구조적인 특징을 반영하여 특징이 되는 라인들을 추출할 수 있다.A line landmark can be created by reconstructing the position of the feature point created through the above process in 3D in consideration of the shooting angle. In addition, when generating a line landmark, it is possible to extract feature lines by reflecting the structural features of the entire face by reflecting the depth information of the scan data.

도 8을 참조하면 라인 랜드 마크는 2차원 랜드 마크와 달리 3차원 공간 상의 곡선 형태로 구성될 수 있으며 눈이나 코와 같은 주요 특징들의 위치가 아닌 해당 부위를 동작하게 하는 해부학적 구조를 정의할 수 있다.Referring to FIG. 8 , a line landmark, unlike a 2D landmark, may be configured in a curved shape in a 3D space, and an anatomical structure for operating a corresponding part may be defined rather than a location of major features such as eyes or a nose. there is.

예를 들어 눈 주위의 근육과, 볼, 입을 움직이게 하는 근육들의 형태와 위치 정보가 라인 랜드마크(125)로 재 구성될 수 있다.For example, shape and location information of muscles around the eyes and muscles that move the cheeks and mouth may be reconstructed into the line landmark 125 .

이상의 과정으로 생성된 라인 랜드마크를 기준으로 템플릿 모델과 이에 대응되는 템플릿 랜드마크를 피팅 함으로써 베이스 모델을 생성한다.A base model is created by fitting a template model and a corresponding template landmark based on the line landmarks generated through the above process.

도 9를 참조하면, 라인 랜드마크(125)와 템플릿 랜드 마크(92)를 이용하여 베이스 모델의 구체적인 형상을 템플릿 모델(94)을 기초로 피팅 할 수 있다. 또한 피팅 과정에서 미리 획득된 깊이 정보(96)를 활용하여 보다 정밀한 베이스 모델(50)의 형태를 구현한다.Referring to FIG. 9 , a specific shape of the base model may be fitted based on the template model 94 using the line landmark 125 and the template landmark 92 . In addition, a more precise shape of the base model 50 is implemented by utilizing the depth information 96 obtained in advance in the fitting process.

생성된 베이스 모델은 다시 표면 최적화를 통하여 매끈한 형태의 3차원 모델로 구성될 수 있으며 추가적으로 본 실시예에서는 생성된 베이스 모델이 개개인의 눈, 코, 입 등의 위치 특성을 반영할 수 있도록 위상을 정렬함으로써 위상 정렬된 베이스 모델을 생성할 수 있다.The generated base model can again be configured as a smooth 3D model through surface optimization, and additionally, in this embodiment, the generated base model is phase aligned so that it can reflect the positional characteristics of individual eyes, noses, and mouths. By doing so, it is possible to generate a phase-aligned base model.

위상 정렬 과정은 베이스 모델(50)의 관심 포인트(Interest Point)의 위상을 소스 객체(10)의 관심 포인트의 위상에 따라 정렬하는 것으로, 소스 객체 특유의 눈, 코, 입의 상대적인 위치 관계를 반영시킨다.The phase alignment process is to align the phase of the interest point of the base model 50 according to the phase of the interest point of the source object 10, reflecting the relative positional relationship of eyes, nose, and mouth peculiar to the source object. let it

나아가, 위상의 정렬은 사용자의 수정 입력을 통해 수행될 수 있으며 이를 위한 인터페이스를 사용자에게 직접 제공할 수 있다. 또한 인터페이스는 상술한 라인 랜드마크를 정렬을 위한 도구로 활용하도록 구현될 수 있다.Furthermore, alignment of the phase may be performed through a user's correction input, and an interface for this may be directly provided to the user. Also, the interface may be implemented to utilize the above-described line landmark as a tool for alignment.

사용자의 수정 정보는 탄성 변형(Elastic Deformation) 과정을 통해 반영될 수 있으며 변형 후 다시 표면의 최적화를 통해 위상 정렬 모델을 갱신할 수 있다.The user's correction information can be reflected through an elastic deformation process, and the phase alignment model can be updated through surface optimization again after deformation.

다시 도 3을 참조하면, 생성된 타겟 베이스 모델에 텍스쳐 맵을 반영하여 생성된 최초 타겟 모델을 렌더링 과정을 통해 최적화 시킴으로써 최종 타겟 모델을 생성한다.Referring back to FIG. 3 , a final target model is created by optimizing an initial target model generated by reflecting a texture map on a generated target base model through a rendering process.

구체적으로 도 10을 함께 참고하면 인버스 렌더링 과정을 통해 타겟 모델로부터 촬영된 조명 조건과 카메라 조건에 따른 조명 보정과 카메라 보정 값을 입력으로 소스 이미지(102)와 동일한 조건의 이미지(104)를 획득하고, 획득된 이미지와 소스 이미지 간의 차이를 산출할 수 있다.Specifically, referring to FIG. 10 , an image 104 under the same conditions as the source image 102 is obtained by inputting lighting correction and camera correction values according to lighting conditions and camera conditions photographed from the target model through an inverse rendering process, , the difference between the acquired image and the source image can be calculated.

획득된 이미지와 소스 이미지는 상술한 위상 정렬 과정의 결과로 픽셀 단위의 비교가 가능하며 이를 통해 픽셀 간 오차를 산출할 수 있다.The obtained image and the source image can be compared in units of pixels as a result of the above-described phase alignment process, and through this, an error between pixels can be calculated.

반복 과정은 산출된 오차와 미리 결정된 임계값과 비교하여 오차를 줄이도록 텍스쳐 맵 내 파라미터 또는 파라미터들의 가중치를 변경할 수 있다. 이상의 과정은 오차가 임계값 이하가 되도록 또는 수치예측을 통해 오차가 최고화되도록 반복 수행될 수 있다.The iterative process may change a parameter or weights of parameters in the texture map to reduce the error by comparing the calculated error with a predetermined threshold. The above process may be repeatedly performed so that the error is less than the threshold value or the error is maximized through numerical prediction.

구체적으로 반복 과정을 통해 갱신되는 파라미터(106)는 알베도(β), 스페큘러(γ), 및 형태(Shape) 값 중 적어도 하나를 포함한다. 또한, 형태 값은 상기 타겟 객체의 전역적인 형태를 정의하는 기하학적(Geometry) 벡터(α) 및 국부적인 형태를 정의하는 노말 벡터(δ)를 포함한다.Specifically, the parameter 106 updated through an iterative process includes at least one of albedo (β), specular (γ), and shape values. Also, the shape value includes a geometry vector α defining the global shape of the target object and a normal vector δ defining the local shape.

이상의 과정을 통해 최적화된 텍스쳐 맵을 베이스 모델에 적용함으로써 타겟 객체 모델(55)을 생성할 수 있다.The target object model 55 may be generated by applying the texture map optimized through the above process to the base model.

또한 갱신된 파라미터를 갖는 텍스쳐 맵을 3차원 공간 내 가상 캐릭터의 안면에 적용함으로써 메타버스 환경 내의 객체들을 직접 사용자나 다른 객체의 아바타로 동작하도록 구현할 수 있다.In addition, by applying a texture map with updated parameters to the face of a virtual character in a 3D space, objects in the metaverse environment can be implemented to directly operate as avatars of users or other objects.

이상 본 발명에 따르면, 촬영된 카메라 이미지만을 이용하여 사람의 안면 특징을 모방한 사실적인 디지털 휴먼을 제작할 수 있다. According to the present invention, it is possible to create a realistic digital human that mimics human facial features using only the captured camera image.

나아가, 본 발명은 다양한 객체의 위상을 수정하는 인터페이스를 제공함으로써 디지털 휴먼을 확장할 수 있다.Furthermore, the present invention can extend the digital human by providing an interface for modifying the phases of various objects.

나아가, 여기에 설명되는 다양한 실시예는 예를 들어, 소프트웨어, 하드웨어 또는 이들의 조합된 것을 이용하여 컴퓨터 또는 이와 유사한 장치로 읽을 수 있는 기록매체 내에서 구현될 수 있다.Furthermore, various embodiments described herein may be implemented in a recording medium readable by a computer or a device similar thereto using, for example, software, hardware, or a combination thereof.

하드웨어적인 구현에 의하면, 여기에 설명되는 실시예는 ASICs (application specific integrated circuits), DSPs (digital signal processors), DSPDs (digital signal processing devices), PLDs (programmable logic devices), FPGAs (field programmable gate arrays, 프로세서(processors), 제어기(controllers), 마이크로 컨트롤러(micro-controllers), 마이크로 프로세서(microprocessors), 기타 기능 수행을 위한 전기적인 유닛 중 적어도 하나를 이용하여 구현될 수 있다. 일부의 경우에 본 명세서에서 설명되는 실시예들이 제어 모듈 자체로 구현될 수 있다.According to hardware implementation, the embodiments described herein include application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable gate arrays (FPGAs), It may be implemented using at least one of processors, controllers, micro-controllers, microprocessors, and electrical units for performing other functions. The described embodiments may be implemented as a control module itself.

소프트웨어적인 구현에 의하면, 본 명세서에서 설명되는 절차 및 기능과 같은 실시예들은 별도의 소프트웨어 모듈들로 구현될 수 있다. 상기 소프트웨어 모듈들 각각은 본 명세서에서 설명되는 하나 이상의 기능 및 작동을 수행할 수 있다. 적절한 프로그램 언어로 씌여진 소프트웨어 어플리케이션으로 소프트웨어 코드가 구현될 수 있다. 상기 소프트웨어 코드는 메모리 모듈에 저장되고, 제어모듈에 의해 실행될 수 있다.According to software implementation, embodiments such as procedures and functions described in this specification may be implemented as separate software modules. Each of the software modules may perform one or more functions and operations described herein. The software code may be implemented as a software application written in any suitable programming language. The software code may be stored in a memory module and executed by a control module.

이상의 설명은 본 발명의 기술 사상을 예시적으로 설명한 것에 불과한 것으로서, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 발명의 본질적인 특성에서 벗어나지 않는 범위 내에서 다양한 수정, 변경 및 치환이 가능할 것이다. The above description is merely an example of the technical idea of the present invention, and those skilled in the art can make various modifications, changes, and substitutions without departing from the essential characteristics of the present invention. will be.

따라서, 본 발명에 개시된 실시 예 및 첨부된 도면들은 본 발명의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시 예 및 첨부된 도면에 의하여 본 발명의 기술 사상의 범위가 한정되는 것은 아니다. 본 발명의 보호 범위는 아래의 청구 범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 발명의 권리 범위에 포함되는 것으로 해석되어야 할 것이다.Therefore, the embodiments disclosed in the present invention and the accompanying drawings are not intended to limit the technical idea of the present invention, but to explain, and the scope of the technical idea of the present invention is not limited by these embodiments and the accompanying drawings. . The protection scope of the present invention should be construed according to the following claims, and all technical ideas within the equivalent range should be construed as being included in the scope of the present invention.

Claims

acquiring a plurality of source images of a source object according to lighting conditions;
generating a texture map from the source image;
extracting a rendered image by applying the generated texture map to a pre-generated target base model;
calculating an error between pixels of the extracted rendered image and the source image; and
Updating parameters of the texture map according to the calculated error;
Wherein the step of extracting the rendered image is repeatedly performed to minimize the calculated inter-pixel error.

According to claim 1,
Wherein the extracting of the rendered image extracts an updated rendered image by applying the texture map having the updated parameters to the target base model.

According to claim 2,
extracting a plurality of predetermined 2-dimensional first landmarks from the plurality of source images;
matching the extracted first landmark with a second landmark in a 3D space in consideration of an anatomical feature;
and generating a target base model by applying the matched second landmark to scan data of the target object.

According to claim 3,
Editing the phase of the interest point of the target object according to the phase of the interest point of the source object.

According to claim 1,
The updated parameter includes at least one of albedo, specular, and shape values.

According to claim 5,
The shape value includes a geometry vector defining a global shape of the target object and a normal vector defining a local shape of the target object.

According to claim 3,
The method of generating a target object model that mimics the characteristics of a source object, characterized in that the step of extracting the first landmark is extracted using a pre-learned neural network.

According to claim 3,
Wherein the updating of the parameter uses the extracted second landmark as an update constraint condition of the parameter.

acquiring a plurality of source images of a source object according to lighting conditions;
generating a texture map from the source image;
extracting a rendered image by applying the generated texture map to a target base model generated by applying second landmarks extracted from the plurality of source images to scan data of a target object;
calculating an error between pixels of the extracted rendered image and the source image; and
Updating parameters of the texture map according to the calculated error;
Wherein the step of extracting the rendered image is repeatedly performed to minimize the calculated inter-pixel error.

According to claim 9,
Wherein the extracting of the rendered image extracts an updated rendered image by applying the texture map having the updated parameter to the target base model.

According to claim 10,
Wherein the updating of the parameter uses the extracted second landmark as an update constraint condition of the parameter.

acquiring a plurality of source images of a face of a source object according to lighting conditions;
generating a texture map from the source image;
extracting a rendered image by applying the generated texture map to a target base model generated by applying second landmarks extracted from the plurality of source images to scan data of a target object;
calculating an error between pixels of the extracted rendered image and the source image;
updating a parameter of the texture map according to the calculated error; and
and applying the updated rendered image to the face of the virtual character by applying the texture map having the updated parameters to the target base model.