KR102373608B1

KR102373608B1 - Electronic apparatus and method for digital human image formation, and program stored in computer readable medium performing the same

Info

Publication number: KR102373608B1
Application number: KR1020210079831A
Authority: KR
Inventors: 오병기
Original assignee: 주식회사 쓰리디팩토리
Priority date: 2021-06-21
Filing date: 2021-06-21
Publication date: 2022-03-14

Abstract

Provided are electronic apparatus and method for forming a digital human image to form a sophisticated and realistic digital human image, and a program stored in a computer-readable recording medium to execute the same. According to the present invention, the electronic apparatus acquires an image of an object; extracts a first feature point for forming a face image of a digital human image and a second feature point for forming a hair image of the digital human image from the image of an object; assigns a first weight to the first feature point in consideration of intimacy between the object and the digital human image and the age of the object; assigns a second weight to the second feature point in consideration of motion of the object, wind, and a force acting outside the object; assigns a third weight to the first feature point and the second feature point in consideration of a real-time face shape and a real-time posture of the object; uses the first feature point to which the first weight and the third weight are reflected and the second feature point to which the second weight and the third weight are reflected to select a stand-in model having the highest degree of agreement with the object; scans the stand-in model and taking images from various angles to form first three-dimensional (3D) data; uses the first 3D data to form a first image of the object; takes a predetermined facial expression template with a depth camera to form second 3D data; uses the second 3D data to form a second image representing a facial expression of the object; and uses 3D animation points to synthesize the first image and the second image, thereby forming a digital human image.

Description

Electronic device and method for digital human image formation, and program stored in a computer-readable recording medium to perform the same

본 발명은 디지털 휴먼 영상 형성을 위한 전자 장치에 관련된 것으로, 보다 구체적으로는 실시간 모션 캡쳐, 빅데이터, 및 감정 데이터베이스에 기초한 상호 작용 시스템을 이용하여 실감나는 딥페이크(Deepfake) 형태의 디지털 휴먼 영상을 제공할 수 있는 디지털 휴먼 영상 형성을 위한 전자 장치 및 방법과, 그를 수행하도록 컴퓨터 판독 가능한 기록매체에 저장된 프로그램에 관련된 것이다.The present invention relates to an electronic device for forming a digital human image, and more specifically, a digital human image in the form of a realistic deepfake using an interaction system based on real-time motion capture, big data, and an emotion database. It relates to an electronic device and method for forming a digital human image that can be provided, and a program stored in a computer-readable recording medium to perform the same.

딥페이크(Deepfake)란 인공지능(AI: Artificial Intelligence)을 기반으로 활용한 인간 이미지 합성 기술이다. 기존에 존재하던 인물의 얼굴, 특정한 부위 등을 영화 제작에 사용되는 CG(Computer Graphic) 기술을 이용하여 합성한 영상편집물을 총칭한다. 과거 인물의 사진이나 영상을 조악하게 합성하여 딥페이크 영상을 합성하였으나, 하드웨어 및 소프트웨어 기술의 발전으로 결과물이 보다 정교해지고 있다. 딥페이크 영상 합성의 원리는 합성하려는 인물의 얼굴이 등장하는 고화질의 동영상을 이용하여 딥러닝(Deep learning)과 같은 기계 학습을 수행하고, 기계 학습 결과에 따라 대상이 되는 인물의 동영상을 프레임 단위로 합성시키는 것이다.Deepfake is a human image synthesis technology based on artificial intelligence (AI). It is a generic term for video compilations that have been synthesized using CG (Computer Graphic) technology used for movie production, including the face and specific parts of an existing person. In the past, deepfake images were synthesized by coarsely synthesizing pictures or images of people, but the result is becoming more sophisticated with the development of hardware and software technology. The principle of deepfake image synthesis is to perform machine learning such as deep learning using a high-definition video showing the face of the person to be synthesized, and according to the machine learning result, the video of the target person is converted into frames. to synthesize.

또한, 기계 학습 방식의 특성상 주어지지 않은 정보에 대해서는 정확한 동영상을 형성하지 못한다. 즉, 아무런 방해물이 없는 상태의 일반적인 얼굴 표정은 잘 합성해내지만, 얼굴 근처에 다른 물체가 있거나, 얼굴 자체가 프레임에서 일부 잘려 나간 경우, 일반적으로 잘 볼 수 없는 매우 특이한 표정을 지은 경우 등에는 대충 덮어씌운 듯한 매우 부자연스러운 딥페이크 영상이 형성되고, 극단적인 경우 딥페이크 영상 합성에 실패하여 원본 이미지 또는 동영상을 디스플레이 하기도 한다. 합성 대상의 얼굴 표정 학습량이 적을 때에도 비슷한 현상이 발생하는데, 이때의 모습이 기괴하여 불쾌한 골짜기 현상이 나타나기도 한다.In addition, due to the nature of the machine learning method, it is not possible to form an accurate video with respect to the information not given. In other words, general facial expressions without any obstruction are synthesized well, but when there are other objects near the face, the face itself is partially cut out of the frame, A very unnatural deepfake image that seems to have been roughly overlaid is formed, and in extreme cases, deepfake image synthesis fails and the original image or video is displayed. A similar phenomenon occurs when the amount of facial expression learning of the synthesized target is small, and the appearance at this time is bizarre and an unpleasant valley phenomenon appears.

이러한 딥페이크 영상 형성 방법을 이용할 경우 미리 설정된 표정, 입모양, 눈모양 등을 이용하여 구현되므로, 사용자가 딥페이크 영상에 대한 친근감을 느끼지 못하는 문제점이 있다. 즉, 기존에는 친밀도, 나이, 얼굴의 움직임, 바람, 외부 힘, 실시간 얼굴 형상, 실시간 자세, 원근감 등을 고려하여 딥페이크 영상을 형성하지 못하는 문제점이 있었다.When such a deepfake image forming method is used, since it is implemented using preset facial expressions, mouth shapes, eye shapes, etc., there is a problem in that the user does not feel familiarity with the deepfake images. That is, there was a problem in that it was not possible to form a deepfake image in consideration of intimacy, age, facial movement, wind, external force, real-time face shape, real-time posture, and perspective.

본 발명이 해결하고자 하는 일 기술적 과제는, 대상체와의 친밀도 및 나이를 고려한 영상 제작이 가능하도록 하는 딥페이크 형태의 디지털 휴먼 영상 형성을 위한 전자 장치 및 방법과, 그를 수행하도록 컴퓨터 판독 가능한 기록 매체에 저장된 프로그램을 제공하는데 있다.One technical problem to be solved by the present invention is an electronic device and method for forming a digital human image in the form of a deepfake that enables image production in consideration of intimacy with an object and age, and a computer-readable recording medium to perform the same It is to provide stored programs.

본 발명이 해결하고자 하는 다른 기술적 과제는, 대상체의 얼굴의 움직임, 바람 및 외부 힘을 고려한 영상 제작이 가능하도록 하는 디지털 휴먼 영상 형성을 위한 전자 장치 및 방법과, 그를 수행하도록 컴퓨터 판독 가능한 기록 매체에 저장된 프로그램을 제공하는데 있다.Another technical problem to be solved by the present invention is an electronic device and method for forming a digital human image that enables image production in consideration of the movement of the subject's face, wind, and external force, and a computer-readable recording medium to perform the same It is to provide stored programs.

본 발명이 해결하고자 하는 다른 기술적 과제는, 대상체의 실시간 얼굴 형상 및 실시간 자세를 고려한 영상 제작이 가능하도록 하는 디지털 휴먼 영상 형성을 위한 전자 장치 및 방법과, 그를 수행하도록 컴퓨터 판독 가능한 기록 매체에 저장된 프로그램을 제공하는데 있다.Another technical problem to be solved by the present invention is an electronic device and method for forming a digital human image enabling image production in consideration of a real-time face shape and real-time posture of an object, and a program stored in a computer-readable recording medium to perform the same is to provide

본 발명이 해결하고자 하는 기술적 과제는 상술된 것에 제한되지 않는다.The technical problem to be solved by the present invention is not limited to the above.

본 발명의 일 실시 예에 따른 디지털 휴먼 영상 형성을 위한 전자 장치는, 하나 이상의 프로세서; 및 상기 하나 이상의 프로세서에 의한 실행 시, 상기 하나 이상의 프로세서가 연산을 수행하도록 하는 명령들이 저장된 하나 이상의 메모리를 포함하며, 상기 하나 이상의 프로세서는, 대상체의 이미지를 획득하고, 상기 대상체의 이미지로부터 디지털 휴먼 영상의 얼굴 이미지 형성을 위한 제1 특징점 및 상기 디지털 휴먼 영상의 모발 이미지 형성을 위한 제2 특징점을 추출하며, 상기 대상체와 상기 디지털 휴먼 영상과의 친밀도 및 상기 대상체의 나이를 고려하여 상기 제1 특징점에 제1 가중치를 부여하고, 상기 디지털 휴먼 영상의 디스플레이 거리를 고려하여 상기 제2 특징점에 제2 가중치를 부여하며, 상기 대상체의 실시간 얼굴 형상 및 실시간 자세를 고려하여 상기 제1 특징점 및 상기 제2 특징점에 제3 가중치를 부여하고, 상기 제1 가중치 및 상기 제3 가중치가 반영된 제1 특징점 및 상기 제2 가중치 및 상기 제3 가중치가 반영된 상기 제2 특징점을 이용하여 상기 대상체와 일치도가 가장 높은 대역 모델을 선택하며, 상기 대역 모델을 3차원 스캐닝하고, 다양한 각도에서의 이미지를 촬영하여 제1 3차원 데이터를 형성하고, 상기 제1 3차원 데이터를 이용하여 상기 대상체의 제1 영상을 형성하며, 미리 설정된 표정 템플릿을 뎁스 카메라로 촬영하여 제2 3차원 데이터를 형성하고, 상기 제2 3차원 데이터를 이용하여 상기 대상체의 표정을 나타내는 제2 영상을 형성하며, 3차원 애니메이션 포인트를 이용하여 상기 제1 영상과 상기 제2 영상을 합성하여 상기 디지털 휴먼 영상을 형성할 수 있다.An electronic device for forming a digital human image according to an embodiment of the present invention includes: one or more processors; and one or more memories storing instructions that, when executed by the one or more processors, cause the one or more processors to perform an operation, wherein the one or more processors acquire an image of the object, and obtain an image of the object from the image of the object. A first feature point for forming a face image of an image and a second feature point for forming a hair image of the digital human image are extracted, and the first feature point is extracted in consideration of intimacy between the object and the digital human image and the age of the object A first weight is given to , and a second weight is given to the second feature point in consideration of the display distance of the digital human image, and the first feature point and the second feature point are taken in consideration of the real-time face shape and real-time posture of the object A band having the highest degree of matching with the object by assigning a third weight to the feature point, and using the first feature point to which the first weight and the third weight are reflected, and the second feature point to which the second weight and the third weight are reflected selecting a model, three-dimensionally scanning the band model, photographing images from various angles to form first three-dimensional data, and forming a first image of the object using the first three-dimensional data; A preset expression template is photographed with a depth camera to form second three-dimensional data, a second image representing the expression of the object is formed using the second three-dimensional data, and the second three-dimensional data is formed using a three-dimensional animation point. The digital human image may be formed by synthesizing the first image and the second image.

일 실시예로서, 상기 하나 이상의 프로세서는, 상기 대상체의 단위 시간당 발화 횟수, 상기 대상체의 단위 대화당 발화 시간 및 상기 디지털 휴먼 영상의 발화 후 상기 대상체가 발화하기까지 소요된 피드백 시간에 기초하여 상기 대상체와 상기 디지털 휴먼 영상과의 친밀도를 측정하고, 상기 대상체의 목소리의 주파수에 기초하여 상기 대상체의 나이를 측정하며, 상기 측정된 친밀도 및 대상체의 나이를 고려하여 상기 제1 특징점에 상기 제1 가중치를 부여할 수 있다.In an embodiment, the one or more processors may be configured to: based on the number of utterances per unit time of the object, the utterance time per unit of conversation of the object, and a feedback time taken from uttering the digital human image until the object utters utterance, Intimacy with the digital human image is measured, the age of the object is measured based on the frequency of the object's voice, and the first weight is applied to the first feature point in consideration of the measured intimacy and the age of the object. can be given

일 실시예로서, 상기 하나 이상의 프로세서는, 상기 디지털 휴먼 영상의 디스플레이 거리에 반비례하여 낮은 디테일로 연산 되도록 자원을 분배할 수 있도록 상기 제2 특징점에 상기 제2 가중치를 부여할 수 있다.As an embodiment, the one or more processors may assign the second weight to the second feature point so as to distribute the resource so that it is calculated with low detail in inverse proportion to the display distance of the digital human image.

일 실시예로서, 상기 하나 이상의 프로세서는, 상기 대상체의 피부의 두께 상태 및 주름 상태에 따른 서로 다른 제4 가중치를 적용하여 상기 디지털 휴먼 영상을 형성할 수 있다.As an embodiment, the one or more processors may form the digital human image by applying different fourth weights according to a thickness state and a wrinkle state of the subject's skin.

일 실시예로서, 상기 하나 이상의 프로세서는, 피부의 두께에 따른 색깔 차이를 고려하여 피부의 최외곽 상태에 대한 백분율을 산출하여 상기 제4 가중치를 적용할 수 있다.In an embodiment, the one or more processors may apply the fourth weight by calculating a percentage of the outermost state of the skin in consideration of a color difference according to the thickness of the skin.

일 실시예로서, 상기 하나 이상의 프로세서는, 상기 대상체의 실시간 얼굴 형상 및 자세를 촬영하고, 상기 제1 특징점 및 상기 제2 특징점과 상기 촬영된 얼굴 형상 및 자세와의 상관도를 추출하며, 상기 추출된 상관도를 고려하여 상기 제1 특징점 및 상기 제2 특징점에 상기 제3 가중치를 부여할 수 있다.In an embodiment, the one or more processors may photograph a real-time face shape and posture of the object, extract a correlation between the first feature point and the second feature point, and the photographed face shape and posture, and extract the The third weight may be assigned to the first feature point and the second feature point in consideration of the degree of correlation.

본 발명의 실시예에 따른 하나 이상의 프로세서; 및 상기 하나 이상의 프로세서에 의한 실행 시, 상기 하나 이상의 프로세서가 연산을 수행하도록 하는 명령들이 저장된 하나 이상의 메모리를 포함하는 디지털 휴먼 영상 형성을 위한 전자 장치를 이용한 디지털 휴먼 영상 형성 방법은, 상기 하나 이상의 프로세서에 의해서, 대상체의 이미지를 획득하는 단계; 상기 하나 이상의 프로세서에 의해서, 상기 대상체의 이미지로부터 디지털 휴먼 영상의 얼굴 이미지 형성을 위한 제1 특징점 및 상기 디지털 휴먼 영상의 모발 이미지 형성을 위한 제2 특징점을 추출하는 단계; 상기 하나 이상의 프로세서에 의해서, 상기 대상체와 상기 디지털 휴먼 영상과의 친밀도 및 상기 대상체의 나이를 고려하여 상기 제1 특징점에 제1 가중치를 부여하는 단계; 상기 디지털 휴먼 영상의 디스플레이 거리를 고려하여 상기 제2 특징점에 제2 가중치를 부여하는 단계; 상기 하나 이상의 프로세서에 의해서, 상기 대상체의 실시간 얼굴 형상 및 실시간 자세를 고려하여 상기 제1 특징점 및 상기 제2 특징점에 제3 가중치를 부여하는 단계; 상기 하나 이상의 프로세서에 의해서, 상기 제1 가중치 및 상기 제3 가중치가 반영된 제1 특징점 및 상기 제2 가중치 및 상기 제3 가중치가 반영된 상기 제2 특징점을 이용하여 상기 대상체와 일치도가 가장 높은 대역 모델을 선택하는 단계; 상기 하나 이상의 프로세서에 의해서, 상기 대역 모델을 3차원 스캐닝하고, 다양한 각도에서의 이미지를 촬영하여 제1 3차원 데이터를 형성하는 단계; 상기 하나 이상의 프로세서에 의해서, 상기 제1 3차원 데이터를 이용하여 상기 대상체의 제1 영상을 형성하는 단계; 상기 하나 이상의 프로세서에 의해서, 미리 설정된 표정 템플릿을 뎁스 카메라로 촬영하여 제2 3차원 데이터를 형성하는 단계; 상기 하나 이상의 프로세서에 의해서, 상기 제2 3차원 데이터를 이용하여 상기 대상체의 표정을 나타내는 제2 영상을 형성하는 단계; 및 상기 하나 이상의 프로세서에 의해서, 3차원 애니메이션 포인트를 이용하여 상기 제1 영상과 상기 제2 영상을 합성하여 상기 디지털 휴먼 영상을 형성하는 단계를 포함할 수 있다.one or more processors according to an embodiment of the present invention; and one or more memories storing instructions that, when executed by the one or more processors, cause the one or more processors to perform arithmetic operations. by, acquiring an image of the object; extracting, by the one or more processors, a first feature point for forming a face image of a digital human image and a second feature point for forming a hair image of the digital human image from the image of the object; assigning, by the one or more processors, a first weight to the first feature point in consideration of intimacy between the object and the digital human image and the age of the object; assigning a second weight to the second feature point in consideration of a display distance of the digital human image; assigning, by the one or more processors, a third weight to the first feature point and the second feature point in consideration of the real-time face shape and real-time posture of the object; A band model having the highest degree of agreement with the object using the first feature point to which the first weight and the third weight are reflected and the second feature point to which the second weight and the third weight are reflected, by the one or more processors selecting; forming first three-dimensional data by three-dimensionally scanning the band model by the one or more processors and photographing images from various angles; forming, by the one or more processors, a first image of the object using the first 3D data; forming, by the one or more processors, a preset expression template with a depth camera to form second three-dimensional data; forming, by the one or more processors, a second image representing the expression of the object by using the second three-dimensional data; and synthesizing, by the one or more processors, the first image and the second image using three-dimensional animation points to form the digital human image.

본 발명의 실시예에 따른 하나 이상의 프로세서; 및 상기 하나 이상의 프로세서에 의한 실행 시, 상기 하나 이상의 프로세서가 연산을 수행하도록 하는 명령들이 저장된 하나 이상의 메모리를 포함하는 컴퓨터에서 수행 가능하도록 컴퓨터 판독 가능한 기록매체에 저장된 컴퓨터 프로그램은, 상기 하나 이상의 프로세서에 의해서, 대상체의 이미지를 획득하는 단계; 상기 하나 이상의 프로세서에 의해서, 상기 대상체의 이미지로부터 디지털 휴먼 영상의 얼굴 이미지 형성을 위한 제1 특징점 및 상기 디지털 휴먼 영상의 모발 이미지 형성을 위한 제2 특징점을 추출하는 단계; 상기 하나 이상의 프로세서에 의해서, 상기 대상체와 상기 디지털 휴먼 영상과의 친밀도 및 상기 대상체의 나이를 고려하여 상기 제1 특징점에 제1 가중치를 부여하는 단계; 상기 디지털 휴먼 영상의 디스플레이 거리를 고려하여 상기 제2 특징점에 제2 가중치를 부여하는 단계; 상기 하나 이상의 프로세서에 의해서, 상기 대상체의 실시간 얼굴 형상 및 실시간 자세를 고려하여 상기 제1 특징점 및 상기 제2 특징점에 제3 가중치를 부여하는 단계; 상기 하나 이상의 프로세서에 의해서, 상기 제1 가중치 및 상기 제3 가중치가 반영된 제1 특징점 및 상기 제2 가중치 및 상기 제3 가중치가 반영된 상기 제2 특징점을 이용하여 상기 대상체와 일치도가 가장 높은 대역 모델을 선택하는 단계; 상기 하나 이상의 프로세서에 의해서, 상기 대역 모델을 3차원 스캐닝하고, 다양한 각도에서의 이미지를 촬영하여 제1 3차원 데이터를 형성하는 단계; 상기 하나 이상의 프로세서에 의해서, 상기 제1 3차원 데이터를 이용하여 상기 대상체의 제1 영상을 형성하는 단계; 상기 하나 이상의 프로세서에 의해서, 미리 설정된 표정 템플릿을 뎁스 카메라로 촬영하여 제2 3차원 데이터를 형성하는 단계; 상기 하나 이상의 프로세서에 의해서, 상기 제2 3차원 데이터를 이용하여 상기 대상체의 표정을 나타내는 제2 영상을 형성하는 단계; 및 상기 하나 이상의 프로세서에 의해서, 3차원 애니메이션 포인트를 이용하여 상기 제1 영상과 상기 제2 영상을 합성하여 상기 디지털 휴먼 영상을 형성하는 단계를 포함할 수 있다.one or more processors according to an embodiment of the present invention; And when executed by the one or more processors, a computer program stored in a computer-readable recording medium to be executable in a computer including one or more memories stored therein, in which the one or more processors perform an operation, the one or more processors obtaining an image of the object by the method; extracting, by the one or more processors, a first feature point for forming a face image of a digital human image and a second feature point for forming a hair image of the digital human image from the image of the object; assigning, by the one or more processors, a first weight to the first feature point in consideration of intimacy between the object and the digital human image and the age of the object; assigning a second weight to the second feature point in consideration of a display distance of the digital human image; assigning, by the one or more processors, a third weight to the first feature point and the second feature point in consideration of the real-time face shape and real-time posture of the object; A band model having the highest degree of agreement with the object using the first feature point to which the first weight and the third weight are reflected and the second feature point to which the second weight and the third weight are reflected, by the one or more processors selecting; forming first three-dimensional data by three-dimensionally scanning the band model by the one or more processors and photographing images from various angles; forming, by the one or more processors, a first image of the object using the first 3D data; forming, by the one or more processors, a preset expression template with a depth camera to form second three-dimensional data; forming, by the one or more processors, a second image representing the expression of the object by using the second three-dimensional data; and synthesizing, by the one or more processors, the first image and the second image using three-dimensional animation points to form the digital human image.

본 발명의 일 실시 예에 따른 디지털 휴먼 영상 형성을 위한 전자 장치 및 방법과, 그를 수행하도록 컴퓨터 판독 가능한 기록 매체에 저장된 프로그램은, 대상체와 디지털 휴먼 영상과의 친밀도 및 대상체의 나이를 고려하여 디지털 휴먼 영상에 가중치를 부여하여 보다 정교하면서도 실감나는 디지털 휴먼 영상을 형성할 수 있다.An electronic device and method for forming a digital human image according to an embodiment of the present invention, and a program stored in a computer-readable recording medium to perform the same, consider the intimacy between the object and the digital human image and the age of the digital human By assigning weight to the image, it is possible to form a more sophisticated and realistic digital human image.

또한, 본 발명의 일 실시 예에 따른 디지털 휴먼 영상 형성을 위한 전자 장치 및 방법과, 그를 수행하도록 컴퓨터 판독 가능한 기록 매체에 저장된 프로그램은, 디지털 휴먼 영상의 디스플레이 거리를 고려하여 가중치를 부여하여 보다 정교하면서도 실감나는 디지털 휴먼 영상을 형성할 수 있다.In addition, the electronic device and method for forming a digital human image according to an embodiment of the present invention, and a program stored in a computer-readable recording medium to perform the same, are weighted in consideration of the display distance of the digital human image to be more sophisticated. It is possible to form a realistic digital human image.

또한, 본 발명의 일 실시 예에 따른 디지털 휴먼 영상 형성을 위한 전자 장치 및 방법과, 그를 수행하도록 컴퓨터 판독 가능한 기록 매체에 저장된 프로그램은, 대상체의 실시간 얼굴 형상 및 실시간 자세를 고려하여 가중치를 부여하여 정교하면서도 실감나는 디지털 휴먼 영상을 형성할 수 있다.In addition, the electronic device and method for forming a digital human image according to an embodiment of the present invention and a program stored in a computer-readable recording medium to perform the same are weighted in consideration of the real-time face shape and real-time posture of the object. It can form sophisticated yet realistic digital human images.

도 1은 본 발명의 실시 예에 따른 디지털 휴먼 영상 형성을 위한 전자 장치의 구성을 보이는 예시도이다.
도 2는 본 발명의 실시예에 따른 뉴럴 네트워크의 학습을 설명하기 위한 도면이다.
도 3은 본 발명의 실시 예에 따른 디지털 휴먼 영상 형성 방법의 절차를 보이는 흐름도이다.1 is an exemplary diagram showing the configuration of an electronic device for forming a digital human image according to an embodiment of the present invention.
2 is a diagram for explaining learning of a neural network according to an embodiment of the present invention.
3 is a flowchart illustrating a procedure of a digital human image forming method according to an embodiment of the present invention.

이하, 첨부된 도면들을 참조하여 본 발명의 바람직한 실시 예를 상세히 설명할 것이다. 그러나 본 발명의 기술적 사상은 여기서 설명되는 실시 예에 한정되지 않고 다른 형태로 구체화될 수도 있다. 오히려, 여기서 소개되는 실시 예는 개시된 내용이 철저하고 완전해질 수 있도록 그리고 당업자에게 본 발명의 사상이 충분히 전달될 수 있도록 하기 위해 제공되는 것이다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. However, the technical spirit of the present invention is not limited to the embodiments described herein and may be embodied in other forms. Rather, the embodiments introduced herein are provided so that the disclosed content may be thorough and complete, and the spirit of the present invention may be sufficiently conveyed to those skilled in the art.

본 명세서에서, 어떤 구성요소가 다른 구성요소 상에 있다고 언급되는 경우에 그것은 다른 구성요소 상에 직접 형성될 수 있거나 또는 그들 사이에 제 3의 구성요소가 개재될 수도 있다는 것을 의미한다.In this specification, when a component is referred to as being on another component, it means that it may be directly formed on the other component or a third component may be interposed therebetween.

또한, 본 명세서의 다양한 실시 예 들에서 제1, 제2, 제3 등의 용어가 다양한 구성요소들을 기술하기 위해서 사용되었지만, 이들 구성요소들이 이 같은 용어들에 의해서 한정되어서는 안 된다. 이들 용어들은 단지 어느 구성요소를 다른 구성요소와 구별시키기 위해서 사용되었을 뿐이다. 따라서, 어느 한 실시 예에 제 1 구성요소로 언급된 것이 다른 실시 예에서는 제 2 구성요소로 언급될 수도 있다. 여기에 설명되고 예시되는 각 실시 예는 그것의 상보적인 실시 예도 포함한다. 또한, 본 명세서에서 '및/또는'은 전후에 나열한 구성요소들 중 적어도 하나를 포함하는 의미로 사용되었다.In addition, in various embodiments of the present specification, terms such as first, second, third, etc. are used to describe various components, but these components should not be limited by these terms. These terms are only used to distinguish one component from another. Accordingly, what is referred to as a first component in one embodiment may be referred to as a second component in another embodiment. Each embodiment described and illustrated herein also includes a complementary embodiment thereof. In addition, in this specification, 'and/or' is used in the sense of including at least one of the components listed before and after.

명세서에서 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한 복수의 표현을 포함한다. 또한, "포함하다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 구성요소 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징이나 숫자, 단계, 구성요소 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 배제하는 것으로 이해되어서는 안 된다. In the specification, the singular expression includes the plural expression unless the context clearly dictates otherwise. In addition, terms such as "comprise" or "have" are intended to designate that a feature, number, step, element, or a combination thereof described in the specification exists, but one or more other features, number, step, configuration It should not be construed as excluding the possibility of the presence or addition of elements or combinations thereof.

또한, 본 명세서에서 "연결"은 복수의 구성 요소를 간접적으로 연결하는 것, 및 직접적으로 연결하는 것을 모두 포함하는 의미로 사용된다. 또한, "연결"이라 함은 물리적인 연결은 물론 전기적인 연결을 포함하는 개념이다.In addition, in this specification, "connection" is used in a sense including both indirectly connecting a plurality of components and directly connecting a plurality of components. In addition, the term "connection" is a concept including not only a physical connection but also an electrical connection.

또한, 하기에서 본 발명을 설명함에 있어 관련된 공지 기능 또는 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명은 생략할 것이다.In addition, in the following description of the present invention, if it is determined that a detailed description of a related well-known function or configuration may unnecessarily obscure the gist of the present invention, the detailed description thereof will be omitted.

본 발명에 따르면, 후술할 다양한 기술을 사용하여 디지털 휴먼 영상 형성을 위한 모델링을 진행하고, 딥러닝과 같은 인공지능(AI: Artificial Intelligence) 기술을 이용하여 상호 작용 가능한 데이터를 확보하여 딥페이크 형태의 대화 가능한 디지털 휴먼 영상을 형성할 수 있다.According to the present invention, modeling for digital human image formation is carried out using various techniques to be described later, and interactive data is obtained using artificial intelligence (AI) technology such as deep learning to obtain data in the form of deep fakes. It is possible to form an interactive digital human image.

또한, 본 발명에 따르면, 인물, 고인의 얼굴 이미지를 컴퓨터 그래픽 기술을 이용하여 복원하고, 인공지능을 이용하여 애니메이션 부분을 합성하는 방식으로 얼굴 이미지를 복원하는 것에 그치는 것이 아니라 현재 사용자와 상호 작용이 가능한 디지털 휴먼 영상을 형성할 수 있다.In addition, according to the present invention, the face image of a person or the deceased is restored using computer graphic technology, and the interaction with the current user is not limited to restoring the face image by synthesizing the animation part using artificial intelligence. It is possible to form a possible digital human image.

도 1은 본 발명의 실시 예에 따른 디지털 휴먼 영상 형성을 위한 전자 장치의 구성을 보이는 예시도이다.1 is an exemplary diagram showing the configuration of an electronic device for forming a digital human image according to an embodiment of the present invention.

도 1에 도시한 바와 같이, 디지털 휴먼 영상 형성을 위한 전자 장치(100)는, 하나 이상의 프로세서(110), 하나 이상의 메모리(120), 송수신기(130) 및 카메라(140)를 포함할 수 있다. 일 실시예로서, 전자 장치(100)의 이 구성요소들 중 적어도 하나가 생략되거나, 다른 구성요소가 디지털 휴먼 영상 형성을 위한 전자 장치(100)에 추가될 수 있다. 추가적으로(additionally) 또는 대체적으로(alternatively), 일부의 구성요소들이 통합되어 구현되거나, 단수 또는 복수의 개체로 구현될 수 있다. 디지털 휴먼 영상 형성을 위한 전자 장치(100) 내, 외부의 구성요소들 중 적어도 일부의 구성요소들은 시스템 버스(system bus), GPIO(general purpose input/output), SPI(serial peripheral interface) 또는 MIPI(mobile industry processor interface) 등을 통해 서로 연결되어, 데이터 및/또는 시그널을 주고받을 수 있다. 일 실시예로서, 디지털 휴먼 영상 형성을 위한 전자 장치(100)는 기계학습(machine learning) 특히, 딥러닝(deep learning)과 같은 심층 강화 학습 알고리즘을 이용하여 고인의 이미지로부터 딥페이크(deepfake) 영상을 형성할 수 있다.As shown in FIG. 1 , the electronic device 100 for forming a digital human image may include one or more processors 110 , one or more memories 120 , a transceiver 130 , and a camera 140 . As an embodiment, at least one of these components of the electronic device 100 may be omitted, or another component may be added to the electronic device 100 for forming a digital human image. Additionally or alternatively, some components may be integrated and implemented, or may be implemented as a singular or a plurality of entities. At least some of the internal and external components of the electronic device 100 for forming a digital human image may include a system bus, general purpose input/output (GPIO), serial peripheral interface (SPI), or MIPI (system bus). Mobile industry processor interface), etc. may be connected to each other to exchange data and/or signals. As an embodiment, the electronic device 100 for forming a digital human image is a deepfake image from an image of the deceased using a deep reinforcement learning algorithm such as machine learning, in particular, deep learning. can form.

하나 이상의 프로세서(110)는, 소프트웨어(예: 명령, 프로그램 등)를 구동하여 프로세서(110)에 연결된 디지털 휴먼 영상 형성을 위한 전자 장치(100)의 적어도 하나의 구성요소를 제어할 수 있다. 또한, 프로세서(110)는 본 발명과 관련된 다양한 연산, 처리, 데이터 생성, 가공 등의 동작을 수행할 수 있다. 또한, 프로세서(110)는 데이터 등을 하나 이상의 메모리(120)로부터 로드하거나, 하나 이상의 메모리(120)에 저장할 수 있다.The one or more processors 110 may control at least one component of the electronic device 100 for forming a digital human image connected to the processor 110 by driving software (eg, a command, a program, etc.). In addition, the processor 110 may perform various operations, processing, data generation, processing, etc. related to the present invention. In addition, the processor 110 may load data or the like from one or more memories 120 or store it in one or more memories 120 .

하나 이상의 프로세서(110)는, 대상체의 이미지를 획득할 수 있다. 일 실시 예에 따르면, 하나 이상의 프로세서(110)는, 디지털 휴먼 영상 형성을 위한 전자 장치(100)가 포함하는 카메라(140)를 이용하여 대상체의 이미지를 컴퓨터에서 처리 가능한 데이터 패킷 형태로 획득할 수 있다.The one or more processors 110 may acquire an image of the object. According to an embodiment, the one or more processors 110 may use the camera 140 included in the electronic device 100 for forming a digital human image to obtain an image of an object in the form of a data packet that can be processed by a computer. there is.

하나 이상의 프로세서(110)는, 대상체의 이미지로부터 디지털 휴먼 영상의 얼굴 이미지 형성을 위한 제1 특징점 및 디지털 휴먼 영상의 모발 이미지 형성을 위한 제2 특징점을 추출할 수 있다. 일 실시 예에 따르면, 하나 이상의 프로세서(110)는, 획득한 대상체의 이미지로부터 딥페이크(deepfake) 형태의 디지털 휴먼 영상의 얼굴 이미지 형성을 위한 제1 특징점(예를 들어, 30개의 얼굴 근육 형성을 위한 리깅(rigging) 특징점)을 추출하고, 디지털 휴먼 영상의 모발 이미지 형성을 위한 제2 특징점을 추출할 수 있다.The one or more processors 110 may extract a first feature point for forming a face image of the digital human image and a second feature point for forming a hair image of the digital human image from the image of the object. According to an embodiment, the one or more processors 110 may generate first feature points (eg, 30 facial muscles) for forming a face image of a digital human image in the form of a deepfake from the acquired image of the object. A rigging feature point) may be extracted, and a second feature point for forming a hair image of a digital human image may be extracted.

하나 이상의 프로세서(110)는, 대상체와 디지털 휴먼 영상과의 친밀도 및 대상체의 나이를 고려하여 제1 특징점에 제1 가중치를 부여할 수 있다. 일 실시 예에 따르면, 프로세서(110)는, 대상체의 단위 시간당 발화 횟수, 대상체의 단위 대화당 발화 시간 및 디지털 휴먼 영상의 발화 후 대상체가 발화하기까지 소요된 피드백 시간에 기초하여 대상체와 디지털 휴먼 영상과의 친밀도를 측정하고, 대상체의 목소리의 주파수에 기초하여 대상체의 나이를 측정하며, 측정된 친밀도 및 대상체의 나이를 고려하여 제1 특징점에 제1 가중치를 부여할 수 있다. 예를 들어, 프로세서(110)는, 대상체가 디지털 휴먼 영상과 친밀감을 보다 많이 느끼는 경우 발화 횟수 및 발화 시간이 증가하고, 피드백 시간이 감소하는 경향을 나타낼 수 있는데 이를 감지하여 대상체와 디지털 휴먼 영상과의 친밀도에 비례하여 디지털 휴먼 영상의 30개의 얼굴 근육이 확장하도록 가중치를 부여할 수 있다. 또한, 사람의 목소리는 나이에 따라서 가늘어지는 경향을 갖는데, 프로세서(110)는 대상체의 목소리의 주파수에 기초하여 대상체의 나이를 측정하며, 측정된 대상체의 나이에 반비례하여 디지털 휴먼 영상의 30개의 얼굴 근육이 확장하도록 가중치를 부여할 수 있다. 즉, 프로세서(110)는, 대상체의 나이가 어릴수록 디지털 휴먼 영상의 표정을 보다 밝게 표현하도록 설정할 수 있다.The one or more processors 110 may assign a first weight to the first feature point in consideration of intimacy between the object and the digital human image and the age of the object. According to an embodiment, the processor 110 is configured to display the object and the digital human image based on the number of utterances per unit time of the object, the utterance time per unit of conversation of the object, and the feedback time taken after the digital human image is uttered until the object is uttered. The intimacy with the subject may be measured, the age of the subject may be measured based on the frequency of the subject's voice, and a first weight may be assigned to the first feature point in consideration of the measured intimacy and the age of the subject. For example, when the object feels more intimacy with the digital human image, the processor 110 may indicate that the number of utterances and utterance time increase and the feedback time decrease may indicate a tendency to decrease. Weights can be given so that 30 facial muscles of the digital human image are expanded in proportion to the intimacy of . In addition, the human voice has a tendency to become thinner with age. The processor 110 measures the age of the object based on the frequency of the object's voice, and the 30 faces of the digital human image are in inverse proportion to the measured age of the object. You can weight the muscles to expand. That is, the processor 110 may set the expression of the digital human image to be brighter as the age of the object is younger.

하나 이상의 프로세서(110)는, 디지털 휴먼 영상의 디스플레이 거리를 고려하여 제2 특징점에 제2 가중치를 부여할 수 있다. 일 실시 예에 따르면, 프로세서(110)는, 디지털 휴먼 영상의 디스플레이 거리에 반비례하여 원거리에서는 낮은 디테일로 연산 되도록 자원을 분배할 수 있도록 제2 특징점에 제2 가중치를 부여할 수 있다.The one or more processors 110 may assign a second weight to the second feature point in consideration of the display distance of the digital human image. According to an embodiment, the processor 110 may assign a second weight to the second feature point so that resources can be distributed to be calculated with low detail at a distance in inverse proportion to the display distance of the digital human image.

하나 이상의 프로세서(110)는, 대상체의 움직임, 바람 및 상기 대상체 외부에서 작용하는 힘을 고려하여 제2 특징점에 추가로 가중치를 부여할 수도 있다. 일 실시 예에 따르면, 프로세서(110)는, 대상체의 이미지로부터 대상체의 얼굴이 움직이는 대상체 속도를 측정하고, 대상체의 미리 설정된 범위(예를 들어, 대상체 반경 1m 이내, 50cm 이내 등)에서 발생하는 바람의 세기를 측정하며, 대상체 외부에서 작용하는 외력을 측정하고, 측정된 대상체 속도, 바람의 세기 및 외력을 고려하여 제2 특징점에 가중치를 부여할 수 있다. 예를 들어, 프로세서(110)는, 측정된 대상체 속도를 이용하여 대상체가 움직이는 속도에 비례하여 디지털 휴먼 영상의 모발이 움직이도록 가중치를 부여할 수 있다. 즉, 대상체가 빨리 움직일 경우 디지털 휴먼 영상의 모발도 빨리 움직이도록 가중치를 부여할 수 있다. 또한, 프로세서(110)는, 측정된 대상체 주변 바람의 세기에 비례하여 디지털 휴먼 영상의 모발이 움직이도록 가중치를 부여할 수 있다. 즉, 대상체 주변에서 바람이 빠르게 불수록 디지털 휴먼 영상의 모발도 빨리 움직이도록 가중치를 부여할 수 있다.The one or more processors 110 may additionally assign a weight to the second feature point in consideration of the movement of the object, the wind, and a force acting outside the object. According to an embodiment, the processor 110 measures the speed of the object at which the face of the object moves from the image of the object, and wind generated within a preset range of the object (eg, within a radius of 1 m, within 50 cm of the object, etc.) may measure the intensity of , measure an external force acting outside the object, and assign a weight to the second feature point in consideration of the measured object speed, wind strength, and external force. For example, the processor 110 may assign weights so that the hair of the digital human image moves in proportion to the moving speed of the object using the measured object speed. That is, when the object moves quickly, weights may be given so that the hair of the digital human image also moves quickly. Also, the processor 110 may give weights so that the hair of the digital human image moves in proportion to the measured strength of the wind around the object. That is, weight may be given so that the faster the wind blows around the object, the faster the hair of the digital human image moves.

하나 이상의 프로세서(110)는, 대상체의 실시간 얼굴 형상 및 실시간 자세를 고려하여 제1 특징점 및 제2 특징점에 제3 가중치를 부여할 수 있다. 일 실시 예에 따르면, 프로세서(110)는, 대상체의 실시간 얼굴 형상 및 자세를 촬영하고, 제1 특징점 및 제2 특징점과 상기 촬영된 얼굴 형상 및 자세와의 상관도를 추출하며, 추출된 상관도를 고려하여 제1 특징점 및 제2 특징점에 제3 가중치를 부여할 수 있다. 예를 들어, 프로세서(110)는, 실시간 얼굴 모션 캡쳐(facial motion capture) 기술을 이용하여 대상체의 얼굴 형상을 추출할 수 있고, 추출된 얼굴 형상을 이용하여 대상체의 표정이 밝을 경우 이에 비례하여 디지털 휴먼 영상의 표정이 밝아지도록 디지털 휴먼 영상의 30개의 얼굴 근육이 확장하도록 제3 가중치를 부여할 수 있다. 또한, 프로세서(110)는, 실시간 모션 캡쳐 기술을 이용하여 대상체의 자세를 추출하고, 추출된 대상체의 자세를 이용하여 대상체의 움직임이 많은 경우 이에 비례하여 디지털 휴먼 영상의 30개의 얼굴 근육이 확장하도록 가중치를 부여할 수 있다.The one or more processors 110 may assign a third weight to the first feature point and the second feature point in consideration of the real-time face shape and real-time posture of the object. According to an embodiment, the processor 110 captures the real-time face shape and posture of the object, extracts a correlation between the first feature point and the second feature point and the photographed face shape and posture, and the extracted correlation In consideration of , a third weight may be assigned to the first and second feature points. For example, the processor 110 may extract the face shape of the object using a real-time facial motion capture technology, and when the expression of the object is bright using the extracted face shape, digital A third weight may be given so that 30 facial muscles of the digital human image expand so that the expression of the human image becomes brighter. In addition, the processor 110 extracts the posture of the object using real-time motion capture technology, and when there is a lot of movement of the object using the extracted posture, the 30 facial muscles of the digital human image expand in proportion to this. weights can be assigned.

하나 이상의 프로세서(110)는, 제1 가중치 및 제3 가중치가 반영된 제1 특징점 및 제2 가중치 및 제3 가중치가 반영된 제2 특징점을 이용하여 대상체와 일치도가 가장 높은 대역 모델을 선택할 수 있다. 일 실시 예에 따르면, 하나 이상의 프로세서(110)는, 제1 가중치 및 제3 가중치가 반영된 제1 특징점 및 제2 가중치 및 제3 가중치가 반영된 제2 특징점을 이용하여 대상체의 얼굴 형태와 일치도가 가장 높은 대역 모델을 선택할 수 있다. 예를 들어, 프로세서(110)는, 가중치들이 반영된 제1 특징점 및 제2 특징점을 이용하여 하나 이상의 메모리(120)에 저장된 적어도 하나의 3차원 모델들의 제2 특징점들을 추출하며, 제1 특징점 및 제2 특징점과 제3 특징점들 간의 일치도를 산출하고, 산출된 일치도가 가장 높은 3차원 모델을 대역 모델로 선택할 수 있다.The one or more processors 110 may select a band model having the highest degree of matching with the object by using the first feature point to which the first and third weights are reflected and the second feature point to which the second and third weights are reflected. According to an embodiment, the one or more processors 110 use the first feature point to which the first weight and the third weight are reflected and the second feature point to which the second weight and the third weight are reflected to have the highest degree of matching with the face shape of the object. A high-bandwidth model can be selected. For example, the processor 110 extracts second feature points of at least one 3D model stored in the one or more memories 120 using the first feature point and the second feature point to which the weights are reflected, and the first feature point and the second feature point A degree of agreement between the second characteristic point and the third characteristic point may be calculated, and a 3D model having the highest calculated degree of agreement may be selected as the band model.

하나 이상의 프로세서(110)는, 선택된 대역 모델을 3차원 스캐닝하고, 다양한 각도에서의 이미지를 촬영하여 제1 3차원 데이터를 형성할 수 있다. 일 실시 예에 따르면, 하나 이상의 프로세서(110)는, 선택된 대역 모델을 3차원 스캐닝하고, 동시에 다양한 각도에서의 대역 모델에 대한 이미지를 촬영하여 제1 3차원 데이터를 형성할 수 있다. 예를 들어, 프로세서(110)는, 디지털 휴먼 영상 형성을 위하여 대상체와 최대한 유사한 대역 모델을 선택하여 촬영하고, 대역 모델의 각도별 카메라 맵(camera map)을 이용하여 대상체의 이미지가 촬영된 각도와 동일한 각도가 되도록 제1 3차원 데이터를 형성할 수 있다. 또한, 프로세서(110)는, 형성된 제1 3차원 데이터에 대상체 이미지의 텍스쳐(texture) 정보를 추가로 혼합할 수도 있다.The one or more processors 110 may 3D scan the selected band model and capture images from various angles to form first 3D data. According to an embodiment, the one or more processors 110 may 3D scan the selected band model and simultaneously capture images of the band model at various angles to form first 3D data. For example, the processor 110 selects and shoots a band model that is as similar as possible to the object to form a digital human image, and uses a camera map for each angle of the band model to determine the angle at which the image of the object is captured. The first 3D data may be formed to have the same angle. Also, the processor 110 may additionally mix texture information of the object image with the formed first 3D data.

하나 이상의 프로세서(110)는, 제1 3차원 데이터를 이용하여 대상체의 제1 영상을 형성할 수 있다. 일 실시 예에 따르면, 프로세서(110)는, 대역 모델을 이용하여 형성된 대상체의 제1 3차원 데이터를 이용하여 대상체의 제1 영상을 형성할 수 있다. 이 과정에서, 프로세서(110)는, 제1 3차원 데이터에 다양한 각도의 대상체의 이미지를 투영하는 복수의 카메라를 설정하고, 3차원 영상의 외곽선이 자연스럽게 나타나도록 복수의 카메라 각각의 FOV(field of view), 위치값, 회전값 등을 수정하여 제1 영상을 형성할 수 있다.The one or more processors 110 may form a first image of the object by using the first 3D data. According to an embodiment, the processor 110 may form a first image of the object using first 3D data of the object formed using the band model. In this process, the processor 110 sets a plurality of cameras for projecting images of the object at various angles on the first three-dimensional data, and a field of view (FOV) of each of the plurality of cameras so that the outline of the three-dimensional image appears naturally. view), a position value, a rotation value, etc. may be modified to form the first image.

하나 이상의 프로세서(110)는, 미리 설정된 표정 템플릿을 뎁스(depth) 카메라로 촬영하여 제2 3차원 데이터를 형성할 수 있다. 일 실시예에 따르면, 하나 이상의 메모리(120)는, 모음을 발음할 경우의 입술 위치, 눈동자의 움직임, 눈 주위 근육, 눈썹의 높낮이에 따른 눈 주위 피부의 모양과 주름 형태, 콧구멍의 형태 및 웃는 얼굴의 피부 밀림과 주름 중 적어도 하나의 정보를 포함하는 표정 템플릿을 저장할 수 있고, 프로세서(110)는, 메모리(120)에 저장된 표정 템플릿을 뎁스 카메라로 촬영하여 제2 3차원 데이터를 형성할 수 있다. 예를 들어, 프로세서(110)는, 일반적인 사람의 표정을 분석하여 68개 이상의 표정 템플릿을 형성할 수 있다. 프로세서(110)는, 모음을 기준으로 하여 입술 모양에 따른 표정 템플릿 40개를 형성할 수 있다. 5개의 모음(아, 에, 이, 오, 우)의 발음에 따른 얼굴 표정을 세분화하여 윗입술과 아랫입술의 위치를 4단계로 분리하고, 추가 표정 (예를 들어, 입을 팽팽하게 부풀리는 표현 등)에 따른 데이터를 좌우로 나누어 합성하여 좀 더 자연스러운 얼굴 표정을 나타낼 수 있도록 표정 템플릿을 형성할 수 있다. 또한, 눈동자의 움직임에 따른 눈 주위의 근육 4방향(상, 하, 좌, 우)과 눈 깜빡임, 좌우 눈을 분리하여 표정 템플릿 14개를 형성할 수 있다. 또한, 콧구멍의 벌어짐과 늘어짐에 따른 표정 템플릿 4개, 웃을 때나 화날 때의 피부의 밀림과 주름을 추가 표현하는 표정 템플릿 3개, 및 눈썹의 높낮이에 따른 눈 주위 피부의 모양과 주름을 세세하게 표현한 표정 템플릿 7개를 형성할 수 있지만, 표정 템플릿의 개수 및 형태가 이에 한정되지 않는다.The one or more processors 110 may form second 3D data by photographing a preset expression template with a depth camera. According to an embodiment, the one or more memories 120 may include, when pronouncing a vowel, the shape of the skin around the eye according to the position of the lips, the movement of the pupil, the muscles around the eye, the shape of the skin around the eye according to the height of the eyebrow, the shape of the nostril, and the It is possible to store an expression template including information on at least one of skin tightening and wrinkles of a smiling face, and the processor 110 forms second three-dimensional data by photographing the expression template stored in the memory 120 with a depth camera. can For example, the processor 110 may analyze a general human expression to form 68 or more expression templates. The processor 110 may form 40 expression templates according to the shape of the lips based on the vowel. By subdividing facial expressions according to the pronunciation of 5 vowels (ah, e, i, oh, u), the positions of the upper and lower lip are divided into 4 stages, and additional facial expressions (e.g., the expression of inflating the mouth, etc.) ), an expression template can be formed to express a more natural facial expression by synthesizing the data according to the left and right sides. In addition, fourteen facial expression templates may be formed by separating the muscles around the eyes in 4 directions (up, down, left, right) according to the movement of the pupil, blinking, and left and right eyes. In addition, there are 4 expression templates according to the opening and sagging of the nostrils, 3 expression templates that additionally express the skin tightening and wrinkles when smiling or angry, and the shape and wrinkles of the skin around the eyes according to the height of the eyebrows are expressed in detail. Seven expression templates may be formed, but the number and shape of the expression templates are not limited thereto.

하나 이상의 프로세서(110)는, 제2 3차원 데이터를 이용하여 대상체의 표정을 나타내는 제2 영상을 형성할 수 있다. 일 실시 예에 따르면, 프로세서(110)는, 3차원 렌더링 방식을 이용하여 제2 3차원 데이터로부터 제2 영상을 형성할 수 있다.The one or more processors 110 may form a second image representing the expression of the object by using the second 3D data. According to an embodiment, the processor 110 may form a second image from the second 3D data using a 3D rendering method.

하나 이상의 프로세서(110)는, 3차원 애니메이션 포인트(예를 들어, 눈썹 양단, 눈 양단, 인중, 콧등 등)를 이용하여 상기 제1 영상과 상기 제2 영상을 합성하여 디지털 휴먼 영상을 형성할 수 있다. 일 실시 예에 따르면, 프로세서(110)는, 제1 영상의 3차원 애니메이션 포인트 및 제2 영상의 3차원 애니메이션 포인트를 추출하고 제1 영상과 제2 영상의 3차원 애니메이션 포인트가 일치하도록 합성하여 디지털 휴먼 영상을 형성할 수 있다. 예를 들어, 프로세서(110)는, 기존 영화 제작 시 합성 영상 형성에서 사용되던 마커 방식과 뎁스 카메라를 활용한 마커리스 방식을 적절히 활용하여 디지털 휴먼 영상을 형성할 수 있다. 즉, 프로세서(110)는, 정밀한 표현, 빠른 표정의 변화나 깊이 정보를 이용하여 감지하기 힘든 부분은 마커 방식을 이용하여 디지털 휴먼 영상을 형성하고, 일반적인 움직임에 대해서는 마커리스 방식을 이용하여 디지털 휴먼 영상을 형성할 수 있다.The one or more processors 110 may form a digital human image by synthesizing the first image and the second image using three-dimensional animation points (eg, both ends of the eyebrows, both ends of the eyes, the pharynx, the bridge of the nose, etc.). there is. According to an embodiment, the processor 110 extracts the 3D animation point of the first image and the 3D animation point of the second image and synthesizes the 3D animation point of the first image and the second image to match the digital It can form a human image. For example, the processor 110 may form a digital human image by appropriately utilizing a marker method used in forming a composite image during production of an existing movie and a markerless method using a depth camera. That is, the processor 110 forms a digital human image by using a marker method for a part that is difficult to detect using precise expression, rapid expression change, or depth information, and uses a markerless method for general movement to form a digital human image. image can be formed.

다른 실시 예에 따르면, 프로세서(110)는 제1 영상 및 제2 영상의 마커 위치 추적을 통해 형성된 3차원 애니메이션 포인트들을 정점의 위치에 미리 정의해둔 각 표정의 포인트 데이터값의 변화량을 측정하여 백분율 수치 데이터로 형성한 표정 템플릿에 적용하여 대상체의 다양한 표정을 표현할 수 있도록 디지털 휴먼 영상을 형성할 수 있다.According to another embodiment, the processor 110 measures the amount of change in the point data value of each facial expression in which the 3D animation points formed through the marker position tracking of the first image and the second image are predefined at the positions of the vertices to obtain a percentage value. It is possible to form a digital human image to express various expressions of an object by applying it to an expression template formed from data.

또한, 하나 이상의 프로세서(110)는, 제2 3차원 데이터로부터 대상체의 표면을 나타내는 다각형 데이터(polygon)를 최적화하여 로우(low) 폴리곤을 형성하고, 제2 영상의 3차원 RGB 정보를 나타내는 노말 맵(Normal map)을 형성하며, 제2 영상의 흑백 정보를 나타내는 디스플레이스먼트 맵(Displacement Map)을 형성하여 제2 영상을 형성할 수 있다. 일 실시 예에 따르면, 정교하게 제작된 표정 템플릿을 제2 영상 형성에 그대로 사용할 경우 애니메이션 확인이 어렵고 렌더링 시간이 많이 소요될 수 있다. 따라서, 프로세서(110)는, 1억 폴리곤 이상의 다각형 데이터를 최적화하여 로우 폴리곤을 형성하고, 제2 영상의 3차원 좌표 공간에서 x, y, z축에 직접 대응하는 RGB(Red Green Blue) 정보를 나타내는 노말 맵(normal map)을 형성할 수 있다. 또한, 프로세서(110)는, 제2 영상의 3차원 좌표 공간에서의 흑백 정보를 나타내는 디스플레이스먼트 맵(displacement map)을 형성한 후 로우 폴리곤, 노말 맵, 디스플레이스먼트 맵을 조합하여 제2 영상을 형성할 수 있다. In addition, the one or more processors 110 form a low polygon by optimizing polygon data representing the surface of the object from the second 3D data, and a normal map representing 3D RGB information of the second image. (Normal map) may be formed, and a displacement map representing black-and-white information of the second image may be formed to form the second image. According to an embodiment, when the elaborately manufactured expression template is used as it is to form the second image, it may be difficult to check the animation and it may take a lot of rendering time. Accordingly, the processor 110 forms a low polygon by optimizing polygon data of 100 million polygons or more, and obtains Red Green Blue (RGB) information directly corresponding to the x, y, and z axes in the 3D coordinate space of the second image. A normal map may be formed. In addition, the processor 110 forms a displacement map representing black-and-white information in the three-dimensional coordinate space of the second image, and then combines the low polygon, normal map, and displacement map to obtain the second image. can form.

또한, 하나 이상의 프로세서(110)는, 제2 영상의 피부의 두께 상태 및 주름 상태에 따른 서로 다른 제4 가중치를 적용하여 디지털 휴먼 영상을 형성할 수 있다. 일 실시 예에 따르면, 프로세서(110)는, 표정 템플릿별 노말 맵 및 디스플레이스먼트 맵을 3차원 애니메이션 포인트의 백분율값을 이용하여 디지털 휴먼 영상을 형성할 수 있다. 예를 들어, 프로세서(110)는, 피부의 두께에 따른 색깔 차이를 고려하여 피부의 최외곽 상태에 대한 백분율을 산출하여 제4 가중치를 적용할 수 있다. 이후 프로세서(110)는, 피부의 모공과 같은 디테일 값을 오버레이로 합성 영상에 덧입힘으로 노말 맵 및 디스플레이스먼트 맵에 적용하여 사실적인 표정의 디지털 휴먼 영상을 형성할 수 있다.In addition, the one or more processors 110 may form a digital human image by applying different fourth weights according to a skin thickness state and a wrinkle state of the second image. According to an embodiment, the processor 110 may form a digital human image using a percentage value of a 3D animation point for a normal map and a displacement map for each facial expression template. For example, the processor 110 may calculate a percentage for the outermost state of the skin in consideration of a color difference according to the thickness of the skin and apply the fourth weight. Thereafter, the processor 110 may form a digital human image of a realistic expression by applying a detail value such as a skin pores to the normal map and the displacement map by overlaying the synthetic image as an overlay.

일반적인 이미지를 이용하여서는 사람의 투명한 피부를 표현하기에는 무리가 있고, 노말 맵과 디스플레이스먼트 맵은 피부의 곡면의 품질에 영향을 끼치지만, 노말 맵과 디스플레이스먼트 맵만으로는 사실적인 사람의 얼굴 영상을 형성하기 어려울 수 있다. 따라서, 하나 이상의 프로세서(110)는, 표정 템플릿별 피부 쉐이더(shader)의 서브서페이스(sub-surface) 맵과 노말 맵 및 디스플레이스먼트 맵을 3차원 애니메이션 포인트의 백분율값을 이용하여 합성 영상을 형성하여 피부의 두께에 따른 질감을 표현할 수 있다. 사람의 피부는 깊이에 따라서 상피(上皮), 중피(中皮), 하피(下皮)로 나눌 수 있고, 상피에서 하피로 갈수록 혈관에 의한 영향을 받아서 붉은색이 더 진하게 표시될 수 있다. 하나 이상의 프로세서(110)는, 표정 템플릿별 피부의 두께 상태 및 주름 상태의 이미지를 1~100%의 백분율 값으로 제어하여 보다 사실적으로 합성 영상을 형성할 수 있다. 즉, 피부가 늘어나거나 두께가 다른 부분을 표정 템플릿별 3차원 애니메이션 포인트의 백분율 값만큼 더하여 피부가 늘어나거나 두꺼워지는 피부 질감 표현을 가능하게 할 수 있다.It is difficult to express the transparent skin of a person using a general image, and the normal map and the displacement map affect the quality of the curved surface of the skin, but only the normal map and the displacement map can produce a realistic image of a human face. It can be difficult to form. Accordingly, the one or more processors 110 form a composite image using a sub-surface map, a normal map, and a displacement map of a skin shader for each expression template using percentage values of 3D animation points. Thus, the texture can be expressed according to the thickness of the skin. Human skin can be divided into epithelium (上皮), mesothelium (中皮), and hypothelium (下皮) according to the depth, and as it goes from the epithelium to the lower epithelium, it is affected by blood vessels, so that the red color can be displayed more intensely. The one or more processors 110 may control the image of the skin thickness state and the wrinkle state for each expression template to a percentage value of 1 to 100% to form a more realistically synthesized image. In other words, it is possible to express the skin texture in which the skin is stretched or thickened by adding the part where the skin is stretched or has a different thickness by the percentage value of the 3D animation point for each expression template.

하나 이상의 프로세서(110)는, 마커리스 방식으로 뎁스 카메라를 사용하며, 제1 영상 형성과는 별도로 표정 데이터만 따로 뎁스 카메라를 이용하여 촬영하여 제2 영상을 형성할 수 있다. 제1 영상과 제2 영상 간에 립싱크의 타이밍과 머리 회전이 서로 다른 결과가 나타나기 때문에 디지털 휴먼 영상에서 눈동자의 초점이 흔들리는 현상이 나타날 수 있는데, 하나 이상의 프로세서(110)는, 제1 영상 및 제2 영상의 3차원 애니메이션 포인트의 회전값을 이용하여 표정별 템플릿의 수치가 자동 변화되는 리액션 작업을 거치게 되며 68개의 표정별 템플릿 중 40개의 입모양 템플릿의 애니메이션 타이밍을 보정할 수 있다. 하나 이상의 프로세서(110)는, 대역 모델의 웃는 모습과 같은 감정표현을 촬영하여 레이어로 합성시켜 최대한 자연스러운 표정이 나타나는 보정 작업을 진행한 후 디지털 휴먼 영상을 형성할 수 있다. 일 실시 예에 따르면, 하나 이상의 프로세서(110)는, 3차원 애니메이션 포인트의 추출을 위하여 상용화된 마커리스 트레킹 프로그램을 사용할 수 있다.The one or more processors 110 may form a second image by using a depth camera in a markerless manner, and photographing only facial expression data using the depth camera separately from forming the first image. Since the timing of the lip sync and the head rotation are different between the first image and the second image, a phenomenon in which the focus of the pupil is shaken may appear in the digital human image. Using the rotation value of the 3D animation point of the image, a reaction operation in which the numerical value of the template for each expression is automatically changed is performed, and the animation timing of 40 mouth templates among 68 templates for each expression can be corrected. The one or more processors 110 may form a digital human image after photographing an emotional expression such as a smile of the band model, synthesizing it into layers, and performing a correction operation to show a natural expression as much as possible. According to an embodiment, the one or more processors 110 may use a commercially available markerless tracking program to extract 3D animation points.

하나 이상의 프로세서(110)는, 대상체로부터 입력된 음성의 잡음을 처리하고, 언어의 특징을 추출하여 음향 모델화하고, 언어 모델과 어휘 사전을 참고하여 텍스트화 할 수 있다. 자연어 처리 알고리즘은 음성 인식으로 텍스트화 된 음성의 어휘 분석, 구문 분석, 의미 분석의 과정을 거쳐 메모리(120)에 저장된 결과를 도출하여 답변으로 출력할 수 있다. 하나 이상의 프로세서(110)는, 상기한 음성 인식 알고리즘과 자연어 처리 알고리즘을 융합하여 상호 작용 디지털 휴먼 영상 형성 시스템에 적용하고, 실시간 얼굴 모션 캡쳐 기술과 인공지능 딥페이크 기술로 축적된 감정 데이터베이스를 통해서 무한대의 얼굴 표정을 생성할 수 있다.The one or more processors 110 may process noise of a voice input from the object, extract language features to model the acoustics, and convert them into texts by referring to a language model and a vocabulary dictionary. The natural language processing algorithm may derive the result stored in the memory 120 through the process of lexical analysis, syntax analysis, and semantic analysis of the text converted to speech recognition and output it as an answer. One or more processors 110 converge the above-described speech recognition algorithm and natural language processing algorithm and apply it to an interactive digital human image forming system, and infinitely through the emotion database accumulated with real-time face motion capture technology and artificial intelligence deepfake technology. of facial expressions can be created.

하나 이상의 프로세서(110)는, 폴리곤 개수를 감소시켜 GPU(Graphics Processing Unit) 렌더링이 가능한 3차원 영상인 디지털 휴먼 영상을 생성할 수 있다. 또한, 프로세서(110)는, 디지털 휴먼 영상 형성을 위하여 효율적인 폴리곤 수를 측정하고, GPU 렌더링을 최적화할 수 있다. 예를 들어, 디지털 휴먼 영상 생성을 위한 폴리곤의 개수를 500만개에서 82,000개로 감소시킬 수 있다. 아울러, 프로세서(110)는, 디지털 휴먼 영상 형성을 위하여 3차원 영상 형성 시 사용되는 노말 맵 기술, 스무싱 그룹(Smoothing Group) 기술, 커버춰 맵(Curvature Map) 기술, 통합(Unity) 및 비현실(Unreal) 기반 리얼타임 렌더링 기술 등을 이용할 수 있다.The one or more processors 110 may reduce the number of polygons to generate a digital human image that is a 3D image capable of rendering by a Graphics Processing Unit (GPU). In addition, the processor 110 may measure the effective number of polygons to form a digital human image and optimize GPU rendering. For example, the number of polygons for generating a digital human image may be reduced from 5 million to 82,000. In addition, the processor 110 includes a normal map technology, a smoothing group technology, a curvature map technology, an integration (Unity) and unreal (Unreal) technology used in forming a 3D image to form a digital human image. ) based real-time rendering technology, etc. can be used.

하나 이상의 프로세서(110)는, 실시간 GPU 렌더링이 가능한 4K 스킨 텍스쳐 쉐이더 및 하드웨어 리소스를 효과적으로 이용하여 디지털 휴먼 영상을 형성할 수 있다. 또한, 프로세서(110)는, 디지털 휴먼 영상의 얼굴 30개의 근육을 각각 제어할 수 있는 리깅(rigging) 기술을 이용하여 각각의 근육을 별도 제어할 수 있는 컨트롤러를 실시간으로 구동시킬 수 있다. 아울러, 프로세서(110)는, 얼굴 표정의 요소를 세분화하는 기법과 분류 체계를 이용하여 얼굴 표정을 분류하고, 데이터를 생성할 수 있다. 일 실시 예에 따르면, 프로세서(110)는, 80개의 얼굴 표정을 제1 가중치에 기반하여 구현할 수 있다.The one or more processors 110 may form a digital human image by effectively using a 4K skin texture shader capable of real-time GPU rendering and hardware resources. In addition, the processor 110 may drive a controller capable of separately controlling each muscle using a rigging technology capable of controlling each of the 30 muscles of the face of the digital human image in real time. In addition, the processor 110 may classify the facial expression using a technique and classification system for subdividing the elements of the facial expression, and may generate data. According to an embodiment, the processor 110 may implement 80 facial expressions based on the first weight.

하나 이상의 프로세서(110)는, 대상체의 모발의 물리적 움직임을 실시간 GPU 렌더링으로 구현할 수 있다. 일 실시 예에 따르면, 프로세서(110)는, 대상체의 모발의 움직임을 수치화하여 데이터 분포 비율 기반 실시간 시각화를 구현할 수 있다. 예를 들어, 프로세서(110)는, 약 30만개의 모발 움직임을 실시간으로 구현할 수 있다. 다른 실시 예에 따르면, 프로세서(110)는, 대상체의 얼굴이 움직이는 대상체 속도, 대상체의 미리 설정된 범위에서 발생하는 바람의 세기 및 대상체 외부에서 작용하는 외력에 의한 디지털 휴먼 영상의 움직임을 제어할 수 있다. 다른 실시 예에 따르면, 프로세서(110)는, 대상체의 내부 및 외부작용에 의한 움직임 탐지가 연동되도록 디지털 휴먼 영상을 형성할 수 있다. 예를 들어, 프로세서(110)는, 어트랙터(attractor)를 이용한 디지털 휴먼 영상의 움직임 제어를 수행할 수 있다.The one or more processors 110 may implement the physical movement of the hair of the object through real-time GPU rendering. According to an embodiment, the processor 110 may implement real-time visualization based on a data distribution ratio by quantifying the movement of the subject's hair. For example, the processor 110 may implement about 300,000 hair movements in real time. According to another embodiment, the processor 110 may control the movement of the digital human image due to the speed of the object at which the face of the object moves, the strength of a wind generated in a preset range of the object, and an external force acting outside the object. . According to another embodiment, the processor 110 may form a digital human image so that motion detection by internal and external actions of the object is interlocked. For example, the processor 110 may perform motion control of the digital human image using an attractor.

하나 이상의 프로세서(110)는, 실시간 얼굴 모션 캡쳐 및 실시간 GPU 출력 연동 기술을 이용하여 디지털 휴면 영상을 형성할 수 있다. 또한, 프로세서(110)는, 대상체의 세밀한 근육 움직임을 해부학적인 얼굴 표정에 대입하여 디지털 휴먼 영상을 형성할 수 있다. 아울러, 프로세서(110)는, 실시간으로 대상체의 얼굴을 탐지하고, 대상체의 자세를 추정하여 디지털 휴먼 영상을 형성할 수 있다. 일 실시 예에 따르면, 프로세서(110)는, 실시간 인터랙션 반영 여부와 GPU 렌더링하에서의 디지털 휴먼 영상 구동을 테스트할 수 있다. 예를 들어, 프로세서(110)는, 얼굴 인식 정확도 90% 이상, 전송 지연율 0.5 sec 이하로 디지털 휴먼 영상을 구동시킬 수 있다.The one or more processors 110 may form a digital resting image using real-time facial motion capture and real-time GPU output interworking technology. Also, the processor 110 may form a digital human image by substituting detailed muscle movements of the object to anatomical facial expressions. In addition, the processor 110 may form a digital human image by detecting the face of the object in real time and estimating the posture of the object. According to an embodiment, the processor 110 may test whether real-time interaction is reflected and driving a digital human image under GPU rendering. For example, the processor 110 may drive the digital human image with a face recognition accuracy of 90% or more and a transmission delay rate of 0.5 sec or less.

하나 이상의 프로세서(110)는, 3차원 모델링, 딥페이크 기술을 사용하여 대상체의 얼굴 형태를 사실적으로 최적화할 수 있고, 디퍼트 렌더링 등의 기술을 사용하여 컨텐츠 최적화를 수행할 수 있다.The one or more processors 110 may realistically optimize the face shape of the object using 3D modeling and deepfake technology, and may perform content optimization using techniques such as default rendering.

하나 이상의 메모리(120)는, 다양한 데이터를 저장할 수 있다. 메모리(120)에 저장되는 데이터는, 디지털 휴먼 영상 형성을 위한 전자 장치(100)의 적어도 하나의 구성요소에 의해 획득되거나, 처리되거나, 사용되는 데이터로서, 소프트웨어(예: 명령, 프로그램 등)를 포함할 수 있다. 메모리(120)는 휘발성 및/또는 비휘발성 메모리를 포함할 수 있다. 본 발명에서, 명령 내지 프로그램은 메모리(120)에 저장되는 소프트웨어로서, 디지털 휴먼 영상 형성을 위한 전자 장치(100)의 리소스를 제어하기 위한 운영체제, 어플리케이션 및/또는 어플리케이션이 전자 장치의 리소스들을 활용할 수 있도록 다양한 기능을 어플리케이션에 제공하는 미들 웨어 등을 포함할 수 있다.The one or more memories 120 may store various data. Data stored in the memory 120 is data acquired, processed, or used by at least one component of the electronic device 100 for forming a digital human image, and includes software (eg, commands, programs, etc.) may include Memory 120 may include volatile and/or non-volatile memory. In the present invention, commands or programs are software stored in the memory 120 , and an operating system, an application, and/or an application for controlling the resources of the electronic device 100 for forming a digital human image can utilize the resources of the electronic device. It may include middleware that provides various functions to applications.

하나 이상의 메모리(120)는, 상술한 대상체의 이미지, 대상체의 제1 및 제2 특징점, 제1 내지 제4 가중치, 대역 모델, 제1 및 제2 3차원 데이터 등을 저장할 수 있다. 또한, 하나 이상의 메모리(120)는, 제1 및 제2 영상, 디지털 휴먼 영상을 저장할 수 있다. 또한, 하나 이상의 메모리(120)는, 하나 이상의 프로세서(110)에 의한 실행 시, 하나 이상의 프로세서(110)가 연산을 수행하도록 하는 명령들을 저장할 수 있다.The one or more memories 120 may store the above-described image of the object, first and second feature points of the object, first to fourth weights, band models, first and second 3D data, and the like. Also, the one or more memories 120 may store first and second images and a digital human image. In addition, the one or more memories 120 may store instructions that, when executed by the one or more processors 110 , cause the one or more processors 110 to perform an operation.

일 실시 예에 따르면, 디지털 휴먼 영상 형성을 위한 전자 장치(100)는 송수신기(130)를 더 포함할 수 있다. 송수신기(130)는, 디지털 휴먼 영상 형성을 위한 전자 장치(100)와 서버, 데이터베이스, 클라이언트 장치들 및/또는 기타 다른 장치 간의 무선 또는 유선 통신을 수행할 수 있다. 예를 들어, 송수신기(130)는 eMBB(enhanced Mobile Broadband), URLLC(Ultra Reliable Low-Latency Communications), MMTC(Massive Machine Type Communications), LTE(long-term evolution), LTE-A(LTE Advance), UMTS(Universal Mobile Telecommunications System), GSM(Global System for Mobile communications), CDMA(code division multiple access), WCDMA(wideband CDMA), WiBro(Wireless Broadband), WiFi(wireless fidelity), 블루투스(Bluetooth), NFC(near field communication), GPS(Global Positioning System) 또는 GNSS(global navigation satellite system) 등의 방식에 따른 무선 통신을 수행할 수 있다. 예를 들어, 송수신기(130)는 USB(universal serial bus), HDMI(high definition multimedia interface), RS-232(recommended standard232) 또는 POTS(plain old telephone service) 등의 방식에 따른 유선 통신을 수행할 수 있다.According to an embodiment, the electronic device 100 for forming a digital human image may further include a transceiver 130 . The transceiver 130 may perform wireless or wired communication between the electronic device 100 for forming a digital human image and a server, a database, client devices, and/or other devices. For example, the transceiver 130 includes enhanced Mobile Broadband (eMBB), Ultra Reliable Low-Latency Communications (URLLC), Massive Machine Type Communications (MMTC), long-term evolution (LTE), LTE Advance (LTE-A), UMTS (Universal Mobile Telecommunications System), GSM (Global System for Mobile communications), CDMA (code division multiple access), WCDMA (wideband CDMA), WiBro (Wireless Broadband), WiFi (wireless fidelity), Bluetooth (Bluetooth), NFC ( Near field communication), a global positioning system (GPS), or a global navigation satellite system (GNSS) may perform wireless communication. For example, the transceiver 130 may perform wired communication according to a method such as universal serial bus (USB), high definition multimedia interface (HDMI), recommended standard232 (RS-232), or plain old telephone service (POTS). there is.

일 실시 예에 따르면, 하나 이상의 프로세서(110)는, 송수신기(130)를 제어하여 서버 및 데이터베이스로부터 정보를 획득할 수 있다. 서버 및 데이터베이스로부터 획득된 정보는 하나 이상의 메모리(120)에 저장될 수 있다. 일 실시예로서, 서버 및 데이터베이스로부터 획득되는 정보는 적어도 하나의 3차원 모델 등을 포함할 수 있다.According to an embodiment, the one or more processors 110 may control the transceiver 130 to obtain information from a server and a database. Information obtained from the server and database may be stored in one or more memories 120 . As an embodiment, the information obtained from the server and the database may include at least one 3D model or the like.

일 실시 예에 따르면, 디지털 휴먼 영상 형성을 위한 전자 장치(100)는, 다양한 형태의 장치가 될 수 있다. 예를 들어, 디지털 휴먼 영상 형성을 위한 전자 장치(100)는 휴대용 통신 장치, 컴퓨터 장치, 또는 상술한 장치들 중 하나 또는 그 이상의 조합에 따른 장치일 수 있다. 본 발명의 디지털 휴먼 영상 형성을 위한 전자 장치(100)는 전술한 장치들에 한정되지 않는다.According to an embodiment, the electronic device 100 for forming a digital human image may have various types of devices. For example, the electronic device 100 for forming a digital human image may be a portable communication device, a computer device, or a device according to a combination of one or more of the above devices. The electronic device 100 for forming a digital human image of the present invention is not limited to the above-described devices.

본 발명에 따른 디지털 휴먼 영상 형성을 위한 전자 장치(100)의 다양한 실시예들은 서로 조합될 수 있다. 각 실시예들은 경우의 수에 따라 조합될 수 있으며, 조합되어 만들어진 디지털 휴먼 영상 형성을 위한 전자 장치(100)의 실시예 역시 본 발명의 범위에 속한다. 또한 전술한 본 발명에 따른 디지털 휴먼 영상 형성을 위한 전자 장치(100)의 내/외부 구성 요소들은 실시예에 따라 추가, 변경, 대체 또는 삭제될 수 있다. 또한 전술한 디지털 휴먼 영상 형성을 위한 전자 장치(100)의 내/외부 구성 요소들은 하드웨어 컴포넌트로 구현될 수 있다.Various embodiments of the electronic device 100 for forming a digital human image according to the present invention may be combined with each other. Each embodiment may be combined according to the number of cases, and an embodiment of the electronic device 100 for forming a digital human image made by combining also falls within the scope of the present invention. In addition, the above-described internal/external components of the electronic device 100 for forming a digital human image according to the present invention may be added, changed, replaced, or deleted according to embodiments. In addition, the above-described internal/external components of the electronic device 100 for forming a digital human image may be implemented as hardware components.

본 발명에서, 인공지능(Artificial Intelligence, AI)은 인간의 학습능력, 추론능력, 지각능력 등을 모방하고, 이를 컴퓨터로 구현하는 기술을 의미하고, 기계 학습, 심볼릭 로직(Symbolic Logic) 등의 개념을 포함할 수 있다. 기계 학습(Machine Learning, ML)은 입력 데이터들의 특징을 스스로 분류 또는 학습하는 알고리즘 기술이다. 인공지능의 기술은 기계 학습의 알고리즘으로써 입력 데이터를 분석하고, 그 분석의 결과를 학습하며, 그 학습의 결과에 기초하여 판단이나 예측을 할 수 있다. 또한, 기계 학습의 알고리즘을 활용하여 인간 두뇌의 인지, 판단 등의 기능을 모사하는 기술들 역시 인공지능의 범주로 이해될 수 있다. 예를 들어, 언어적 이해, 시각적 이해, 추론/예측, 지식 표현, 동작 제어 등의 기술 분야가 포함될 수 있다.In the present invention, artificial intelligence (AI) refers to a technology that imitates human learning ability, reasoning ability, perceptual ability, etc., and implements it with a computer, and concepts such as machine learning and symbolic logic may include Machine Learning (ML) is an algorithm technology that classifies or learns characteristics of input data by itself. Artificial intelligence technology is an algorithm of machine learning that can analyze input data, learn the results of the analysis, and make judgments or predictions based on the results of the learning. In addition, technologies that use machine learning algorithms to simulate functions such as cognition and judgment of the human brain can also be understood as a category of artificial intelligence. For example, technical fields such as verbal comprehension, visual comprehension, reasoning/prediction, knowledge expression, and motion control may be included.

기계 학습은 데이터를 처리한 경험을 이용해 신경망 모델을 훈련시키는 처리를 의미할 수 있다. 기계 학습을 통해 컴퓨터 소프트웨어는 스스로 데이터 처리 능력을 향상시키는 것을 의미할 수 있다. 신경망 모델은 데이터 사이의 상관 관계를 모델링하여 구축된 것으로서, 그 상관 관계는 복수의 파라미터에 의해 표현될 수 있다. 신경망 모델은 주어진 데이터로부터 특징들을 추출하고 분석하여 데이터 간의 상관 관계를 도출하는데, 이러한 과정을 반복하여 신경망 모델의 파라미터를 최적화해 나가는 것이 기계 학습이라고 할 수 있다. 예를 들어, 신경망 모델은 입출력 쌍으로 주어지는 데이터에 대하여, 입력과 출력 사이의 매핑(상관 관계)을 학습할 수 있다. 또는, 신경망 모델은 입력 데이터만 주어지는 경우에도 주어진 데이터 사이의 규칙성을 도출하여 그 관계를 학습할 수도 있다.Machine learning may refer to the processing of training a neural network model using the experience of processing data. With machine learning, computer software could mean improving its own data processing capabilities. The neural network model is constructed by modeling the correlation between data, and the correlation may be expressed by a plurality of parameters. A neural network model extracts and analyzes features from given data to derive correlations between data, and repeating this process to optimize parameters of a neural network model can be called machine learning. For example, the neural network model may learn a mapping (correlation) between an input and an output with respect to data given as an input/output pair. Alternatively, the neural network model may learn the relationship by deriving regularity between the given data even when only input data is given.

인공지능 학습모델 또는 신경망 모델은 인간의 뇌 구조를 컴퓨터 상에서 구현하도록 설계될 수 있으며, 인간의 신경망의 뉴런(neuron)을 모의하며 가중치를 가지는 복수의 네트워크 노드들을 포함할 수 있다. 복수의 네트워크 노드들은 뉴런이 시냅스(synapse)를 통하여 신호를 주고받는 뉴런의 시냅틱(synaptic) 활동을 모의하여, 서로 간의 연결 관계를 가질 수 있다. 인공지능 학습모델에서 복수의 네트워크 노드들은 서로 다른 깊이의 레이어에 위치하면서 컨볼루션(convolution) 연결 관계에 따라 데이터를 주고받을 수 있다. 인공지능 학습모델은, 예를 들어, 인공 신경망 모델(Artificial Neural Network), 컨볼루션 신경망 모델(Convolution Neural Network: CNN) 등일 수 있다. 일 실시예로서, 인공지능 학습모델은, 지도학습(Supervised Learning), 비지도 학습(Unsupervised Learning), 강화 학습(Reinforcement Learning) 등의 방식에 따라 기계 학습될 수 있다. 기계 학습을 수행하기 위한 기계 학습 알고리즘에는, 의사결정트리(Decision Tree), 베이지안 망(Bayesian Network), 서포트 벡터 머신(Support Vector Machine), 인공 신경망(Artificial Neural Network), 에이다부스트(Ada-boost), 퍼셉트론(Perceptron), 유전자 프로그래밍(Genetic Programming), 군집화(Clustering) 등이 사용될 수 있다.The artificial intelligence learning model or neural network model may be designed to implement a human brain structure on a computer, and may include a plurality of network nodes that simulate neurons of a human neural network and have weights. A plurality of network nodes may have a connection relationship with each other by simulating a synaptic activity of a neuron through which a neuron sends and receives a signal through a synapse. In the AI learning model, a plurality of network nodes can exchange data according to a convolutional connection relationship while being located in layers of different depths. The artificial intelligence learning model may be, for example, an artificial neural network model, a convolutional neural network model, or the like. As an embodiment, the AI learning model may be machine-learned according to a method such as supervised learning, unsupervised learning, reinforcement learning, or the like. Machine learning algorithms for performing machine learning include decision trees, Bayesian networks, support vector machines, artificial neural networks, and Ada-boost. , Perceptron, Genetic Programming, Clustering, etc. may be used.

이중, CNN은 최소한의 전처리(preprocess)를 사용하도록 설계된 다계층 퍼셉트론(multilayer perceptrons)의 한 종류이다. CNN은 하나 또는 여러 개의 합성곱 계층과 그 위에 올려진 일반적인 인공 신경망 계층들로 이루어져 있으며, 가중치와 통합 계층(pooling layer)들을 추가로 활용한다. 이러한 구조 덕분에 CNN은 2차원 구조의 입력 데이터를 충분히 활용할 수 있다. 다른 딥러닝 구조들과 비교해서, CNN은 영상, 음성 분야 모두에서 좋은 성능을 보여준다. CNN은 또한 표준 역전달을 통해 훈련될 수 있다. CNN은 다른 피드포워드 인공신경망 기법들보다 쉽게 훈련되는 편이고 적은 수의 매개변수를 사용한다는 이점이 있다.Among them, CNN is a type of multilayer perceptrons designed to use minimal preprocessing. CNN consists of one or several convolutional layers and general artificial neural network layers on top of it, and additionally utilizes weights and pooling layers. Thanks to this structure, CNN can fully utilize the input data of the two-dimensional structure. Compared with other deep learning structures, CNN shows good performance in both video and audio fields. CNNs can also be trained through standard back-passing. CNNs are easier to train than other feed-forward neural network techniques and have the advantage of using fewer parameters.

컨볼루션 네트워크는 묶인 파라미터들을 가지는 노드들의 집합들을 포함하는 신경 네트워크들이다. 사용 가능한 트레이닝 데이터의 크기 증가와 연산 능력의 가용성이, 구분적 선형 단위 및 드롭아웃 트레이닝과 같은 알고리즘 발전과 결합되어, 많은 컴퓨터 비전 작업들이 크게 개선되었다. 오늘날 많은 작업에 사용할 수 있는 데이터 세트들과 같은 엄청난 양의 데이터 세트에서는 초과 맞춤(outfitting)이 중요하지 않으며, 네트워크의 크기를 늘리면 테스트 정확도가 향상된다. 컴퓨팅 리소스들의 최적 사용은 제한 요소가 된다. 이를 위해, 심층 신경 네트워크들의 분산된, 확장 가능한 구현예가 사용될 수 있다.Convolutional networks are neural networks that contain sets of nodes with bound parameters. The increasing size of available training data and the availability of computational power, coupled with advances in algorithms such as piecewise linear units and dropout training, have greatly improved many computer vision tasks. For huge data sets, such as those available for many tasks today, overfitting is not important, and increasing the size of the network improves test accuracy. Optimal use of computing resources becomes a limiting factor. To this end, a distributed, scalable implementation of deep neural networks may be used.

도 2는 본 발명의 실시예에 따른 뉴럴 네트워크의 학습을 설명하기 위한 도면이다.2 is a diagram for explaining learning of a neural network according to an embodiment of the present invention.

도 2에 도시한 바와 같이, 학습 장치는 대상체의 이미지가 포함하는 제1 및 제2 특징점의 추출을 위하여 뉴럴 네트워크(114)를 학습시킬 수 있다. 일 실시예에 따르면, 학습 장치는 디지털 휴먼 영상 형성을 위한 전자 장치(100)와 다른 별개의 주체일 수 있지만, 이에 제한되는 것은 아니다.As shown in FIG. 2 , the learning apparatus may train the neural network 114 to extract the first and second feature points included in the image of the object. According to an embodiment, the learning device may be a separate entity from the electronic device 100 for forming a digital human image, but is not limited thereto.

뉴럴 네트워크(114)는 트레이닝 샘플들이 입력되는 입력 레이어(112)와 트레이닝 출력들을 출력하는 출력 레이어(116)를 포함하고, 트레이닝 출력들과 레이블들 사이의 차이에 기초하여 학습될 수 있다. 여기서, 레이블들은 제1 및 제2 특징점 객체에 대응하는 신체 부위 정보에 기초하여 정의될 수 있다. 뉴럴 네트워크(114)는 복수의 노드들의 그룹으로 연결되어 있고, 연결된 노드들 사이의 가중치들과 노드들을 활성화시키는 활성화 함수에 의해 정의된다.The neural network 114 includes an input layer 112 to which training samples are input and an output layer 116 to output training outputs, and can be trained based on differences between the training outputs and labels. Here, the labels may be defined based on body part information corresponding to the first and second feature point objects. The neural network 114 is connected as a group of a plurality of nodes, and is defined by weights between the connected nodes and an activation function that activates the nodes.

학습 장치는 GD(Gradient Decent) 기법 또는 SGD(Stochastic Gradient Descent) 기법을 이용하여 뉴럴 네트워크(114)를 학습시킬 수 있다. 학습 장치는 뉴럴 네트워크의 출력들 및 레이블들 의해 설계된 손실 함수(Loss Function)를 이용할 수 있다.The learning apparatus may train the neural network 114 by using a Gradient Decent (GD) technique or a Stochastic Gradient Descent (SGD) technique. The learning apparatus may use a loss function designed by the outputs and labels of the neural network.

학습 장치는 미리 정의된 손실 함수를 이용하여 트레이닝 에러를 계산할 수 있다. 손실 함수는 레이블, 출력 및 파라미터를 입력 변수로 미리 정의될 수 있고, 여기서 파라미터는 뉴럴 네트워크(114) 내 가중치들에 의해 설정될 수 있다. 예를 들어, 손실 함수는 MSE(Mean Square Error) 형태, 엔트로피(entropy) 형태 등으로 설계될 수 있는데, 손실 함수가 설계되는 실시예에는 다양한 기법 또는 방식이 채용될 수 있다.The learning apparatus may calculate the training error using a predefined loss function. The loss function may be predefined with labels, outputs and parameters as input variables, where the parameters may be set by weights in the neural network 114 . For example, the loss function may be designed in a Mean Square Error (MSE) form, an entropy form, or the like, and various techniques or methods may be employed in an embodiment in which the loss function is designed.

학습 장치는 역전파(Backpropagation) 기법을 이용하여 트레이닝 에러에 영향을 주는 가중치들을 찾아낼 수 있다. 여기서, 가중치들은 뉴럴 네트워크(114) 내 노드들 사이의 관계들이다. 학습 장치는 역전파 기법을 통해 찾아낸 가중치들을 최적화시키기 위해 레이블들 및 출력들을 이용한 SGD 기법을 이용할 수 있다. 예를 들어, 학습 장치는 레이블들, 출력들 및 가중치들에 기초하여 정의된 손실 함수의 가중치들을 SGD 기법을 이용하여 갱신할 수 있다.The learning apparatus may find weights affecting the training error by using a backpropagation technique. Here, the weights are relationships between nodes in the neural network 114 . The learning apparatus may use the SGD technique using labels and outputs to optimize the weights found through the backpropagation technique. For example, the learning apparatus may update the weights of the loss function defined based on the labels, outputs, and weights using the SGD technique.

일 실시예에 따르면, 학습 장치는 트레이닝 대상체의 이미지들을 획득하고, 트레이닝 대상체의 이미지들로부터 트레이닝 특징점 객체들을 추출할 수 있다. 학습 장치는 트레이닝 특징점 객체들에 대해서 각각 미리 레이블링 된 정보(제1 레이블들)를 획득할 수 있는데, 트레이닝 특징점 객체들에 미리 정의된 신체 부위 정보(예를 들어, 30개의 얼굴 근육 형성을 위한 리깅 특징점, 눈, 코, 입, 귀, 턱선, 눈썹 등)를 나타내는 제1 레이블들을 획득할 수 있다.According to an embodiment, the learning apparatus may obtain images of the training object and extract training feature point objects from the images of the training object. The learning apparatus may obtain pre-labeled information (first labels) for each training feature point object, and body part information (eg, rigging for forming 30 facial muscles) predefined in the training feature point objects. First labels indicating a feature point, eyes, nose, mouth, ears, jaw line, eyebrows, etc.) may be obtained.

일 실시예에 따르면, 학습 장치는 트레이닝 특징점 객체들의 외관 특징들, 패턴 특징들 및 색상 특징들에 기초하여 제1 트레이닝 특징 벡터들을 생성할 수 있다. 트레이닝 특징점 객체들의 특징을 추출하는 데는 다양한 방식이 채용될 수 있다.According to an embodiment, the learning apparatus may generate the first training feature vectors based on appearance features, pattern features, and color features of the training feature point objects. Various methods may be employed to extract the features of the training feature point objects.

일 실시예에 따르면, 학습 장치는 제1 트레이닝 특징 벡터들을 뉴럴 네트워크(114)에 적용하여 트레이닝 출력들을 획득할 수 있다. 학습 장치는 트레이닝 출력들과 제1 레이블들에 기초하여 뉴럴 네트워크(114)를 학습시킬 수 있다. 학습 장치는 트레이닝 출력들에 대응하는 트레이닝 에러들을 계산하고, 그 트레이닝 에러들을 최소화하기 위해 뉴럴 네트워크(114) 내 노드들의 연결 관계를 최적화하여 뉴럴 네트워크(114)를 학습시킬 수 있다. 디지털 휴먼 영상 형성을 위한 전자 장치(100)는 학습이 완료된 뉴럴 네트워크(114)를 이용하여 대상체의 이미지로부터 대상체의 제1 및 제2 특징점을 추출할 수 있다.According to an embodiment, the learning apparatus may obtain training outputs by applying the first training feature vectors to the neural network 114 . The learning apparatus may train the neural network 114 based on the training outputs and the first labels. The learning apparatus may train the neural network 114 by calculating training errors corresponding to the training outputs, and optimizing a connection relationship between nodes in the neural network 114 in order to minimize the training errors. The electronic device 100 for forming a digital human image may extract first and second feature points of the object from the image of the object by using the neural network 114 on which learning has been completed.

도 3은 본 발명의 실시 예에 따른 디지털 휴먼 영상 형성 방법의 절차를 보이는 흐름도이다. 도 3의 흐름도에서 프로세스 단계들, 방법 단계들, 알고리즘들 등이 순차적인 순서로 설명되었지만, 그러한 프로세스들, 방법들 및 알고리즘들은 임의의 적합한 순서로 작동하도록 구성될 수 있다. 다시 말하면, 본 발명의 다양한 실시예들에서 설명되는 프로세스들, 방법들 및 알고리즘들의 단계들이 본 발명에서 기술된 순서로 수행될 필요는 없다. 또한, 일부 단계들이 비동시적으로 수행되는 것으로서 설명되더라도, 다른 실시예에서는 이러한 일부 단계들이 동시에 수행될 수 있다. 또한, 도면에서의 묘사에 의한 프로세스의 예시는 예시된 프로세스가 그에 대한 다른 변화들 및 수정들을 제외하는 것을 의미하지 않으며, 예시된 프로세스 또는 그의 단계들 중 임의의 것이 본 발명의 다양한 실시예들 중 하나 이상에 필수적임을 의미하지 않으며, 예시된 프로세스가 바람직하다는 것을 의미하지 않는다.3 is a flowchart illustrating a procedure of a digital human image forming method according to an embodiment of the present invention. Although process steps, method steps, algorithms, etc. are described in a sequential order in the flowchart of FIG. 3 , such processes, methods, and algorithms may be configured to operate in any suitable order. In other words, the steps of the processes, methods, and algorithms described in various embodiments of the invention need not be performed in the order described herein. Also, although some steps are described as being performed asynchronously, in other embodiments some of these steps may be performed concurrently. Further, the exemplification of a process by description in the drawings does not imply that the exemplified process excludes other changes and modifications thereto, and that the exemplified process or any of its steps may be used in any of the various embodiments of the present invention. It is not meant to be essential to one or more, nor does it imply that the illustrated process is preferred.

도 3에 도시한 바와 같이, 단계(S310)에서, 대상체의 이미지가 획득된다. 예를 들어, 도 1 내지 도 2를 참조하면, 디지털 휴먼 영상 형성을 위한 전자 장치(100)의 프로세서(110)는, 대상체의 이미지를 획득할 수 있다. 일 실시 예에 따르면, 프로세서(110)는, 디지털 휴먼 영상 형성을 위한 전자 장치(100)가 포함하는 카메라(140)를 이용하여 대상체의 이미지를 컴퓨터에서 처리 가능한 데이터 패킷 형태로 획득할 수 있다.As shown in FIG. 3 , in step S310 , an image of an object is acquired. For example, referring to FIGS. 1 and 2 , the processor 110 of the electronic device 100 for forming a digital human image may acquire an image of an object. According to an embodiment, the processor 110 may acquire an image of an object in the form of a data packet processable by a computer using the camera 140 included in the electronic device 100 for forming a digital human image.

단계(S320)에서, 대상체의 이미지로부터 디지털 휴먼 영상의 얼굴 이미지 형성을 위한 제1 특징점 및 상기 디지털 휴먼 영상의 모발 이미지 형성을 위한 제2 특징점이 추출된다. 예를 들어, 도 1 내지 도 2를 참조하면, 디지털 휴먼 영상 형성을 위한 전자 장치(100)의 프로세서(110)는, 단계 S310에서 획득한 대상체의 이미지로부터 디지털 휴먼 영상의 얼굴 이미지 형성을 위한 제1 특징점 및 상기 디지털 휴먼 영상의 모발 이미지 형성을 위한 제2 특징점을 추출할 수 있다. 일 실시 예에 따르면, 프로세서(110)는, 딥러닝(deep learning)과 같은 기계 학습 알고리즘을 이용하여 획득한 대상체의 이미지로부터 제1 및 제2 특징점을 추출할 수 있다.In step S320, a first feature point for forming a face image of the digital human image and a second feature point for forming a hair image of the digital human image are extracted from the image of the object. For example, referring to FIGS. 1 and 2 , the processor 110 of the electronic device 100 for forming a digital human image performs a first step for forming a face image of a digital human image from the image of the object obtained in step S310. One key point and a second key point for forming a hair image of the digital human image may be extracted. According to an embodiment, the processor 110 may extract first and second feature points from an image of an object obtained by using a machine learning algorithm such as deep learning.

단계(S330)에서, 대상체와 디지털 휴먼 영상과의 친밀도 및 대상체의 나이를 고려하여 제1 특징점에 제1 가중치가 부여된다. 예를 들어, 도 1 내지 도 2를 참조하면, 디지털 휴먼 영상 형성을 위한 전자 장치(100)의 프로세서(110)는, 대상체의 단위 시간당 발화 횟수, 대상체의 단위 대화당 발화 시간 및 디지털 휴먼 영상의 발화 후 대상체가 발화하기까지 소요된 피드백 시간에 기초하여 대상체와 디지털 휴먼 영상과의 친밀도를 측정하고, 대상체의 목소리의 주파수에 기초하여 대상체의 나이를 측정하며, 측정된 친밀도 및 대상체의 나이를 고려하여 제1 특징점에 제1 가중치를 부여할 수 있다.In step S330 , a first weight is assigned to the first feature point in consideration of the intimacy between the object and the digital human image and the age of the object. For example, referring to FIGS. 1 and 2 , the processor 110 of the electronic device 100 for forming a digital human image may display the number of utterances per unit time of the object, the utterance time per unit conversation of the object, and the digital human image. The intimacy between the object and the digital human image is measured based on the feedback time taken until the object speaks after the utterance, and the age of the object is measured based on the frequency of the object's voice, and the measured intimacy and the age of the object are taken into account. Thus, a first weight may be assigned to the first feature point.

단계(S340)에서, 디지털 휴먼 영상의 디스플레이 거리를 고려하여 제2 특징점에 제2 가중치가 부여된다. 예를 들어, 도 1 내지 도 2를 참조하면, 디지털 휴먼 영상 형성을 위한 전자 장치(100)의 프로세서(110)는, 디지털 휴먼 영상의 디스플레이 거리에 반비례하여 원거리에서는 낮은 디테일로 연산 되도록 자원을 분배할 수 있도록 제2 특징점에 제2 가중치를 부여할 수 있다.In step S340 , a second weight is given to the second feature point in consideration of the display distance of the digital human image. For example, referring to FIGS. 1 and 2 , the processor 110 of the electronic device 100 for forming a digital human image distributes resources to be calculated with low detail at a distance in inverse proportion to the display distance of the digital human image. A second weight may be assigned to the second feature point to make it possible.

단계(S350)에서, 대상체의 실시간 얼굴 형상 및 실시간 자세를 고려하여 제1 특징점 및 제2 특징점에 제3 가중치가 부여된다. 예를 들어, 도 1 내지 도 2를 참조하면, 디지털 휴먼 영상 형성을 위한 전자 장치(100)의 프로세서(110)는, 대상체의 실시간 얼굴 형상 및 자세를 촬영하고, 제1 특징점 및 제2 특징점과 촬영된 얼굴 형상 및 자세와의 상관도를 추출하며, 추출된 상관도를 고려하여 제1 특징점 및 제2 특징점에 제3 가중치를 부여할 수 있다. 일 실시 예에 따르면, 프로세서(110)는, 실시간 얼굴 모션 캡쳐 기술을 이용하여 대상체의 얼굴 형상을 추출할 수 있고, 추출된 얼굴 형상을 이용하여 대상체의 표정이 밝을 경우 이에 비례하여 디지털 휴먼 영상의 표정이 밝아지도록 디지털 휴먼 영상의 30개의 얼굴 근육이 확장하도록 제3 가중치를 부여할 수 있다.In step S350 , a third weight is given to the first feature point and the second feature point in consideration of the real-time face shape and real-time posture of the object. For example, referring to FIGS. 1 and 2 , the processor 110 of the electronic device 100 for forming a digital human image captures a real-time face shape and posture of an object, and includes first and second feature points and A degree of correlation with the photographed face shape and posture may be extracted, and a third weight may be given to the first feature point and the second feature point in consideration of the extracted correlation degree. According to an embodiment, the processor 110 may extract the face shape of the object using real-time facial motion capture technology, and when the expression of the object is bright using the extracted face shape, the digital human image is displayed in proportion to this. A third weight may be given so that 30 facial muscles of the digital human image are expanded to brighten the expression.

단계(S360)에서, 제1 가중치 및 제3 가중치가 반영된 제1 특징점 및 제2 가중치 및 제3 가중치가 반영된 제2 특징점을 이용하여 대상체와 일치도가 가장 높은 대역 모델이 선택된다. 예를 들어, 도 1 내지 도 2를 참조하면, 디지털 휴먼 영상 형성을 위한 전자 장치(100)의 프로세서(110)는, 제1 가중치 및 제3 가중치가 반영된 제1 특징점 및 제2 가중치 및 제3 가중치가 반영된 제2 특징점을 이용하여 대상체의 얼굴 형태와 일치도가 가장 높은 대역 모델을 선택할 수 있다.In step S360, a band model having the highest degree of agreement with the object is selected using the first feature point to which the first and third weights are reflected and the second feature point to which the second and third weights are reflected. For example, referring to FIGS. 1 and 2 , the processor 110 of the electronic device 100 for forming a digital human image includes a first feature point, a second weight, and a third weight to which the first weight and the third weight are reflected. A band model having the highest degree of matching with the face shape of the object may be selected using the second feature point to which the weight is reflected.

단계(S370)에서, 대역 모델을 3차원 스캐닝하고, 다양한 각도에서의 이미지를 촬영하여 제1 3차원 데이터가 형성된다. 예를 들어, 도 1 내지 도 2를 참조하면, 디지털 휴먼 영상 형성을 위한 전자 장치(100)의 프로세서(110)는, 선택된 대역 모델을 3차원 스캐닝하고, 동시에 다양한 각도에서의 대역 모델에 대한 이미지를 촬영하여 제1 3차원 데이터를 형성할 수 있다.In step S370, first three-dimensional data is formed by three-dimensionally scanning the band model and photographing images from various angles. For example, referring to FIGS. 1 and 2 , the processor 110 of the electronic device 100 for forming a digital human image 3D scans the selected band model, and simultaneously images the band model at various angles. may be photographed to form first 3D data.

단계(S380)에서, 제1 3차원 데이터를 이용하여 대상체의 제1 영상이 형성된다. 예를 들어, 도 1 내지 도 2를 참조하면, 디지털 휴먼 영상 형성을 위한 전자 장치(100)의 프로세서(110)는, 대역 모델을 이용하여 형성된 대상체의 제1 3차원 데이터를 이용하여 대상체의 제1 영상을 형성할 수 있다.In operation S380, a first image of the object is formed using the first 3D data. For example, referring to FIGS. 1 and 2 , the processor 110 of the electronic device 100 for forming a digital human image uses the first 3D data of the object formed using the band model to form a second image of the object. 1 image can be formed.

단계(S390)에서, 미리 설정된 표정 템플릿을 뎁스 카메라로 촬영하여 제2 3차원 데이터가 형성된다. 예를 들어, 도 1 내지 도 2를 참조하면, 디지털 휴먼 영상 형성을 위한 전자 장치(100)의 프로세서(110)는, 메모리(120)에 저장된 표정 템플릿을 뎁스 카메라로 촬영하여 제2 3차원 데이터를 형성할 수 있다.In step (S390), the second three-dimensional data is formed by photographing a preset expression template with a depth camera. For example, referring to FIGS. 1 and 2 , the processor 110 of the electronic device 100 for forming a digital human image captures an expression template stored in the memory 120 with a depth camera to obtain second 3D data. can form.

단계(S400)에서, 제2 3차원 데이터를 이용하여 대상체의 표정을 나타내는 제2 영상이 형성된다. 예를 들어, 도 1 내지 도 2를 참조하면, 디지털 휴먼 영상 형성을 위한 전자 장치(100)의 프로세서(110)는, 3차원 렌더링 방식을 이용하여 제2 3차원 데이터로부터 제2 영상을 형성할 수 있다.In step S400 , a second image representing the expression of the object is formed using the second 3D data. For example, referring to FIGS. 1 and 2 , the processor 110 of the electronic device 100 for forming a digital human image may form a second image from second 3D data using a 3D rendering method. can

단계(S410)에서, 3차원 애니메이션 포인트를 이용하여 제1 영상과 제2 영상을 합성하여 디지털 휴먼 영상이 형성된다. 예를 들어, 도 1 내지 도 2를 참조하면, 디지털 휴먼 영상 형성을 위한 전자 장치(100)의 프로세서(110)는, 제1 영상의 3차원 애니메이션 포인트 및 제2 영상의 3차원 애니메이션 포인트를 추출하고 제1 영상과 제2 영상의 3차원 애니메이션 포인트가 일치하도록 합성하여 디지털 휴먼 영상을 형성할 수 있다.In step S410, a digital human image is formed by synthesizing the first image and the second image using 3D animation points. For example, referring to FIGS. 1 and 2 , the processor 110 of the electronic device 100 for forming a digital human image extracts a 3D animation point of a first image and a 3D animation point of a second image. and synthesizing the three-dimensional animation points of the first image and the second image to match to form a digital human image.

상기 방법은 특정 실시예들을 통하여 설명되었지만, 상기 방법은 또한 컴퓨터로 읽을 수 있는 기록매체에 컴퓨터가 읽을 수 있는 코드로서 구현하는 것이 가능하다. 컴퓨터가 읽을 수 있는 기록매체는 컴퓨터 시스템에 의해 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록장치를 포함한다. 컴퓨터가 읽을 수 있는 기록매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피 디스크, 광데이터 저장 장치 등이 있다. 또한, 컴퓨터가 읽을 수 있는 기록매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다. 그리고, 상기 실시예들을 구현하기 위한 기능적인(functional) 프로그램, 코드 및 코드 세그먼트들은 본 발명이 속하는 기술분야의 프로그래머들에 의해 용이하게 추론될 수 있다.Although the method has been described through specific embodiments, the method can also be implemented as computer-readable code on a computer-readable recording medium. The computer-readable recording medium includes all types of recording devices in which data that can be read by a computer system is stored. Examples of computer-readable recording media include ROM, RAM, CD-ROM, magnetic tape, floppy disk, and optical data storage device. In addition, the computer-readable recording medium is distributed in a computer system connected through a network, so that the computer-readable code can be stored and executed in a distributed manner. In addition, functional programs, codes, and code segments for implementing the above embodiments can be easily inferred by programmers in the technical field to which the present invention pertains.

이상, 본 발명을 바람직한 실시 예를 사용하여 상세히 설명하였으나, 본 발명의 범위는 특정 실시 예에 한정되는 것은 아니며, 첨부된 특허청구범위에 의하여 해석되어야 할 것이다. 또한, 이 기술분야에서 통상의 지식을 습득한 자라면, 본 발명의 범위에서 벗어나지 않으면서도 많은 수정과 변형이 가능함을 이해하여야 할 것이다.As mentioned above, although the present invention has been described in detail using preferred embodiments, the scope of the present invention is not limited to specific embodiments and should be construed according to the appended claims. In addition, those skilled in the art will understand that many modifications and variations are possible without departing from the scope of the present invention.

100: 디지털 휴먼 영상 형성을 위한 전자 장치 110: 프로세서
120: 메모리 130: 송수신기
140: 카메라 112: 입력 레이어
114: 뉴럴 네트워크 116: 출력 레이어 100: Electronic device for forming a digital human image 110: Processor
120: memory 130: transceiver
140: camera 112: input layer
114: neural network 116: output layer

Claims

An electronic device for forming a digital human image, comprising:
one or more processors; and
and one or more memories storing instructions that, when executed by the one or more processors, cause the one or more processors to perform an operation,
The one or more processors,
acquiring an image of the object,
extracting a first feature point for forming a face image and a second feature point for forming a hair image from the image of the object,
A first weight is given to the first feature point in consideration of intimacy with the object and the age of the object,
A second weight is given to the second feature point in consideration of the display distance,
A third weight is given to the first feature point and the second feature point in consideration of the real-time face shape and real-time posture of the object;
forming the digital human image using a first feature point to which the first weight and the third weight are reflected and the second feature point to which the second weight and the third weight are reflected;
Electronic devices for forming digital human images.

According to claim 1,
The one or more processors,
The intimacy between the object and the digital human image is measured based on the number of utterances per unit time of the object, the utterance time per unit conversation of the object, and the feedback time taken from the utterance of the digital human image until the object utters, and ,
measuring the age of the subject based on the frequency of the subject's voice,
giving the first weight to the first feature point in consideration of the measured intimacy and the age of the object;
Electronic devices for forming digital human images.

According to claim 1,
The one or more processors,
assigning the second weight to the second feature point so as to distribute resources so as to be calculated with low detail in inverse proportion to the display distance of the digital human image;
The one or more processors,
forming the digital human image by applying different fourth weights according to the thickness state and wrinkle state of the subject's skin,
The one or more processors,
Applying the fourth weight by calculating the percentage for the outermost state of the skin in consideration of the color difference according to the thickness of the skin,
The one or more processors,
photographing the real-time face shape and posture of the object,
extracting the correlation between the first feature point and the second feature point and the photographed face shape and posture,
giving the third weight to the first feature point and the second feature point in consideration of the extracted correlation,
Electronic devices for forming digital human images.

one or more processors; and
A method of forming a digital human image using an electronic device for forming a digital human image including one or more memories storing instructions for the one or more processors to perform an operation when executed by the one or more processors,
acquiring, by the one or more processors, an image of the object;
extracting, by the one or more processors, a first feature point for forming a face image and a second feature point for forming a hair image from the image of the object;
assigning, by the one or more processors, a first weight to the first feature point in consideration of intimacy with the object and the age of the object;
assigning a second weight to the second feature point in consideration of a display distance;
assigning, by the one or more processors, a third weight to the first feature point and the second feature point in consideration of the real-time face shape and real-time posture of the object;
forming, by the one or more processors, the digital human image using a first feature point to which the first weight and the third weight are reflected, and the second feature point to which the second weight and the third weight are reflected doing,
A method of forming a digital human image.

one or more processors; And when executed by the one or more processors, the one or more processors as a computer program stored in a computer-readable recording medium to be executable by a computer including one or more memories stored therein,
acquiring, by the one or more processors, an image of the object;
extracting, by the one or more processors, a first feature point for forming a face image and a second feature point for forming a hair image from the image of the object;
assigning, by the one or more processors, a first weight to the first feature point in consideration of intimacy with the object and the age of the object;
assigning a second weight to the second feature point in consideration of a display distance;
assigning, by the one or more processors, a third weight to the first feature point and the second feature point in consideration of the real-time face shape and real-time posture of the object; and
Forming a digital human image by using the first feature point to which the first weight and the third weight are reflected and the second feature point to which the second weight and the third weight are reflected can be performed by the one or more processors A computer program stored in a computer-readable recording medium to