KR20210070874A

KR20210070874A - 3d human body model reconstruction apparatus and method

Info

Publication number: KR20210070874A
Application number: KR1020190179350A
Authority: KR
Inventors: 이상훈; 허정우
Original assignee: 연세대학교 산학협력단
Priority date: 2019-12-04
Filing date: 2019-12-31
Publication date: 2021-06-15
Also published as: KR102270949B1

Abstract

The present invention provides an apparatus and method for restoring a three-dimensional (3D) human model, which comprises: a two-dimensional (2D) posture feature extraction unit for estimating a 2D joint vector corresponding to each joint position of at least one person included in a 2D image by a previously learned pattern estimation method and normalizing the estimated 2D joint vector in a predetermined mode to acquire features of a 2D posture; a multi-step human model estimation unit for receiving 2D posture features for each of the at least one person included in the 2D image, estimating a body type and a posture vector designated by a human model template in order to restore a human model corresponding to the 2D posture feature by a previously learned pattern estimation method, and estimating a location vector in order to dispose the human model being restored at a location corresponding to the 2D posture feature in the 3D space; and a 3D space disposition unit for receiving a status vector composed of a body type vector, a posture vector and a location vector and a 2D posture feature to restore a human model corresponding to the body type vector and the posture vector and disposing the human model restored according to the location vector in 3D space.

Description

3D human model restoration apparatus and method {3D HUMAN BODY MODEL RECONSTRUCTION APPARATUS AND METHOD}

본 발명은 3차원 인간 모델 복원 장치 및 방법에 관한 것으로, 2차원 자세와 인간 템플릿 모델을 활용한 다수의 3차원 인간 모델 복원 장치 및 방법에 관한 것이다.The present invention relates to an apparatus and method for restoring a three-dimensional human model, and to a plurality of apparatus and methods for restoring a three-dimensional human model using a two-dimensional posture and a human template model.

최근 영상으로부터 사람의 자세를 추출하는 기술들이 발전하여 행동 인식, 애니메이션 및 가상 현실이나 증강 현실과 같은 기존의 응용 분야뿐만 아니라 보안, 의료 등 다양한 분야에서 활용되고 있다. 특히 2차원의 영상으로부터 3차원 자세를 추출하는 기술은 각 관절들의 좌표를 3차원 공간상에 표현하기 때문에 사람의 공간적인 움직임까지 분석할 수 있다.Techniques for extracting a person's posture from an image have recently been developed and are being used in various fields, such as security and medical care, as well as behavior recognition, animation, and existing applications such as virtual reality or augmented reality. In particular, the technique of extracting a three-dimensional posture from a two-dimensional image can analyze even the spatial movement of a person because the coordinates of each joint are expressed in a three-dimensional space.

그러나 사람의 자세를 단순히 관절의 위치만으로 복원할 경우, 팔의 회전과 같은 상세한 움직임들은 복원할 수 없으며 사람의 체형과 같은 정보들도 알 수 없다. 이러한 정보들을 획득하지 못하면 사람의 움직임을 가상 공간상의 아바타에 투영시킬 때 어려움이 있다. 그리고 기술의 발전이 점차 증강 현실/가상 현실 공간상으로의 콘텐츠 생성을 지향하고 있는 상황에서, 단순 사람 자세 추출을 넘어 사람의 움직임을 상세하게 표현할 수 있도록 3차원 공간 상에서 인간 모델을 복원할 필요성이 있다.However, when a person's posture is simply restored by the joint position, detailed movements such as arm rotation cannot be restored, and information such as a person's body type cannot be known. If such information is not obtained, it is difficult to project the movement of a person onto the avatar in the virtual space. And in a situation in which the development of technology is gradually oriented toward the creation of contents in the augmented reality/virtual reality space, the need to restore a human model in a three-dimensional space so that it can express the movement of a person in detail beyond simple human posture extraction have.

또한 기존에 제안되었던 3차원 인간 모델 복원 방식들은 하나의 영상이 입력으로 들어왔을 때 하나의 모델만 복원하였으며, 사람이 영상을 촬영한 카메라부터 얼마나 떨어져 있는지에 대한 거리 정보를 제공해주지 않는다. 즉 사람의 자세만을 복원할 뿐, 3차원 공간 상에서의 사람의 위치를 복원하지 않는다. 따라서 하나의 영상에 여러 사람이 포함된 경우에 기존에는 모든 사람들에 대한 3차원 인간 모델을 복원할 수 없을 뿐만 아니라 사람들 간의 거리 관계 또한 알 수 없다는 한계가 있다.In addition, the previously proposed 3D human model restoration methods restore only one model when one image is input, and do not provide distance information about how far away a person is from the camera that captured the image. That is, only the posture of the person is restored, but the position of the person in the three-dimensional space is not restored. Therefore, when several people are included in one image, there is a limitation that not only cannot reconstruct a 3D human model for all people, but also the distance relationship between people cannot be known.

한국 공개 특허 제10-2018-0097949호 (2018.09.03 공개)Korean Patent Publication No. 10-2018-0097949 (published on September 3, 2018)

본 발명의 목적은 2차원 영상으로부터 3 차원에서의 위치와 회전, 자세 및 체형이 반영된 다수의 사람에 대한 3차원 인간 모델을 정확하게 복원할 수 있는 3차원 인간 모델 복원 장치 및 방법을 제공하는데 있다.It is an object of the present invention to provide a 3D human model restoration apparatus and method capable of accurately reconstructing a 3D human model for a plurality of people in which positions, rotations, postures, and body types in 3D are reflected from a 2D image.

본 발명의 다른 목적은 2차원 영상의 배경과 무관하게 3차원의 가상 공간상에서 다양한 응용이 가능한 3차원 인간 모델을 획득할 수 있는 3차원 인간 모델 복원 장치 및 방법을 제공하는데 있다.Another object of the present invention is to provide a 3D human model restoration apparatus and method capable of obtaining a 3D human model capable of various applications in a 3D virtual space regardless of the background of the 2D image.

상기 목적을 달성하기 위한 본 발명의 일 실시예에 따른 3차원 인간 모델 복원 장치는 미리 학습된 패턴 추정 방식에 따라 2차원 영상에 포함된 적어도 하나의 사람 각각의 관절 위치에 대응하는 2차원 관절 벡터를 추정하고, 추정된 2차원 관절 벡터를 기지정된 방식으로 정규화하여 2차원 자세 특징을 획득하는 2차원 자세 특징 추출부; 상기 2차원 영상에 포함된 적어도 하나의 사람 각각에 대한 상기 2차원 자세 특징을 인가받고, 미리 학습된 패턴 추정 방식에 따라 상기 2차원 자세 특징에 대응하는 체형 및 자세의 인간 모델을 복원하기 위해 인간 모델 템플릿에 의해 지정되는 체형 벡터와 자세 벡터를 추정하고, 복원되는 인간 모델을 3차원 공간 상의 상기 2차원 자세 특징에 대응하는 위치에 배치하기 위한 위치 벡터를 추정하는 다단 인간 모델 추정부; 및 상기 체형 벡터와 상기 자세 벡터 및 상기 위치 벡터로 구성되는 상태 벡터와 상기 2차원 자세 특징을 인가받아, 상기 체형 벡터와 상기 자세 벡터에 대응하는 인간 모델을 복원하고, 상기 위치 벡터에 따라 복원된 인간 모델을 3차원 공산 상에 배치하는 3차원 공간 배치부를 포함한다.A 3D human model restoration apparatus according to an embodiment of the present invention for achieving the above object is a 2D joint vector corresponding to each joint position of at least one person included in a 2D image according to a pre-learned pattern estimation method. a two-dimensional posture feature extractor for estimating and normalizing the estimated two-dimensional joint vector in a predetermined manner to obtain two-dimensional posture features; In order to receive the two-dimensional posture characteristic for each of the at least one person included in the two-dimensional image, and to reconstruct a human model of a body type and posture corresponding to the two-dimensional posture characteristic according to a pre-learned pattern estimation method, the human a multi-stage human model estimator for estimating a body shape vector and a posture vector specified by a model template, and estimating a position vector for arranging the restored human model at a position corresponding to the two-dimensional posture feature on a three-dimensional space; and a state vector composed of the body shape vector, the posture vector, and the position vector and the two-dimensional posture feature are applied, and a human model corresponding to the body vector and the posture vector is restored, and the restored human model is restored according to the position vector. and a three-dimensional space arrangement unit for arranging the human model on a three-dimensional space.

상기 다단 인간 모델 추정부는 다단 구조로 순차 연결된 다수의 모델 추정부를 포함하고, 다수의 모델 추정부 각각은 상기 2차원 자세 특징과 이전단의 모델 추정부에서 추정된 상태 벡터를 인가받아 결합하여 결합 상태 벡터를 생성하고, 생성된 결합 상태 벡터로부터 미리 학습된 패턴 추정 방식에 따라 상기 2차원 자세 특징에 대응하는 체형 및 자세의 인간 모델을 복원하기 위해 인간 모델 템플릿에 의해 지정되는 체형 벡터와 자세 벡터를 추정하고, 복원되는 인간 모델을 3차원 공간 상의 상기 2차원 자세 특징에 대응하는 위치에 배치하기 위한 위치 벡터를 추정하여 체형 벡터와 자세 벡터 및 위치 벡터를 포함하는 새로운 상태 벡터를 생성하여 다음단의 모델 추정부로 전달할 수 있다.The multi-stage human model estimator includes a plurality of model estimators sequentially connected in a multi-stage structure, and each of the plurality of model estimators receives and combines the two-dimensional posture characteristic and the state vector estimated by the model estimator of the previous stage to combine them into a combined state. The vector is generated, and the body shape vector and the posture vector specified by the human model template are used to reconstruct the human model of the body shape and posture corresponding to the two-dimensional posture feature according to the pattern estimation method learned in advance from the generated combined state vector. By estimating and estimating a position vector for arranging the restored human model at a position corresponding to the two-dimensional posture feature in the three-dimensional space, a new state vector including the body vector, the posture vector, and the position vector is generated. It can be passed to the model estimator.

상기 다수의 모델 추정부 각각은 상기 2차원 자세 특징과 이전단의 모델 추정부에서 추정되어 상태 벡터를 인가받아 결합하여 결합 상태 벡터를 생성하는 특징 취합부; 상기 결합 상태 벡터로부터 미리 학습된 패턴 추정 방식에 따라 상기 2차원 자세 특징에 대응하는 체형 벡터와 3차원 공간에서 인간 모델이 배치되어야 하는 위치에 대응하는 위치 벡터 및 다수의 인간 관절 각각의 회전 각도에 대응하는 다수의 관절 회전 벡터로 구성되는 자세 벡터 중 기지정된 루트 관절을 기준으로 인간 모델 전체의 회전 각도를 나타내는 전역 회전 벡터를 추정하는 위치 추정부; 및 상기 2차원 자세 특징과 상기 전역 회전 벡터를 인가받고, 상기 전역 회전 벡터를 기준으로 상기 루트 관절에서 인접한 관절로부터 먼 관절의 상대적 회전 각도를 기지정된 순차로 추정하여 상기 자세 벡터 중 상기 전역 회전 벡터를 제외한 나머지 관절의 관절 회전 벡터를 포함하는 바디 자세 벡터를 획득하는 모델 자세 추정부를 포함할 수 있다.Each of the plurality of model estimator includes a feature assembling unit for generating a combined state vector by combining the two-dimensional posture feature and the state vector estimated by the model estimator of the previous stage; According to the pattern estimation method learned in advance from the combined state vector, the body shape vector corresponding to the two-dimensional posture feature, the position vector corresponding to the position where the human model should be placed in the three-dimensional space, and the rotation angle of each of a plurality of human joints a position estimator for estimating a global rotation vector representing a rotation angle of the entire human model based on a predetermined root joint among posture vectors composed of a plurality of corresponding joint rotation vectors; and receiving the two-dimensional posture characteristic and the global rotation vector, and estimating the relative rotation angle of a joint far from an adjacent joint in the root joint based on the global rotation vector in a predetermined sequence to estimate the global rotation vector among the posture vectors It may include a model posture estimation unit for obtaining a body posture vector including the joint rotation vector of the remaining joints except for.

상기 위치 추정부는 상기 결합 상태 벡터로부터 상기 2차원 자세 특징에 대응하는 체형 벡터를 추정하도록 미리 학습된 체형 추정부; 및 체형 추정부와 독립적으로 학습되어 상기 결합 상태 벡터로부터 상기 2차원 자세 특징에 대응하는 상기 위치 벡터와 상기 전역 회전 벡터를 동시에 추정하는 깊이/회전 추정부를 포함할 수 있다.The position estimator may include: a body shape estimator trained in advance to estimate a body shape vector corresponding to the two-dimensional posture feature from the combined state vector; and a depth/rotation estimator that is learned independently of the body shape estimator and simultaneously estimates the position vector and the global rotation vector corresponding to the two-dimensional posture feature from the combined state vector.

상기 모델 자세 추정부는 상기 인간 모델의 다수의 관절 중 상기 루트 관절을 제외한 나머지 관절에 대응하는 개수로 구비되고, 각각 상기 전역 회전 벡터 또는 이전 단에서 획득된 관절 회전 벡터와 상기 2차원 자세 특징을 인가받아 대응하는 관절 회전 벡터를 추정하는 다수의 관절 회전 추정부를 포함할 수 있다.The model posture estimator is provided in a number corresponding to the remaining joints except for the root joint among the plurality of joints of the human model, and applies the global rotation vector or the joint rotation vector obtained in the previous stage and the two-dimensional posture characteristic, respectively. It may include a plurality of joint rotation estimation units for estimating the corresponding joint rotation vector received.

상기 다수의 관절 회전 추정부는 인간 모델의 부위별 관절 연결 구조에 대응하여, 상기 루트 관절로부터 다수의 부위 각각에 대응하여 병렬로 연결되고, 각 부위의 관절 연결 구조에 따라 직렬 연결되어 각각 관절 회전 벡터를 추정할 수 있다.The plurality of joint rotation estimating units are connected in parallel to each of the plurality of parts from the root joint in correspondence to the joint connection structure for each part of the human model, and are connected in series according to the joint connection structure of each part to each joint rotation vector can be estimated.

상기 2차원 자세 특징 추출부는 상기 2차원 관절 벡터가 기지정된 스케일과 해상도의 정규화 이미지 평면에 대응하는 크기 및 위치를 갖도록 변환하고, 변환된 2차원 관절 벡터에서 루트 관절에 대응하는 관절 벡터의 위치가 정규화 이미지 평면의 원점에 배치되도록 이동시킬 수 있다.The two-dimensional posture feature extraction unit converts the two-dimensional joint vector to have a size and a position corresponding to a normalized image plane of a predetermined scale and resolution, and the position of the joint vector corresponding to the root joint in the converted two-dimensional joint vector is It can be moved to be placed at the origin of the normalized image plane.

상기 2차원 자세 특징 추출부는 정규화 이미지 평면의 원점으로 이동되기 이전의 루트 관절에 대응하는 관절 벡터의 위치를 상기 2차원 자세 특징에 포함할 수 있다.The two-dimensional posture feature extractor may include a position of a joint vector corresponding to the root joint before moving to the origin of the normalized image plane in the two-dimensional posture feature.

상기 2차원 자세 특징 추출부는 상기 2차원 영상에 포함된 적어도 하나의 사람 각각의 골격 방향에 대응하는 2차원 골격 벡터를 추정하고, 추정된 2차원 골격 벡터를 기지정된 방식으로 정규화하여 상기 2차원 자세 특징에 더 포함할 수 있다.The two-dimensional posture feature extracting unit estimates a two-dimensional skeletal vector corresponding to each skeletal direction of at least one person included in the two-dimensional image, and normalizes the estimated two-dimensional skeletal vector in a predetermined manner to obtain the two-dimensional posture Features may be further included.

상기 3차원 인간 모델 복원 장치는 2차원 학습 영상으로부터 상기 다단 인간 모델 추정부에서 추정된 상태 벡터와 상기 2차원 학습 영상에 레이블된 상태 벡터의 진리값으로부터 체형 손실(

), 위치 손실(

), 전역 회전 손실(

), 바디 손실(

)을 계산하고 대응하는 가중치(

)를 적용하며, 바디 회전 손실(

) 추가로 고려하여 총 손실을 수학식 The three-dimensional human model restoration apparatus loses body shape from the truth value of the state vector estimated by the multi-stage human model estimator from the two-dimensional learning image and the state vector labeled in the two-dimensional learning image.

), position loss (

), global rotation loss (

), body loss (

) and calculate the corresponding weights (

), and the body rotation loss (

), further taking into account the total loss in the formula

으로 계산하고 계산된 총 손실을 역전파하여 상기 다단 인간 모델 추정부를 학습시키는 학습부를 더 포함할 수 있다.and a learning unit configured to train the multi-stage human model estimator by calculating and backpropagating the calculated total loss.

상기 다른 목적을 달성하기 위한 본 발명의 다른 실시예에 따른 3차원 인간 모델 복원 방법은 미리 학습된 패턴 추정 방식에 따라 2차원 영상에 포함된 적어도 하나의 사람 각각의 관절 위치에 대응하는 2차원 관절 벡터를 추정하고, 추정된 2차원 관절 벡터를 기지정된 방식으로 정규화하여 2차원 자세 특징을 획득하는 단계; 상기 2차원 영상에 포함된 적어도 하나의 사람 각각에 대한 상기 2차원 자세 특징을 인가받고, 미리 학습된 패턴 추정 방식에 따라 상기 2차원 자세 특징에 대응하는 체형 및 자세의 인간 모델을 복원하기 위해 인간 모델 템플릿에 의해 지정되는 체형 벡터와 자세 벡터를 추정하고, 복원되는 인간 모델을 3차원 공간 상의 상기 2차원 자세 특징에 대응하는 위치에 배치하기 위한 위치 벡터를 추정하여 상태 벡터를 획득하는 단계; 및 상기 체형 벡터와 상기 자세 벡터 및 상기 위치 벡터로 구성되는 상태 벡터와 상기 2차원 자세 특징을 인가받아, 상기 체형 벡터와 상기 자세 벡터에 대응하는 인간 모델을 복원하고, 상기 위치 벡터에 따라 복원된 인간 모델을 3차원 공산 상에 배치하는 단계를 포함한다.A 3D human model restoration method according to another embodiment of the present invention for achieving the above another object is a 2D joint corresponding to the joint position of each of at least one person included in the 2D image according to a pre-learned pattern estimation method. estimating the vector and normalizing the estimated two-dimensional joint vector in a predetermined manner to obtain a two-dimensional posture characteristic; In order to receive the two-dimensional posture characteristic for each of the at least one person included in the two-dimensional image, and to reconstruct a human model of a body type and posture corresponding to the two-dimensional posture characteristic according to a pre-learned pattern estimation method, the human obtaining a state vector by estimating a body shape vector and a posture vector specified by a model template, and estimating a position vector for placing the reconstructed human model at a position corresponding to the two-dimensional posture feature on a three-dimensional space; and a state vector composed of the body shape vector, the posture vector, and the position vector and the two-dimensional posture feature are applied, and a human model corresponding to the body vector and the posture vector is restored, and the restored human model is restored according to the position vector. placing the human model on a three-dimensional sphere.

따라서, 본 발명의 실시예에 따른 3차원 인간 모델 복원 장치 및 방법은 적어도 하나의 사람이 포함된 2차원 영상으로부터 각 사람의 3차원 위치와 자세 및 체형을 추정하여 인간 모델을 획득하고, 획득된 인간 모델을 다양한 3차원의 가상 공간 상의 정확한 위치에 배치할 수 있도록 하여, 증강 현실 또는 가상 현실 등의 각종 응용에서 매우 사실적인 인간 모델을 복원할 수 있다. 또한 위치에 기반한 3차원의 인간 모델을 복원할 수 있으므로, 다수의 사람 사이의 상호 작용과 집단 행동 인지가 용이하도록 한다.Therefore, the apparatus and method for restoring a 3D human model according to an embodiment of the present invention obtains a human model by estimating the 3D position, posture, and body type of each person from a 2D image including at least one person, and By allowing the human model to be placed at an exact location on various three-dimensional virtual spaces, it is possible to reconstruct highly realistic human models in various applications such as augmented reality or virtual reality. In addition, since it is possible to reconstruct a location-based three-dimensional human model, it facilitates interaction between multiple people and recognition of group behavior.

도 1은 본 발명의 일 실시예에 따른 3차원 인간 모델 복원 장치가 2차원 영상으로부터 3차원 가상 공간에 인간 모델을 복원하는 개념을 설명하기 위한 도면이다.
도 2는 본 발명의 일 실시예에 따른 3차원 인간 모델 복원 장치의 개략적 구조를 나타낸다.
도 3은 도 2의 2차원 자세 특징 추출부의 구성의 일예를 나타낸다.
도 4는 SMPL 모델의 체형 및 자세 매개 변수를 설명하기 위한 도면이다.
도 5는 2차원 관절 위치를 2차원 정규화 이미지 평면에 정규화하는 개념을 설명하기 위한 도면이다.
도 6은 도 2의 다단 인간 모델 추정부의 개략적 구성을 나타낸다.
도 7은 도 6의 다수의 모델 추정부 각각에 대한 개략적 구성을 나타낸다.
도 8은 도 7의 모델 추정부 각각의 상세 구성의 일예를 나타낸다.
도 9는 도 7의 모델 자세 추정부의 상세 구성의 다른 예를 나타낸다.
도 10은 본 발명의 일 실시예에 따른 3차원 인간 모델 복원 방법을 나타낸다.
도 11은 본 발명의 3차원 인간 모델 복원 방법을 이용하여 2차원 영상으로부터 3차원 공간 상에 복원된 인간 모델의 예를 나타낸다.1 is a diagram for explaining the concept of reconstructing a human model in a 3D virtual space from a 2D image by an apparatus for reconstructing a 3D human model according to an embodiment of the present invention.
2 shows a schematic structure of an apparatus for reconstructing a 3D human model according to an embodiment of the present invention.
FIG. 3 shows an example of the configuration of the two-dimensional posture feature extraction unit of FIG. 2 .
4 is a diagram for explaining the body type and posture parameters of the SMPL model.
5 is a diagram for explaining the concept of normalizing a two-dimensional joint position to a two-dimensional normalized image plane.
6 is a schematic configuration of the multi-stage human model estimator of FIG. 2 .
FIG. 7 shows a schematic configuration of each of a plurality of model estimators of FIG. 6 .
8 shows an example of a detailed configuration of each of the model estimator of FIG. 7 .
9 shows another example of the detailed configuration of the model posture estimation unit of FIG. 7 .
10 shows a 3D human model reconstruction method according to an embodiment of the present invention.
11 shows an example of a human model reconstructed in a 3D space from a 2D image using the 3D human model reconstruction method of the present invention.

본 발명과 본 발명의 동작상의 이점 및 본 발명의 실시에 의하여 달성되는 목적을 충분히 이해하기 위해서는 본 발명의 바람직한 실시예를 예시하는 첨부 도면 및 첨부 도면에 기재된 내용을 참조하여야만 한다. In order to fully understand the present invention, the operational advantages of the present invention, and the objects achieved by the practice of the present invention, reference should be made to the accompanying drawings illustrating preferred embodiments of the present invention and the contents described in the accompanying drawings.

이하, 첨부한 도면을 참조하여 본 발명의 바람직한 실시예를 설명함으로써, 본 발명을 상세히 설명한다. 그러나, 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며, 설명하는 실시예에 한정되는 것이 아니다. 그리고, 본 발명을 명확하게 설명하기 위하여 설명과 관계없는 부분은 생략되며, 도면의 동일한 참조부호는 동일한 부재임을 나타낸다. Hereinafter, the present invention will be described in detail by describing preferred embodiments of the present invention with reference to the accompanying drawings. However, the present invention may be embodied in various different forms, and is not limited to the described embodiments. In addition, in order to clearly describe the present invention, parts irrelevant to the description are omitted, and the same reference numerals in the drawings indicate the same members.

명세서 전체에서, 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라, 다른 구성요소를 더 포함할 수 있는 것을 의미한다. 또한, 명세서에 기재된 "...부", "...기", "모듈", "블록" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어나 소프트웨어 또는 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다. Throughout the specification, when a part "includes" a certain component, it means that other components may be further included, rather than excluding other components unless otherwise stated. In addition, terms such as "... unit", "... group", "module", and "block" described in the specification mean a unit that processes at least one function or operation, which is hardware, software, or hardware. and a combination of software.

도 1은 본 발명의 일 실시예에 따른 3차원 인간 모델 복원 장치가 2차원 영상으로부터 3차원 가상 공간에 인간 모델을 복원하는 개념을 설명하기 위한 도면이다.1 is a diagram for explaining the concept of reconstructing a human model in a 3D virtual space from a 2D image by an apparatus for reconstructing a 3D human model according to an embodiment of the present invention.

도 1을 참조하면, 본 실시예에 따른 3차원 인간 모델 복원 장치는 2차원 영상이 인가되면, (a)에 도시된 바와 같이, 2차원 영상에 포함된 적어도 하나 사람 각각에 대한 객체 영역을 판별하고, 각 객체 영역의 사람의 2차원 관절 위치를 (b)와 같이 추정한다. 그리고 추정된 각 사람에 대한 2차원 관절 위치는 (c)에 도시된 바와 같이 기지정된 정규화 이미지 평면 상의 좌표로 변환되고, 변환된 좌표를 기반으로 2차원 자세 특징이 획득된다. 획득된 2차원 자세 특징을 기반으로 각 사람에 대한 자세 및 체형을 표현하는 3차원의 인체 모델 및 위치를 추정하여 획득하고, 획득된 인체 모델을 (d)와 같이 3차원 공간 상의 추정된 위치에 배치한다.Referring to FIG. 1 , the apparatus for reconstructing a 3D human model according to the present embodiment determines an object area for each of at least one person included in the 2D image as shown in (a) when a 2D image is applied. and estimate the two-dimensional joint position of a person in each object area as shown in (b). Then, the estimated two-dimensional joint position for each person is converted into coordinates on a predetermined normalized image plane as shown in (c), and a two-dimensional posture characteristic is obtained based on the converted coordinates. Based on the obtained two-dimensional posture characteristics, a three-dimensional human body model expressing the posture and body type of each person is estimated and obtained, and the obtained human body model is placed at the estimated position in the three-dimensional space as shown in (d). place it

도 2는 본 발명의 일 실시예에 따른 3차원 인간 모델 복원 장치의 개략적 구조를 나타낸다.2 shows a schematic structure of an apparatus for reconstructing a 3D human model according to an embodiment of the present invention.

도 2를 참조하면, 본 실시예에 따른 3차원 인간 모델 복원 장치는 영상 획득부(100), 2차원 자세 특징 추출부(200), 다단 인간 모델 추정부(300) 및 3차원 공간 배치부(400)를 포함할 수 있다.Referring to FIG. 2 , the apparatus for restoring a 3D human model according to the present embodiment includes an image acquisition unit 100 , a 2D posture feature extraction unit 200 , a multi-stage human model estimator 300 , and a 3D spatial arrangement unit ( 400) may be included.

도 2를 참조하여, 본 실시예에 따른 3차원 인간 모델 복원 장치의 각 구성을 설명하면, 우선 영상 획득부(100)는 3차원 모델로의 모델링 대상이 되는 적어도 하나의 사람이 포함된 2차원 영상을 획득한다.Referring to FIG. 2 , each configuration of the apparatus for reconstructing a 3D human model according to the present embodiment will be described. First, the image acquisition unit 100 is a 2D model including at least one person to be modeled as a 3D model. acquire an image

2차원 자세 특징 추출부(200)는 미리 학습된 패턴 추정 방식에 따라 영상 획득부(100)에서 획득된 2차원 영상에 포함된 적어도 하나의 사람 각각에 대한 2차원에서의 자세를 추정하고, 추정된 자세로부터 2차원 자세 특징을 추출한다. 여기서 2차원 자세 특징 추출부(200)는 도 1의 (b)에 도시된 바와 같이, 2차원 영상에 포함된 각 사람의 관절 위치를 추정하고, 추정된 관절 위치를 정규화하여 적어도 하나의 사람 각각의 2차원 자세 특징을 추출할 수 있다.The two-dimensional posture feature extraction unit 200 estimates and estimates a posture in two dimensions for each of at least one person included in the two-dimensional image obtained by the image acquisition unit 100 according to a pre-learned pattern estimation method. Two-dimensional posture features are extracted from the obtained posture. Here, the two-dimensional posture feature extraction unit 200 estimates the joint position of each person included in the two-dimensional image, and normalizes the estimated joint position, as shown in FIG. 2D posture features can be extracted.

다단 인간 모델 추정부(300)는 2차원 자세 특징 추출부(200)에서 추정된 2차원 자세로부터 각 사람의 3차원 자세와 위치를 추정한다. 본 실시예에서 다단 인간 모델 추정부(300)는 도 1의 (c)에 도시된 바와 같이, 적어도 하나의 사람 각각의 3차원 자세와 더불어 체형을 추정할 뿐만 아니라, 2차원 영상을 촬영한 카메라를 기준으로 각 사람의 3차원에서의 위치를 함께 추정할 수 있다. 즉 3차원 공간에서의 각 사람의 자세와 체형 및 위치와 함께 외형을 추정할 수 있다.The multi-stage human model estimator 300 estimates the three-dimensional posture and position of each person from the two-dimensional posture estimated by the two-dimensional posture feature extraction unit 200 . In the present embodiment, the multi-stage human model estimator 300 as shown in FIG. 1C , estimates the body shape along with the three-dimensional posture of each of at least one person, as well as a camera that captures a two-dimensional image. It is possible to estimate the position of each person in 3D together based on . That is, the external appearance can be estimated along with the posture, body type, and position of each person in the three-dimensional space.

3차원 공간 배치부(400)는 다단 인간 모델 추정부(300)에서 추정된 각 사람의 자세와 체형에 따른 인간 모델을 3차원 가상의 공간 상의 추정된 위치에 배치한다. 따라서 도 3의 (d)에 도시된 바와 같이, 2차원 영상에 포함된 적어도 하나의 사람 각각 사이의 상대 위치와 자세 및 외형을 3차원의 가상 공간 상에 복원할 수 있다. 즉 2차원 영상에 포함된 장면의 상황을 3차원 가상 공간의 상황으로 변환하여 표현할 수 있다. 이때 2차원 영상에서의 배경 등에 무관하게 다양한 공간 상에 인간 모델을 배치할 수 있어 증강 현실 또는 가상 현실 등의 환경에서 매우 사실적인 인간 모델을 복원할 수 있다.The 3D space arrangement unit 400 arranges the human model according to the posture and body shape of each person estimated by the multi-stage human model estimator 300 at the estimated position in the 3D virtual space. Accordingly, as shown in FIG. 3D , the relative position, posture, and appearance between each of at least one person included in the two-dimensional image may be restored in the three-dimensional virtual space. That is, the situation of the scene included in the 2D image can be converted into the situation of the 3D virtual space and expressed. In this case, since the human model can be arranged in various spaces regardless of the background in the two-dimensional image, a very realistic human model can be restored in an environment such as augmented reality or virtual reality.

한편, 본 실시예에 따른 3차원 인간 모델 복원 장치는 다단 인간 모델 추정부(300)를 학습시키기 위한 학습부(500)를 더 포함할 수 있다. 본 실시예에서 다단 인간 모델 추정부(300)는 2차원 자세로부터 인간의 3차원 자세와 체형 및 위치를 추정하는 인공 신경망으로 구현될 수 있다. 그리고 인공 신경망이 요구되는 동작을 수행하기 위해서는 패턴 추정 방식이 미리 학습되어야 한다. 이에 학습부(500)는 자세와 체형 및 위치에 대한 진리값이 레이블된 2차원의 학습 영상이 영상 획득부(100)로 입력되어 다단 인간 모델 추정부(300)에서 추정된 인간의 3차원 자세와 위치 및 체형과 학습 영상에 레이블된 3차원 자세와 위치 및 체형 사이의 오차를 손실로 획득하고 획득된 손실을 다단 인간 모델 추정부(300)로 역전파함으로써, 다단 인간 모델 추정부(300)를 학습시킬 수 있다.Meanwhile, the 3D human model restoration apparatus according to the present embodiment may further include a learning unit 500 for learning the multi-stage human model estimator 300 . In the present embodiment, the multi-stage human model estimator 300 may be implemented as an artificial neural network for estimating a three-dimensional posture, body type, and position of a human from a two-dimensional posture. And in order for the artificial neural network to perform a required operation, a pattern estimation method must be learned in advance. Accordingly, the learning unit 500 receives a two-dimensional learning image labeled with truth values for posture, body type, and position into the image acquisition unit 100 and the three-dimensional posture of a human estimated by the multi-stage human model estimator 300 . The multi-stage human model estimator 300 by acquiring the error between the position and body shape and the three-dimensional posture and the position and body type labeled in the learning image as a loss and backpropagating the obtained loss to the multi-stage human model estimator 300 . can be learned

2차원 자세 특징 추출부(200) 또한 인공 신경망으로 구현될 수 있으며, 이 경우, 학습부(500)는 2차원 자세 특징 추출부(200) 또한 함께 학습시키도록 구성될 수도 있다. 다만 2차원 영상으로부터 영상에 포함된 각 사람의 2차원 자세를 추정하는 인공 신경망은 이미 다양하게 연구되어 공지되어 있다. 따라서 여기서는 2차원 자세 특징 추출부(200)를 기존에 공지된 인공 신경망을 이용하는 것으로 가정하여, 이하에서는 학습부(500)가 다단 인간 모델 추정부(300)만을 학습시키는 것으로 설명한다.The 2D posture feature extracting unit 200 may also be implemented as an artificial neural network, and in this case, the learning unit 500 may be configured to also learn the 2D posture feature extracting unit 200 together. However, artificial neural networks for estimating the two-dimensional posture of each person included in the image from the two-dimensional image have already been variously studied and known. Therefore, here, it is assumed that the two-dimensional posture feature extraction unit 200 uses a conventionally known artificial neural network, and hereinafter, the learning unit 500 learns only the multi-stage human model estimator 300 .

학습부(500)는 상기한 바와 같이, 다단 인간 모델 추정부(300)를 학습시키기 위한 구성이므로, 다단 인간 모델 추정부(300)의 학습이 완료된 이후에는 3차원 인간 모델 복원 장치에서 제거될 수 있다.Since the learning unit 500 is configured to learn the multi-stage human model estimator 300 as described above, it can be removed from the 3D human model restoration apparatus after the learning of the multi-stage human model estimator 300 is completed. have.

도 3은 도 2의 2차원 자세 특징 추출부의 구성의 일예를 나타내고, 도 4는 SMPL 모델의 체형 및 자세 매개 변수를 설명하기 위한 도면이며, 도 5는 2차원 관절 위치를 2차원 정규화 이미지 평면에 정규화하는 개념을 설명하기 위한 도면이다.Figure 3 shows an example of the configuration of the two-dimensional posture feature extraction unit of Figure 2, Figure 4 is a view for explaining the body shape and posture parameters of the SMPL model, Figure 5 is a two-dimensional joint position on the two-dimensional normalized image plane It is a diagram for explaining the concept of normalization.

도 3을 참조하면, 2차원 자세 특징 추출부(200)는 객체 판별부(210), 관절 벡터 획득부(220), 골격 벡터 획득부(230), 관절 벡터 인코딩부(240), 골격 벡터 인코딩부(250), 2차원 자세 특징 획득부(260), 초기 상태 설정부(270) 및 정규화부(280)를 포함할 수 있다.Referring to FIG. 3 , the two-dimensional posture feature extraction unit 200 includes an object determining unit 210 , a joint vector obtaining unit 220 , a skeletal vector obtaining unit 230 , a joint vector encoding unit 240 , and a skeletal vector encoding unit. It may include a unit 250 , a two-dimensional posture feature acquisition unit 260 , an initial state setting unit 270 , and a normalization unit 280 .

객체 판별부(210)는 영상 획득부(100)로부터 인가되는 2차원 영상에서 미리 학습된 패턴 추정 방식에 따라 적어도 하나의 사람이 배치된 객체 영역 각각을 판별한다. 관절 벡터 획득부(220)는 객체 판별부(210)에서 판별된 각각의 객체 영역에서 미리 학습된 패턴 추정 방식에 따라 사람의 관절 위치를 나타내는 관절 벡터를 획득한다. 골격 벡터 획득부(230)는 미리 학습된 패턴 추정 방식에 따라 사람의 뼈(bone)의 위치와 방향을 나타내는 골격 벡터를 획득한다. 즉 관절 벡터 획득부(220)는 인체의 골격에서 관절의 위치를 관절 벡터로 획득하고, 이와 유사하게 골격 벡터 획득부(230)는 뼈의 위치와 방향에 따른 골격 벡터를 획득한다. 골격 벡터 획득부(230)는 관절 벡터 획득부(220)와 독립적으로 2차원 영상으로부터 직접 골격 벡터를 획득할 수도 있으나, 인체 구조상 골격은 관절 사이의 구성이므로, 골격 벡터 획득부(230)는 2차원 영상이 아닌 관절 벡터 획득부(220)에서 획득된 관절 벡터를 기반으로 골격 벡터를 획득할 수도 있다.The object determining unit 210 determines each of the object regions in which at least one person is disposed according to a pattern estimation method previously learned from the two-dimensional image applied from the image obtaining unit 100 . The joint vector obtaining unit 220 obtains a joint vector indicating the joint position of a person according to a pattern estimation method learned in advance in each object area determined by the object determining unit 210 . The skeletal vector acquisition unit 230 acquires a skeletal vector indicating the position and direction of a human bone according to a pre-learned pattern estimation method. That is, the joint vector acquisition unit 220 acquires the position of a joint in the human skeleton as a joint vector, and similarly, the skeletal vector acquirer 230 acquires a skeletal vector according to the position and direction of the bone. The skeletal vector acquisition unit 230 may obtain a skeletal vector directly from the two-dimensional image independently of the joint vector acquisition unit 220, but since the skeleton is a structure between joints in the structure of the human body, the skeletal vector acquirer 230 may obtain 2 A skeletal vector may be obtained based on the joint vector obtained by the joint vector obtaining unit 220 rather than the dimensional image.

여기서 관절 벡터 획득부(220)와 골격 벡터 획득부(230)는 객체 판별부(210)에서 판별된 객체 영역 각각으로부터 관절 벡터와 골격 벡터를 구분하여 획득할 수 있다.Here, the joint vector acquirer 220 and the skeletal vector acquirer 230 may obtain a joint vector and a skeletal vector from each of the object regions determined by the object determiner 210 .

관절 벡터 인코딩부(240)와 골격 벡터 인코딩부(250)는 각각 관절 벡터 획득부(220)에서 획득된 관절 벡터와 골격 벡터 획득부(230)에서 획득된 골격 벡터를 기지정된 형식으로 인코딩한다. 관절 벡터 인코딩부(240)와 골격 벡터 인코딩부(250)는 일예로 좌표값 형식으로 획득된 관절 벡터와 골격 벡터를 1차원의 관절 특징값 및 골격 특징값으로 변환할 수 있다.The joint vector encoding unit 240 and the skeletal vector encoding unit 250 encode the joint vector obtained by the joint vector obtaining unit 220 and the skeletal vector obtained by the skeletal vector obtaining unit 230 in a predetermined format, respectively. The joint vector encoding unit 240 and the skeletal vector encoding unit 250 may convert, for example, a joint vector and a skeletal vector obtained in a coordinate value format into one-dimensional joint feature values and skeletal feature values.

일반적으로 2차원 영상에서 3차원 인간 모델을 복원하기 위해서는 관절의 위치를 나타내는 관절 벡터만을 획득한다. 그러나 골격 벡터는 관절 사이의 상대 거리를 나타내며, 이는 2차원의 관절 위치로부터 3차원 상에서의 관절 위치를 추정할 때, 더욱 상세한 관절 사이의 관계 정보를 제공함으로써 3차원 관절의 위치가 더욱 정확하게 추정되도록 할 수 있다.In general, in order to reconstruct a 3D human model from a 2D image, only a joint vector indicating the position of a joint is obtained. However, the skeletal vector represents the relative distance between the joints, and when estimating the joint position in three dimensions from the two-dimensional joint position, it provides more detailed information on the relationship between the joints so that the three-dimensional joint position can be estimated more accurately. can do.

이에 본 실시예서는 2차원 자세 특징 추출부(200)가 관절 벡터 획득부(220)와 함께 골격 벡터 획득부(230)를 구비하여 관절 벡터와 골격 벡터를 함께 획득하도록 하였다. 그러나 경우에 따라 골격 벡터 획득부(230)와 골격 벡터 인코딩부(250)는 생략될 수도 있다.Accordingly, in this embodiment, the two-dimensional posture feature extraction unit 200 is provided with the skeletal vector acquisition unit 230 together with the joint vector acquisition unit 220 to acquire the joint vector and the skeletal vector together. However, in some cases, the skeleton vector obtaining unit 230 and the skeleton vector encoding unit 250 may be omitted.

2차원 자세 특징 획득부(260)는 관절 벡터 인코딩부(240)와 골격 벡터 인코딩부(250)에서 변환된 관절 특징값 및 골격 특징값을 기지정된 2차원의 정규화 이미지 평면 상의 좌표로 정규화하고, 정규화된 관절 특징값 및 골격 특징값을 결합 및 특징을 추출하여 2차원 영상의 사람의 자세 특징을 나타내는 2차원 자세 특징을 획득한다.The two-dimensional posture feature acquisition unit 260 normalizes the joint feature values and skeletal feature values converted by the joint vector encoding unit 240 and the skeletal vector encoding unit 250 to coordinates on a predetermined two-dimensional normalized image plane, By combining normalized joint feature values and skeletal feature values and extracting features, two-dimensional posture features representing human posture features of a two-dimensional image are obtained.

관절 벡터 획득부(220)와 골격 벡터 획득부(230)가 객체 판별 영역 각각의 사람에 대한 관절 벡터와 골격 벡터를 구분하여 획득할 수 있으므로, 2차원 자세 특징 획득부(260) 또한 각 사람에 대한 2차원 자세 특징을 구분하여 획득할 수 있다.Since the joint vector obtaining unit 220 and the skeletal vector obtaining unit 230 can obtain a joint vector and a skeletal vector for each person in the object identification area by dividing it, the two-dimensional posture feature obtaining unit 260 also applies to each person. It can be obtained by distinguishing the two-dimensional posture characteristics for

여기서 2차원 자세 특징 획득부(260)가 관절 벡터와 골격 벡터를 정규화하는 것은 서로 다른 2차원 이미지의 스케일과 해상도 등으로 인한 차이를 보상하여 균일화하기 위함일 뿐만 아니라, 2차원의 관절 벡터 및/또는 골격 벡터로부터 3차원 공간 상에서 사람의 자세와 위치를 모두 동시에 추정하는 것은 용이하지 않기 때문이다. 비록 다단 인간 모델 추정부(300)가 인공 신경망으로 구현되어 학습시킨다 하더라도, 2차원의 관절 벡터로부터 3차원 공간 상의 사람의 자세와 위치를 함께 추정하도록 학습시키는 것은 어려울 뿐만 아니라, 학습시키더라도 요구되는 성능을 나타내지 못하는 경우가 대부분이다.Here, the two-dimensional posture feature acquisition unit 260 normalizes the joint vector and the skeletal vector not only to compensate for differences due to the scale and resolution of different two-dimensional images, but also to equalize the two-dimensional joint vector and/or Alternatively, it is because it is not easy to simultaneously estimate both the posture and the position of a person in a three-dimensional space from a skeleton vector. Although the multi-stage human model estimator 300 is implemented as an artificial neural network and trained, it is difficult to learn to estimate the posture and position of a person in a three-dimensional space from a two-dimensional joint vector. In most cases, it does not show performance.

이는 추정해야 하는 변수가 많을 뿐만 아니라 각 변수의 자유도가 높기 때문이며, 이러한 문제는 변수 중 일부를 고정값으로 한정하여 자유도를 제약함으로서 개선될 수 있다. 특히 본 실시예에서 2차원 자세 특징 획득부(260)는 2차원 영상의 크기와 해상도 및 2차원 영상의 각 사람의 위치에 따른 변화를 정규화를 통해 제한함으로써, 이후 3차원에서의 사람의 자세와 위치가 정확하게 추정될 수 있도록 한다.This is because there are many variables to be estimated and each variable has a high degree of freedom, and this problem can be improved by limiting the degree of freedom by limiting some of the variables to fixed values. In particular, in this embodiment, the 2D posture feature acquisition unit 260 limits the size and resolution of the 2D image and changes according to the position of each person in the 2D image through normalization, thereby Allows the position to be accurately estimated.

본 실시예에서는 인간 모델 복원을 위해, 체형 벡터(β)와 자세 벡터(θ)를 매개 변수로 하여 인간 모델을 정의할 수 있는 SMPL(Skinned Multi-Person Linear) 모델 템플릿을 기반으로 인간 모델을 복원하는 것으로 가정한다.In this embodiment, for human model restoration, a human model is restored based on a Skinned Multi-Person Linear (SMPL) model template that can define a human model using a body shape vector (β) and a posture vector (θ) as parameters. assume that

SMPL 모델 템플릿에서 체형 벡터(β)는 미리 지정된 10가지 종류의 체형 변수(β₁ ~ β₁₀)에 따라 6890개의 정점으로 이루어진 메쉬가 변형되어 인간의 외형을 나타내는 체형을 설정할 수 있다. 도 4의 (a)에서는 10가지 종류의 체형 변수(β₁ ~ β₁₀) 중 제1 및 제2 체형 변수(β₁, β₂)에 의해 정의되는 체형 변화를 나타낸다. 즉 체형 벡터(β)는 10개의 변수(β₁ ~ β₁₀)로 구성되어 인간 체형을 정의하는 10차원 벡터(β ∈

¹⁰)일 수 있다.In the SMPL model template, the body shape vector (β) can set a body shape representing a human appearance by deforming a mesh consisting of 6890 vertices according _{to 10 types of pre-specified body shape variables (β 1} to β _{10 ).} FIG. 4A shows changes in body shape defined by the first and second body shape variables (β ₁ , β ₂ _{) among 10} types of body shape variables (β ₁ to β 10 ). In other words, the body shape vector (β) is a 10-dimensional vector (β ∈) that consists of ₁₀ _{variables (β 1} ~ β 10 ) and defines the human body shape.

¹⁰ ) can be.

한편, 인간 관절은 관절의 상호 의존성(joint interdependency)에 기반하여 도 4의 (b)에 도시된 바와 같이, 골격의 연결 구조에 기초한 키네마틱 트리(Kinematic tree) 구조로 볼 수 있다. 이에 인간의 자세는 연결 구조에 따라 주요 중심 관절인 루트 관절(root joint)을 기준으로 신체 말단 방향으로 순차적으로 연결된 각 관절의 상대 회전각으로 표현될 수 있다. 이러한 키네마틱 트리 구조에 따르면, 인간의 모든 관절의 위치는 루트 관절로부터 나머지 관절을 상대적 회전각으로 표현될 수 있으며, 자세 벡터(θ)는 루트 관절을 포함한 각 관절의 회전각을 나타낸다.On the other hand, a human joint can be viewed as a kinematic tree structure based on a skeletal connection structure as shown in FIG. 4(b) based on joint interdependency. Accordingly, the human posture may be expressed as a relative rotation angle of each joint sequentially connected in the direction of the end of the body based on a root joint, which is a main central joint, according to a connection structure. According to this kinematic tree structure, the positions of all human joints can be expressed as relative rotation angles from the root joint to the rest of the joints, and the posture vector θ represents the rotation angle of each joint including the root joint.

즉 이러한 키네마틱 트리 구조를 기반으로 자세 벡터(θ)에 따라 루트 관절로부터 나머지 관절을 상대적 회전각으로 표현하는 경우, 루트 관절의 위치와 회전각이 추정되면, 나머지 관절의 위치는 루트 관절을 기준으로 회전각이 파생되는 형태로 용이하게 추정될 수 있을 뿐만 아니라 정확하게 추정될 수 있다. 그러므로 루트 관절을 앵커 포인트(anchor point)로 설정하고, 앵커 포인트인 루트 관절을 기반으로 나머지 관절의 상대적 위치를 추정하면, 모든 관절의 위치를 개별적으로 추정하는 방식에 비해, 매우 효과적으로 관절의 위치를 추정할 수 있다.That is, if the remaining joints are expressed as relative rotation angles from the root joint according to the posture vector (θ) based on the kinematic tree structure, the position and rotation angle of the root joint are estimated, and the positions of the remaining joints are based on the root joint. It can be easily estimated as well as accurately estimated in the form from which the rotation angle is derived. Therefore, if the root joint is set as an anchor point and the relative positions of the remaining joints are estimated based on the root joint, which is the anchor point, the position of the joints is very effectively compared to the method of estimating the positions of all joints individually. can be estimated

자세 벡터(θ)는 추정하는 관절의 개수가 K개인 경우, 3 * K 차원(θ = [θ_0,1, θ_0,2, θ_0,3, θ_1,1, …, θ_K,1, θ_K,2, θ_K,3] ∈

^3K)일 수 있다. 여기서는 고관절을 루트 관절로 하여 관절의 개수가 24개인 SMPL 모델을 가정하며, 이에 자세 벡터(θ)는 72차원(3 * 24)의 매개 변수로 설정될 수 있다. 3 * K 차원의 자세 벡터(θ) 중 초기 3개 차원의 자세 벡터(θ₀) = [θ_0,1, θ_0,2, θ_0,3] ∈

³)는 루트 관절을 중심으로 하는 인간 모델 전체의 회전을 나타내는 전역 회전 벡터(θ₀)이다. 그리고 자세 벡터(θ)에서 루트 관절에 대한 전역 회전 벡터(θ₀)를 제외한 나머지 K-1개의 관절에 대한 자세 벡터(θ_J = [θ₁, θ₂, …, θ_K-1])는 루트 관절을 시점으로 연결된 각 관절들 사이의 상대적 회전각으로 각 관절의 위치를 나타내는 바디 회전 벡터라 할 수 있다.The posture vector (θ) is 3 * K dimension (θ = [θ _0,1 , θ _0,2 , θ _0,3 , θ _1,1 , …, θ _{K,1 ) when the number of joints to be estimated is K.} , θ _K,2 , θ _K,3 ] ∈

^3K ). Here, an SMPL model in which the number of joints is 24 is assumed using the hip joint as the root joint, and the posture vector (θ) can be set as a parameter of 72 dimensions (3 * 24). 3 * Among the K-dimensional posture vectors (θ), the initial three-dimensional posture vectors (θ ₀ ) = [θ _0,1 , θ _0,2 , θ _0,3 ] ∈

³ _{) is the global rotation vector (θ 0} ) representing the rotation of the entire human model around the root joint. _{And the posture vector (θ J} = [θ ₁ , θ ₂ , …, θ _K-1 ]) for the remaining K-1 joints except for the global rotation vector (θ _{0 ) for the root joint from the posture vector (θ) is} It can be called a body rotation vector indicating the position of each joint as a relative rotation angle between each joint connected to the root joint as a starting point.

그러므로 체형 벡터(β)와 자세 벡터(θ)가 획득되면, SMPL 모델 템플릿으로부터 체형 벡터(β)와 자세 벡터(θ)에 대응하는 인간 모델을 획득할 수 있다.Therefore, when the body shape vector (β) and the posture vector (θ) are obtained, a human model corresponding to the body shape vector (β) and the posture vector (θ) can be obtained from the SMPL model template.

다만 본 실시예의 3차원 인간 모델 복원 장치는 2차원 영상에 다수의 사람이 포함된 경우에도 각각의 사람에 대응하는 3차원 인간 모델을 3차원 가상 공간 상의 정확한 위치에 복원해야 한다. 그러므로 3차원 인간 모델과 함께 3차원 공간 상의 위치를 함께 추정해야 한다. 그러나 2차원 영상에서 서로 다른 임의의 위치에 배치된 다수의 사람 각각의 위치를 추정하는 것은 용이하지 않다. 또한 비록 자세 벡터(θ)에서 루트 관절에 대한 전역 회전 벡터(θ₀)를 추정하고, 이후 추정된 전역 회전 벡터(θ₀)로부터 나머지 자세 벡터(θ_J)를 추정하는 것이 상대적으로 용이할 지라도, 전역 회전 벡터(θ₀)는 다른 관절의 위치에 제약되지 않으므로, 상대적으로 자유도가 높다.However, the 3D human model restoration apparatus of the present embodiment needs to restore the 3D human model corresponding to each person to an accurate position in the 3D virtual space even when a plurality of people are included in the 2D image. Therefore, it is necessary to estimate the position in the 3D space together with the 3D human model. However, it is not easy to estimate the positions of a plurality of people disposed at different arbitrary positions in a two-dimensional image. In addition, although in the position vector (θ) estimated the global rotation vector (θ ₀₎ of the root joint, and to assume a rest position vector (θ _J) from the estimated global rotation vector (θ ₀₎ since even if relatively easy , the global rotation vector (θ ₀ ) is not constrained to the position of other joints, so the degree of freedom is relatively high.

이에 2차원 자세 특징 획득부(260)는 정규화 이미지 평면 좌표에 대응하는 스케일로 변환되고, 루트 관절인 고관절이 원점에 배치되도록 2차원 관절 벡터를 정규화함으로써, 3차원에서 루트 관절의 위치를 거리 방향에서만 추정하도록 제한할 수 있다. 즉 전역 회전 벡터(θ₀)의 자유도를 줄일 수 있다.Accordingly, the two-dimensional posture feature acquisition unit 260 is converted to a scale corresponding to the normalized image plane coordinates, and normalizes the two-dimensional joint vector so that the hip joint, which is the root joint, is disposed at the origin, thereby determining the position of the root joint in three dimensions in the distance direction. can be limited to estimating only from That is, the degree of freedom of the global rotation vector θ _{0 can be reduced.}

본 실시예에서는 서로 다른 2차원 영상이 입력될 때의 불일치를 방지하기 위해, 초점 거리(f)와 센서 폭(w)을 갖는 동일한 카메라로 모든 영상이 촬영되었다고 가정하며, 특정 위치(J₀ = [x_t, y_t, z_t])에 위치한 SMPL 모델이 촬영된 2차원 영상의 넓이(W)와 높이(H)가 주어지고, 입력된 2차원 영상으로부터 N개의 2차원 관절(j = [j₀, j₁, … j_N], j_i ∈

²))이 추정된 경우, N개의 2차원 관절(j)의 위치는 2차원 영상의 해상도에 따른 종속성을 제거하기 위해 정규화된 이미지 평면 상의 좌표(

)로 변환되어야 한다.In this embodiment, in order to prevent inconsistency when different two-dimensional images are input, it is assumed that all images were taken with the same camera having a focal length f and a sensor width w, and a specific position (J ₀ = The width (W) and height (H) of the 2D image where the SMPL model located at [x _t , y _t , z _{t ]) is taken is given, and N 2D joints (j = [} j ₀ , j ₁ , … j _N ], j _i ∈

² )) is estimated, the positions of the N two-dimensional joints (j) are coordinates on the normalized image plane (

) should be converted to

정규화된 이미지 평면 상의 좌표(

)로 변환하기 위해서는 우선 2차원 영상의 넓이(W)와 높이(H)의 비를 기지정된 비율로 조절한다. 일예로 넓이(W)와 높이(H)의 비를 1로 설정할 수 있으며, 넓이(W)가 높이(H)보다 큰 경우, 정규화된 이미지 평면의 y 좌표에 오프셋((W-H)/2)을 추가하고, 높이(H)가 넓이(W)보다 큰 경우 x 좌표에 오프셋((H-W)/2)을 추가할 수 있다. 그리고 좌표의 x와 y 좌표를 넓이(W)와 높이(H) 중 더 큰 값(max(W, H))으로 나누고, 정규화된 이미지 좌표의 원점으로 기준 위치를 맞추기 위해 더 넓이(W)와 높이(H) 중 더 큰 값의 절반(max(W, H)/2)의 절반으로 모든 좌표를 차감한다. 여기서 정규화된 이미지 평면의 초점 거리(f_n)는 카메라 초점 거리(f)를 센서 폭(w)으로 나누어 결정될 수 있다.coordinates on the normalized image plane (

), the ratio of the width (W) to the height (H) of the 2D image is first adjusted at a predetermined ratio. For example, the ratio of the width (W) to the height (H) can be set to 1, and when the width (W) is greater than the height (H), an offset ((WH)/2) is added to the y-coordinate of the normalized image plane. In addition, if the height (H) is greater than the width (W), you can add an offset ((HW)/2) to the x-coordinate. Then, divide the x and y coordinates of the coordinates by the larger value (max(W, H)) of the width (W) and the height (H). Subtract all coordinates by half (max(W, H)/2) of the greater of the height (H). Here, the focal length f _n of the normalized image plane may be determined by dividing the camera focal length f by the sensor width w.

도 5에 도시된 바와 같이 3차원의 카메라 좌표계(t ∈

³)에서 정규화 이미지 평면 좌표의 2차원 관절 벡터에 따라 3차원 공간으로 투영된 인간 모형의 위치(J₀= [x_t, y_t, z_t])는 2차원 관절 벡터의 루트 관절의 위치가 정규화 이미지 평면 좌표의 원점(0, 0)으로 이동되면, 투영된 인간 모형의 위치(J₀ ^b = [0, 0, z_t])가 되며, 이는 루트 관절의 위치의 위치를 Z 축 방향의 거리(z_t)만으로 추정할 수 있음을 의미한다. 그리고 3차원 공간에 투영된 인간 모형의 원위치(J₀)는 정규화 이미지 평면 좌표에서 루트 관절 위치(

)와 정규화된 루트 관절의 위치(

= [0, 0]) 사이의 위치 차와 이에 대응하는 인간 모형의 위치(J₀, J₀ ^b) 차 사이의 비례 관계가 성립해야 서로 닮은 형상으로 나타나는 특성을 이용하여 추정될 수 있다.As shown in Fig. 5, the three-dimensional camera coordinate system (t ∈

³ ), the position of the human model projected into 3D space according to the 2D joint vector of the normalized image plane coordinates (J ₀ = [x _t , y _t , z _t ]) is the position of the root joint of the 2D joint vector When it is moved to the origin (0, 0) of the normalized image plane coordinates, it becomes the position of the projected human model (J ₀ ^b = [0, 0, z _t ]), which changes the position of the root joint position in the Z-axis direction. This means that only the distance (z _t ) can be estimated. _{And the original position (J 0} ) of the human model projected in the three-dimensional space is the root joint position (J 0 ) in the normalized image plane coordinates.

) and the position of the normalized root joint (

= [0, 0]) and the corresponding human model position (J ₀ , J ₀ ^b ) difference must be established in order to be estimated using the characteristics appearing similar to each other.

즉 루트 관절 위치(

)와 정규화된 2차원 루트 관절 위치(

)가 확인되고, 위치 벡터(Z_t)가 추정되면, 3차원 공간 상에서의 인간 모형의 루트 관절의 위치(J₀)는 수학식 1에 따라 용이하게 추정될 수 있음을 알 수 있다.That is, the root joint position (

) and the normalized two-dimensional root joint position (

) is confirmed and the position vector (Z _t _{) is estimated, it can be seen that the position (J 0} ) of the root joint of the human model in the three-dimensional space can be easily estimated according to Equation (1).

(여기서 ∥∥는 L2-norm 함수이다.)(Where │ is the L2-norm function.)

다만 정규화로 인해 변환된 위치를 다시 재변환하기 위해, 2차원 자세 특징 획득부(260)는 추정된 2차원 관절 위치 중 루트 관절 위치(

)를 2차원 자세 특징에 포함하여 다단 인간 모델 추정부(300)로 전달할 수 있다.However, in order to re-convert the converted position due to normalization, the 2D posture feature acquisition unit 260 determines the root joint position (

) may be included in the two-dimensional posture feature and transmitted to the multi-stage human model estimator 300 .

도 6은 도 1의 다단 인간 모델 추정부의 개략적 구성을 나타내고, 도 7은 도 6의 다수의 모델 추정부 각각에 대한 개략적 구성을 나타내며, 도 8은 도 7의 모델 추정부 각각의 상세 구성의 일예를 나타낸다. 그리고 도 9는 도 7의 모델 자세 추정부의 상세 구성의 다른 예를 나타낸다.6 shows a schematic configuration of the multi-stage human model estimator of FIG. 1 , FIG. 7 shows a schematic configuration of each of a plurality of model estimators of FIG. 6 , and FIG. 8 is an example of a detailed configuration of each model estimator of FIG. 7 indicates And FIG. 9 shows another example of the detailed configuration of the model posture estimation unit of FIG. 7 .

도 6에 도시된 바와 같이, 다단 인간 모델 추정부(300)는 직렬로 연결되는 다수의 모델 추정부(3001, 3002, … 300S)를 포함한다. 다수의 모델 추정부(3001, 3002, … 300S)는 각각 이전 단에서 획득된 상태 벡터(β, θ, Z_t)와 2차원 자세 특징 획득부(260)에서 획득된 2차원 자세를 특징 인가받고, 인가된 상태 벡터와 2차원 자세를 특징을 결합하고 특징을 추출하여 결합 상태 벡터를 획득하며, 결합 상태 벡터로부터 새로운 상태 벡터를 추정한다.As shown in FIG. 6 , the multi-stage human model estimator 300 includes a plurality of model estimators 3001 , 3002 , ... 300S connected in series. The plurality of model estimators 3001, 3002, ... 300S _{receive feature authorizations for the state vectors β, θ, Z t} obtained in the previous stage and the two-dimensional posture obtained from the two-dimensional posture feature acquisition unit 260, respectively. , a combined state vector is obtained by combining the applied state vector with the two-dimensional posture, and extracting the feature, and a new state vector is estimated from the combined state vector.

여기서 다수의 모델 추정부(3001, 3002, … 300S) 중 제1 모델 추정부(3001)는 별도로 설정된 초기 체형(β⁰)과 초기 자세(θ⁰)와 함께 거리 방향의 초기 위치(Z_t ⁰)를 상태 벡터로 인가받거나 미리 저장할 수 있다. 그리고 초기 체형(β⁰)과 초기 자세(θ⁰)와 함께 거리 방향의 초기 위치(Z_t ⁰)는 임의의 값으로 설정될 수 있다.Here, the first model estimator 3001 among the plurality of model estimators 3001, 3002, ... 300S has an initial body shape (β ⁰ ) and an initial posture (θ ⁰ ) set separately in the distance direction along with the initial position (Z _t ^{0 ).} ) can be authorized as a state vector or stored in advance. In addition, the initial position (Z _t ⁰ ) in the distance direction along with the initial body shape (β ⁰ ) and the initial posture (θ ⁰ ) may be set to an arbitrary value.

도 7에서는 동일한 구조의 다수의 모델 추정부(3001, 3002, … 300S)의 일예로 제s 모델 추정부(300s)를 도시하였다. 도 7 및 도 8을 참조하면, 다수의 모델 추정부 각각은 특징 취합부(310), 위치 추정부(320) 및 모델 자세 추정부(330)를 포함할 수 있다.7 illustrates an s-th model estimator 300s as an example of a plurality of model estimators 3001, 3002, ... 300S having the same structure. 7 and 8 , each of the plurality of model estimators may include a feature aggregator 310 , a position estimator 320 , and a model posture estimator 330 .

특징 취합부(310)는 이전 단에서 획득된 상태 벡터(β^s-1, θ^s-1, Z_t ^s-1)과 2차원 자세 특징을 인가받아 결합하여 결합 상태 벡터를 생성하여, 위치 추정부(320)와 모델 자세 추정부(330)로 전달한다.The feature collecting unit 310 receives and combines the state vectors (β ^s-1 , θ ^s-1 , Z _t ^s-1 ) obtained in the previous stage and the two-dimensional posture feature to generate a combined state vector, and a position weight It is transmitted to the government 320 and the model posture estimation unit 330 .

특징 취합부(310)는 체형 벡터(β^s), 위치 벡터(Z_t ^s)와 루트 회전 벡터(θ₀ ^s) 및 관절 회전 벡터(θ_J ^s)를 인가받아 결합하여 상태 벡터(β^s, θ^s, Z_t ^s)를 획득하는 상태 벡터 획득부(311)와 상태 벡터 획득부(311)에서 획득된 상태 벡터(β^s, θ^s, Z_t ^s)와 2차원 자세 특징 추출부(200)에서 인가된 2차원 자세 특징을 결합하고 인코딩하여 결합 상태 벡터를 획득하는 상태 특징 결합 인코딩부(312)를 포함할 수 있다.The feature collecting unit 310 receives and combines the body vector (β ^s ), the position vector (Z _t ^s ), the root rotation vector (θ ₀ ^s ), and the joint rotation vector (θ _J ^s ) to combine the state vector (β ^s , θ ^s, Z _t ^s) of the state vector (β ^s, θ ^s, Z _t ^s) and the two-dimensional position characteristic extraction unit (200 acquired in the state vector acquiring unit 311 and the state vector obtaining unit 311 to obtain the ) may include a state feature combining encoding unit 312 that combines and encodes the applied two-dimensional posture feature to obtain a combined state vector.

위치 추정부(320)는 특징 취합부(310)에서 인가된 결합 상태 벡터로부터 미리 학습된 패턴 추정 방식에 따라 체형 벡터(β^s), 위치 벡터(Z_t ^s)를 추정하고, 자세(θ) 중 루트 관절의 회전을 나타내는 전역 회전 벡터(θ₀ ^s)를 추정한다. ^{The position estimation unit 320 estimates the body shape vector (β s} ) and the position vector (Z _t ^s ) according to the pattern estimation method previously learned from the coupling state vector applied by the feature collection unit 310 , and the posture (θ) Estimate the _{global rotation vector (θ 0} ^s ) representing the rotation of the middle root joint.

위치 추정부(320)는 체형 벡터(β^s)를 추정하는 체형 추정부(321)와 위치 벡터(Z_t ^s) 및 전역 회전 벡터(θ₀ ^s)를 추정하는 깊이/회전 추정부(322)를 포함할 수 있다. 체형 추정부(321)와 깊이/회전 추정부(322)는 각각 독립된 인공 신경망으로 구현될 수 있다. 여기서 체형 추정부(321)와 깊이/회전 추정부(322)를 별도로 구성하는 것은 위치 벡터(Z_t ^s) 및 전역 회전 벡터(θ₀ ^s)의 경우 추정하는 성향이 유사한데 반해, 체형 벡터(β^s)의 경우 추정 성향이 상이하기 때문이다.The position estimator 320 includes a body estimator 321 for estimating the ^{body vector β s} , and a depth/rotation estimator 322 for estimating the _{position vector Z t} ^s and the global rotation vector θ ₀ ^{s .} may include. The body shape estimator 321 and the depth/rotation estimator 322 may be implemented as independent artificial neural networks, respectively. Here, separately configuring the body shape estimator 321 and the depth/rotation estimator 322 _{has a similar tendency to estimate the position vector (Z t} ^s ) and the global rotation vector (θ ₀ ^s ), whereas the body shape vector ( β ^s ) because the estimation tendency is different.

그리고 모델 자세 추정부(330)는 이전단에서 인가된 결합 상태 벡터와 위치 추정부(320)에서 추정된 루트 회전 벡터(θ₀ ^s)를 인가받아 미리 학습된 패턴 추정 방식에 따라 나머지 관절에 대한 관절 회전 벡터(θ^s _J = [θ^s ₁, θ^s ₂, … θ^s _K-1], θ^s _i ∈

³)를 추정한다.In addition, the model posture estimator 330 receives the combined state vector applied in the previous stage and the root rotation vector (θ ₀ ^s ) estimated by the position estimator 320 , and according to the pre-learned pattern estimation method, for the remaining joints. Joint rotation vector(θ ^s _J = [θ ^s ₁ , θ ^s ₂ , … θ ^s _K-1 ], θ ^s _i ∈

³ ) is estimated.

모델 자세 추정부(330)는 도 8에 도시된 바와 같이, 다단 구조의 다수의 관절 회전 추정부(331 ~ 33(K-1))를 포함할 수 있다. 여기서 모델 자세 추정부(330)는 추정해야 하는 인간 모델의 관절 개수(K)에 대응하는 개수의 관절 회전 추정부(331 ~ 33(K-1))를 포함하여 각 관절의 회전 벡터(θ_J ^s = θ₁ ^s, θ₂ ^s, …, θ_K-1 ^s)를 순차적으로 추정할 수 있다. 모델 자세 추정부(330)가 다수의 관절 회전 추정부(331 ~ 33(K-1))를 포함하고, 각 관절 회전 추정부(331 ~ 33(K-1))가 이전 단에서 추정된 관절 회전 벡터와 결합 상태 벡터에 포함된 대응하는 2차원 관절 위치로부터 각 관절의 회전 벡터(θ_J ^s = θ₁ ^s, θ₂ ^s, …, θ_K-1 ^s)를 순차적으로 추정하는 것은 상기한 바와 같이, 인간 관절은 상호 의존성에 기반하여 도 루트 관절로부터 키네마틱 트리 구조를 가지기 때문에, 루트 관절(여기서는 일예로 고관절)을 기준으로 신체 말단 방향으로 순차적으로 연결된 각 관절의 상대 회전각으로 추정되는 경우, 더욱 정확하게 추정될 수 있기 때문이다.As shown in FIG. 8 , the model posture estimation unit 330 may include a plurality of joint rotation estimation units 331 to 33 (K-1) having a multi-stage structure. Here, the model posture estimator 330 includes a number of joint rotation estimation units 331 to 33 (K-1) corresponding to the number of joints K of the human model to be estimated, and the rotation vector θ _{J of each joint.} ^s = θ ₁ ^s , θ ₂ ^s , …, θ _K-1 ^s ) can be sequentially estimated. The model posture estimation unit 330 includes a plurality of joint rotation estimation units 331 to 33 (K-1)), and each joint rotation estimation unit 331 to 33 (K-1)) is a joint estimated in the previous stage. _{Sequentially estimating the rotation vector (θ J} ^s = θ ₁ ^s , θ ₂ ^s , …, θ _K-1 ^s ) of each joint from the rotation vector and the corresponding two-dimensional joint position included in the combined state vector is described above. As shown, since human joints have a kinematic tree structure from the root joint based on their interdependence, it is estimated as the relative rotation angle of each joint sequentially connected in the direction of the end of the body based on the root joint (here, hip joint as an example). case, it can be estimated more accurately.

다만, 도 8에 도시된 모델 자세 추정부(330)의 경우, 다수의 관절 회전 추정부(331 ~ 33(K-1))가 모두 직렬로 연결되는 다단 구조를 갖지만, 인간 관절은 각 부위별로 구분된다. 일예로 양팔과 양다리의 관절들은 독립적으로 회전할 수 있다. 이에 모델 자세 추정부(330)는 도 9에 도시된 바와 같이, 다수의 관절 회전 추정부(331 ~ 33(K-1))가 인간 신체 부위에 대응하는 구조로 분기되는 구조로 구성되어, 각 관절의 회전 벡터(θ_J ^s = θ₁ ^s, θ₂ ^s, …, θ_K-1 ^s)를 추정할 수도 있다.However, in the case of the model posture estimator 330 shown in FIG. 8 , the plurality of joint rotation estimators 331 to 33 (K-1) have a multi-stage structure in which all are connected in series, but the human joint is are separated For example, the joints of both arms and legs can rotate independently. Accordingly, the model posture estimation unit 330 has a structure in which a plurality of joint rotation estimation units 331 to 33 (K-1) are branched into structures corresponding to human body parts, as shown in FIG. 9 , each It is also possible to estimate the rotation vector of the joint (θ _J ^s = θ ₁ ^s , θ ₂ ^s , …, θ _K-1 ^{s ).}

일예로 도 9의 모델 자세 추정부(330)에서 제1 행의 다수의 관절 회전 추정부는 왼쪽 다리의 각 관절의 회전을 추정하고, 제2 행의 다수의 관절 회전 추정부는 오른쪽 다리, 제3 행과 제3 행에서 분기되는 제4 행의 다수의 관절 회전 추정부는 척추 관절을 경유하여 왼쪽 팔과 오른쪽 팔의 각 관절의 회전을 추정할 수 있다.For example, in the model posture estimator 330 of FIG. 9 , the multiple joint rotation estimator in the first row estimates the rotation of each joint of the left leg, and the multiple joint rotation estimator in the second row includes the right leg and the third row and a plurality of joint rotation estimation units in the fourth row branched from the third row may estimate the rotation of each joint of the left arm and the right arm via the vertebral joint.

3차원 공간 배치부(400)는 3차원 가상의 공간 상에 다단 인간 모델 추정부(300)에서 추정된 상태 벡터(β, θ, Z_t) 중 체형 벡터(β)와 회전 벡터(θ)를 기반으로 인간 모델을 생성하고, 생성된 인간 모델을 배치할 위치를 위치 벡터(Z_t)와 루트 관절 위치(

)에 따라 결정하여 배치한다.The three-dimensional space arrangement unit 400 calculates a body shape vector (β) and a rotation vector (θ) among _{the state vectors (β, θ, Z t} ) estimated by the multi-stage human model estimator 300 in a three-dimensional virtual space. Create a human model based on the position vector (Z _t ) and the root joint position (

) to determine and place

한편 학습부(500)는 다단 인간 모델 추정부(300)에서 추정된 상태 벡터(β, θ, Z_t) 중 체형 벡터(β)의 손실(L_β)과 위치 벡터(Z_t)의 손실을 수학식 2 및 수학식 3에 따라 계산하여 획득할 수 있다.Meanwhile, the learning unit 500 calculates the loss of the body shape vector (β) (L _β ) and the loss of the position vector (Z _t _{) among the state vectors (β, θ, Z t} ) estimated by the multi-stage human model estimator 300 . It can be obtained by calculating according to Equations 2 and 3.

여기서 N은 미니 배치의 샘플수이고, ^ 표시된 변수는 학습 데이터에 레이블된 진리값을 나타낸다.Here, N is the number of samples in the mini-batch, and the variable indicated by ^ represents the truth value labeled in the training data.

그리고 학습부(500)는 로드리게스(Rodrigues)의 회전 공식을 사용하여 각 관절의 축 각도(Axis-angle) 형태의 표현을 회전 행렬로 변환한 후 L2 손실을 적용하여, 자세 손실을 수학식 4에 따라 계산하여 획득할 수 있다.And the learning unit 500 converts the expression of the axis-angle form of each joint into a rotation matrix using Rodrigues's rotation formula, and then applies the L2 loss, and the posture loss is expressed in Equation 4 It can be calculated and obtained accordingly.

여기서 θ_j,i는 샘플 i의 신체 부분 j의 자세 매개 변수이고, R은 축 각도 표현에서 회전 행렬로의 변환으로 L2 손실이 적용되기 전에 회전 행렬이 1D로 평탄화된다.where θ _j,i is the posture parameter of body part j of sample i, R is the transformation from the axial angle representation to the rotation matrix, where the rotation matrix is flattened to 1D before the L2 loss is applied.

다만 축 각도 표현의 크기가 2π의 배수가 동일한 회전을 나타내기 때문에 -π와 π사이의 크기를 제한하기 위해 자세 매개 변수의 표준 제곱에 힌지 손실이 적용되어야 하므로, 자세 손실은 수학식 5로 다시 표현될 수 있다.However, since the size of the axial angle expression represents a rotation that is a multiple of 2π, the hinge loss must be applied to the standard square of the attitude parameter to limit the size between -π and π, so the attitude loss is again expressed by Equation 5 can be expressed

또한 바디 자세 손실은 전역 회전 벡터(θ₀)를 제외한 손실로서, 수학식 6으로 계산될 수 있다.In addition, the body posture loss is a loss _{excluding the global rotation vector (θ 0} ), and may be calculated by Equation (6).

학습부(500)는 수학식 2내지 7에 따라 계산되는 체형 손실(

), 위치 손실(

), 전역 회전 손실(

), 바디 손실(

)에 가중치(

)를 적용하고 바디 회전 손실(

) 추가로 고려하여 총 손실을 수학식 7에 따라 획득한다.The learning unit 500 loses body shape calculated according to Equations 2 to 7 (

), position loss (

), global rotation loss (

), body loss (

) to weight(

) and lose body rotation (

), the total loss is obtained according to Equation (7) with additional consideration.

그리고 획득된 총 손실(L)을 역전파하여, 다단 인간 모델 추정부(300)를 학습시킬 수 있다.And by backpropagating the acquired total loss L, the multi-stage human model estimator 300 may be trained.

여기서 가중치(

)는 일예로 (1, 5, 10, 1)로 설정하였다.where the weight (

) is set to (1, 5, 10, 1) as an example.

도 10은 본 발명의 일 실시예에 따른 3차원 인간 모델 복원 방법을 나타낸다.10 shows a 3D human model reconstruction method according to an embodiment of the present invention.

도 10을 참조하면, 본 실시예에 따른 3차원 인간 모델 복원 방법은 우선 3차원 모델로의 모델링 대상이 되는 적어도 하나의 사람이 포함된 2차원 영상을 획득한다(S11). 그리고 미리 학습된 패턴 추정 방식에 따라 획득된 2차원 영상에 포함된 각 사람의 관절 벡터를 추정한다(S12). 관절 벡터가 추정되면, 추정된 관절 벡터를 정규화하여 2차원 영상에 포함된 적어도 하나의 사람 각각에 대한 2차원 자세 특징을 획득한다(S13). 이때 관절 벡터와 함께 골격 벡터를 추정할 수 있으며, 골격 벡터가 추정되면, 관절 벡터와 함께 골격 벡터를 정규화하고 결합하여 2차원 자세 특징을 획득할 수도 있다.Referring to FIG. 10 , in the 3D human model restoration method according to the present embodiment, a 2D image including at least one person to be modeled as a 3D model is first obtained ( S11 ). Then, the joint vector of each person included in the obtained 2D image is estimated according to the pre-learned pattern estimation method (S12). When the joint vector is estimated, the estimated joint vector is normalized to obtain a two-dimensional posture characteristic for each of at least one person included in the two-dimensional image (S13). In this case, the skeletal vector may be estimated together with the joint vector, and when the skeletal vector is estimated, the skeletal vector may be normalized and combined together with the joint vector to obtain a two-dimensional posture characteristic.

여기서 정규화는 입력된 2차원 영상의 크기 및 해상도에 따라 추정된 관절 위치를 기지정된 정규화 이미지 평면의 좌표로 변환하고, 변환된 관절 위치 중 루트 관절을 앵커 포인트가 되도록 루트 관절의 위치를 정규화 이미지 평면의 좌표의 원점으로 이동시키는 형태로 수행될 수 있다. 이때, 2차원 자세 특징에는 정규화 이미지 평면의 좌표로 변환된 상태에서 루트 관절의 위치(

)가 포함될 수 있다.Here, normalization converts the joint position estimated according to the size and resolution of the input 2D image into coordinates of a predetermined normalized image plane, and normalizes the position of the root joint so that the root joint among the transformed joint positions becomes an anchor point on the image plane It can be performed in the form of moving to the origin of the coordinates of . At this time, the 2D posture feature includes the position of the root joint (

) may be included.

2차원 자세 특징이 획득되면, SMPL 인간 모델 템플릿에 기반하여 3차원 공간 상에서 인간 모델을 생성하고 위치시키기 위한 초기 상태 벡터(β⁰, θ⁰, Z_t ⁰)를 설정한다(S14). 그리고 초기 상태 벡터(β⁰, θ⁰, Z_t ⁰)와 2차원 자세 특징을 결합하고 미리 학습된 패턴 추정 방식에 따라 인코딩하여 결합 상태 벡터를 획득한다(S15). 결합 상태 벡터가 획득되면, 획득된 결합 벡터를 이용하여 미리 학습된 패턴 추정 방식에 따라 인간 모델의 외형 특징을 나타내는 체형 벡터(β)를 추정한다(S16). 이와 병렬로 획득된 결합 벡터를 이용하여 미리 학습된 패턴 추정 방식에 따라 인간 모델의 위치 및 회전 상태를 나타내는 위치 벡터(Z_t) 및 전역 회전 벡터(θ₀)를 함께 획득한다(S17). ^{When the two-dimensional posture feature is obtained, initial state vectors (β 0} , θ ⁰ , Z _t ⁰ ) for generating and locating a human model in a three-dimensional space are set based on the SMPL human model template (S14). Then, the combined state vector is obtained by combining the initial state vectors (β ⁰ , θ ⁰ , Z _t ⁰ ) with the two-dimensional posture feature and encoding according to a pre-learned pattern estimation method (S15). When the binding state vector is obtained, a body shape vector (β) representing the external features of the human model is estimated according to a pre-learned pattern estimation method using the acquired binding vector (S16). _{A position vector (Z t} ) and a global rotation vector (θ ₀ ) representing the position and rotation state of the human model are obtained together according to a pre-learned pattern estimation method using the joint vector obtained in parallel (S17).

전역 회전 벡터(θ₀)가 획득되면, 결합 상태 벡터와 획득된 전역 회전 벡터(θ₀)으로부터 각 관절에 대한 관절 회전 벡터(θ_J = θ₁, θ₂, …, θ_K-1)를 추정한다(S18). 여기서 각 관절에 대한 관절 회전 벡터(θ_J)는 루트 관절로부터 인접한 관절부터 신체 말단 방향의 관절의 회전 벡터를 순차적으로 추정하거나, 신체 구조에 따른 부위별로 병렬로 추정함과 동시에 각 부위에서의 관절 연결 구조에 따라 순차적으로 추정할 수 있다.When the global rotation vector (θ ₀ ) is obtained, the joint rotation vector (θ _J = θ ₁ , θ ₂ , …, θ _K-1 ) for each joint is obtained from the combined state vector and the obtained global rotation vector (θ _{0 ).} Estimate (S18). Here, the joint rotation vector (θ _J ) for each joint is sequentially estimating the rotation vector of the joint from the joint adjacent to the root joint in the direction of the end of the body, or estimating in parallel for each part according to the body structure, and simultaneously estimating the joint in each part. Depending on the connection structure, it can be estimated sequentially.

체형 벡터(β)와 위치 벡터(Z_t), 전역 회전 벡터(θ₀) 및 관절 회전 벡터(θ_J)가 추정되어 새로운 상태 벡터(β, θ, Z_t)가 추정되면, 상태 벡터 추정 횟수가 기지정된 횟수 이상인지 판별한다(S19). 본 실시예에서 상태 벡터 추정 횟수는 도 6에 도시된 바와 같이, 다단 인간 모델 추정부(300)에 포함된 모델 추정부의 개수에 따라 결정될 수 있다. 상태 벡터 추정 횟수가 모델 추정부의 개수에 의해 지정된 횟수 미만이면, 다시 추정된 상태 벡터(β, θ, Z_t)와 2차원 자세 특징을 결합하고 미리 학습된 패턴 추정 방식에 따라 인코딩하여 결합 상태 벡터를 획득한다(S15).When the body vector (β), position vector (Z _t ), global rotation vector (θ ₀ ), and joint rotation vector (θ _J ) are estimated and new state vectors (β, θ, Z _t ) are estimated, the number of state vector estimations It is determined whether is more than a predetermined number of times (S19). In the present embodiment, as shown in FIG. 6 , the number of state vector estimation may be determined according to the number of model estimators included in the multi-stage human model estimator 300 . If the number of state vector estimation is less than the number specified by the number of model estimators, the re-estimated state vector (β, θ, Z _t ) and the two-dimensional posture feature are combined and encoded according to the pre-learned pattern estimation method to combine the state vector to obtain (S15).

그러나 상태 벡터 추정 횟수가 지정된 횟수 이상이면, 획득된 상태 벡터(β, θ, Z_t) 중 체형 벡터(β)와 자세 벡터(θ)를 기반으로 인간 모델을 생성한다(S20). 그리고 위치 벡터(Z_t)와 정규화 시에 획득된 루트 관절의 위치(

)로부터 3차원 공간 상에 인간 모델이 배치될 위치를 결정하여 배치한다(S21).However, if the number of state vector estimation is greater than or equal to the specified number of times, a human model is generated based on the body shape vector (β) and the posture vector (θ) among the _{obtained state vectors (β, θ, Z t ) ( S20 ).} And the position vector (Z _t ) and the position of the root joint obtained during normalization (

) to determine and arrange a position where the human model is to be placed on the three-dimensional space (S21).

한편 인간 모델이 3차원 공간에 배치되면, 2차원 영상 내의 모든 사람에 대한 인간 모델이 3차원 공간에 배치되었는지 판별한다(S22). 만일 배치되지 않은 사람이 존재한다면, 다시 배치되지 않은 사람의 관절 위치를 추정한다(S12). 그러나 모든 사람이 배치된 것으로 판별되면, 3차원 인간 모델 복원을 종료한다.On the other hand, when the human model is arranged in the 3D space, it is determined whether the human models for all people in the 2D image are arranged in the 3D space (S22). If there is a person who is not placed, the joint position of the person who is not placed again is estimated (S12). However, when it is determined that all people have been placed, the 3D human model restoration is terminated.

도시하지 않았으나, 본 실시예에 따른 3차원 인간 모델 복원 방법은 결합 상태 벡터로부터 상태 벡터(β, θ, Z_t)를 추정하는 패턴 추정 방식을 학습시키기 위한 학습 단계가 더 포함될 수 있다. 학습 단계는 수학식 2내지 7에 따라 체형 손실(

), 위치 손실(

), 전역 회전 손실(

), 바디 손실(

)을 계산하고 대응하는 가중치(

)를 적용하며, 바디 회전 손실(

) 추가로 고려하여 총 손실을 수학식 7에 따라 획득한다. 그리고 획득된 총 손실을 역전파하여 학습을 수행할 수 있다.Although not shown, the 3D human model reconstruction method according to the present embodiment may further include a learning step for learning a pattern estimation method for estimating the _{state vectors (β, θ, Z t ) from the combined state vector.} The learning stage is the loss of body shape (

), position loss (

), global rotation loss (

), body loss (

) and calculate the corresponding weights (

), and the body rotation loss (

), the total loss is obtained according to Equation (7) with additional consideration. Then, learning can be performed by backpropagating the acquired total loss.

도 11은 본 발명의 3차원 인간 모델 복원 방법을 이용하여 2차원 영상으로부터 3차원 공간 상에 복원된 인간 모델의 예를 나타낸다.11 shows an example of a human model reconstructed in a 3D space from a 2D image using the 3D human model reconstruction method of the present invention.

도 11은 MS COCO 데이터 셋을 이용하여 2차원 영상으로부터 3차원 공간 상에 인간 모델을 복원한 결과이다. 도 11에 도시된 바와 같이, 본 실시예에 따른 3차원 인간 모델 복원 장치 및 방법은 다수의 사람이 포함된 2차원 영상이 인가되면, 영상에 포함된 각 사람의 자세, 체형뿐만 아니라 3차원 공간 상에서의 상대적 위치까지 매우 정확하게 복원할 수 있음을 알 수 있다.11 is a result of reconstructing a human model in a 3D space from a 2D image using the MS COCO data set. 11 , in the 3D human model restoration apparatus and method according to the present embodiment, when a 2D image including a plurality of people is applied, not only the posture and body type of each person included in the image but also the 3D space It can be seen that even the relative position on the image can be very accurately restored.

본 발명에 따른 방법은 컴퓨터에서 실행시키기 위한 매체에 저장된 컴퓨터 프로그램으로 구현될 수 있다. 여기서 컴퓨터 판독가능 매체는 컴퓨터에 의해 액세스 될 수 있는 임의의 가용 매체일 수 있고, 또한 컴퓨터 저장 매체를 모두 포함할 수 있다. 컴퓨터 저장 매체는 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함하며, ROM(판독 전용 메모리), RAM(랜덤 액세스 메모리), CD(컴팩트 디스크)-ROM, DVD(디지털 비디오 디스크)-ROM, 자기 테이프, 플로피 디스크, 광데이터 저장장치 등을 포함할 수 있다.The method according to the present invention may be implemented as a computer program stored in a medium for execution by a computer. Here, the computer-readable medium may be any available medium that can be accessed by a computer, and may include all computer storage media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, and read dedicated memory), RAM (Random Access Memory), CD (Compact Disk)-ROM, DVD (Digital Video Disk)-ROM, magnetic tape, floppy disk, optical data storage, and the like.

본 발명은 도면에 도시된 실시예를 참고로 설명되었으나 이는 예시적인 것에 불과하며, 본 기술 분야의 통상의 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 타 실시예가 가능하다는 점을 이해할 것이다.Although the present invention has been described with reference to the embodiment shown in the drawings, which is merely exemplary, those skilled in the art will understand that various modifications and equivalent other embodiments are possible therefrom.

따라서, 본 발명의 진정한 기술적 보호 범위는 첨부된 청구범위의 기술적 사상에 의해 정해져야 할 것이다.Accordingly, the true technical protection scope of the present invention should be defined by the technical spirit of the appended claims.

100: 영상 획득부 200: 2차원 자세 특징 추출부
300: 다단 인간 모델 추정부 400: 3차원 공간 배치부
500: 학습부100: image acquisition unit 200: two-dimensional posture feature extraction unit
300: multi-stage human model estimator 400: 3D space arrangement unit
500: study

Claims

According to the pre-learned pattern estimation method, a 2D joint vector corresponding to the joint position of each of at least one person included in the 2D image is estimated, and the estimated 2D joint vector is normalized by a predetermined method to characterize the 2D posture. A two-dimensional posture feature extraction unit to obtain;
In order to receive the two-dimensional posture characteristic for each of the at least one person included in the two-dimensional image, and to reconstruct a human model of a body type and posture corresponding to the two-dimensional posture characteristic according to a pre-learned pattern estimation method, the human a multi-stage human model estimator for estimating a body shape vector and a posture vector specified by a model template, and estimating a position vector for arranging the restored human model at a position corresponding to the two-dimensional posture feature on a three-dimensional space; and
A state vector composed of the body shape vector, the posture vector, and the position vector and the two-dimensional posture feature are applied, the human model corresponding to the body vector and the posture vector is restored, and the human restored according to the position vector A three-dimensional human model restoration apparatus including a three-dimensional space arrangement unit for arranging a model on a three-dimensional space.

According to claim 1, wherein the multi-stage human model estimator
It includes a plurality of model estimators sequentially connected in a multi-stage structure,
Each of the plurality of model estimators generates a combined state vector by receiving and combining the two-dimensional posture feature and the state vector estimated by the model estimator of the previous stage, and according to the pattern estimation method learned in advance from the generated combined state vector. In order to reconstruct a human model of a body shape and posture corresponding to the two-dimensional posture characteristic, a body shape vector and a posture vector specified by a human model template are estimated, and the restored human model corresponds to the two-dimensional posture feature in a three-dimensional space. A three-dimensional human model restoration apparatus that estimates a position vector to be placed in a position to generate a new state vector including a body vector, a posture vector, and a position vector, and transmits it to the model estimator of the next stage.

The method of claim 2, wherein each of the plurality of model estimators
a feature assembling unit for generating a combined state vector by receiving and combining the two-dimensional posture characteristic and the state vector estimated by the model estimator of the previous stage;
According to the pattern estimation method learned in advance from the combined state vector, the body shape vector corresponding to the two-dimensional posture feature, the position vector corresponding to the position where the human model should be placed in the three-dimensional space, and the rotation angle of each of a plurality of human joints a position estimator for estimating a global rotation vector representing a rotation angle of the entire human model based on a predetermined root joint among posture vectors composed of a plurality of corresponding joint rotation vectors; and
The two-dimensional posture characteristic and the global rotation vector are applied, and the relative rotation angle of a joint far from an adjacent joint in the root joint is estimated in a predetermined sequence based on the global rotation vector to obtain the global rotation vector among the posture vectors. A three-dimensional human model restoration apparatus including a model posture estimator for obtaining a body posture vector including a joint rotation vector of the remaining joints.

The method of claim 3, wherein the position estimator
a body shape estimator trained in advance to estimate a body shape vector corresponding to the two-dimensional posture characteristic from the combined state vector; and
and a depth/rotation estimator for simultaneously estimating the position vector and the global rotation vector corresponding to the two-dimensional posture feature from the combined state vector by being independently learned from the body shape estimator.

The method of claim 4, wherein the model posture estimating unit
Among the plurality of joints of the human model, the number of joints corresponding to the remaining joints except for the root joint is provided, and the joint rotation corresponding to the global rotation vector or the joint rotation vector obtained in the previous stage and the two-dimensional posture characteristic are applied, respectively. A three-dimensional human model restoration apparatus including a plurality of joint rotation estimators for estimating vectors.

According to claim 5, wherein the plurality of joint rotation estimation unit
In response to the joint connection structure for each part of the human model, a three-dimensional human model that is connected in parallel to each of a plurality of parts from the root joint and is connected in series according to the joint connection structure of each part to estimate a joint rotation vector restore device.

According to claim 1, wherein the two-dimensional posture feature extraction unit
The two-dimensional joint vector is transformed to have a size and position corresponding to the normalized image plane of a predetermined scale and resolution, and the position of the joint vector corresponding to the root joint in the transformed two-dimensional joint vector is placed at the origin of the normalized image plane A 3D human model restoration device that moves as much as possible.

The method of claim 7, wherein the two-dimensional posture feature extraction unit
A 3D human model restoration apparatus including the position of a joint vector corresponding to the root joint before moving to the origin of the normalized image plane in the 2D posture characteristic.

The method of claim 7, wherein the two-dimensional posture feature extraction unit
3D comprising: estimating a 2D skeletal vector corresponding to each skeletal direction of at least one person included in the 2D image, normalizing the estimated 2D skeletal vector in a predetermined manner, and further including in the 2D posture characteristic Human model restoration device.

According to claim 3, wherein the three-dimensional human model restoration apparatus
Loss of body shape from the state vector estimated by the multi-stage human model estimator from the two-dimensional learning image and the truth value of the state vector labeled in the two-dimensional learning image (

), position loss (

), global rotation loss (

), body loss (

) and calculate the corresponding weights (

), and the body rotation loss (

), further taking into account the total loss in the formula

3D human model restoration apparatus further comprising a learning unit for calculating and backpropagating the calculated total loss to train the multi-stage human model estimator.

According to the pre-learned pattern estimation method, a 2D joint vector corresponding to the joint position of each of at least one person included in the 2D image is estimated, and the estimated 2D joint vector is normalized by a predetermined method to characterize the 2D posture. obtaining a;
In order to receive the two-dimensional posture characteristic for each of the at least one person included in the two-dimensional image, and to reconstruct a human model of a body type and posture corresponding to the two-dimensional posture characteristic according to a pre-learned pattern estimation method, the human obtaining a state vector by estimating a body shape vector and a posture vector specified by a model template, and estimating a position vector for placing the reconstructed human model at a position corresponding to the two-dimensional posture feature on a three-dimensional space; and
A state vector composed of the body shape vector, the posture vector, and the position vector and the two-dimensional posture feature are applied, the human model corresponding to the body vector and the posture vector is restored, and the human restored according to the position vector A method of reconstructing a three-dimensional human model comprising the step of placing the model on a three-dimensional probability.

12. The method of claim 11, wherein obtaining the state vector comprises:
generating a combined state vector by receiving and combining the two-dimensional posture characteristic and the previously estimated state vector;
estimating a body shape vector and a posture vector specified by a human model template to reconstruct a human model of a body shape and a posture corresponding to the two-dimensional posture feature according to a pattern estimation method learned in advance from the generated combined state vector;
estimating a position vector for arranging the reconstructed human model at a position corresponding to the two-dimensional posture feature on a three-dimensional space; and
A method of restoring a three-dimensional human model, comprising generating and transmitting a new state vector including a body vector, a posture vector, and a position vector.

The method of claim 12, wherein estimating the posture vector comprises:
According to the pattern estimation method learned in advance from the combined state vector, the body shape vector corresponding to the two-dimensional posture feature, the position vector corresponding to the position where the human model should be placed in the three-dimensional space, and the rotation angle of each of a plurality of human joints estimating a global rotation vector representing a rotation angle of the entire human model based on a predetermined root joint among posture vectors composed of a plurality of corresponding joint rotation vectors; and
The two-dimensional posture characteristic and the global rotation vector are applied, and the relative rotation angle of a joint far from an adjacent joint in the root joint is estimated in a predetermined sequence based on the global rotation vector to obtain the global rotation vector among the posture vectors. 3D human model restoration method comprising the step of obtaining a body posture vector including a joint rotation vector of the remaining joints.

14. The method of claim 13, wherein estimating the global rotation vector comprises:
estimating a body vector corresponding to the two-dimensional posture feature from the combined state vector; and
and simultaneously estimating the position vector and the global rotation vector corresponding to the two-dimensional posture feature from the combined state vector independently of the step of estimating the body shape vector.

The method of claim 11, wherein the acquiring of the two-dimensional posture characteristic comprises:
converting the two-dimensional joint vector to have a size and a position corresponding to a normalized image plane of a predetermined scale and resolution; and
A 3D human model restoration method comprising the step of moving the position of the joint vector corresponding to the root joint in the transformed 2D joint vector to be placed at the origin of the normalized image plane.

The method of claim 15, wherein the acquiring of the two-dimensional posture characteristic comprises:
The 3D human model restoration method further comprising the step of including the position of the joint vector corresponding to the root joint before moving to the origin of the normalized image plane in the 2D posture characteristic.

The method of claim 13 , wherein the three-dimensional human model reconstruction method comprises:
Loss of body shape from the truth value of the state vector estimated from the two-dimensional learning image and the state vector labeled in the two-dimensional learning image (

), position loss (

), global rotation loss (

), body loss (

) and calculate the corresponding weights (

), and the body rotation loss (

), further taking into account the total loss in the formula

3D human model reconstruction method further comprising the step of calculating and learning by backpropagating the calculated total loss.