WO2022131390A1 - Self-supervised learning-based three-dimensional human posture estimation method using multi-view images - Google Patents

Self-supervised learning-based three-dimensional human posture estimation method using multi-view images Download PDF

Info

Publication number
WO2022131390A1
WO2022131390A1 PCT/KR2020/018365 KR2020018365W WO2022131390A1 WO 2022131390 A1 WO2022131390 A1 WO 2022131390A1 KR 2020018365 W KR2020018365 W KR 2020018365W WO 2022131390 A1 WO2022131390 A1 WO 2022131390A1
Authority
WO
WIPO (PCT)
Prior art keywords
posture
dimensional
estimation
view
network
Prior art date
Application number
PCT/KR2020/018365
Other languages
French (fr)
Korean (ko)
Inventor
윤주홍
박민규
장인호
김제우
Original Assignee
한국전자기술연구원
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 한국전자기술연구원 filed Critical 한국전자기술연구원
Publication of WO2022131390A1 publication Critical patent/WO2022131390A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • G06T7/251Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/60Rotation of whole images or parts thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/204Image signal generators using stereoscopic image cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person

Definitions

  • the present invention relates to artificial intelligence technology for image processing, and more particularly, to a method of restoring a three-dimensional posture from an image, two-dimensional posture information, and camera parameters using a self-supervised learning method.
  • Existing 3D human posture information restoration technology mainly uses a method to restore the 3D position of the joint by directly attaching a large number of expensive sensors to the person, but it is not suitable for use in real life due to dependence on additional equipment and inconvenience. .
  • 3D human posture data takes a lot of time, human resources, and equipment resources to make label data differently from 2D human posture data.
  • the present invention has been devised to solve the above problems, and an object of the present invention is to improve the supervised learning 3D human posture estimation method that was dependent on 3D label data, thereby providing a self-supervised learning method that is relatively free from label data. It is intended to provide a method that can be used for general-purpose posture estimation.
  • a network learning method for estimating a 3D posture is a 3D method of a first view from an input image of a first view using a network for 3D posture estimation.
  • a first estimation step of estimating a posture a second estimation step of estimating a 3D posture of a second viewpoint from an input image of a second viewpoint using a network for 3D posture estimation; a first rotation step of rotating the three-dimensional posture of the first viewpoint to the second viewpoint; using the loss function between the 3D posture obtained in the second estimation step and the 3D posture obtained in the first rotation step to learn a network for 3D posture estimation.
  • a network learning method for three-dimensional posture estimation comprises: a second rotation step of rotating a three-dimensional posture of a second view to a first view; and using the loss function between the 3D posture obtained in the first estimation step and the 3D posture obtained in the second rotation step to learn a network for 3D posture estimation.
  • the first rotation step is performed by applying a camera rotation matrix that converts the first point of view of the camera to the second point of view, and the three-dimensional posture of the first point of view is rotated to the second point of view, and the second rotation step includes the second point of view of the camera.
  • a camera rotation matrix that converts the viewpoint to the first viewpoint the 3D posture of the second viewpoint may be rotated to the first viewpoint.
  • a network learning method for three-dimensional posture estimation includes: a first transformation step of converting a three-dimensional posture obtained in a first rotation step into a two-dimensional posture of a second viewpoint; Using the loss function between the two-dimensional attitude of the second viewpoint obtained in the first transformation step and the two-dimensional attitude label of the input image of the second viewpoint, the step of learning a network for 3-dimensional attitude estimation; have.
  • a network learning method for three-dimensional posture estimation comprises: a second transformation step of converting the three-dimensional posture obtained in the second rotation step into a two-dimensional posture of a first viewpoint; Using a loss function between the two-dimensional attitude of the first viewpoint obtained in the second transformation step and the two-dimensional attitude label of the input image of the first viewpoint, the step of learning a network for 3-D attitude estimation; may further include have.
  • the first conversion step uses the camera parameters to convert the three-dimensional posture obtained in the first rotation step into the two-dimensional posture at the second viewpoint
  • the second conversion step uses the camera parameters to convert the second rotation step It is possible to convert the obtained 3D posture into the 2D posture of the first viewpoint.
  • the input image may be a two-dimensional input image.
  • the network learning method for three-dimensional posture estimation may further include: estimating a three-dimensional posture by inputting an input image to a learned network for three-dimensional posture estimation.
  • the network learning method for 3D posture estimation may further include outputting the 3D posture estimated in the estimation step together with an input image.
  • the 3D posture estimation system estimates the 3D posture of the first view from the input image of the first view by using a network for 3D posture estimation, and the second view point an estimator for estimating a three-dimensional posture of a second viewpoint from an input image of ; a rotating unit that rotates the three-dimensional posture of the first viewpoint to the second viewpoint; and a learning unit for learning a network for 3D posture estimation using a loss function between the 3D posture obtained from the estimator and the 3D posture obtained from the rotation unit.
  • a method for estimating a 3D posture includes: estimating a 3D posture by inputting an input image to a network for estimating a 3D posture; outputting the information on the 3D posture estimated in the estimation step; including, wherein the network for 3D posture estimation uses the network for 3D posture estimation from the input image of the first view point of the first view Estimate the 3D posture, estimate the 3D posture of the second view from the input image of the second view using a network for 3D posture estimation, rotate the 3D posture of the first view to the second view, It is learned using a loss function between the 3D posture of the second viewpoint obtained through estimation and the 3D posture of the second viewpoint obtained through rotation.
  • a three-dimensional posture estimation system includes: an estimator for estimating a three-dimensional posture by inputting an input image to a network for estimating a three-dimensional posture; and an output unit for outputting information on the 3D posture estimated by the estimator, wherein the network for 3D posture estimation uses the network for 3D posture estimation from the input image of the first view point to the first view point.
  • Estimate the 3D posture of the second view from the input image of the second view using the network for 3D posture estimation, and rotate the 3D posture of the first view to the second view, is learned using a loss function between the 3D posture of the second viewpoint obtained through estimation and the 3D posture of the second viewpoint obtained through rotation.
  • the three-dimensional human posture estimation model which was dependent on label data, is improved in a self-supervised learning method to obtain a three-dimensional human posture using only images, two-dimensional postures, and camera parameters. can be estimated.
  • 1 is a configuration diagram of supervised deep learning for posture estimation
  • FIG. 3 is a view showing a three-dimensional human posture estimation system according to an embodiment of the present invention.
  • FIG. 4 is a flowchart showing a three-dimensional posture information acquisition process for self-supervised learning
  • FIG. 5 is a diagram schematically illustrating a process of generating a three-dimensional posture of view A and a three-dimensional posture of view B using a two-dimensional input image of view A;
  • FIG. 6 is a diagram schematically illustrating a process of generating a three-dimensional posture of view B and a three-dimensional posture of view A using a two-dimensional input image of view B;
  • FIG. 7 is a flowchart showing a two-dimensional posture information acquisition process for self-supervised learning
  • FIG. 8 is a diagram provided to explain a method for learning a network for 3D posture estimation.
  • a self-supervised learning-based three-dimensional human posture estimation method using a multi-view image is presented.
  • the device recognizes the user's current posture and shape for interaction between the device and the user.
  • the image, The 3D posture is restored using only the 2D posture information and camera parameters.
  • a self-supervised learning model using a two-dimensional human posture label and camera parameters without hard-to-find three-dimensional human posture label data is presented.
  • 1 is a block diagram of a supervised deep learning model for posture estimation. Because supervised learning optimizes by backpropagating the error between the model result and the label data, the model can be trained only when the input and the label exist in pairs.
  • the self-supervised learning model utilized in the embodiment of the present invention optimizes the error of the two-dimensional projection of the model result and the rotated result with the camera parameter and the two-dimensional human posture label, it is self-guided without the three-dimensional label data.
  • Network optimization can be done with only the input data of 2 is a block diagram illustrating the concept of a self-supervised deep learning model.
  • a deep learning model is trained by rotating the three-dimensional posture estimation result with different camera rotation matrices, converting them into three-dimensional postures in each direction, and calculating the self-loss function.
  • 3 is a diagram illustrating a three-dimensional human posture estimation system according to an embodiment of the present invention.
  • 3D human posture estimation system according to an embodiment of the present invention, as shown, the input unit 110, the output unit 120, the estimator 130, the learning unit 140, the rotating unit 150 and the converting unit (160) is included.
  • the input unit 110 is a means for receiving a learning image and an inference image.
  • a two-dimensional multi-view image is input as a learning image and an inference image, and in the learning mode, two-dimensional posture information of the two-dimensional image is input as a label.
  • the estimator 130 is a deep learning network that estimates a 3D posture using a 2D image input through the input unit 110 .
  • the network extracts useful features using Resnet50, which shows good performance in image feature detection, and estimates the three-dimensional posture through the Fully Connected Layer.
  • the output unit 120 displays and outputs the 3D posture information estimated by the estimator 130 on the input image.
  • the rotating unit 150 rotates the three-dimensional posture estimated by the estimator 130 to a posture of another viewpoint. To this end, the rotation unit 150 applies the camera rotation matrix to the estimated three-dimensional posture and converts it into a three-dimensional posture of different viewpoints.
  • the conversion unit 160 converts the three-dimensional posture rotated by the rotating unit 150 into a two-dimensional posture using the internal camera parameters.
  • the learning unit 140 uses the three-dimensional posture estimated by the estimator 130 , the three-dimensional posture rotated by the rotating unit 150 , and the two-dimensional posture converted by the transform unit 160 and the input unit 110 .
  • a deep learning network for 3D posture estimation is trained using the input 2D posture label.
  • the learning unit 140 calculates a self-loss function between the three-dimensional postures and a loss function of the two-dimensional posture.
  • a specific learning method by the learning unit 140 will be described later in detail with reference to FIG. 8 .
  • FIG. 4 is a flowchart illustrating a three-dimensional posture information acquisition process for self-supervised learning.
  • the estimator 130 estimates the 3D attitude of the viewpoint A using a network for estimating the 3D attitude.
  • the rotating unit 150 rotates the 3D posture of the viewpoint A estimated by the estimator 130 to the viewpoint B, and converts the 3D posture of the viewpoint B into the 3D posture.
  • FIG. 5 schematically shows a process of generating a 3D posture of view A and a 3D posture of view B using the 2D input image of view A.
  • the estimator 130 calculates the 3D attitude of the viewpoint B using a network for 3D attitude estimation. estimate
  • the rotation unit 150 rotates the 3D posture of the viewpoint B estimated by the estimator 130 to the viewpoint A, and converts the 3D posture of the viewpoint A into the 3D posture.
  • FIG. 6 schematically shows a process of generating the 3D posture of view B and the 3D posture of view A using the 2D input image of view B.
  • FIG. 7 is a flowchart illustrating a two-dimensional posture information acquisition process for self-supervised learning.
  • the converting unit 160 converts the three-dimensional attitude of the B viewpoint rotated from the A viewpoint to the B viewpoint by the rotation unit 150 into a two-dimensional attitude of the B viewpoint by using the camera parameters.
  • the conversion unit 160 converts the three-dimensional posture of the viewpoint A rotated from the viewpoint B to the viewpoint A by the rotation unit 150 into the two-dimensional attitude of the viewpoint A by using the camera parameters.
  • FIG. 8 is a diagram provided to explain a method for the learning unit of FIG. 3 to learn a network for 3D posture estimation by using the posture information generated through FIGS. 4 and 7 .
  • the learning unit 140 learns a network for 3D posture estimation by using the self-loss function between the following two 3D postures.
  • the learning unit 140 learns a network for estimating the 3D posture by using the self-loss function between the following two 3D postures.
  • the learning unit 140 further learns the network for estimating the three-dimensional posture by using the following loss function between the two two-dimensional postures.
  • the learning unit 140 further trains the network for estimating the 3D posture by using the following loss function between the two 2D postures.
  • the technical idea of the present invention can be applied to a computer-readable recording medium containing a computer program for performing the functions of the apparatus and method according to the present embodiment.
  • the technical ideas according to various embodiments of the present invention may be implemented in the form of computer-readable codes recorded on a computer-readable recording medium.
  • the computer-readable recording medium may be any data storage device readable by the computer and capable of storing data.
  • the computer-readable recording medium may be a ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical disk, hard disk drive, or the like.
  • the computer-readable code or program stored in the computer-readable recording medium may be transmitted through a network connected between computers.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Medical Informatics (AREA)
  • Signal Processing (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

Provided is a self-supervised learning-based three-dimensional human posture estimation method using multi-view images. A three-dimensional human posture estimation method according to an embodiment of the present invention uses a self-supervised learning method among deep learning techniques to restore a three-dimensional posture by using only an image, two-dimensional posture information, and camera parameters. Accordingly, it is possible to optimize a network by using only its own input data and two-dimensional posture labels without three-dimensional label data.

Description

다중 시점 이미지를 사용한 자가지도 학습 기반 3차원 사람 자세 추정 방법Self-supervised learning-based three-dimensional human posture estimation method using multi-view images
본 발명은 영상 처리를 위한 인공지능 기술에 관한 것으로, 더욱 상세하게는 자가지도 학습방식을 사용하여 이미지, 2차원 자세 정보, 카메라 파라미터로부터 3차원 자세를 복원하는 방법에 관한 것이다.The present invention relates to artificial intelligence technology for image processing, and more particularly, to a method of restoring a three-dimensional posture from an image, two-dimensional posture information, and camera parameters using a self-supervised learning method.
기존 3차원 사람 자세 정보 복원 기술은 주로 많은 수의 고가 센서를 사람에게 직접 부착하여 관절의 3차원 위치를 복원하는 방식을 사용하였으나, 부가적인 장비 의존도, 불편성으로 실생활에서 사용하기는 적합하지 않다.Existing 3D human posture information restoration technology mainly uses a method to restore the 3D position of the joint by directly attaching a large number of expensive sensors to the person, but it is not suitable for use in real life due to dependence on additional equipment and inconvenience. .
최근 들어 컴퓨터 성능 향상과 데이터 수의 증가 그리고 알고리즘의 향상에 따라 딥 러닝 성능이 폭발적으로 향상되었고, 3차원 사람 자세 정보 복원에도 딥 러닝 기술을 사용하여 특정 데이터 환경에서 준수한 결과를 보여주었다.Recently, the deep learning performance has been explosively improved due to the improvement of computer performance, the increase in the number of data, and the improvement of algorithms.
하지만 딥 러닝 기술들은 많은 수의 입력 데이터와 정답(레이블) 데이터를 비교하여 오차를 기반으로 모델의 성능을 최적화 시키지만 3D 자세 추정에 적용하는 데 있어서 다음과 같은 몇 가지 문제점이 있다. However, deep learning techniques optimize the performance of the model based on the error by comparing a large number of input data and correct answer (label) data, but there are several problems in applying it to 3D posture estimation as follows.
첫 째, 딥 러닝 지도학습 모델 학습 시 많은 양의 레이블 데이터가 없는 경우 모델을 학습시킬 수가 없다. First, when training a deep learning supervised learning model, if there is not a large amount of label data, the model cannot be trained.
둘 째, 3차원 사람 자세 데이터는 2차원 사람 자세 데이터와 다르게 레이블 데이터를 만드는 데 있어 많은 시간, 인적자원, 장비 자원이 소요된다. Second, 3D human posture data takes a lot of time, human resources, and equipment resources to make label data differently from 2D human posture data.
셋 째, 많은 자원이 들어가기에 공개된 데이터의 수 자체가 적고, 딥 러닝 모델을 훈련 시키는데 한정적인 데이터만 존재한다.Third, since many resources are involved, the number of published data itself is small, and there is only limited data for training deep learning models.
이와 같은 문제점들 때문에 지도학습 딥 러닝 기술은 레이블 데이터가 없는 경우 그 성능을 발휘할 수가 없다.Because of these problems, supervised deep learning technology cannot exhibit its performance without label data.
본 발명은 상기와 같은 문제점을 해결하기 위하여 안출된 것으로서, 본 발명의 목적은, 3차원 레이블 데이터에 의존적이던 지도학습 3차원 사람 자세 추정 방식을 개선하여, 레이블 데이터로부터 비교적 자유로운 자가지도학습 방식을 사용하여 범용적인 자세 추정이 가능한 방법을 제공함에 있다.The present invention has been devised to solve the above problems, and an object of the present invention is to improve the supervised learning 3D human posture estimation method that was dependent on 3D label data, thereby providing a self-supervised learning method that is relatively free from label data. It is intended to provide a method that can be used for general-purpose posture estimation.
상기 목적을 달성하기 위한 본 발명의 일 실시예에 따른, 3차원 자세 추정을 위한 네트워크 학습 방법은, 3차원 자세 추정을 위한 네트워크를 이용하여, 제1 시점의 입력 영상으로부터 제1 시점의 3차원 자세를 추정하는 제1 추정단계; 3차원 자세 추정을 위한 네트워크를 이용하여, 제2 시점의 입력 영상으로부터 제2 시점의 3차원 자세를 추정하는 제2 추정단계; 제1 시점의 3차원 자세를 제2 시점으로 회전하는 제1 회전단계; 제2 추정단계에서 획득된 3차원 자세와 제1 회전단계에서 획득된 3차원 자세 간 손실함수를 이용하여, 3차원 자세 추정을 위한 네트워크를 학습시키는 단계;를 포함한다.According to an embodiment of the present invention for achieving the above object, a network learning method for estimating a 3D posture is a 3D method of a first view from an input image of a first view using a network for 3D posture estimation. a first estimation step of estimating a posture; a second estimation step of estimating a 3D posture of a second viewpoint from an input image of a second viewpoint using a network for 3D posture estimation; a first rotation step of rotating the three-dimensional posture of the first viewpoint to the second viewpoint; using the loss function between the 3D posture obtained in the second estimation step and the 3D posture obtained in the first rotation step to learn a network for 3D posture estimation.
본 발명의 실시예에 따른 3차원 자세 추정을 위한 네트워크 학습 방법은, 제2 시점의 3차원 자세를 제1 시점으로 회전하는 제2 회전단계; 및 제1 추정단계에서 획득된 3차원 자세와 제2 회전단계에서 획득된 3차원 자세 간 손실함수를 이용하여, 3차원 자세 추정을 위한 네트워크를 학습시키는 단계;를 더 포함할 수 있다.A network learning method for three-dimensional posture estimation according to an embodiment of the present invention comprises: a second rotation step of rotating a three-dimensional posture of a second view to a first view; and using the loss function between the 3D posture obtained in the first estimation step and the 3D posture obtained in the second rotation step to learn a network for 3D posture estimation.
제1 회전단계는, 카메라의 제1 시점을 제2 시점으로 변환하는 카메라 회전행렬을 적용하여, 제1 시점의 3차원 자세를 제2 시점으로 회전하고, 제2 회전단계는, 카메라의 제2 시점을 제1 시점으로 변환하는 카메라 회전행렬을 적용하여, 제2 시점의 3차원 자세를 제1 시점으로 회전할 수 있다.The first rotation step is performed by applying a camera rotation matrix that converts the first point of view of the camera to the second point of view, and the three-dimensional posture of the first point of view is rotated to the second point of view, and the second rotation step includes the second point of view of the camera. By applying a camera rotation matrix that converts the viewpoint to the first viewpoint, the 3D posture of the second viewpoint may be rotated to the first viewpoint.
본 발명의 실시예에 따른 3차원 자세 추정을 위한 네트워크 학습 방법은, 제1 회전단계에서 획득된 3차원 자세를 제2 시점의 2차원 자세로 변환하는 제1 변환단계; 제1 변환단계에서 획득된 제2 시점의 2차원 자세와 제2 시점의 입력 영상의 2차원 자세 레이블 간 손실함수를 이용하여, 3차원 자세 추정을 위한 네트워크를 학습시키는 단계;를 더 포함할 수 있다.A network learning method for three-dimensional posture estimation according to an embodiment of the present invention includes: a first transformation step of converting a three-dimensional posture obtained in a first rotation step into a two-dimensional posture of a second viewpoint; Using the loss function between the two-dimensional attitude of the second viewpoint obtained in the first transformation step and the two-dimensional attitude label of the input image of the second viewpoint, the step of learning a network for 3-dimensional attitude estimation; have.
본 발명의 실시예에 따른 3차원 자세 추정을 위한 네트워크 학습 방법은, 제2 회전단계에서 획득된 3차원 자세를 제1 시점의 2차원 자세로 변환하는 제2 변환단계; 제2 변환단계에서 획득된 제1 시점의 2차원 자세와 제1 시점의 입력 영상의 2차원 자세 레이블 간 손실함수를 이용하여, 3차원 자세 추정을 위한 네트워크를 학습시키는 단계;를 더 포함할 수 있다.A network learning method for three-dimensional posture estimation according to an embodiment of the present invention comprises: a second transformation step of converting the three-dimensional posture obtained in the second rotation step into a two-dimensional posture of a first viewpoint; Using a loss function between the two-dimensional attitude of the first viewpoint obtained in the second transformation step and the two-dimensional attitude label of the input image of the first viewpoint, the step of learning a network for 3-D attitude estimation; may further include have.
제1 변환단계는, 카메라 파라미터를 사용하여, 제1 회전단계에서 획득된 3차원 자세를 제2 시점의 2차원 자세로 변환하고, 제2 변환단계는, 카메라 파라미터를 사용하여, 제2 회전단계에서 획득된 3차원 자세를 제1 시점의 2차원 자세로 변환할 수 있다.The first conversion step uses the camera parameters to convert the three-dimensional posture obtained in the first rotation step into the two-dimensional posture at the second viewpoint, and the second conversion step uses the camera parameters to convert the second rotation step It is possible to convert the obtained 3D posture into the 2D posture of the first viewpoint.
입력 영상은, 2차원 입력 영상일 수 있다.The input image may be a two-dimensional input image.
본 발명의 실시예에 따른 3차원 자세 추정을 위한 네트워크 학습 방법은, 입력 영상을 학습된 3차원 자세 추정을 위한 네트워크에 입력하여 3차원 자세를 추정하는 단계;를 더 포함할 수 있다.The network learning method for three-dimensional posture estimation according to an embodiment of the present invention may further include: estimating a three-dimensional posture by inputting an input image to a learned network for three-dimensional posture estimation.
본 발명의 실시예에 따른 3차원 자세 추정을 위한 네트워크 학습 방법은, 추정 단계에서 추정된 3차원 자세를 입력 영상과 함께 출력하는 단계;를 더 포함할 수 있다.The network learning method for 3D posture estimation according to an embodiment of the present invention may further include outputting the 3D posture estimated in the estimation step together with an input image.
한편, 본 발명의 다른 실시예에 따른, 3차원 자세 추정 시스템은, 3차원 자세 추정을 위한 네트워크를 이용하여, 제1 시점의 입력 영상으로부터 제1 시점의 3차원 자세를 추정하고, 제2 시점의 입력 영상으로부터 제2 시점의 3차원 자세를 추정하는 추정부; 제1 시점의 3차원 자세를 제2 시점으로 회전하는 회전부; 추정부에서 획득된 3차원 자세와 회전부에서 획득된 3차원 자세 간 손실함수를 이용하여, 3차원 자세 추정을 위한 네트워크를 학습시키는 학습부;를 포함한다.On the other hand, the 3D posture estimation system according to another embodiment of the present invention estimates the 3D posture of the first view from the input image of the first view by using a network for 3D posture estimation, and the second view point an estimator for estimating a three-dimensional posture of a second viewpoint from an input image of ; a rotating unit that rotates the three-dimensional posture of the first viewpoint to the second viewpoint; and a learning unit for learning a network for 3D posture estimation using a loss function between the 3D posture obtained from the estimator and the 3D posture obtained from the rotation unit.
한편, 본 발명의 다른 실시예에 따른, 3차원 자세 추정 방법은, 입력 영상을 3차원 자세 추정을 위한 네트워크에 입력하여 3차원 자세를 추정하는 단계; 추정 단계에서 추정된 3차원 자세에 대한 정보를 출력하는 단계;를 포함하고, 3차원 자세 추정을 위한 네트워크는, 3차원 자세 추정을 위한 네트워크를 이용하여 제1 시점의 입력 영상으로부터 제1 시점의 3차원 자세를 추정하고, 3차원 자세 추정을 위한 네트워크를 이용하여 제2 시점의 입력 영상으로부터 제2 시점의 3차원 자세를 추정하며, 제1 시점의 3차원 자세를 제2 시점으로 회전하고, 추정을 통해 획득한 제2 시점의 3차원 자세와 회전을 통해 획득한 제2 시점의 3차원 자세 간 손실함수를 이용하여 학습된다.Meanwhile, according to another embodiment of the present invention, a method for estimating a 3D posture includes: estimating a 3D posture by inputting an input image to a network for estimating a 3D posture; outputting the information on the 3D posture estimated in the estimation step; including, wherein the network for 3D posture estimation uses the network for 3D posture estimation from the input image of the first view point of the first view Estimate the 3D posture, estimate the 3D posture of the second view from the input image of the second view using a network for 3D posture estimation, rotate the 3D posture of the first view to the second view, It is learned using a loss function between the 3D posture of the second viewpoint obtained through estimation and the 3D posture of the second viewpoint obtained through rotation.
한편, 본 발명의 다른 실시예에 따른, 3차원 자세 추정 시스템은, 입력 영상을 3차원 자세 추정을 위한 네트워크에 입력하여 3차원 자세를 추정하는 추정부; 추정부에서 추정된 3차원 자세에 대한 정보를 출력하는 출력부;를 포함하고, 3차원 자세 추정을 위한 네트워크는, 3차원 자세 추정을 위한 네트워크를 이용하여 제1 시점의 입력 영상으로부터 제1 시점의 3차원 자세를 추정하고, 3차원 자세 추정을 위한 네트워크를 이용하여 제2 시점의 입력 영상으로부터 제2 시점의 3차원 자세를 추정하며, 제1 시점의 3차원 자세를 제2 시점으로 회전하고, 추정을 통해 획득한 제2 시점의 3차원 자세와 회전을 통해 획득한 제2 시점의 3차원 자세 간 손실함수를 이용하여 학습된다.Meanwhile, according to another embodiment of the present invention, a three-dimensional posture estimation system includes: an estimator for estimating a three-dimensional posture by inputting an input image to a network for estimating a three-dimensional posture; and an output unit for outputting information on the 3D posture estimated by the estimator, wherein the network for 3D posture estimation uses the network for 3D posture estimation from the input image of the first view point to the first view point. Estimate the 3D posture of the second view from the input image of the second view using the network for 3D posture estimation, and rotate the 3D posture of the first view to the second view, , is learned using a loss function between the 3D posture of the second viewpoint obtained through estimation and the 3D posture of the second viewpoint obtained through rotation.
이상 설명한 바와 같이, 본 발명의 실시예들에 따르면, 레이블 데이터에 의존적이던 3차원 사람 자세 추정 모델을 자가지도 학습방식으로 개선함으로써, 이미지, 2차원 자세, 카메라 파라미터만을 이용하여 3차원 사람 자세를 추정할 수 있게 된다.As described above, according to the embodiments of the present invention, the three-dimensional human posture estimation model, which was dependent on label data, is improved in a self-supervised learning method to obtain a three-dimensional human posture using only images, two-dimensional postures, and camera parameters. can be estimated.
또한, 본 발명의 실시예들에 따르면, 카메라 회전행렬을 사용하여 다른 시각의 3차원 자세를 제공함으로써, 3차원 사람 자세 레이블 데이터의 필요성을 제거할 수 있게 된다.In addition, according to embodiments of the present invention, by using a camera rotation matrix to provide a three-dimensional posture of different views, it is possible to eliminate the need for three-dimensional human posture label data.
그리고, 본 발명의 실시예들에 따르면, 카메라 세팅과 2차원 레이블 데이터가 있는 모든 상황에 적용하여 자세 추정을 할 수 있게 된다.And, according to embodiments of the present invention, it is possible to estimate the posture by applying it to all situations where there are camera settings and two-dimensional label data.
도 1은 자세 추정을 위한 지도학습 딥 러닝 구성도,1 is a configuration diagram of supervised deep learning for posture estimation;
도 2는 자가지도 학습 딥 러닝 구성도,2 is a configuration diagram of self-supervised deep learning;
도 3은 본 발명의 일 실시예에 따른 3차원 사람 자세 추정 시스템을 도시한 도면,3 is a view showing a three-dimensional human posture estimation system according to an embodiment of the present invention;
도 4는 자가지도 학습을 위한 3차원 자세 정보 획득 과정을 나타낸 흐름도,4 is a flowchart showing a three-dimensional posture information acquisition process for self-supervised learning;
도 5는 A 시점의 2차원 입력 영상을 이용하여, A 시점의 3차원 자세와 B 시점의 3차원 자세를 생성하는 과정을 도식적으로 나타낸 도면,5 is a diagram schematically illustrating a process of generating a three-dimensional posture of view A and a three-dimensional posture of view B using a two-dimensional input image of view A;
도 6은 B 시점의 2차원 입력 영상을 이용하여, B 시점의 3차원 자세와 A 시점의 3차원 자세를 생성하는 과정을 도식적으로 나타낸 도면,6 is a diagram schematically illustrating a process of generating a three-dimensional posture of view B and a three-dimensional posture of view A using a two-dimensional input image of view B;
도 7은 자가지도 학습을 위한 2차원 자세 정보 획득 과정을 나타낸 흐름도,7 is a flowchart showing a two-dimensional posture information acquisition process for self-supervised learning;
도 8은 3차원 자세 추정을 위한 네트워크를 학습시키는 방법의 설명에 제공되는 도면이다.8 is a diagram provided to explain a method for learning a network for 3D posture estimation.
이하에서는 도면을 참조하여 본 발명을 보다 상세하게 설명한다.Hereinafter, the present invention will be described in more detail with reference to the drawings.
본 발명의 실시예에서는 다중 시점 이미지를 사용한 자가지도 학습 기반 3차원 사람 자세 추정 방법을 제시한다.In an embodiment of the present invention, a self-supervised learning-based three-dimensional human posture estimation method using a multi-view image is presented.
AR/VR 기반 서비스에서 기기와 사용자 간의 상호소통을 위해서는 기기가 사용자의 현재 자세 및 형상을 인지하는 기술이 필수적인데, 본 발명의 실시예에서는 딥 러닝 기술 방식 중 자가지도 학습방식을 사용하여 이미지, 2차원 자세 정보, 카메라 파라미터만 사용하여 3차원 자세를 복원한다.In the AR/VR-based service, it is essential that the device recognizes the user's current posture and shape for interaction between the device and the user. In the embodiment of the present invention, the image, The 3D posture is restored using only the 2D posture information and camera parameters.
본 발명의 실시예에서는, 구하기 힘든 3차원 사람 자세 레이블 데이터 없이 2차원 사람 자세 레이블과 카메라 파라미터를 사용하는 자가지도 학습 모델을 제시한다.In an embodiment of the present invention, a self-supervised learning model using a two-dimensional human posture label and camera parameters without hard-to-find three-dimensional human posture label data is presented.
도 1은 자세 추정을 위한 지도학습 딥 러닝 모델의 구성도이다. 지도학습은 모델의 결과와 레이블 데이터 사이의 오차를 역전파하여 최적화를 진행하기 때문에 입력과 레이블이 쌍으로 존재해야만 모델 훈련을 시킬 수 있다. 1 is a block diagram of a supervised deep learning model for posture estimation. Because supervised learning optimizes by backpropagating the error between the model result and the label data, the model can be trained only when the input and the label exist in pairs.
본 발명의 실시예에서 활용하는 자가지도 학습 모델은 카메라 파라미터와 2차원 사람 자세 레이블을 가지고 모델 결과와 회전된 결과의 2차원 사영(projection)의 오차를 최적화하기 때문에, 3차원 레이블 데이터 없이 자기 자신의 입력 데이터와 2차원 자세 레이블만 가지고 네트워크 최적화를 할 수 있다. 도 2는 자가지도 학습 딥 러닝 모델의 개념을 도시한 구성도이다. Since the self-supervised learning model utilized in the embodiment of the present invention optimizes the error of the two-dimensional projection of the model result and the rotated result with the camera parameter and the two-dimensional human posture label, it is self-guided without the three-dimensional label data. Network optimization can be done with only the input data of 2 is a block diagram illustrating the concept of a self-supervised deep learning model.
나아가, 본 발명의 실시예에서는 3차원 자세 추정 결과를 서로 다른 카메라 회전행렬로 회전시켜 각 방향의 3차원 자세로 변환 후 자가 손실함수를 계산하여, 딥 러닝 모델을 학습시킨다.Furthermore, in an embodiment of the present invention, a deep learning model is trained by rotating the three-dimensional posture estimation result with different camera rotation matrices, converting them into three-dimensional postures in each direction, and calculating the self-loss function.
도 3은 본 발명의 일 실시예에 따른 3차원 사람 자세 추정 시스템을 도시한 도면이다. 본 발명의 실시예에 따른 3차원 사람 자세 추정 시스템은, 도시된 바와 같이, 입력부(110), 출력부(120), 추정부(130), 학습부(140), 회전부(150) 및 변환부(160)를 포함하여 구성된다.3 is a diagram illustrating a three-dimensional human posture estimation system according to an embodiment of the present invention. 3D human posture estimation system according to an embodiment of the present invention, as shown, the input unit 110, the output unit 120, the estimator 130, the learning unit 140, the rotating unit 150 and the converting unit (160) is included.
입력부(110)는 학습 영상과 추론 영상을 입력받는 수단이다. 학습 영상과 추론 영상으로 2차원 다시점 영상이 입력되며, 학습 모드에서는 2차원 영상의 2차원 자세 정보가 레이블로 함께 입력된다.The input unit 110 is a means for receiving a learning image and an inference image. A two-dimensional multi-view image is input as a learning image and an inference image, and in the learning mode, two-dimensional posture information of the two-dimensional image is input as a label.
추정부(130)는 입력부(110)를 통해 입력되는 2차원 영상을 이용하여 3차원 자세를 추정하는 딥 러닝 네트워크이다. 네트워크는 이미지 특성 검출에 좋은 성능을 보이는 Resnet50을 사용하여 유용한 특성을 추출하고, 이 특성을 Fully Connected Layer를 통하여 3차원 자세를 추정한다.The estimator 130 is a deep learning network that estimates a 3D posture using a 2D image input through the input unit 110 . The network extracts useful features using Resnet50, which shows good performance in image feature detection, and estimates the three-dimensional posture through the Fully Connected Layer.
출력부(120)는 추정부(130)에 의해 추정된 3차원 자세 정보를 입력 영상에 표시하여 출력한다.The output unit 120 displays and outputs the 3D posture information estimated by the estimator 130 on the input image.
회전부(150)는 추정부(130)에서 추정된 3차원 자세를 다른 시점의 자세로 회전한다. 이를 위해, 회전부(150)는 추정된 3차원 자세에 카메라 회전행렬을 적용하여 서로 다른 시점의 3차원 자세로 변환한다.The rotating unit 150 rotates the three-dimensional posture estimated by the estimator 130 to a posture of another viewpoint. To this end, the rotation unit 150 applies the camera rotation matrix to the estimated three-dimensional posture and converts it into a three-dimensional posture of different viewpoints.
변환부(160)는 회전부(150)에 의해 회전된 3차원 자세를 내부 카메라 파라미터를 사용하여 2차원 자세로 변환한다.The conversion unit 160 converts the three-dimensional posture rotated by the rotating unit 150 into a two-dimensional posture using the internal camera parameters.
추정부(130), 회전부(150) 및 변환부(160)에 의한 구체적인 자세 정보 획득 방법에 대해서는, 도 4 내지 도 7을 참조하여 상세히 후술한다.A detailed method of acquiring posture information by the estimator 130 , the rotation unit 150 , and the transform unit 160 will be described later in detail with reference to FIGS. 4 to 7 .
학습부(140)는 추정부(130)에 의해 추정된 3차원 자세, 회전부(150)에 의해 회전된 3차원 자세 그리고 변환부(160)에 의해 변환된 2차원 자세와 입력부(110)를 통해 입력되는 2차원 자세의 레이블을 이용하여, 3차원 자세 추정을 위한 딥 러닝 네트워크를 학습시킨다.The learning unit 140 uses the three-dimensional posture estimated by the estimator 130 , the three-dimensional posture rotated by the rotating unit 150 , and the two-dimensional posture converted by the transform unit 160 and the input unit 110 . A deep learning network for 3D posture estimation is trained using the input 2D posture label.
이를 위해, 학습부(140)는 3차원 자세들 간의 자가손실함수와 2차원 자세의 손실함수를 계산한다. 학습부(140)에 의한 구체적인 학습 방법에 대해서는, 도 8을 참조하여 상세히 후술한다.To this end, the learning unit 140 calculates a self-loss function between the three-dimensional postures and a loss function of the two-dimensional posture. A specific learning method by the learning unit 140 will be described later in detail with reference to FIG. 8 .
도 4는 자가지도 학습을 위한 3차원 자세 정보 획득 과정을 나타낸 흐름도이다.4 is a flowchart illustrating a three-dimensional posture information acquisition process for self-supervised learning.
도시된 바와 같이, 먼저, 입력부(110)를 통해 입력되는 A 시점의 2차원 입력 영상으로부터, 추정부(130)는 3차원 자세 추정을 위한 네트워크를 이용하여 A 시점의 3차원 자세를 추정한다.As shown, first, from the 2D input image of the viewpoint A input through the input unit 110 , the estimator 130 estimates the 3D attitude of the viewpoint A using a network for estimating the 3D attitude.
그리고, 회전부(150)는 추정부(130)에서 추정된 A 시점의 3차원 자세를 B 시점으로 회전하여, B 시점의 3차원 자세로 변환한다.Then, the rotating unit 150 rotates the 3D posture of the viewpoint A estimated by the estimator 130 to the viewpoint B, and converts the 3D posture of the viewpoint B into the 3D posture.
도 5에는 A 시점의 2차원 입력 영상을 이용하여, A 시점의 3차원 자세와 B 시점의 3차원 자세를 생성하는 과정을 도식적으로 나타내었다.FIG. 5 schematically shows a process of generating a 3D posture of view A and a 3D posture of view B using the 2D input image of view A. Referring to FIG.
마찬가지로, 도 4에 도시된 바와 같이, 입력부(110)를 통해 입력되는 B 시점의 2차원 입력 영상으로부터, 추정부(130)는 3차원 자세 추정을 위한 네트워크를 이용하여 B 시점의 3차원 자세를 추정한다.Similarly, as shown in FIG. 4 , from the 2D input image of the viewpoint B input through the input unit 110 , the estimator 130 calculates the 3D attitude of the viewpoint B using a network for 3D attitude estimation. estimate
그리고, 회전부(150)는 추정부(130)에서 추정된 B 시점의 3차원 자세를 A 시점으로 회전하여, A 시점의 3차원 자세로 변환한다.Then, the rotation unit 150 rotates the 3D posture of the viewpoint B estimated by the estimator 130 to the viewpoint A, and converts the 3D posture of the viewpoint A into the 3D posture.
도 6에는 B 시점의 2차원 입력 영상을 이용하여, B 시점의 3차원 자세와 A 시점의 3차원 자세를 생성하는 과정을 도식적으로 나타내었다.6 schematically shows a process of generating the 3D posture of view B and the 3D posture of view A using the 2D input image of view B.
도 7은 자가지도 학습을 위한 2차원 자세 정보 획득 과정을 나타낸 흐름도이다.7 is a flowchart illustrating a two-dimensional posture information acquisition process for self-supervised learning.
도시된 바와 같이, 먼저, 변환부(160)는 카메라 파라미터를 사용하여, 회전부(150)에 의해 A 시점으로부터 B 시점으로 회전된 B 시점의 3차원 자세를 B 시점의 2차원 자세로 변환한다.As shown, first, the converting unit 160 converts the three-dimensional attitude of the B viewpoint rotated from the A viewpoint to the B viewpoint by the rotation unit 150 into a two-dimensional attitude of the B viewpoint by using the camera parameters.
마찬가지로, 변환부(160)는 카메라 파라미터를 사용하여, 회전부(150)에 의해 B 시점으로부터 A 시점으로 회전된 A 시점의 3차원 자세를 A 시점의 2차원 자세로 변환한다.Similarly, the conversion unit 160 converts the three-dimensional posture of the viewpoint A rotated from the viewpoint B to the viewpoint A by the rotation unit 150 into the two-dimensional attitude of the viewpoint A by using the camera parameters.
도 8은, 도 3의 학습부가 도 4 및 도 7을 통해 생성된 자세 정보들을 이용하여, 3차원 자세 추정을 위한 네트워크를 학습시키는 방법의 설명에 제공되는 도면이다.FIG. 8 is a diagram provided to explain a method for the learning unit of FIG. 3 to learn a network for 3D posture estimation by using the posture information generated through FIGS. 4 and 7 .
도시된 바와 같이, 먼저 학습부(140)는 아래의 두 3차원 자세 간 자가 손실함수를 이용하여, 3차원 자세 추정을 위한 네트워크를 학습시킨다.As shown, first, the learning unit 140 learns a network for 3D posture estimation by using the self-loss function between the following two 3D postures.
1) A 시점의 2차원 자세(210)로부터 추정된 A 시점의 3차원 자세(310)를 회전하여 획득한 B 시점의 3차원 자세(312)1) The three-dimensional posture 312 of view B obtained by rotating the three-dimensional posture 310 of view A estimated from the two-dimensional posture 210 of view A
2) B 시점의 2차원 자세(220)로부터 추정된 B 시점의 3차원 자세(320)2) The three-dimensional posture 320 of the viewpoint B estimated from the two-dimensional attitude 220 of the viewpoint B
또한, 학습부(140)는 아래의 두 3차원 자세 간 자가 손실함수를 이용하여, 3차원 자세 추정을 위한 네트워크를 학습시킨다.In addition, the learning unit 140 learns a network for estimating the 3D posture by using the self-loss function between the following two 3D postures.
1) A 시점의 2차원 자세(210)로부터 추정된 A 시점의 3차원 자세(310)1) The three-dimensional posture 310 of the viewpoint A estimated from the two-dimensional posture 210 of the viewpoint A
2) B 시점의 2차원 자세(220)로부터 추정된 B 시점의 3차원 자세(320)를 회전하여 획득한 A 시점의 3차원 자세(321)2) The three-dimensional posture 321 of view A obtained by rotating the three-dimensional posture 320 of view B estimated from the two-dimensional posture 220 of view B
아울러, 학습부(140)는 아래의 두 2차원 자세 간 손실함수를 이용하여, 3차원 자세 추정을 위한 네트워크를 더 학습시킨다.In addition, the learning unit 140 further learns the network for estimating the three-dimensional posture by using the following loss function between the two two-dimensional postures.
1) A 시점의 2차원 자세(210)로부터 추정된 A 시점의 3차원 자세(310)를 회전하여 획득한 B 시점의 3차원 자세(312)를 변환하여 획득한 B 시점의 2차원 자세(212)1) The two-dimensional posture 212 of view B obtained by converting the three-dimensional posture 312 of view B obtained by rotating the three-dimensional posture 310 of view A estimated from the two-dimensional posture 210 of view A )
2) B 시점의 2차원 자세(220)의 레이블(220')2) the label 220' of the two-dimensional posture 220 at the point B
또한, 학습부(140)는 아래의 두 2차원 자세 간 손실함수를 이용하여, 3차원 자세 추정을 위한 네트워크를 더 학습시킨다.In addition, the learning unit 140 further trains the network for estimating the 3D posture by using the following loss function between the two 2D postures.
1) B 시점의 2차원 자세(220)로부터 추정된 B 시점의 3차원 자세(320)를 회전하여 획득한 A 시점의 3차원 자세(321)를 변환하여 획득한 A시점의 2차원 자세(221)1) The two-dimensional posture 221 of the point A obtained by converting the three-dimensional posture 321 of the viewpoint A obtained by rotating the three-dimensional attitude 320 of the viewpoint B estimated from the two-dimensional attitude 220 of the viewpoint B )
2) A시점의 2차원 자세(210)의 레이블(210')2) The label 210' of the two-dimensional posture 210 at the point A
지금까지, 다중 시점 이미지를 사용한 자가지도 학습 기반 3차원 사람 자세 추정 방법에 대해 바람직한 실시예를 들어 상세히 설명하였다.So far, a preferred embodiment of the self-supervised learning-based three-dimensional human posture estimation method using multi-view images has been described in detail.
위 실시예에서는, 3차원 레이블 데이터에 의존적이던 지도학습 3차원 사람 자세 추정 방식을 레이블 데이터로부터 비교적 자유로운 자가지도학습 방식을 사용한 범용적인 자세 추정 모델을 제시하였다.In the above example, a general-purpose posture estimation model using a self-supervised learning method that is relatively free from label data was presented in a supervised 3D human posture estimation method that was dependent on 3D label data.
자가지도 학습방식에 의해, 레이블 데이터에 의존적이던 3차원 사람 자세 추정 모델을 이미지, 2차원 자세, 카메라 파라미터를 가지고 추정할 수 있게 되며, 카메라 회전행렬을 사용하여 다른 시각의 3차원 자세를 제공함으로써 3차원 사람 자세 레이블 데이터의 필요성을 제거할 수 있게 된다.With the self-supervised learning method, it is possible to estimate a three-dimensional human posture estimation model, which was dependent on label data, with images, two-dimensional posture, and camera parameters. It becomes possible to eliminate the need for three-dimensional human posture label data.
한편, 본 실시예에 따른 장치와 방법의 기능을 수행하게 하는 컴퓨터 프로그램을 수록한 컴퓨터로 읽을 수 있는 기록매체에도 본 발명의 기술적 사상이 적용될 수 있음은 물론이다. 또한, 본 발명의 다양한 실시예에 따른 기술적 사상은 컴퓨터로 읽을 수 있는 기록매체에 기록된 컴퓨터로 읽을 수 있는 코드 형태로 구현될 수도 있다. 컴퓨터로 읽을 수 있는 기록매체는 컴퓨터에 의해 읽을 수 있고 데이터를 저장할 수 있는 어떤 데이터 저장 장치이더라도 가능하다. 예를 들어, 컴퓨터로 읽을 수 있는 기록매체는 ROM, RAM, CD-ROM, 자기 테이프, 플로피 디스크, 광디스크, 하드 디스크 드라이브, 등이 될 수 있음은 물론이다. 또한, 컴퓨터로 읽을 수 있는 기록매체에 저장된 컴퓨터로 읽을 수 있는 코드 또는 프로그램은 컴퓨터간에 연결된 네트워크를 통해 전송될 수도 있다.On the other hand, it goes without saying that the technical idea of the present invention can be applied to a computer-readable recording medium containing a computer program for performing the functions of the apparatus and method according to the present embodiment. In addition, the technical ideas according to various embodiments of the present invention may be implemented in the form of computer-readable codes recorded on a computer-readable recording medium. The computer-readable recording medium may be any data storage device readable by the computer and capable of storing data. For example, the computer-readable recording medium may be a ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical disk, hard disk drive, or the like. In addition, the computer-readable code or program stored in the computer-readable recording medium may be transmitted through a network connected between computers.
또한, 이상에서는 본 발명의 바람직한 실시예에 대하여 도시하고 설명하였지만, 본 발명은 상술한 특정의 실시예에 한정되지 아니하며, 청구범위에서 청구하는 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 기술분야에서 통상의 지식을 가진자에 의해 다양한 변형실시가 가능한 것은 물론이고, 이러한 변형실시들은 본 발명의 기술적 사상이나 전망으로부터 개별적으로 이해되어져서는 안될 것이다.In addition, although preferred embodiments of the present invention have been illustrated and described above, the present invention is not limited to the specific embodiments described above, and the technical field to which the present invention belongs without departing from the gist of the present invention as claimed in the claims In addition, various modifications are possible by those of ordinary skill in the art, and these modifications should not be individually understood from the technical spirit or perspective of the present invention.

Claims (12)

  1. 3차원 자세 추정을 위한 네트워크를 이용하여, 제1 시점의 입력 영상으로부터 제1 시점의 3차원 자세를 추정하는 제1 추정단계;a first estimation step of estimating a 3D posture of a first view from an input image of a first view using a network for 3D posture estimation;
    3차원 자세 추정을 위한 네트워크를 이용하여, 제2 시점의 입력 영상으로부터 제2 시점의 3차원 자세를 추정하는 제2 추정단계;a second estimation step of estimating a 3D posture of a second viewpoint from an input image of a second viewpoint using a network for 3D posture estimation;
    제1 시점의 3차원 자세를 제2 시점으로 회전하는 제1 회전단계;a first rotation step of rotating the three-dimensional posture of the first viewpoint to the second viewpoint;
    제2 추정단계에서 획득된 3차원 자세와 제1 회전단계에서 획득된 3차원 자세 간 손실함수를 이용하여, 3차원 자세 추정을 위한 네트워크를 학습시키는 단계;를 포함하는 것을 특징으로 하는 3차원 자세 추정을 위한 네트워크 학습 방법.Using the loss function between the 3D posture obtained in the second estimation step and the 3D posture obtained in the first rotation step, learning a network for 3D posture estimation; 3D posture comprising: A network training method for estimation.
  2. 청구항 2에 있어서,3. The method according to claim 2,
    제2 시점의 3차원 자세를 제1 시점으로 회전하는 제2 회전단계; 및a second rotation step of rotating the three-dimensional posture of the second viewpoint to the first viewpoint; and
    제1 추정단계에서 획득된 3차원 자세와 제2 회전단계에서 획득된 3차원 자세 간 손실함수를 이용하여, 3차원 자세 추정을 위한 네트워크를 학습시키는 단계;를 더 포함하는 것을 특징으로 하는 3차원 자세 추정을 위한 네트워크 학습 방법.Using the loss function between the 3D posture obtained in the first estimation step and the 3D posture obtained in the second rotation step, learning a network for 3D posture estimation; A network training method for posture estimation.
  3. 청구항 2에 있어서,3. The method according to claim 2,
    제1 회전단계는,The first rotation step is
    카메라의 제1 시점을 제2 시점으로 변환하는 카메라 회전행렬을 적용하여, 제1 시점의 3차원 자세를 제2 시점으로 회전하고,By applying a camera rotation matrix that converts the first view point of the camera to the second view point, the three-dimensional posture of the first view point is rotated to the second view point,
    제2 회전단계는,The second rotation step is
    카메라의 제2 시점을 제1 시점으로 변환하는 카메라 회전행렬을 적용하여, 제2 시점의 3차원 자세를 제1 시점으로 회전하는 것을 특징으로 하는 3차원 자세 추정을 위한 네트워크 학습 방법.A network learning method for 3D posture estimation, characterized in that the 3D posture of the second view is rotated to the first view by applying a camera rotation matrix that converts the second view point of the camera into the first view point.
  4. 청구항 1에 있어서,The method according to claim 1,
    제1 회전단계에서 획득된 3차원 자세를 제2 시점의 2차원 자세로 변환하는 제1 변환단계;a first conversion step of converting the three-dimensional posture obtained in the first rotation step into a two-dimensional posture of a second viewpoint;
    제1 변환단계에서 획득된 제2 시점의 2차원 자세와 제2 시점의 입력 영상의 2차원 자세 레이블 간 손실함수를 이용하여, 3차원 자세 추정을 위한 네트워크를 학습시키는 단계;를 더 포함하는 것을 특징으로 하는 3차원 자세 추정을 위한 네트워크 학습 방법.Using the loss function between the two-dimensional attitude of the second viewpoint obtained in the first transformation step and the two-dimensional attitude label of the input image of the second viewpoint, the step of learning a network for estimating the three-dimensional attitude; A network learning method for three-dimensional posture estimation characterized by.
  5. 청구항 4에 있어서,5. The method according to claim 4,
    제2 회전단계에서 획득된 3차원 자세를 제1 시점의 2차원 자세로 변환하는 제2 변환단계;a second transformation step of converting the three-dimensional posture obtained in the second rotation step into the two-dimensional posture of the first viewpoint;
    제2 변환단계에서 획득된 제1 시점의 2차원 자세와 제1 시점의 입력 영상의 2차원 자세 레이블 간 손실함수를 이용하여, 3차원 자세 추정을 위한 네트워크를 학습시키는 단계;를 더 포함하는 것을 특징으로 하는 3차원 자세 추정을 위한 네트워크 학습 방법.Using a loss function between the two-dimensional attitude of the first viewpoint obtained in the second transformation step and the two-dimensional attitude label of the input image of the first viewpoint to train a network for 3-dimensional attitude estimation; further comprising A network learning method for three-dimensional posture estimation characterized by
  6. 청구항 5에 있어서,6. The method of claim 5,
    제1 변환단계는,The first conversion step is
    카메라 파라미터를 사용하여, 제1 회전단계에서 획득된 3차원 자세를 제2 시점의 2차원 자세로 변환하고,Using the camera parameters, the three-dimensional posture obtained in the first rotation step is converted into the two-dimensional posture of the second viewpoint,
    제2 변환단계는,The second transformation step is
    카메라 파라미터를 사용하여, 제2 회전단계에서 획득된 3차원 자세를 제1 시점의 2차원 자세로 변환하는 것을 특징으로 하는 3차원 자세 추정을 위한 네트워크 학습 방법.A network learning method for three-dimensional posture estimation, characterized in that by using camera parameters, the three-dimensional posture obtained in the second rotation step is converted into the two-dimensional posture of the first viewpoint.
  7. 청구항 1에 있어서,The method according to claim 1,
    입력 영상은,input video,
    2차원 입력 영상인 것을 특징으로 하는 3차원 자세 추정을 위한 네트워크 학습 방법.A network learning method for three-dimensional posture estimation, characterized in that it is a two-dimensional input image.
  8. 청구항 1에 있어서,The method according to claim 1,
    입력 영상을 학습된 3차원 자세 추정을 위한 네트워크에 입력하여 3차원 자세를 추정하는 단계;를 더 포함하는 것을 특징으로 하는 3차원 자세 추정을 위한 네트워크 학습 방법.Estimating the 3D posture by inputting the input image into a network for estimating the learned 3D posture; The network learning method for 3D posture estimation further comprising:
  9. 청구항 8에 있어서,9. The method of claim 8,
    추정 단계에서 추정된 3차원 자세를 입력 영상과 함께 출력하는 단계;를 더 포함하는 것을 특징으로 하는 3차원 자세 추정을 위한 네트워크 학습 방법.A network learning method for estimating a three-dimensional posture, comprising: outputting the three-dimensional posture estimated in the estimation step together with an input image.
  10. 3차원 자세 추정을 위한 네트워크를 이용하여, 제1 시점의 입력 영상으로부터 제1 시점의 3차원 자세를 추정하고, 제2 시점의 입력 영상으로부터 제2 시점의 3차원 자세를 추정하는 추정부;an estimator for estimating the 3D posture of the first view from the input image of the first view by using a network for 3D posture estimation, and estimating the 3D posture of the second view from the input image of the second view;
    제1 시점의 3차원 자세를 제2 시점으로 회전하는 회전부;a rotating unit that rotates the three-dimensional posture of the first viewpoint to the second viewpoint;
    추정부에서 획득된 3차원 자세와 회전부에서 획득된 3차원 자세 간 손실함수를 이용하여, 3차원 자세 추정을 위한 네트워크를 학습시키는 학습부;를 포함하는 것을 특징으로 하는 3차원 자세 추정 시스템.3D posture estimation system comprising: a learning unit for learning a network for 3D posture estimation by using a loss function between the 3D posture obtained from the estimator and the 3D posture obtained from the rotation unit.
  11. 입력 영상을 3차원 자세 추정을 위한 네트워크에 입력하여 3차원 자세를 추정하는 단계;estimating a three-dimensional posture by inputting an input image into a network for estimating a three-dimensional posture;
    추정 단계에서 추정된 3차원 자세에 대한 정보를 출력하는 단계;를 포함하고,Including; outputting information about the three-dimensional posture estimated in the estimation step;
    3차원 자세 추정을 위한 네트워크는,The network for 3D posture estimation is,
    3차원 자세 추정을 위한 네트워크를 이용하여 제1 시점의 입력 영상으로부터 제1 시점의 3차원 자세를 추정하고,Estimating the 3D posture of the first view from the input image of the first view using the network for 3D posture estimation,
    3차원 자세 추정을 위한 네트워크를 이용하여 제2 시점의 입력 영상으로부터 제2 시점의 3차원 자세를 추정하며,Estimate the 3D posture of the second view from the input image of the second view using the network for 3D posture estimation,
    제1 시점의 3차원 자세를 제2 시점으로 회전하고,Rotating the three-dimensional posture of the first viewpoint to the second viewpoint,
    추정을 통해 획득한 제2 시점의 3차원 자세와 회전을 통해 획득한 제2 시점의 3차원 자세 간 손실함수를 이용하여 학습된 것을 특징으로 하는 3차원 자세 추정 방법.A three-dimensional posture estimation method, characterized in that it is learned using a loss function between the three-dimensional posture of the second view obtained through estimation and the three-dimensional posture of the second view obtained through rotation.
  12. 입력 영상을 3차원 자세 추정을 위한 네트워크에 입력하여 3차원 자세를 추정하는 추정부;an estimator for estimating a three-dimensional posture by inputting an input image into a network for estimating a three-dimensional posture;
    추정부에서 추정된 3차원 자세에 대한 정보를 출력하는 출력부;를 포함하고,Including; an output unit for outputting information about the three-dimensional posture estimated by the estimator;
    3차원 자세 추정을 위한 네트워크는,The network for 3D posture estimation is,
    3차원 자세 추정을 위한 네트워크를 이용하여 제1 시점의 입력 영상으로부터 제1 시점의 3차원 자세를 추정하고,Estimating the 3D posture of the first view from the input image of the first view using the network for 3D posture estimation,
    3차원 자세 추정을 위한 네트워크를 이용하여 제2 시점의 입력 영상으로부터 제2 시점의 3차원 자세를 추정하며,Estimate the 3D posture of the second view from the input image of the second view using the network for 3D posture estimation,
    제1 시점의 3차원 자세를 제2 시점으로 회전하고,Rotating the three-dimensional posture of the first viewpoint to the second viewpoint,
    추정을 통해 획득한 제2 시점의 3차원 자세와 회전을 통해 획득한 제2 시점의 3차원 자세 간 손실함수를 이용하여 학습된 것을 특징으로 하는 3차원 자세 추정 시스템.A three-dimensional posture estimation system, characterized in that it is learned using a loss function between the three-dimensional posture of the second view obtained through estimation and the three-dimensional posture of the second view obtained through rotation.
PCT/KR2020/018365 2020-12-15 2020-12-15 Self-supervised learning-based three-dimensional human posture estimation method using multi-view images WO2022131390A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR10-2020-0175602 2020-12-15
KR1020200175602A KR20220085491A (en) 2020-12-15 2020-12-15 Self-supervised learning based 3D human posture estimation method using multi-view images

Publications (1)

Publication Number Publication Date
WO2022131390A1 true WO2022131390A1 (en) 2022-06-23

Family

ID=82057648

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2020/018365 WO2022131390A1 (en) 2020-12-15 2020-12-15 Self-supervised learning-based three-dimensional human posture estimation method using multi-view images

Country Status (2)

Country Link
KR (1) KR20220085491A (en)
WO (1) WO2022131390A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2019029021A (en) * 2017-07-30 2019-02-21 国立大学法人 奈良先端科学技術大学院大学 Learning data set preparing method, as well as object recognition and position attitude estimation method
KR20190088379A (en) * 2018-01-18 2019-07-26 삼성전자주식회사 Pose estimating method, method of displaying virtual object using estimated pose and apparatuses performing the same
JP2019133658A (en) * 2018-01-31 2019-08-08 株式会社リコー Positioning method, positioning device and readable storage medium
US20200084427A1 (en) * 2018-09-12 2020-03-12 Nvidia Corporation Scene flow estimation using shared features

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2019029021A (en) * 2017-07-30 2019-02-21 国立大学法人 奈良先端科学技術大学院大学 Learning data set preparing method, as well as object recognition and position attitude estimation method
KR20190088379A (en) * 2018-01-18 2019-07-26 삼성전자주식회사 Pose estimating method, method of displaying virtual object using estimated pose and apparatuses performing the same
JP2019133658A (en) * 2018-01-31 2019-08-08 株式会社リコー Positioning method, positioning device and readable storage medium
US20200084427A1 (en) * 2018-09-12 2020-03-12 Nvidia Corporation Scene flow estimation using shared features

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
CHANG, INHO; PARK, MIN-GYU; KIM, JAEWOO; YOON, JU HONG: "Self-Supervised 3D Human Pose Estimation Using Multi-View and Camera Parameter", PROCEEDINGS OF THE INSTITUTE OF ELECTRONICS AND INFORMATION ENGINEERS (IEIE) AUTUMN CONFERENCE 2020, 1 November 2020 (2020-11-01), Korea, pages 421 - 423, XP009537606 *

Also Published As

Publication number Publication date
KR20220085491A (en) 2022-06-22

Similar Documents

Publication Publication Date Title
WO2021206284A1 (en) Depth estimation method and system using cycle gan and segmentation
WO2018190504A1 (en) Face pose correction apparatus and method
JP2021513175A (en) Data processing methods and devices, electronic devices and storage media
WO2023068795A1 (en) Device and method for creating metaverse using image analysis
WO2021261687A1 (en) Device and method for reconstructing three-dimensional human posture and shape model on basis of image
WO2023080266A1 (en) Face converting method and apparatus using deep learning network
WO2019190076A1 (en) Eye tracking method and terminal for performing same
CN107133611A (en) A kind of classroom student nod rate identification with statistical method and device
WO2015008932A1 (en) Digilog space creator for remote co-work in augmented reality and digilog space creation method using same
WO2022131390A1 (en) Self-supervised learning-based three-dimensional human posture estimation method using multi-view images
WO2022154523A1 (en) Method and device for matching three-dimensional oral scan data via deep-learning based 3d feature detection
WO2019117393A1 (en) Learning apparatus and method for depth information generation, depth information generation apparatus and method, and recording medium related thereto
WO2014010820A1 (en) Method and apparatus for estimating image motion using disparity information of a multi-view image
WO2021256640A1 (en) Device and method for reconstructing human posture and shape model on basis of multi-view image by using information on relative distance between joints
WO2023277448A1 (en) Artificial neural network model training method and system for image processing
WO2020130211A1 (en) Joint model registration device and method
WO2019103193A1 (en) System and method for acquiring 360 vr image in game using distributed virtual camera
WO2018199724A1 (en) Virtual reality system enabling bi-directional communication
WO2011040653A1 (en) Photography apparatus and method for providing a 3d object
WO2021118047A1 (en) Method and apparatus for evaluating accident fault in accident image by using deep learning
WO2023224169A1 (en) Three-dimensional skeleton estimation system and three-dimensional skeleton estimation method
WO2023120743A1 (en) Context-specific training method for point cloud-based three-dimensional object recognition model
WO2020116674A1 (en) Intelligent method and device for enhancing visual information for mobility-impaired person
WO2024117356A1 (en) Device and method for 3d reconstruction of human object based on monocular color image in real time
WO2023243785A1 (en) Virtual performance generation method and virtual performance generation system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20966044

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20966044

Country of ref document: EP

Kind code of ref document: A1