WO2021256640A1 - Dispositif et procédé permettant de reconstruire un modèle de forme et de posture humaine sur la base d'une image multivue en utilisant des informations concernant une distance relative entre des articulations - Google Patents

Dispositif et procédé permettant de reconstruire un modèle de forme et de posture humaine sur la base d'une image multivue en utilisant des informations concernant une distance relative entre des articulations Download PDF

Info

Publication number
WO2021256640A1
WO2021256640A1 PCT/KR2020/017833 KR2020017833W WO2021256640A1 WO 2021256640 A1 WO2021256640 A1 WO 2021256640A1 KR 2020017833 W KR2020017833 W KR 2020017833W WO 2021256640 A1 WO2021256640 A1 WO 2021256640A1
Authority
WO
WIPO (PCT)
Prior art keywords
person
information
encoded
dimensional
posture
Prior art date
Application number
PCT/KR2020/017833
Other languages
English (en)
Korean (ko)
Inventor
윤주홍
박민규
김제우
Original Assignee
한국전자기술연구원
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 한국전자기술연구원 filed Critical 한국전자기술연구원
Publication of WO2021256640A1 publication Critical patent/WO2021256640A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/20Finite element generation, e.g. wire-frame surface description, tesselation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/97Determining parameters from multiple pictures

Definitions

  • the present invention relates to AR (Augmented Reality)/VR (Virtual Reality) technology, and more particularly, to an apparatus and method for reconstructing a human posture and shape model based on a multi-viewpoint image in an AR or VR service.
  • AR Augmented Reality
  • VR Virtual Reality
  • the human posture restoration device refers to a device that restores the three-dimensional position of each joint using an active or passive sensor.
  • An active sensor-based posture restoration device is a device that directly acquires information such as position, speed, and angle from sensors attached to the user's body, and a passive sensor-based device attaches a marker to the user's body and It is a device that restores the location information of the marker using an installed camera. 1 illustrates marker-based human posture estimation.
  • the markerless method has the advantage of being easy to use, but the following problems occur in an environment where several people are mixed or interact with each other in a complex way.
  • the present invention has been devised to solve the above problems, and an object of the present invention is to accurately restore two-dimensional/three-dimensional postures and shapes of several people from images of people obtained at various points of view, and An object of the present invention is to provide an apparatus and method for restoring a shape model.
  • a method for reconstructing a human model comprising: extracting encoded two-dimensional posture information of a person from multi-view images; converting the extracted two-dimensional posture information of a person into three-dimensional posture information of the encoded person, respectively; fusing the converted 3D posture information of the encoded person into one encoded 3D posture information of the person; and simultaneously restoring the 3D pose and shape of the person from the fused encoded 3D pose information of the person.
  • the encoded 2D posture information of the person may be 2D distance information between joints, and the encoded 3D posture information of the person may be 3D distance information between joints.
  • the extracting may include: detecting a position of a person in each of the multi-view images; It may include; obtaining two-dimensional distance information between joints from the detected image of the person.
  • the two-dimensional distance information between joints is expressed as an EDM (Euclidean Distance Matrix) of N ⁇ N, where N may be the number of joints.
  • the obtaining step may include: extracting human posture characteristic information from the detected image information of the person; and inputting the extracted person's posture characteristic information into an artificial intelligence model to obtain two-dimensional distance information between joints.
  • the encoded two-dimensional distance information between the joints may be directly extracted without extracting the human joints from the multi-viewpoint image.
  • the standard of the inter-joint 3D distance information may be the same as the standard of the inter-joint 2D distance information.
  • the fusion step includes: measuring reliability values for inter-joint 3D distance information at each viewpoint; Based on the measured reliability, the step of fusing three-dimensional distance information between joints into one; may include.
  • the restoration step may include: obtaining a human shape model parameter from the three-dimensional distance information between joints; It may include; by performing rendering using the obtained shape model parameters of the person, restoring the 3D posture and shape of the person at the same time.
  • an apparatus for reconstructing a human model includes an encoding unit for extracting two-dimensional posture information of a person encoded from multi-view images; a conversion unit that converts the extracted 2D posture information of the person into 3D posture information of the encoded person, respectively; a fusion unit that fuses the converted 3D posture information of the encoded person into one encoded 3D posture information of the person; and a restoration unit that simultaneously restores a person's three-dimensional posture and shape from the fused encoded three-dimensional posture information of the person.
  • a computer-readable recording medium may include: extracting each encoded two-dimensional posture information of a person from multi-view images; converting the extracted two-dimensional posture information of a person into three-dimensional posture information of the encoded person, respectively; fusing the converted 3D posture information of the encoded person into one encoded 3D posture information of the person; and simultaneously reconstructing a person's three-dimensional posture and shape from the fused encoded three-dimensional posture information of the person; a program capable of performing a method of restoring a person model comprising the following is recorded.
  • a method for restoring a human model includes extracting encoded two-dimensional pose information of a person from multi-view images; and simultaneously reconstructing the 3D pose and shape of the person from the encoded 3D pose information of the person generated by converting the extracted 2D pose information of the person.
  • the three-dimensional posture EDM can be expressed in a way that is invariant to viewpoint changes, and the input and output have the same dimension. Therefore, efficient deep learning learning and estimation are possible.
  • the human shape model can be restored by predicting the human shape model parameters from the three-dimensional posture EDM fused from multiple viewpoints. Since EDM is information on the relative distance between joints, the joint shape ( shape) and relative angle information between joints can be predicted.
  • 3 is an example of human posture restoration based on a multi-viewpoint image.
  • FIG. 4 is a block diagram of a human posture and shape model restoration apparatus according to an embodiment of the present invention.
  • FIG. 5 is an algorithm configuration diagram of the encoding unit shown in FIG. 4;
  • FIG. 6 is a diagram illustrating two-dimensional distance information between joints by EDM formula and EDM;
  • FIG. 7 is an algorithm block diagram of the conversion unit shown in FIG. 4;
  • FIG. 8 is an algorithm configuration diagram of the fusion unit shown in FIG. 4, and
  • FIG. 9 is an algorithm configuration diagram of the restoration unit shown in FIG. 4 .
  • the apparatus for restoring a person's posture and shape model according to an embodiment of the present invention is an apparatus for reconstructing three-dimensional posture information and shape information of several people based on a multi-viewpoint image (images from multiple viewpoints).
  • the apparatus for restoring a human posture and shape model restores a 2D/3D human posture and shape model through encoding based on information on relative distance between joints.
  • the apparatus for restoring a human posture and shape model is an encoding unit 110 , a transformation unit 120 , a fusion unit 130 , and restoration. It is configured to include a unit 140 .
  • the encoding unit 110 extracts encoded 2D posture information for each of the multi-view images.
  • the two-dimensional posture information refers to two-dimensional relative distance information between joints of each person appearing in the image.
  • the conversion unit 120 converts the encoded multi-viewpoint two-dimensional posture information extracted by the encoding unit 110 into the encoded multi-viewpoint three-dimensional posture information.
  • the dimensional transformation of the relative distance information by the transforming unit 120 is performed for each multi-viewpoint image, and the 2D relative distance information between joints is converted into 3D relative distance information between joints.
  • the fusion unit 130 fuses the encoded multi-viewpoint three-dimensional posture information into one piece of three-dimensional posture information.
  • the restoration unit 140 simultaneously restores/predicts the 3D shape information of the person and the position information of the joints from the 3D posture information fused into one by the fusion unit 130 .
  • FIG. 5 is an algorithm configuration diagram of the encoding unit 110 shown in FIG. 4 .
  • the encoding unit 110 first detects the position of a person through an image-based person detector with respect to an input multi-viewpoint image (four viewpoints are exemplified in FIG. 5 ).
  • the encoding unit 110 extracts the person's posture characteristic information from the detected image information of the person.
  • the encoding unit 110 inputs the extracted human posture characteristic information to a two-dimensional posture encoding network, and encodes the human image information into two-dimensional relative distance information between joints.
  • the two-dimensional posture encoding network is an artificial intelligence model trained to encode human posture characteristic information into two-dimensional relative distance information between joints.
  • the encoded two-dimensional relative distance information between joints is expressed as an N ⁇ N matrix. It is expressed in EDM (Euclidean Distance Matrix) form.
  • the input of the two-dimensional posture encoding network is a multi-viewpoint image, and the output of the two-dimensional posture encoding network becomes two-dimensional posture EDMs for each of the multi-view images.
  • FIG. 6 shows the EDM formula
  • the lower part of FIG. 6 exemplifies the EDM showing the two-dimensional relative distance information between joints when the number of joints is three.
  • the EDM information on the relative distance between each joint and other joints is recorded.
  • the encoding unit 110 directly estimates EDM, which is encoded two-dimensional relative distance information between joints, rather than extracting/estimating joints first.
  • connection information between the joints is directly acquired, there is no need for post-processing to connect the estimated joints.
  • FIG. 7 is an algorithm configuration diagram of the conversion unit 120 shown in FIG. 4 .
  • the transform unit 120 converts the two-dimensional relative distance information between joints obtained through the encoding unit 110 into three-dimensional relative distance information between joints by using a two-dimensional three-dimensional transformation network. convert
  • the 2D-3D transformation network is an artificial intelligence model trained to convert 2D relative distance information between joints into 3D relative distance information between joints.
  • the 3D relative distance information between joints is also expressed as an N ⁇ N matrix in the same way as the 2D relative distance information between joints.
  • two-dimensional posture information ie, two-dimensional posture EDM
  • three-dimensional posture EDM are expressed with the same amount of information
  • conversion from two-dimensional to three-dimensional information is efficient.
  • two-dimensional posture information when expressed by N joints, two-dimensional posture information is expressed in N-by-2 dimensions and three-dimensional posture information is expressed in N-by-3 dimensions. The problem of estimating large information from small information arises.
  • the input of the 2D-3D transformation network is encoded 2D posture EDMs for each multi-viewpoint image, and the output of the 2D-3D transformation network becomes encoded 3D posture EDMs for each multiview image.
  • FIG. 8 is an algorithm configuration diagram of the fusion unit 130 shown in FIG. 4 .
  • the fusion unit 130 fuses the encoded multi-view 3D posture information obtained through the transform unit 120 into one encoded 3D posture information.
  • the fusion unit 130 includes a reliability measurement network and a convergence network.
  • the reliability measurement network is an artificial intelligence model trained to measure reliability for each viewpoint by receiving multi-viewpoint two-dimensional posture EDMs. Because the reliability of the EDM is different at each time point due to occlusion, etc., the reliability of the two-dimensional posture EDM was measured. The reliability measurement network is trained to measure high reliability when there is little occlusion and to measure low reliability when there is a lot of occlusion.
  • the convergence network is an artificial intelligence model trained to converge multi-view 3D posture EDMs into one by using the reliability measured in the reliability measurement network as weights. To this end, the reliability measured in the reliability measurement network in addition to the multi-view 3D posture EDMs is input to the convergence network.
  • FIG. 9 is an algorithm configuration diagram of the restoration unit 140 shown in FIG. 4 .
  • 3D shape information of a person expressed as a point cloud or a mesh model can be restored by transforming a pre-generated 3D skeletal model into a desired posture.
  • the restoration unit 140 obtains human shape model parameters (3D posture, shape, and relative angle between joints) from the three-dimensional posture EDM fused into one by the fusion unit 130 by using the shape model parameter prediction network.
  • the shape model parameter prediction network is an artificial intelligence model trained to obtain shape model parameters from the three-dimensional posture EDM.
  • the restoration unit 140 restores the 3D shape information and the joint position of a person at the same time by performing rendering using the extracted shape model parameters.
  • the restoration unit 140 learns the shape information conversion part in an end-to-end method based on deep learning technology to increase efficiency.
  • the apparatus for restoring a human posture and shape model when a multi-viewpoint image is input as an input, it is possible to simultaneously acquire 2D/3D/shape posture information in one framework.
  • encoded 2D postures are generated from the multi-viewpoint image, and the encoded 2D postures are encoded in 3D.
  • An apparatus and method for converting postures and restoring three-dimensional shape information and joint position information from encoded three-dimensional postures were presented.
  • the entire posture can be restored using the relative distance information from adjacent joints, and encoding
  • a three-dimensional posture can be expressed in a way that is invariant to viewpoint changes, and since the input and output have the same dimension, efficient deep learning learning and estimation are possible.
  • EDM is relative distance information
  • the two-dimensional posture EDM which is EDM information calculated from the two-dimensional posture
  • the viewpoint varies depending on the viewpoint, but their three-dimensional posture EDM information is the same.
  • the three-dimensional posture EDMs of several viewpoints are converted into one based on the reliability of the two-dimensional posture EDMs at each viewpoint.
  • the three-dimensional posture EDM was fused. Since 3D posture EDM is not affected by the camera viewpoint, camera calibration is not required.
  • the user's shape information and each joint from the 3D posture EDM is a deep learning network that predicts the human shape model (relative angle between joints, the shape of a person) parameters by inputting the 3D posture EDM as an input. Their location information can be restored at the same time.
  • EDM information excludes image information about people and backgrounds, so it is possible to design a lightweight deep learning network because human posture and shape information is compressed into a fairly small data size.
  • the technical idea of the present invention can be applied to a computer-readable recording medium containing a computer program for performing the functions of the apparatus and method according to the present embodiment.
  • the technical ideas according to various embodiments of the present invention may be implemented in the form of computer-readable codes recorded on a computer-readable recording medium.
  • the computer-readable recording medium may be any data storage device readable by the computer and capable of storing data.
  • the computer-readable recording medium may be a ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical disk, hard disk drive, or the like.
  • the computer-readable code or program stored in the computer-readable recording medium may be transmitted through a network connected between computers.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

L'invention concerne un dispositif et un procédé permettant de reconstruire un modèle de forme et de posture humaine sur la base d'une image multivue en utilisant des informations concernant une distance relative entre des articulations. Un procédé de reconstruction de modèle humain selon un mode de réalisation de la présente invention consiste : à extraire des éléments d'information de posture humaine bidimensionnelle codée à partir d'images multivue, respectivement, et à les convertir en éléments d'information de posture humaine tridimensionnelle codée ; et à fusionner les éléments d'informations convertis en un seul élément d'information de posture humaine tridimensionnelle codé, et à reconstruire simultanément une posture et une forme humaines tridimensionnelles. Au moyen du procédé, même si une articulation spécifique n'est pas détectée, l'ensemble de la posture peut être reconstruite au moyen d'informations concernant la distance vis-à-vis des articulations environnantes.
PCT/KR2020/017833 2019-12-11 2020-12-08 Dispositif et procédé permettant de reconstruire un modèle de forme et de posture humaine sur la base d'une image multivue en utilisant des informations concernant une distance relative entre des articulations WO2021256640A1 (fr)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
KR20190164276 2019-12-11
KR10-2020-0074753 2020-06-19
KR1020200074753A KR102338488B1 (ko) 2019-12-11 2020-06-19 관절 간 상대 거리 정보를 이용한 다시점 영상 기반 사람 자세 및 형상 모델 복원 장치 및 방법

Publications (1)

Publication Number Publication Date
WO2021256640A1 true WO2021256640A1 (fr) 2021-12-23

Family

ID=76600001

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2020/017833 WO2021256640A1 (fr) 2019-12-11 2020-12-08 Dispositif et procédé permettant de reconstruire un modèle de forme et de posture humaine sur la base d'une image multivue en utilisant des informations concernant une distance relative entre des articulations

Country Status (2)

Country Link
KR (1) KR102338488B1 (fr)
WO (1) WO2021256640A1 (fr)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102531977B1 (ko) * 2022-05-17 2023-05-15 한국전자기술연구원 3차원 스켈레톤 추정 시스템 및 3차원 스켈레톤 추정 방법

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20080105698A (ko) * 2007-06-01 2008-12-04 (주)제어와정보 다시점 영상을 이용한 3차원 형상 추정 방법 및 구현
KR20110070058A (ko) * 2009-12-18 2011-06-24 한국전자통신연구원 동적 개체 모션 캡쳐 방법 및 그 장치
KR20160022576A (ko) * 2014-08-20 2016-03-02 한국전자통신연구원 알지비 센서 및 깊이 센서 기반 3차원 인체 관절 좌표 정합 장치 및 방법
JP2018013999A (ja) * 2016-07-21 2018-01-25 日本電信電話株式会社 姿勢推定装置、方法、及びプログラム
KR20200076276A (ko) * 2018-12-19 2020-06-29 전자부품연구원 관절 간 거리 정보를 이용한 영상 기반 사람 자세 및 형상 모델 복원 장치 및 방법

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20080105698A (ko) * 2007-06-01 2008-12-04 (주)제어와정보 다시점 영상을 이용한 3차원 형상 추정 방법 및 구현
KR20110070058A (ko) * 2009-12-18 2011-06-24 한국전자통신연구원 동적 개체 모션 캡쳐 방법 및 그 장치
KR20160022576A (ko) * 2014-08-20 2016-03-02 한국전자통신연구원 알지비 센서 및 깊이 센서 기반 3차원 인체 관절 좌표 정합 장치 및 방법
JP2018013999A (ja) * 2016-07-21 2018-01-25 日本電信電話株式会社 姿勢推定装置、方法、及びプログラム
KR20200076276A (ko) * 2018-12-19 2020-06-29 전자부품연구원 관절 간 거리 정보를 이용한 영상 기반 사람 자세 및 형상 모델 복원 장치 및 방법

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
FRANCESC MORENO-NOGUER: "3D Human Pose Estimation from a Single Image via Distance Matrix Regression", ARXIV, no. 1611.09010v1, 28 November 2016 (2016-11-28), pages 1 - 10, XP055881595 *

Also Published As

Publication number Publication date
KR20210074165A (ko) 2021-06-21
KR102338488B1 (ko) 2021-12-13

Similar Documents

Publication Publication Date Title
EP3786890A1 (fr) Procédé et appareil de détermination de pose de dispositif de capture d'image, et support d'enregistrement correspondant
WO2019143174A1 (fr) Procédé et appareil de traitement de données pour une image en trois dimensions
WO2013015549A2 (fr) Système de réalité augmentée sans repère à caractéristique de plan et son procédé de fonctionnement
JP6985532B2 (ja) データ処理方法及び装置、電子機器並びに記憶媒体
JP2014112055A (ja) カメラ姿勢の推定方法およびカメラ姿勢の推定システム
WO2011136407A1 (fr) Appareil et procédé de reconnaissance d'image à l'aide d'un appareil photographique stéréoscopique
WO2021256640A1 (fr) Dispositif et procédé permettant de reconstruire un modèle de forme et de posture humaine sur la base d'une image multivue en utilisant des informations concernant une distance relative entre des articulations
WO2018101746A2 (fr) Appareil et procédé de reconstruction d'une zone bloquée de surface de route
WO2017104984A1 (fr) Procédé et appareil de génération de modèle tridimensionnel utilisant un procédé d'approche volumétrique de point le plus proche
CN114022798A (zh) 一种基于数字孪生技术的变电站巡检机器人避障方法
CN203630822U (zh) 虚拟影像与真实场景相结合的舞台交互集成系统
WO2015008932A1 (fr) Créateur d'espace digilogue pour un travail en équipe à distance dans une réalité augmentée et procédé de création d'espace digilogue l'utilisant
WO2021261687A1 (fr) Dispositif et procédé permettant de reconstruire un modèle de forme et de posture humaine tridimensionnel sur la base d'une image
WO2019098421A1 (fr) Dispositif de reconstruction d'objet au moyen d'informations de mouvement et procédé de reconstruction d'objet l'utilisant
WO2022131390A1 (fr) Procédé d'estimation de posture humaine tridimensionnelle basée sur un apprentissage auto-supervisé utilisant des images multi-vues
WO2011136405A1 (fr) Dispositif et procédé de reconnaissance d'image à l'aide d'un appareil photographique en trois dimensions (3d)
KR102170888B1 (ko) 관절 간 거리 정보를 이용한 영상 기반 사람 자세 및 형상 모델 복원 장치 및 방법
WO2017135514A1 (fr) Caméra tridimensionnelle permettant une capture d'image en vue de fournir une réalité virtuelle
WO2023038369A1 (fr) Augmentation de construction tridimensionnelle sémantique (3d)
WO2019103193A1 (fr) Système et procédé pour acquérir une image de rv à 360° dans un jeu à l'aide d'une caméra virtuelle distribuée
WO2017142130A1 (fr) Procédé de traitement d'image destiné à fournir une réalité virtuelle, et un dispositif de réalité virtuelle
WO2024101499A1 (fr) Procédé et système de reconstruction de contenu 3d à point de vue libre sur la base d'une image 2d
WO2024117356A1 (fr) Dispositif et procédé de reconstruction 3d d'un objet humain sur la base d'une image couleur monoculaire en temps réel
WO2020171257A1 (fr) Procédé de traitement d'image et dispositif correspondant
WO2023243785A1 (fr) Procédé de génération de performance virtuelle et système de génération de performance virtuelle

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20941371

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20941371

Country of ref document: EP

Kind code of ref document: A1