KR20230078777A

KR20230078777A - 3D reconstruction methods, devices and systems, media and computer equipment

Info

Publication number: KR20230078777A
Application number: KR1020237014677A
Authority: KR
Inventors: 즈제 차오; 민 왕; 원타오 류; 천 첸; 리좡 마
Original assignee: 상하이 센스타임 인텔리전트 테크놀로지 컴퍼니 리미티드
Priority date: 2021-05-10
Filing date: 2022-02-09
Publication date: 2023-06-02
Also published as: TW202244853A; CN113160418A; WO2022237249A1; JP2023547888A

Abstract

본 발명은 3차원 재구성 방법, 장치와 시스템, 매체 및 컴퓨터 기기를 제공하며, 상기 방법은 3차원 재구성 네트워크를 통해 이미지 내의 목표 오브젝트에 대해 3차원 재구성을 수행하여 상기 목표 오브젝트의 파라미터의 초기값을 얻는 단계 - 상기 파라미터의 초기값은 상기 목표 오브젝트의 3차원 모델을 구축하는 데에 사용됨 -; 미리 얻은, 목표 오브젝트의 특징을 나타내는 감독 정보에 기반하여 상기 파라미터의 초기값을 최적화하여 파라미터의 최적화된 값을 얻는 단계; 상기 파라미터의 최적화된 값에 기반하여 골격 스키닝 처리를 수행하여 상기 목표 오브젝트의 3차원 모델을 구축하는 단계를 포함한다.The present invention provides a 3D reconstruction method, apparatus and system, medium and computer equipment, wherein the method performs 3D reconstruction on a target object in an image through a 3D reconstruction network to obtain initial values of parameters of the target object. obtaining - initial values of the parameters are used to build a three-dimensional model of the target object; obtaining an optimized value of the parameter by optimizing an initial value of the parameter based on previously obtained supervision information representing characteristics of a target object; and constructing a 3D model of the target object by performing a skeletal skinning process based on the optimized values of the parameters.

Description

3D reconstruction methods, devices and systems, media and computer equipment

[관련 출원의 교차 인용][Cross-citation of related applications]

본 발명은 2021년 5월 10에 제출되고, 출원 번호가 202110506464.X이며, 발명의 명칭이 "3차원 재구성 방법, 장치와 시스템, 매체 및 컴퓨터 기기"인 중국 특허 출원의 우선권을 주장하는 바, 당해 중국 특허 출원의 모든 내용은 인용을 통해 본 발명에 통합된다.The present invention claims priority of a Chinese patent application filed on May 10, 2021, with application number 202110506464.X, entitled "Three-dimensional reconstruction method, apparatus and system, medium and computer apparatus", All contents of this Chinese patent application are incorporated into the present invention by reference.

[기술분야][Technical field]

본 발명은 컴퓨터 비전 기술분야에 관한 것으로, 특히 3차원 재구성 방법, 장치와 시스템, 매체 및 컴퓨터 기기에 관한 것이다.The present invention relates to the field of computer vision technology, and more particularly to three-dimensional reconstruction methods, devices and systems, media and computer equipment.

3차원 재구성은 컴퓨터 비전에서 중요한 기술 중 하나이며 증강 현실, 가상 현실 및 기타 분야에서 많은 잠재적 응용 프로그램을 구비하고 있다. 목표 오브젝트에 대해 3차원 재구성을 수행함으로써 목표 오브젝트의 자태 및 사지 회전을 재구성할 수 있다. 그러나 기존의 3차원 재구성 방법은 재구성 결과의 정확성과 신뢰성을 동시에 고려할 수 없다.3D reconstruction is one of the important techniques in computer vision and has many potential applications in augmented reality, virtual reality and other fields. A posture and limb rotation of the target object may be reconstructed by performing 3D reconstruction on the target object. However, existing 3D reconstruction methods cannot simultaneously consider the accuracy and reliability of reconstruction results.

본 발명은 3차원 재구성 방법, 장치와 시스템, 매체 및 컴퓨터 기기를 제공한다.The present invention provides a 3D reconstruction method, apparatus and system, medium and computer equipment.

본 발명의 실시예의 제1 양태에 따르면, 3차원 재구성 방법을 제공하는 바, 상기 방법은: 3차원 재구성 네트워크를 통해 이미지 내의 목표 오브젝트에 대해 3차원 재구성을 수행하여 상기 목표 오브젝트의 파라미터의 초기값을 얻는 단계 - 상기 파라미터의 초기값은 상기 목표 오브젝트의 3차원 모델을 구축하는 데에 사용됨 -; 미리 얻은, 목표 오브젝트의 특징을 나타내는 감독 정보에 기반하여 상기 파라미터의 초기값을 최적화하여 상기 파라미터의 최적화된 값을 얻는 단계; 상기 파라미터의 최적화된 값에 기반하여 골격 스키닝(골격 스키닝 처리, Skeleton skinning) 처리를 수행하여 상기 목표 오브젝트의 3차원 모델을 구축하는 단계를 포함한다.According to a first aspect of the embodiments of the present invention, there is provided a 3D reconstruction method, comprising: performing 3D reconstruction on a target object in an image through a 3D reconstruction network to obtain initial values of parameters of the target object obtaining - the initial values of the parameters are used to build a 3D model of the target object; obtaining an optimized value of the parameter by optimizing an initial value of the parameter based on pre-obtained supervision information representing characteristics of a target object; and constructing a 3D model of the target object by performing a skeleton skinning process based on the optimized values of the parameters.

일부 실시예에 있어서, 상기 감독 정보는 제1 감독 정보를 포함하거나 또는 상기 감독 정보는 제1 감독 정보와 제2 감독 정보를 포함하고; 상기 제1 감독 정보는 상기 목표 오브젝트의 초기 2차원 키 포인트, 상기 이미지 내의 상기 목표 오브젝트의 복수 개의 픽셀 포인트의 시멘틱 정보 중 적어도 하나를 포함하며; 상기 제2 감독 정보는 상기 목표 오브젝트 표면의 초기 3차원 포인트 클라우드를 포함한다. 본 발명의 실시예는 목표 오브젝트의 초기 2차원 키 포인트 또는 픽셀 포인트의 시멘틱 정보만 감독 정보로 사용하여 상기 파라미터의 초기값을 최적화하여 최적화 효율이 높고 최적화 복잡도가 낮거나; 또는 목표 오브젝트 표면의 초기 3차원 포인트 클라우드와 앞에서 언급한 초기 2차원 키 포인트 또는 픽셀 포인트의 시멘틱 정보를 함께 감독 정보로 사용하여 획득된 파라미터의 최적화된 값의 정확성을 향상시킬 수 있다.In some embodiments, the supervision information includes first supervision information, or the supervision information includes first supervision information and second supervision information; the first supervision information includes at least one of an initial two-dimensional key point of the target object and semantic information of a plurality of pixel points of the target object in the image; The second supervision information includes an initial 3D point cloud of the surface of the target object. The embodiments of the present invention optimize the initial values of the parameters by using only the semantic information of the initial two-dimensional key points or pixel points of the target object as supervision information, so that the optimization efficiency is high and the optimization complexity is low; Alternatively, the initial 3D point cloud of the target object surface and the aforementioned initial 2D key point or pixel point semantic information are used together as supervision information to improve the accuracy of the optimized value of the acquired parameter.

일부 실시예에 있어서, 상기 방법은 키 포인트 추출 네트워크를 통해 상기 이미지로부터 상기 목표 오브젝트의 초기 2차원 키 포인트의 정보를 추출하는 단계를 더 포함한다. 키 포인트 추출 네트워크에 의해 추출된 초기 2차원 키 포인트의 정보를 감독 정보로 이용하여 3차원 모델을 위해 자연스럽고 합리적인 동작을 생성할 수 있다.In some embodiments, the method further includes extracting information of an initial two-dimensional key point of the target object from the image through a key point extraction network. A natural and reasonable motion can be created for a 3D model by using the initial 2D key point information extracted by the key point extraction network as supervision information.

일부 실시예에 있어서, 상기 이미지는 상기 목표 오브젝트의 깊이 맵을 포함하고; 상기 방법은: 상기 깊이 맵으로부터 상기 목표 오브젝트의 복수 개의 픽셀 포인트의 깊이 정보를 추출하는 단계; 상기 깊이 정보에 기반하여 상기 깊이 맵 내의 상기 목표 오브젝트의 복수 개의 픽셀 포인트를 3차원 공간에 역투영하여 상기 목표 오브젝트 표면의 초기 3차원 포인트 클라우드를 얻는 단계를 더 포함한다. 깊이 정보를 추출하고, 깊이 정보에 기반하여 2차원 이미지 내의 픽셀 포인트를 3차원 공간에 역투영하여 목표 오브젝트 표면의 초기 3차원 포인트 클라우드를 얻음으로써, 당해 초기 3차원 포인트 클라우드를 감독 정보로 이용하여 파라미터의 초기값을 최적화하며, 나아가 파라미터 최적화의 정확성을 향상시킨다.In some embodiments, the image includes a depth map of the target object; The method may include: extracting depth information of a plurality of pixel points of the target object from the depth map; The method further comprises obtaining an initial 3D point cloud of a surface of the target object by back-projecting a plurality of pixel points of the target object in the depth map into a 3D space based on the depth information. Depth information is extracted, and based on the depth information, pixel points in the 2D image are back-projected onto a 3D space to obtain an initial 3D point cloud of the surface of the target object, and the initial 3D point cloud is used as supervision information. Optimize the initial values of parameters, and further improve the accuracy of parameter optimization.

일부 실시예에 있어서, 상기 이미지는 상기 목표 오브젝트의 RGB 이미지를 더 포함하고; 상기 깊이 맵으로부터 상기 목표 오브젝트의 복수 개의 픽셀 포인트의 깊이 정보를 추출하는 단계는: 상기 RGB 이미지에 대해 이미지 분할을 수행하고, 이미지 분할의 결과에 기반하여 상기 RGB 이미지 내의 목표 오브젝트가 소재하는 이미지 영역을 결정하며, 상기 RGB 이미지 내의 목표 오브젝트가 소재하는 이미지 영역에 기반하여 상기 깊이 맵 내의 목표 오브젝트가 소재하는 이미지 영역을 결정하는 단계; 상기 깊이 맵 내의 상기 목표 오브젝트가 소재하는 이미지 영역의 복수 개의 픽셀 포인트의 깊이 정보를 획득하는 단계를 포함한다. RGB 이미지에 대해 이미지 분할을 수행하여 목표 오브젝트의 위치를 정확하게 결정할 수 있음으로써, 추출 목표 오브젝트의 깊이 정보를 정확하게 추출할 수 있다.In some embodiments, the image further includes an RGB image of the target object; The step of extracting depth information of a plurality of pixel points of the target object from the depth map: performing image segmentation on the RGB image, and based on a result of the image segmentation, an image area where the target object in the RGB image is located. determining an image area where the target object in the depth map is located based on an image area where the target object in the RGB image is located; and acquiring depth information of a plurality of pixel points of an image area in which the target object is located in the depth map. By performing image segmentation on the RGB image to accurately determine the location of the target object, depth information of the extraction target object can be accurately extracted.

일부 실시예에 있어서, 상기 방법은, 상기 초기 3차원 포인트 클라우드 중 이상치를 필터링하고, 필터링 후의 상기 초기 3차원 포인트 클라우드를 상기 제2 감독 정보로 사용하는 단계를 더 포함한다. 이상치를 필터링하는 것을 통해 이상치의 간섭이 완화되고, 나아가 파라미터 최적화 과정의 정확성이 더욱 향상된다.In some embodiments, the method further includes filtering outliers in the initial 3D point cloud and using the filtered initial 3D point cloud as the second supervisory information. Through outlier filtering, the interference of outliers is mitigated, and furthermore, the accuracy of the parameter optimization process is further improved.

일부 실시예에 있어서, 상기 목표 오브젝트의 이미지는 이미지 수집 장치를 통해 수집되고 얻으며, 상기 파라미터는 상기 목표 오브젝트의 전역 회전 파라미터, 상기 목표 오브젝트의 각 키 포인트의 키 포인트 회전 파라미터, 상기 목표 오브젝트의 자태 파라미터 및 상기 이미지 수집 장치의 변위 파라미터를 포함하고; 상기 미리 얻은, 목표 오브젝트 특징을 나타내는 데에 사용되는 감독 정보에 기반하여 상기 파라미터의 초기값을 최적화하는 단계는: 상기 자태 파라미터의 초기값과 키 포인트 회전 파라미터의 초기값이 변하지 않은 상황에서, 상기 감독 정보와 상기 변위 파라미터의 초기값에 기반하여 상기 이미지 수집 장치의 변위 파라미터의 현재 값 및 상기 전역 회전 파라미터의 초기값을 최적화하여 변위 파라미터의 최적화된 값과 전역 회전 파라미터의 최적화된 값을 얻는 단계; 상기 변위 파라미터의 최적화된 값과 전역 회전 파라미터의 최적화된 값에 기반하여 상기 키 포인트 회전 파라미터의 초기값과 상기 자태 파라미터의 초기값을 최적화하여 키 포인트 회전 파라미터의 최적화된 값과 자태 파라미터의 최적화된 값을 얻는 단계를 포함한다. 최적화 과정에서 이미지 수집 장치의 위치를 변경하거나 3차원 키 포인트 위치를 변경하는 것은 모두 3차원 키 포인트의 2차원 투영의 변화를 일으킬 수 있으므로 최적화 과정이 불안정하게 된다. 두 단계의 최적화의 방식을 사용하여 키 포인트 회전 파라미터의 초기값 및 자태 파라미터의 초기값을 미리 고정함으로써 이미지 수집 장치의 변위 파라미터의 초기값 및 전역 회전 파라미터의 초기값을 최적화하고, 다시 변위 파라미터의 초기값 및 전역 회전 파라미터의 초기값을 고정하여 키 포인트 회전 파라미터의 초기값 및 자태 파라미터의 초기값을 최적화하여 최적화 과정의 안정성을 향상시킨다.In some embodiments, the image of the target object is collected and obtained by an image acquisition device, and the parameters include a global rotation parameter of the target object, a key point rotation parameter of each key point of the target object, and a state of the target object. parameters and displacement parameters of the image acquisition device; The step of optimizing the initial values of the parameters based on the pre-obtained supervision information used to represent the target object characteristics: in a situation where the initial values of the shape parameters and the initial values of the key point rotation parameters do not change, the Optimizing a current value of a displacement parameter and an initial value of the global rotation parameter of the image acquisition device based on supervision information and an initial value of the displacement parameter to obtain an optimized value of the displacement parameter and an optimized value of the global rotation parameter. ; Based on the optimized value of the displacement parameter and the optimized value of the global rotation parameter, the initial value of the key point rotation parameter and the initial value of the shape parameter are optimized to obtain the optimized value of the key point rotation parameter and the optimized value of the shape parameter. Including getting the value. In the optimization process, changing the position of the image acquisition device or changing the position of the 3D key point may cause a change in the 2D projection of the 3D key point, and thus the optimization process becomes unstable. By using a two-step optimization method, the initial values of the key point rotation parameters and the initial values of the shape parameters are fixed in advance to optimize the initial values of the displacement parameters and the initial values of the global rotation parameters of the image acquisition device. The initial values of the initial values and the global rotation parameters are fixed, and the initial values of the key point rotation parameters and the initial values of the shape parameters are optimized to improve the stability of the optimization process.

일부 실시예에 있어서, 상기 감독 정보는 상기 목표 오브젝트의 초기 2차원 키 포인트를 포함하고; 상기 감독 정보와 상기 변위 파라미터의 초기값에 기반하여 상기 이미지 수집 장치의 변위 파라미터의 현재 값 및 상기 전역 회전 파라미터의 초기값을 최적화하는 단계는, 상기 목표 오브젝트의 3차원 키 포인트에 대응하는 2차원 투영 키 포인트 중 상기 목표 오브젝트의 미리 설정된 부위에 속하는 목표 2차원 투영 키 포인트를 획득하는 단계 - 상기 목표 오브젝트의 3차원 키 포인트는 상기 전역 회전 파라미터의 초기값, 키 포인트 회전 파라미터의 초기값 및 자태 파라미터의 초기값에 기반하여 얻고, 상기 2차원 투영 키 포인트는 상기 변위 파라미터의 현재 값과 전역 회전 파라미터의 초기값에 기반하여 상기 목표 오브젝트의 3차원 키 포인트에 대해 투영을 수행하여 얻음 -; 상기 목표 2차원 투영 키 포인트와 상기 초기 2차원 키 포인트 사이의 제1 손실을 획득하는 단계; 상기 변위 파라미터의 초기값과 상기 변위 파라미터의 현재 값 사이의 제2 손실을 획득하는 단계; 상기 제1 손실과 제2 손실에 기반하여 상기 변위 파라미터의 현재 값과 전역 회전 파라미터의 초기값을 최적화하는 단계를 포함한다. 미리 설정된 부분은 몸통 등 부위일 수 있으며, 상이한 동작이 몸통 부위의 키 포인트에 대한 영향이 작으므로, 몸통 부위의 키 포인트를 사용하여 제1 손실을 결정함으로써 상이한 동작의 키 포인트 위치에 대한 영향을 감소시키고, 최적화 결과의 정확성을 향상시킬 수 있다. 2차원 키 포인트는 2차원 평면의 감독 정보이고, 이미지 수집 장치의 변위 파라미터는 3차원 평면의 파라미터이므로, 제2 손실을 획득하는 것을 통해 최적화 결과가 2차원 평면의 국부 최적화 포인트에 떨어져 실제 포인트에서 벗어나는 상황을 감소시킬 수 있다.In some embodiments, the supervision information includes an initial two-dimensional key point of the target object; Optimizing the current value of the displacement parameter and the initial value of the global rotation parameter of the image capture device based on the supervision information and the initial value of the displacement parameter includes: Acquiring a target 2D projection key point belonging to a preset portion of the target object among projection key points - the 3D key point of the target object includes the initial value of the global rotation parameter, the initial value of the key point rotation parameter, and the shape obtained based on an initial value of a parameter, and the 2D projection key point is obtained by performing projection on a 3D key point of the target object based on the current value of the displacement parameter and the initial value of the global rotation parameter; obtaining a first loss between the target 2D projection key point and the initial 2D key point; obtaining a second loss between the initial value of the displacement parameter and the current value of the displacement parameter; and optimizing a current value of the displacement parameter and an initial value of the global rotation parameter based on the first loss and the second loss. The preset part may be a torso or the like, and since different motions have a small effect on key points of the torso, the first loss is determined using the key points of the torso to reduce the influence of different motions on key point positions. and improve the accuracy of optimization results. Since the two-dimensional key point is the supervision information of the two-dimensional plane, and the displacement parameter of the image acquisition device is the parameter of the three-dimensional plane, by obtaining the second loss, the optimization result falls to the local optimization point of the two-dimensional plane at the actual point. You can reduce the chances of getting out of the way.

일부 실시예에서, 상기 감독 정보는 상기 목표 오브젝트의 초기 2차원 키 포인트를 포함하고; 상기 변위 파라미터의 최적화된 값과 전역 회전 파라미터의 최적화된 값에 기반하여 상기 키 포인트 회전 파라미터의 초기값과 상기 자태 파라미터의 초기값을 최적화하는 단계는: 상기 목표 오브젝트의 최적화 2차원 투영 키 포인트와 상기 초기 2차원 키 포인트 사이의 제3 손실을 획득하는 단계 - 상기 최적화 2차원 투영 키 포인트는 상기 변위 파라미터의 최적화된 값과 전역 회전 파라미터의 최적화된 값에 기반하여 상기 목표 오브젝트의 최적화 3차원 키 포인트에 대해 투영을 수행하여 얻고, 상기 최적화 3차원 키 포인트는 상기 전역 회전 파라미터의 최적화된 값, 키 포인트 회전 파라미터의 초기값 및 자태 파라미터의 초기값에 기반하여 얻음 -; 상기 전역 회전 파라미터의 최적화된 값, 키 포인트 회전 파라미터의 초기값 및 자태 파라미터의 초기값에 대응하는 자태의 합리성을 나타내기 위한 제4 손실을 획득하는 단계; 상기 제3 손실과 상기 제4 손실에 기반하여 상기 키 포인트 회전 파라미터의 초기값과 상기 자태 파라미터의 초기값을 최적화하는 단계를 포함한다. 본 실시예는 변위 파라미터의 최적화된 값과 전역 회전 파라미터의 최적화된 값에 기반하여 키 포인트 회전 파라미터의 초기값과 자태 파라미터의 초기값을 최적화하여 최적화 과정의 안정성을 향상시키고, 동시에 제4 손실을 통해 최적화 후의 파라미터에 대응하는 자태의 합리성을 확보한다.In some embodiments, the supervision information includes an initial two-dimensional key point of the target object; The step of optimizing the initial value of the key point rotation parameter and the initial value of the shape parameter based on the optimized value of the displacement parameter and the optimized value of the global rotation parameter includes: an optimized two-dimensional projection key point of the target object and obtaining a third loss between the initial 2D key points - the optimized 2D projection key point is an optimized 3D key of the target object based on an optimized value of the displacement parameter and an optimized value of a global rotation parameter; point is obtained by performing projection, and the optimized 3D key point is obtained based on the optimized value of the global rotation parameter, the initial value of the key point rotation parameter and the initial value of the shape parameter -; obtaining a fourth loss for representing the rationality of a posture corresponding to the optimized value of the global rotation parameter, the initial value of the key point rotation parameter, and the initial value of the posture parameter; and optimizing an initial value of the key point rotation parameter and an initial value of the shape parameter based on the third loss and the fourth loss. This embodiment optimizes the initial value of the key point rotation parameter and the initial value of the shape parameter based on the optimized value of the displacement parameter and the optimized value of the global rotation parameter to improve the stability of the optimization process, and at the same time reduce the fourth loss. Through this, the rationality of the shape corresponding to the parameters after optimization is secured.

일부 실시예에서, 상기 방법은, 상기 변위 파라미터의 최적화된 값과 전역 회전 파라미터의 최적화된 값에 기반하여 상기 키 포인트 회전 파라미터의 초기값과 상기 자태 파라미터의 초기값을 최적화한 후, 상기 전역 회전 파라미터의 최적화된 값, 상기 키 포인트 회전 파라미터의 최적화된 값, 자태 파라미터의 최적화된 값 및 상기 변위 파라미터의 최적화된 값에 대해 연합 최적화를 수행하는 단계를 더 포함한다. 본 실시예는 전술한 최적화를 바탕으로 최적화 후의 각 파라미터에 대해 연합 최적화를 수행함으로써, 나아가 최적화 결과의 정확성을 향상시킨다.In some embodiments, the method further includes optimizing an initial value of the key point rotation parameter and an initial value of the shape parameter based on the optimized value of the displacement parameter and the optimized value of the global rotation parameter, and then the global rotation. and performing joint optimization on the optimized value of the parameter, the optimized value of the key point rotation parameter, the optimized value of the shape parameter and the optimized value of the displacement parameter. This embodiment further improves the accuracy of the optimization result by performing coalition optimization on each parameter after optimization based on the above optimization.

일부 실시예에서, 상기 감독 정보는 상기 목표 오브젝트의 초기 2차원 키 포인트와 상기 목표 오브젝트 표면의 초기 3차원 포인트 클라우드를 포함하고; 상기 감독 정보와 상기 변위 파라미터의 초기값에 기반하여 상기 이미지 수집 장치의 변위 파라미터의 현재 값 및 상기 전역 회전 파라미터의 초기값을 최적화하는 단계는, 상기 목표 오브젝트의 3차원 키 포인트에 대응하는 2차원 투영 키 포인트 중 상기 목표 오브젝트의 미리 설정된 부위에 속하는 목표 2차원 투영 키 포인트 획득하는 단계 - 상기 목표 오브젝트의 3차원 키 포인트는 상기 전역 회전 파라미터의 초기값, 키 포인트 회전 파라미터의 초기값 및 자태 파라미터의 초기값에 기반하여 얻고, 상기 2차원 투영 키 포인트는 상기 변위 파라미터의 현재 값과 전역 회전 파라미터의 초기값에 기반하여 상기 목표 오브젝트의 3차원 키 포인트에 대해 투영을 수행하여 얻음 -; 상기 목표 2차원 투영 키 포인트와 상기 초기 2차원 키 포인트 사이의 제1 손실을 획득하는 단계; 상기 변위 파라미터의 초기값과 상기 변위 파라미터의 현재 값 사이의 제2 손실을 획득하는 단계; 상기 목표 오브젝트 표면의 제1 3차원 포인트 클라우드와 상기 초기 3차원 포인트 클라우드 사이의 제5 손실을 획득하는 단계; 상기 제1 3차원 포인트 클라우드는 상기 전역 회전 파라미터의 초기값, 키 포인트 회전 파라미터의 초기값 및 자태 파라미터의 초기값에 기반하여 얻는 단계; 상기 제1 손실, 제2 손실 및 제5 손실에 기반하여 상기 변위 파라미터의 현재 값과 전역 회전 파라미터의 초기값을 최적화하는 단계를 포함한다. 본 실시예는 3차원 포인트 클라우드를 감독 정보에 추가하여 초기의 각 파라미터를 최적화함으로써 최적화 결과의 정확성을 향상시킨다.In some embodiments, the supervision information includes an initial two-dimensional key point of the target object and an initial three-dimensional point cloud of a surface of the target object; Optimizing the current value of the displacement parameter and the initial value of the global rotation parameter of the image capture device based on the supervision information and the initial value of the displacement parameter includes: Obtaining a target 2D projection key point belonging to a preset portion of the target object among projection key points, wherein the 3D key point of the target object includes an initial value of the global rotation parameter, an initial value of a key point rotation parameter, and a shape parameter. -; obtaining a first loss between the target 2D projection key point and the initial 2D key point; obtaining a second loss between the initial value of the displacement parameter and the current value of the displacement parameter; obtaining a fifth loss between a first 3D point cloud of the surface of the target object and the initial 3D point cloud; obtaining the first 3-dimensional point cloud based on the initial values of the global rotation parameters, the initial values of the key point rotation parameters, and the initial values of the shape parameters; and optimizing the current value of the displacement parameter and the initial value of the global rotation parameter based on the first loss, the second loss and the fifth loss. This embodiment improves the accuracy of the optimization result by optimizing each initial parameter by adding a 3D point cloud to the supervisory information.

일부 실시예에서, 상기 전역 회전 파라미터의 최적화된 값, 상기 키 포인트 회전 파라미터의 최적화된 값, 자태 파라미터의 최적화된 값 및 상기 변위 파라미터의 최적화된 값에 대해 연합 최적화를 수행하는 단계는: 상기 목표 오브젝트의 최적화 2차원 투영 키 포인트와 상기 초기 2차원 키 포인트 사이의 제6 손실을 획득하는 단계 - 상기 최적화 2차원 투영 키 포인트는 상기 변위 파라미터의 최적화된 값과 전역 회전 파라미터의 최적화된 값에 기반하여 상기 목표 오브젝트의 최적화 3차원 키 포인트에 대해 투영을 수행하여 얻고, 상기 최적화 3차원 키 포인트는 상기 전역 회전 파라미터의 최적화된 값, 키 포인트 회전 파라미터의 최적화된 값 및 자태 파라미터의 최적화된 값에 기반하여 얻음 -; 상기 전역 회전 파라미터의 최적화된 값, 키 포인트 회전 파라미터의 최적화된 값 및 자태 파라미터의 최적화된 값에 대응하는 자태의 합리성을 나타내기 위한 제7손실을 획득하는 단계; 상기 목표 오브젝트 표면의 제2 3차원 포인트 클라우드와 상기 초기 3차원 포인트 클라우드 사이의 제8 손실을 획득하는 단계 - 상기 제2 3차원 포인트 클라우드는 상기 전역 회전 파라미터의 최적화된 값, 키 포인트 회전 파라미터의 최적화된 값 및 자태 파라미터의 최적화된 값에 기반하여 얻음 -; 상기 제6 손실, 제7 손실 및 제8 손실에 기반하여 상기 전역 회전 파라미터의 최적화된 값, 상기 키 포인트 회전 파라미터의 최적화된 값, 자태 파라미터의 최적화된 값 및 상기 변위 파라미터의 최적화된 값에 대해 연합 최적화를 수행하는 단계를 포함한다. 본 실시예는 3차원 포인트 클라우드를 감독 정보에 추가하여 초기의 각 파라미터를 최적화함으로써 최적화 결과의 정확성을 향상시킨다.In some embodiments, performing joint optimization on the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter, the optimized value of the shape parameter, and the optimized value of the displacement parameter: obtaining a sixth loss between an optimized 2D projection key point of an object and the initial 2D key point, wherein the optimized 2D projection key point is based on an optimized value of the displacement parameter and an optimized value of a global rotation parameter; to obtain by performing projection on an optimized 3D key point of the target object, wherein the optimized 3D key point corresponds to an optimized value of the global rotation parameter, an optimized value of a key point rotation parameter, and an optimized value of a shape parameter. obtained based on -; obtaining a seventh loss for representing the rationality of a posture corresponding to the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter, and the optimized value of the posture parameter; obtaining an eighth loss between a second 3D point cloud of the target object surface and the initial 3D point cloud, wherein the second 3D point cloud is an optimized value of the global rotation parameter, a key point rotation parameter Obtained based on the optimized value and the optimized value of the shape parameter -; For the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter, the optimized value of the shape parameter and the optimized value of the displacement parameter based on the sixth loss, the seventh loss and the eighth loss. and performing federation optimization. This embodiment improves the accuracy of the optimization result by optimizing each initial parameter by adding a 3D point cloud to the supervisory information.

본 발명의 실시예의 제2 양태에 따르면, 3차원 재구성 장치를 제공하는 바, 상기 장치는 제1 3차원 재구성 모듈, 최적화 모듈, 제2 3차원 재구성 모듈을 포함한다. 상기 제1 3차원 재구성 모듈은 3차원 재구성 네트워크를 통해 이미지 내의 목표 오브젝트에 대해 3차원 재구성을 수행하여 상기 목표 오브젝트의 파라미터의 초기값을 얻는 데에 사용되고, 여기서, 상기 파라미터의 초기값은 상기 목표 오브젝트의 3차원 모델을 구축하는 데에 사용되며; 상기 최적화 모듈은 미리 얻은, 목표 오브젝트의 특징을 나타내는 감독 정보에 기반하여 상기 파라미터의 초기값을 최적화하여 파라미터의 최적화된 값을 얻는 데에 사용되고; 상기 제2 3차원 재구성 모듈은 상기 파라미터의 최적화된 값에 기반하여 골격 스키닝 처리를 수행하여 상기 목표 오브젝트의 3차원 모델을 구축하는 데에 사용된다.According to a second aspect of the embodiments of the present invention, a 3D reconstruction device is provided, and the device includes a first 3D reconstruction module, an optimization module, and a second 3D reconstruction module. The first 3D reconstruction module is used to perform 3D reconstruction on a target object in an image through a 3D reconstruction network to obtain an initial value of a parameter of the target object, wherein the initial value of the parameter is the target object. used to build a three-dimensional model of an object; the optimization module is used to obtain an optimized value of the parameter by optimizing the initial value of the parameter according to pre-obtained supervision information representing characteristics of the target object; The second 3D reconstruction module is used to construct a 3D model of the target object by performing skeletal skinning processing based on the optimized values of the parameters.

일부 실시예에서, 상기 감독 정보는 제1 감독 정보를 포함하거나 또는 상기 감독 정보는 제1 감독 정보와 제2 감독 정보를 포함하고; 상기 제1 감독 정보는 상기 목표 오브젝트의 초기 2차원 키 포인트, 상기 이미지 내의 상기 목표 오브젝트의 복수 개의 픽셀 포인트의 시멘틱 정보 중 적어도 하나를 포함하며; 상기 제2 감독 정보는 상기 목표 오브젝트 표면의 초기 3차원 포인트 클라우드를 포함한다. 본 발명의 실시예에서 목표 오브젝트의 초기 2차원 키 포인트 또는 픽셀 포인트의 시멘틱 정보만 감독 정보로 사용하여 상기 파라미터의 초기값을 최적화하여 최적화 효율이 높고 최적화 복잡도가 낮거나; 또는 목표 오브젝트 표면의 초기 3차원 포인트 클라우드와 앞에서 언급한 초기 2차원 키 포인트 또는 픽셀 포인트의 시멘틱 정보를 함께 감독 정보로 사용하여 획득된 파라미터의 최적화된 값의 정확성을 향상시킬 수 있다.In some embodiments, the supervision information includes first supervision information, or the supervision information includes first supervision information and second supervision information; the first supervision information includes at least one of an initial two-dimensional key point of the target object and semantic information of a plurality of pixel points of the target object in the image; The second supervision information includes an initial 3D point cloud of the surface of the target object. In an embodiment of the present invention, only semantic information of an initial two-dimensional key point or pixel point of a target object is used as supervision information to optimize the initial values of the parameters so that optimization efficiency is high and optimization complexity is low; Alternatively, the initial 3D point cloud of the target object surface and the aforementioned initial 2D key point or pixel point semantic information are used together as supervision information to improve the accuracy of the optimized value of the acquired parameter.

일부 실시예에서 상기 장치는 키 포인트 추출 네트워크를 통해 상기 이미지로부터 상기 목표 오브젝트의 초기 2차원 키 포인트의 정보를 추출하는 데에 사용되는 2차원 키 포인트 추출 모듈을 더 포함한다. 키 포인트 추출 네트워크에 의해 추출된 초기 2차원 키 포인트의 정보를 감독 정보로 이용하여 3차원 모델을 위해 자연스럽고 합리적인 동작을 생성할 수 있다.In some embodiments, the device further includes a 2D key point extraction module, which is used to extract information of an initial 2D key point of the target object from the image through a key point extraction network. A natural and reasonable motion can be created for a 3D model by using the initial 2D key point information extracted by the key point extraction network as supervision information.

일부 실시예에서, 상기 이미지는 상기 목표 오브젝트의 깊이 맵을 포함하고; 상기 장치는 깊이 정보 추출 모듈, 역투영 모듈을 더 포함한다. 상기 깊이 정보 추출은 모듈은 상기 깊이 맵으로부터 상기 목표 오브젝트의 복수 개의 픽셀 포인트의 깊이 정보를 추출하는 데에 사용되고; 상기 역투영 모듈은 상기 깊이 정보에 기반하여 상기 깊이 맵 내의 상기 목표 오브젝트의 복수 개의 픽셀 포인트를 3차원 공간에 역투영하여 상기 목표 오브젝트 표면의 초기 3차원 포인트 클라우드를 얻는 데에 사용된다. 깊이 정보를 추출하고, 깊이 정보에 기반하여 2차원 이미지 내의 픽셀 포인트를 3차원 공간에 역투영하여 목표 오브젝트 표면의 초기 3차원 포인트 클라우드를 얻음으로써, 당해 초기 3차원 포인트 클라우드를 감독 정보로 이용하여 파라미터의 초기값을 최적화하며, 나아가 파라미터 최적화의 정확성을 향상시킨다.In some embodiments, the image includes a depth map of the target object; The device further includes a depth information extraction module and a back-projection module. The depth information extraction module is used to extract depth information of a plurality of pixel points of the target object from the depth map; The back-projection module is used to back-project a plurality of pixel points of the target object in the depth map into a 3-dimensional space based on the depth information to obtain an initial 3-dimensional point cloud of a surface of the target object. Depth information is extracted, and based on the depth information, pixel points in the 2D image are back-projected onto a 3D space to obtain an initial 3D point cloud of the surface of the target object, and the initial 3D point cloud is used as supervision information. Optimize the initial values of parameters, and further improve the accuracy of parameter optimization.

일부 실시예에서, 상기 이미지는 상기 목표 오브젝트의 RGB 이미지를 더 포함하고; 상기 깊이 정보 추출 모듈은 이미지 분할 유닛, 이미지 영역 결정 유닛, 깊이 정보 획득 유닛을 포함한다. 상기 이미지 분할 유닛은 상기 RGB 이미지에 대해 이미지 분할을 수행하는 데에 사용되고; 상기 이미지 영역 결정 유닛은, 이미지 분할의 결과에 기반하여 상기 RGB 이미지 내의 목표 오브젝트가 소재하는 이미지 영역을 결정하며, 상기 RGB 이미지 내의 목표 오브젝트가 소재하는 이미지 영역에 기반하여 상기 깊이 맵 내의 목표 오브젝트가 소재하는 이미지 영역을 결정하는 데에 사용되고; 상기 깊이 정보 획득 유닛은 상기 깊이 맵 내의 상기 목표 오브젝트가 소재하는 이미지 영역의 복수 개의 픽셀 포인트의 깊이 정보를 획득하는 데에 사용된다. RGB 이미지에 대해 이미지 분할을 수행하여 목표 오브젝트의 위치를 정확하게 결정할 수 있음으로써, 추출 목표 오브젝트의 깊이 정보를 정확하게 추출할 수 있다.In some embodiments, the image further includes an RGB image of the target object; The depth information extraction module includes an image segmentation unit, an image area determination unit, and a depth information acquisition unit. the image segmentation unit is used to perform image segmentation on the RGB image; The image area determination unit determines an image area where a target object in the RGB image is located based on a result of image segmentation, and a target object in the depth map is determined based on the image area where the target object in the RGB image is located. used to determine the region of the image to be located; The depth information acquiring unit is used to acquire depth information of a plurality of pixel points of an image area where the target object is located in the depth map. By performing image segmentation on the RGB image to accurately determine the location of the target object, depth information of the extraction target object can be accurately extracted.

일부 실시예에서, 상기 장치는 상기 초기 3차원 포인트 클라우드 중 이상치를 필터링하고, 필터링 후의 상기 초기 3차원 포인트 클라우드를 상기 제2 감독 정보로 이용하기 위한 필터링 모듈을 더 포함한다. 이상치를 필터링하는 것을 통해 이상치의 간섭이 완화되고, 나아가 파라미터 최적화 과정의 정확성이 더욱 향상된다.In some embodiments, the device further includes a filtering module for filtering outliers in the initial 3D point cloud and using the filtered initial 3D point cloud as the second supervisory information. Through outlier filtering, the interference of outliers is mitigated, and furthermore, the accuracy of the parameter optimization process is further improved.

일부 실시예에서, 상기 목표 오브젝트의 이미지는 이미지 수집 장치를 통해 수집되고 얻으며, 상기 파라미터는 상기 목표 오브젝트의 전역 회전 파라미터, 상기 목표 오브젝트의 각 키 포인트의 키 포인트 회전 파라미터, 상기 목표 오브젝트의 자태 파라미터 및 상기 이미지 수집 장치의 변위 파라미터를 포함하고; 상기 최적화 모듈은, 상기 자태 파라미터의 초기값과 키 포인트 회전 파라미터의 초기값이 변하지 않은 상황에서, 상기 감독 정보와 상기 변위 파라미터의 초기값에 기반하여 상기 이미지 수집 장치의 변위 파라미터의 현재 값 및 상기 전역 회전 파라미터의 초기값을 최적화하여 변위 파라미터의 최적화된 값과 전역 회전 파라미터의 최적화된 값을 얻는 데에 사용되는 제1 최적화 유닛; 상기 변위 파라미터의 최적화된 값과 전역 회전 파라미터의 최적화된 값에 기반하여 상기 키 포인트 회전 파라미터의 초기값과 상기 자태 파라미터의 초기값을 최적화하여 키 포인트 회전 파라미터의 최적화된 값과 자태 파라미터의 최적화된 값을 얻는 데에 사용되는 제2 최적화 유닛을 포함한다. 최적화 과정에서 이미지 수집 장치의 위치를 변경하거나 3차원 키 포인트 위치를 변경하는 것은 모두 3차원 키 포인트의 2차원 투영의 변화를 일으킬 수 있으므로 최적화 과정이 불안정하게 된다. 두 단계의 최적화의 방식을 사용하여 키 포인트 회전 파라미터의 초기값 및 자태 파라미터의 초기값을 미리 고정함으로써 이미지 수집 장치의 변위 파라미터의 초기값 및 전역 회전 파라미터의 초기값을 최적화하고, 다시 변위 파라미터의 초기값 및 전역 회전 파라미터의 초기값을 고정하여 키 포인트 회전 파라미터의 초기값 및 자태 파라미터의 초기값을 최적화하여 최적화 과정의 안정성을 향상시킨다.In some embodiments, the image of the target object is collected and obtained by an image collecting device, and the parameters include a global rotation parameter of the target object, a key point rotation parameter of each key point of the target object, and a shape parameter of the target object. and a displacement parameter of the image acquisition device; The optimization module determines the current value of the displacement parameter of the image acquisition device and the initial value of the displacement parameter based on the supervision information and the initial value of the displacement parameter in a situation where the initial value of the figure parameter and the initial value of the key point rotation parameter do not change. a first optimization unit used to optimize an initial value of a global rotation parameter to obtain an optimized value of a displacement parameter and an optimized value of a global rotation parameter; Based on the optimized value of the displacement parameter and the optimized value of the global rotation parameter, the initial value of the key point rotation parameter and the initial value of the shape parameter are optimized to obtain the optimized value of the key point rotation parameter and the optimized value of the shape parameter. and a second optimization unit used to obtain a value. In the optimization process, changing the position of the image acquisition device or changing the position of the 3D key point may cause a change in the 2D projection of the 3D key point, and thus the optimization process becomes unstable. By using a two-step optimization method, the initial values of the key point rotation parameters and the initial values of the shape parameters are fixed in advance to optimize the initial values of the displacement parameters and the initial values of the global rotation parameters of the image acquisition device. The initial values of the initial values and the global rotation parameters are fixed, and the initial values of the key point rotation parameters and the initial values of the shape parameters are optimized to improve the stability of the optimization process.

일부 실시예에서, 상기 감독 정보는 상기 목표 오브젝트의 초기 2차원 키 포인트를 포함하고; 상기 제1 최적화 유닛은 상기 목표 오브젝트의 3차원 키 포인트에 대응하는 2차원 투영 키 포인트 중 상기 목표 오브젝트의 미리 설정된 부위에 속하는 목표 2차원 투영 키 포인트를 획득하고; 여기서, 상기 목표 오브젝트의 3차원 키 포인트는 상기 전역 회전 파라미터의 초기값, 키 포인트 회전 파라미터의 초기값 및 자태 파라미터의 초기값에 기반하여 얻고, 상기 2차원 투영 키 포인트는 상기 변위 파라미터의 현재 값과 전역 회전 파라미터의 초기값에 기반하여 상기 목표 오브젝트의 3차원 키 포인트에 대해 투영을 수행하여 얻으며; 상기 목표 2차원 투영 키 포인트와 상기 초기 2차원 키 포인트 사이의 제1 손실을 획득하고; 상기 변위 파라미터의 초기값과 상기 변위 파라미터의 현재 값 사이의 제2 손실을 획득하며; 상기 제1 손실과 제2 손실에 기반하여 상기 변위 파라미터의 현재 값과 전역 회전 파라미터의 초기값을 최적화한다. 미리 설정된 부분은 몸통 등 부위일 수 있으며, 상이한 동작이 몸통 부위의 키 포인트에 대한 영향이 작으므로, 몸통 부위의 키 포인트를 사용하여 제1 손실을 결정함으로써 상이한 동작의 키 포인트 위치에 대한 영향을 감소시키고, 최적화 결과의 정확성을 향상시킬 수 있다. 2차원 키 포인트는 2차원 평면의 감독 정보이고, 이미지 수집 장치의 변위 파라미터는 3차원 평면의 파라미터이므로, 제2 손실을 획득하는 것을 통해 최적화 결과가 2차원 평면의 국부 최적화 포인트에 떨어져 실제 포인트에서 벗어나는 상황을 감소시킬 수 있다.In some embodiments, the supervision information includes an initial two-dimensional key point of the target object; the first optimization unit obtains a target 2D projection key point belonging to a preset part of the target object from among the 2D projection key points corresponding to the 3D key points of the target object; Here, the 3D key point of the target object is obtained based on the initial value of the global rotation parameter, the initial value of the key point rotation parameter, and the initial value of the shape parameter, and the 2D projection key point is the current value of the displacement parameter. Obtain by performing projection on the three-dimensional key points of the target object based on initial values of and global rotation parameters; obtain a first loss between the target two-dimensional projection key point and the initial two-dimensional key point; obtain a second loss between an initial value of the displacement parameter and a current value of the displacement parameter; A current value of the displacement parameter and an initial value of the global rotation parameter are optimized based on the first loss and the second loss. The preset part may be a torso or the like, and since different motions have a small effect on key points of the torso, the first loss is determined using the key points of the torso to reduce the influence of different motions on key point positions. and improve the accuracy of optimization results. Since the two-dimensional key point is the supervision information of the two-dimensional plane, and the displacement parameter of the image acquisition device is the parameter of the three-dimensional plane, by obtaining the second loss, the optimization result falls to the local optimization point of the two-dimensional plane at the actual point. You can reduce the chances of getting out of the way.

일부 실시예에서, 상기 감독 정보는 상기 목표 오브젝트의 초기 2차원 키 포인트를 포함하고; 상기 제2 최적화 유닛은 상기 목표 오브젝트의 최적화 2차원 투영 키 포인트와 상기 초기 2차원 키 포인트 사이의 제3 손실을 획득하고 - 상기 최적화 2차원 투영 키 포인트는 상기 변위 파라미터의 최적화된 값과 전역 회전 파라미터의 최적화된 값에 기반하여 상기 목표 오브젝트의 최적화 3차원 키 포인트에 대해 투영을 수행하여 얻고, 상기 최적화 3차원 키 포인트는 상기 전역 회전 파라미터의 최적화된 값, 키 포인트 회전 파라미터의 초기값 및 자태 파라미터의 초기값에 기반하여 얻음 -; 상기 전역 회전 파라미터의 최적화된 값, 키 포인트 회전 파라미터의 초기값 및 자태 파라미터의 초기값에 대응하는 자태의 합리성을 나타내기 위한 제4 손실을 얻으며; 상기 제3 손실과 상기 제4 손실에 기반하여 상기 키 포인트 회전 파라미터의 초기값과 상기 자태 파라미터의 초기값을 최적화한다. 본 실시예는 변위 파라미터의 최적화된 값과 전역 회전 파라미터의 최적화된 값에 기반하여 키 포인트 회전 파라미터의 초기값과 자태 파라미터의 초기값을 최적화하여 최적화 과정의 안정성을 향상시키고, 동시에 제4 손실을 통해 최적화 후의 파라미터에 대응하는 자태의 합리성을 확보한다.In some embodiments, the supervision information includes an initial two-dimensional key point of the target object; the second optimization unit obtains a third loss between an optimized 2D projection key point of the target object and the initial 2D key point, wherein the optimized 2D projection key point corresponds to an optimized value of the displacement parameter and a global rotation; Obtained by performing projection on an optimized 3D key point of the target object based on the optimized value of the parameter, wherein the optimized 3D key point includes the optimized value of the global rotation parameter, the initial value of the key point rotation parameter, and the shape. Obtained based on the initial value of the parameter -; obtain a fourth loss for representing the rationality of a posture corresponding to the optimized value of the global rotation parameter, the initial value of the key point rotation parameter and the initial value of the posture parameter; An initial value of the key point rotation parameter and an initial value of the shape parameter are optimized based on the third loss and the fourth loss. This embodiment optimizes the initial value of the key point rotation parameter and the initial value of the shape parameter based on the optimized value of the displacement parameter and the optimized value of the global rotation parameter to improve the stability of the optimization process, and at the same time reduce the fourth loss. Through this, the rationality of the shape corresponding to the parameters after optimization is secured.

일부 실시예에서, 상기 장치는, 상기 변위 파라미터의 최적화된 값과 전역 회전 파라미터의 최적화된 값에 기반하여 상기 키 포인트 회전 파라미터의 초기값과 상기 자태 파라미터의 초기값을 최적화한 후 상기 전역 회전 파라미터의 최적화된 값, 상기 키 포인트 회전 파라미터의 최적화된 값, 자태 파라미터의 최적화된 값 및 상기 변위 파라미터의 최적화된 값에 대해 연합 최적화를 수행하는 연합 최적화 모듈을 더 포함한다. 본 실시예는 전술한 최적화를 바탕으로 최적화 후의 각 파라미터에 대해 연합 최적화를 수행함으로써, 나아가 최적화 결과의 정확성을 향상시킨다.In some embodiments, the device optimizes the initial value of the key point rotation parameter and the initial value of the shape parameter based on the optimized value of the displacement parameter and the optimized value of the global rotation parameter, and then the global rotation parameter. and a federated optimization module that performs coalesced optimization on the optimized value of the key point rotation parameter, the optimized value of the shape parameter, and the optimized value of the displacement parameter. This embodiment further improves the accuracy of the optimization result by performing coalition optimization on each parameter after optimization based on the above optimization.

일부 실시예에서, 상기 감독 정보는 상기 목표 오브젝트의 초기 2차원 키 포인트와 상기 목표 오브젝트 표면의 초기 3차원 포인트 클라우드를 포함하고; 상기 제1 최적화 유닛은 상기 목표 오브젝트의 3차원 키 포인트에 대응하는 2차원 투영 키 포인트 중 상기 목표 오브젝트의 미리 설정된 부위에 속하는 목표 2차원 투영 키 포인트를 획득하고 - 상기 목표 오브젝트의 3차원 키 포인트는 상기 전역 회전 파라미터의 초기값, 키 포인트 회전 파라미터의 초기값 및 자태 파라미터의 초기값에 기반하여 얻고, 상기 2차원 투영 키 포인트는 상기 변위 파라미터의 현재 값과 전역 회전 파라미터의 초기값에 기반하여 상기 목표 오브젝트의 3차원 키 포인트에 대해 투영을 수행하여 얻음 -; 상기 목표 2차원 투영 키 포인트와 상기 초기 2차원 키 포인트 사이의 제1 손실을 획득하며; 상기 변위 파라미터의 초기값과 상기 변위 파라미터의 현재 값 사이의 제2 손실을 획득하고; 상기 목표 오브젝트 표면의 제1 3차원 포인트 클라우드와 상기 초기 3차원 포인트 클라우드 사이의 제5 손실을 획득하며; 상기 제1 3차원 포인트 클라우드는 상기 전역 회전 파라미터의 초기값, 키 포인트 회전 파라미터의 초기값 및 자태 파라미터의 초기값에 기반하여 얻고; 상기 제1 손실, 제2 손실 및 제5 손실에 기반하여 상기 변위 파라미터의 현재 값과 전역 회전 파라미터의 초기값을 최적화한다. 본 실시예는 3차원 포인트 클라우드를 감독 정보에 추가하여 초기의 각 파라미터를 최적화함으로써 최적화 결과의 정확성을 향상시킨다.In some embodiments, the supervision information includes an initial two-dimensional key point of the target object and an initial three-dimensional point cloud of a surface of the target object; The first optimization unit obtains a target 2D projection key point belonging to a preset part of the target object from among the 2D projection key points corresponding to the 3D key point of the target object - the 3D key point of the target object. is obtained based on the initial value of the global rotation parameter, the initial value of the key point rotation parameter, and the initial value of the shape parameter, and the two-dimensional projection key point is based on the current value of the displacement parameter and the initial value of the global rotation parameter. Obtained by performing projection on the 3D key point of the target object -; obtain a first loss between the target two-dimensional projection key point and the initial two-dimensional key point; obtain a second loss between an initial value of the displacement parameter and a current value of the displacement parameter; obtain a fifth loss between the first 3-dimensional point cloud of the surface of the target object and the initial 3-dimensional point cloud; the first three-dimensional point cloud is obtained based on the initial values of the global rotation parameters, the initial values of the key point rotation parameters and the initial values of the shape parameters; A current value of the displacement parameter and an initial value of the global rotation parameter are optimized based on the first loss, the second loss and the fifth loss. This embodiment improves the accuracy of the optimization result by optimizing each initial parameter by adding a 3D point cloud to the supervisory information.

일부 실시예에서, 상기 연합 최적화 모듈은, 상기 목표 오브젝트의 최적화 2차원 투영 키 포인트와 상기 초기 2차원 키 포인트 사이의 제6 손실을 획득하는 데에 사용되는 제1 획득 유닛 - 상기 최적화 2차원 투영 키 포인트는 상기 변위 파라미터의 최적화된 값과 전역 회전 파라미터의 최적화된 값에 기반하여 상기 목표 오브젝트의 최적화 3차원 키 포인트에 대해 투영을 수행하여 얻고, 상기 최적화 3차원 키 포인트는 상기 전역 회전 파라미터의 최적화된 값, 키 포인트 회전 파라미터의 최적화된 값 및 자태 파라미터의 최적화된 값에 기반하여 얻음 -; 제7 손실을 획득하는 데에 사용되는 제2 획득 유닛 - 상기 제7 손실은 상기 전역 회전 파라미터의 최적화된 값, 키 포인트 회전 파라미터의 최적화된 값 및 자태 파라미터의 최적화된 값에 대응하는 자태의 합리성을 나타내는데 사용됨 -; 상기 목표 오브젝트 표면의 제2 3차원 포인트 클라우드와 상기 초기 3차원 포인트 클라우드 사이의 제8 손실을 획득하는 데에 사용되는 제3 획득 유닛 - 상기 제2 3차원 포인트 클라우드는 상기 전역 회전 파라미터의 최적화된 값, 키 포인트 회전 파라미터의 최적화된 값 및 자태 파라미터의 최적화된 값에 기반하여 얻음 -; 상기 제6 손실, 제7 손실 및 제8 손실에 기반하여 상기 전역 회전 파라미터의 최적화된 값, 상기 키 포인트 회전 파라미터의 최적화된 값, 자태 파라미터의 최적화된 값 및 상기 변위 파라미터의 최적화된 값에 대해 연합 최적화를 수행하는 데에 사용되는 연합 최적화 유닛을 포함한다. 본 실시예는 3차원 포인트 클라우드를 감독 정보에 추가하여 초기의 각 파라미터를 최적화함으로써 최적화 결과의 정확성을 향상시킨다.In some embodiments, the joint optimization module comprises: a first acquisition unit used to obtain a sixth loss between an optimized 2D projection key point of the target object and the initial 2D key point - the optimized 2D projection A key point is obtained by performing projection on an optimized 3D key point of the target object based on the optimized value of the displacement parameter and the optimized value of the global rotation parameter, the optimized 3D key point of the global rotation parameter. Obtained based on the optimized value, the optimized value of the key point rotation parameter and the optimized value of the shape parameter -; A second acquisition unit used to obtain a seventh loss, wherein the seventh loss is the rationality of a posture corresponding to the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter, and the optimized value of the posture parameter. Used to indicate -; a third acquisition unit used to obtain an eighth loss between a second 3-D point cloud of the target object surface and the initial 3-D point cloud, wherein the second 3-D point cloud is optimized for the global rotation parameter; values, obtained based on the optimized values of the key point rotation parameters and the optimized values of the shape parameters -; For the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter, the optimized value of the shape parameter and the optimized value of the displacement parameter based on the sixth loss, the seventh loss and the eighth loss. Contains a coalition optimization unit used to perform coalition optimization. This embodiment improves the accuracy of the optimization result by optimizing each initial parameter by adding a 3D point cloud to the supervisory information.

본 발명의 실시예의 제3 양태에 따르면, 3차원 재구성 시스템을 제공하는 바, 상기 시스템은 이미지 수집 장치, 처리 유닛을 포함한다. 상기 이미지 수집 장치는 목표 오브젝트의 이미지를 수집하는 데에 사용되며; 상기 처리 유닛은, 상기 이미지 수집 장치와 통신 연결되고, 3차원 재구성 네트워크를 통해 상기 이미지 내의 상기 목표 오브젝트에 대해 3차원 재구성을 수행하여 상기 목표 오브젝트의 파라미터의 초기값을 얻고 - 상기 파라미터의 초기값은 상기 목표 오브젝트의 3차원 모델을 구축하는 데에 사용됨 -; 목표 오브젝트 특징을 나타내는 데에 사용되는 감독 정보에 기반하여 상기 파라미터의 초기값을 최적화하여 상기 파라미터의 최적화된 값을 얻으며; 상기 파라미터의 최적화된 값에 기반하여 골격 스키닝 처리를 수행하여 상기 목표 오브젝트의 3차원 모델을 구축한다.According to a third aspect of the embodiments of the present invention, a three-dimensional reconstruction system is provided, wherein the system includes an image acquisition device and a processing unit. the image collection device is used to collect images of target objects; The processing unit is communicatively connected with the image acquisition device, and performs 3D reconstruction on the target object in the image through a 3D reconstruction network to obtain an initial value of a parameter of the target object - an initial value of the parameter. is used to build a 3D model of the target object; optimize an initial value of the parameter according to supervision information used for representing a target object feature to obtain an optimized value of the parameter; A skeletal skinning process is performed based on the optimized values of the parameters to build a 3D model of the target object.

본 발명의 실시예의 제4 양태에 따르면, 컴퓨터 프로그램이 저장되어 있는 컴퓨터 판독 가능 저장 매체를 제공하는 바, 당해 프로그램이 프로세서에 의해 실행될 경우 임의의 실시예에 따른 방법이 구현된다.According to a fourth aspect of the embodiments of the present invention, a computer readable storage medium in which a computer program is stored is provided, and the method according to any embodiment is implemented when the program is executed by a processor.

본 발명의 실시예의 제5 양태에 따르면, 컴퓨터 기기를 제공하는 바, 상기 컴퓨터 기기는 메모리, 프로세서 및 메모리에 저장되고 프로세서에 의해 실행 가능한 컴퓨터 프로그램을 포함한다. 상기 프로그램이 프로세서에 의해 실행될 경우 상술한 임의의 실시예에 따른 방법이 구현된다.According to a fifth aspect of the embodiments of the present invention, there is provided a computer device, wherein the computer device includes a memory, a processor, and a computer program stored in the memory and executable by the processor. When the program is executed by a processor, the method according to any of the embodiments described above is implemented.

본 발명의 실시예의 제6 양태에 따르면, 컴퓨터 프로그램 제품을 제공하는 바, 상기 컴퓨터 프로그램 제품은 저장 매체에 저장되고, 프로세서에서 실행 가능한 컴퓨터 프로그램을 포함하며, 상기 프로세서에 의해 상기 컴퓨터 프로그램이 수행될 경우 임의의 실시예에 따른 방법이 구현된다.According to a sixth aspect of the embodiments of the present invention, there is provided a computer program product, wherein the computer program product is stored in a storage medium and includes a computer program executable by a processor, and the computer program is executed by the processor. In case a method according to any embodiment is implemented.

본 발명의 일 실시예에서, 3차원 재구성 네트워크를 통해 목표 오브젝트의 이미지에 대해 3차원 재구성을 수행하여 파라미터의 초기값을 획득한 후, 감독 정보를 기반으로 파라미터의 초기값을 최적화하고, 파라미터 최적화를 통해 획득된 파라미터의 최적화된 값에 기반하여 목표 오브젝트의 3차원 모델을 구축한다. 파라미터 최적화의 방법의 장점은 보다 정확하고, 이미지의 2차원 관찰 특성에 부합되는 3차원 재구성 결과를 제공할 수 있는 것이지만, 일반적으로 부자연스럽고 불합리한 동작 결과를 제공하여 신뢰도가 낮다. 3차원 재구성 네트워크를 통한 네트워크 회귀는 보다 자연스럽고 합리적인 동작 결과를 제공할 수 있으므로, 3차원 재구성 네트워크의 출력을 파라미터의 초기값으로 이용하여 최적화 함으로써 3차원 재구성 결과의 신뢰성을 확보하는 동시에 3차원 재구성의 정확성도 고려할 수 있다.In an embodiment of the present invention, after acquiring initial values of parameters by performing 3D reconstruction on an image of a target object through a 3D reconstruction network, initial values of parameters are optimized based on supervision information, and parameter optimization is performed. A 3D model of the target object is built based on the optimized values of the parameters obtained through The advantage of the parameter optimization method is that it is more accurate and can provide a 3D reconstruction result that matches the 2D observation characteristics of the image, but generally provides unnatural and unreasonable operation results, resulting in low reliability. Since network regression through a 3D reconstruction network can provide more natural and reasonable operating results, the output of the 3D reconstruction network is used as the initial value of the parameter to be optimized to secure the reliability of the 3D reconstruction result and at the same time 3D reconstruction accuracy can also be taken into account.

전술한 일반적인 설명 및 다음의 상세한 설명은 단지 예시적이고 설명적일 뿐 본 개시내용을 제한하지 않음을 이해해야 한다.It is to be understood that the foregoing general description and the following detailed description are illustrative and explanatory only and do not limit the present disclosure.

여기에서의 첨부 도면은 명세서의 일부로서 명세서에 포함되며, 이러한 첨부 도면은 본 발명의 실시예에 부합되는 실시예를 도시하며, 명세서와 함께 본 발명의 기술적 해결 방안을 설명한다.
도 1a 및 도 1b는 일부 실시예의 3차원 모델의 개략도이다.
도 2는 본 발명의 실시예의 3차원 재구성 방법의 흐름도이다.
도 3은 본 발명의 실시예의 전체 흐름도이다.
도 4a 및 도 4b는 각각 본 발명의 실시예의 적용 시나리오의 개략도이다.
도 5는 본 발명의 실시예의 3차원 재구성 장치의 블럭도이다.
도 6은 본 발명의 실시예의 3차원 재구성 시스템의 개략도이다.
도 7은 본 발명의 실시예의 컴퓨터 기기의 구조 개략도이다.The accompanying drawings herein are included in the specification as part of the specification, and these accompanying drawings illustrate embodiments corresponding to the embodiments of the present invention, and describe technical solutions of the present invention together with the specification.
1A and 1B are schematic diagrams of three-dimensional models of some embodiments.
2 is a flowchart of a 3D reconstruction method according to an embodiment of the present invention.
3 is an overall flowchart of an embodiment of the present invention.
4A and 4B are respectively schematic diagrams of application scenarios of an embodiment of the present invention.
5 is a block diagram of a 3D reconstruction device according to an embodiment of the present invention.
6 is a schematic diagram of a three-dimensional reconstruction system of an embodiment of the present invention.
7 is a structural schematic diagram of a computer device in an embodiment of the present invention.

예들이 본 명세서에서 상세히 설명될 것이며, 그 예시들은 도면들에 나타나 있다. 이하의 설명들이 도면들을 포함할 때, 상이한 도면들에서의 동일한 번호들은 달리 지시되지 않는 한 동일하거나 유사한 요소들을 지칭한다. 하기 예들에 설명된 실시예들은 본 개시내용과 부합하는 모든 실시예를 나타내지 않는다. 오히려 이들은 본 개시내용의 일부 양태들과 부합하며 첨부된 청구항들에 상술된 바와 같은 장치들 및 방법들의 예들에 불과하다.Examples will be described in detail herein, examples of which are shown in the drawings. When the following description includes drawings, like numbers in different drawings refer to the same or similar elements unless otherwise indicated. The embodiments described in the examples below do not represent all embodiments consistent with the present disclosure. Rather, they are merely examples of devices and methods as recited in the appended claims and consistent with some aspects of the present disclosure.

본 개시내용에서 사용되는 용어들은 단지 특정 예들을 설명하기 위한 것이며, 본 개시내용을 제한하려는 의도는 아니다. 본 개시내용 및 첨부된 청구항들에서 단수 형태("a", "the" 및 "said")의 용어들은, 문맥상 명확히 달리 지시되지 않는 한, 복수 형태를 포함하도록 또한 의도된다. 또한, 본 명세서에서 사용되는 "및/또는"이라는 용어는 하나 이상의 상관된 열거된 항목들 중 임의의 또는 모든 가능한 조합을 포함한다는 것을 이해해야 한다. 또한, 본 명세서에서 사용되는 "적어도 하나” 용어는 여러 가지 중의 임의의 한 가지 또는 여러 가지 중의 적어도 두 가지의 임의의 조합을 나타낸다.The terms used in this disclosure are only for describing specific examples and are not intended to limit the disclosure. Terms in the singular form ("a", "the" and "said") in this disclosure and the appended claims are also intended to include the plural form unless the context clearly dictates otherwise. It should also be understood that the term "and/or" as used herein includes any or all possible combinations of one or more of the associated listed items. Also, as used herein, the term "at least one" refers to any one of several or any combination of at least two of several.

비록 본 개시에서 "제1", "제2", "제3" 등 용어를 사용하여 여러 가지 정보를 설명할 수 있으나 이러한 정보는 이러한 용어에 한정되지 말아야 한다는 것을 이해해야 한다. 이러한 용어는 단지 동일한 유형의 정보를 서로 구별하기 위한 것이다. 예를 들어, 본 개시의 범위를 벗어나지 않을 경우, 제1 정보가 제2 정보로 불릴 수도 있고, 유사하게, 제2 정보는 제1 정보로 불릴 수도 있다. 또한, 문맥에 따라, 본 명세서에서 사용되는 바와 같은 단어 "~한 경우(if)"는 "~할 때(when)" 또는 "~할 시(upon)" 또는 "결정에 응답하여"로서 해석될 수 있다.Although various information may be described using terms such as "first", "second", and "third" in the present disclosure, it should be understood that such information should not be limited to these terms. These terms are only intended to distinguish information of the same type from each other. For example, without departing from the scope of the present disclosure, first information may be referred to as second information, and similarly, second information may be referred to as first information. Also, depending on the context, the word "if" as used herein may be interpreted as "when" or "upon" or "in response to a decision". can

본 기술분야의 통상의 기술자들이 본 발명의 실시예에 따른 기술적 해결수단을 더 잘 이해하도록 하고, 본 개시의 실시예의 상기 목적, 특징 및 장점이 보다 뚜렷하고 이해하기 용이하게 하기 위하여 아래에는 도면에 결부하여 본 개시의 실시예에 따른 기술적 해결수단을 더 상세하게 설명한다.In order to help those skilled in the art to better understand the technical solutions according to the embodiments of the present invention, and to make the objects, features, and advantages of the embodiments of the present disclosure clearer and easier to understand, the drawings are attached below. Thus, the technical solution according to the embodiment of the present disclosure will be described in more detail.

목표 오브젝트의 3차원 재구성은 목표 오브젝트의 자태 및 사지 회전을 재구성해야 하며, 일반적으로 목표 오브젝트의 자태 및 사지 회전을 표현하기 위해 3차원 키 포인트뿐만 아니라 파라메트릭 모델을 사용한다. 예를 들어, 상이한 사람에 대한 3차원 재구성은 자태가 마른 사람의 3차원 모델(도 1a에 도시됨) 및 자태가 뚱뚱한 사람의 3차원 모델(도 1b에 도시됨)을 각각 재구성하지만, 도 1a에 도시된 사람 및 도 1b에 도시된 사람이 같은 자세에서의 키 포인트 정보가 같으므로 키 포인트 정보만으로 둘의 자태의 차이를 나타낼 수 없다.3D reconstruction of the target object requires reconstruction of the posture and rotation of the limbs of the target object, and generally uses parametric models as well as 3D key points to represent the posture and rotation of the limbs of the target object. For example, 3D reconstructions for different people reconstruct a 3D model of a lean person (shown in FIG. 1A ) and a 3D model of a fat person (shown in FIG. 1B ) respectively, but FIG. 1A Since the key point information in the same posture is the same for the person shown in and the person shown in FIG.

관련 기술은, 일반적으로 파라미터 최적화와 네트워크 회귀의 두 가지 방법을 통해 3차원 재구성을 수행한다. 파라미터 최적화 방법은 일반적으로 일련의 표준 파라미터를 선택하여 목표 오브젝트의 이미지의 2차원 시각적 특징에 따라 경사 하강법을 사용하여 3차원 모델의 파라미터 초기 값을 반복적으로 최적화하는 바, 여기서 이미지의 2차원 시각적 특징은 2차원 키 포인트 등을 선택할 수 있다. 파라미터 최적화 방법의 장점은 보다 정확하고, 이미지 2차원 시각적 특징에 부합되는 파라미터 추정 결과를 제공할 수 있는 것이지만, 부자연스럽고 불합리한 동작 결과를 제공하는 경우가 많고, 파라미터 최적화의 최종 성능이 파라미터의 초기값에 많이 의존하므로 파라미터 최적화에 기반한 3차원 재구성 방법의 신뢰도가 낮다.Related technologies generally perform 3D reconstruction through two methods: parameter optimization and network regression. The parameter optimization method generally selects a series of standard parameters and iteratively optimizes the initial values of the parameters of the 3D model according to the two-dimensional visual characteristics of the image of the target object using gradient descent, wherein the two-dimensional visual characteristics of the image As for the feature, a two-dimensional key point or the like can be selected. The advantage of the parameter optimization method is that it is more accurate and can provide parameter estimation results that conform to the two-dimensional visual characteristics of the image, but it often provides unnatural and irrational operating results, and the final performance of parameter optimization is the initial value of the parameter. , so the reliability of the 3D reconstruction method based on parameter optimization is low.

네트워크 회귀 방법은 일반적으로 종단간 신경망을 훈련하여 이미지에서 3차원 모델 파라미터로의 매핑을 학습한다. 네트워크 회귀 방법의 장점은 보다 자연스럽고 합리적인 동작 결과를 얻을 수 있는 것이지만, 훈련 데이터의 양이 많지 않아 3차원 재구성 결과가 이미지의 2차원 시각적 특징과 일치하지 않을 수 있으므로, 네트워크 회귀를 기반으로 하는 3차원 재구성 방법의 정확성이 낮다. 따라서 종래의 3차원 재구성 방법은 3차원 재구성 결과의 정확성과 신뢰성을 동시에 고려할 수 없다.Network regression methods typically train an end-to-end neural network to learn the mapping from images to 3D model parameters. The advantage of the network regression method is that more natural and reasonable motion results can be obtained, but the amount of training data is not large, so the 3D reconstruction result may not match the 2D visual features of the image. The accuracy of the dimensional reconstruction method is low. Therefore, the conventional 3D reconstruction method cannot simultaneously consider the accuracy and reliability of the 3D reconstruction result.

상기 내용을 감안하여 본 발명의 실시예는 3차원 재구성 방법을 제공하며, 도 2에 도시된 바와 같이, 상기 방법은 다음 단계를 포함한다:In view of the above, an embodiment of the present invention provides a 3D reconstruction method, as shown in FIG. 2 , the method includes the following steps:

단계 201: 3차원 재구성 네트워크를 통해 이미지 내의 목표 오브젝트에 대해 3차원 재구성을 수행하여 상기 목표 오브젝트의 파라미터의 초기값을 얻으며, 여기서 상기 파라미터의 초기값은 상기 목표 오브젝트의 3차원 모델을 구축하는 데에 사용된다;Step 201: Perform 3D reconstruction on a target object in an image through a 3D reconstruction network to obtain initial values of parameters of the target object, where the initial values of the parameters are used to build a 3D model of the target object. used for;

단계 202: 미리 얻은, 목표 오브젝트의 특징을 나타내는 감독 정보에 기반하여 상기 파라미터의 초기값을 최적화하여 파라미터의 최적화된 값을 얻는다;Step 202: Optimizing initial values of the parameters according to previously obtained supervision information representing characteristics of the target object to obtain optimized values of the parameters;

단계 203: 상기 파라미터의 최적화된 값에 기반하여 골격 스키닝 처리를 수행하여 상기 목표 오브젝트의 3차원 모델을 구축한다.Step 203: A skeletal skinning process is performed according to the optimized values of the parameters to build a three-dimensional model of the target object.

단계 201에 있어서, 목표 오브젝트는 예를 들어 물리적 공간의 사람, 동물, 로봇 등 3차원 오브젝트이거나, 또는 예를 들어, 사람의 얼굴 또는 사지 등 상기 3차원 오브젝트 중 하나 이상의 영역일 수 있다. 설명의 편의를 위해 이하 사람을 목표 오브젝트로 이용하고, 목표 오브젝트에 대한 3차원 재구성이 인체 재구성인 것을 예로 들어 설명한다. 상기 목표 오브젝트의 이미지는 하나의 이미지일 수도 있고, 복수 개의 다른 시야각에서 목표 오브젝트를 촬영하여 얻은 복수의 이미지를 포함할 수도 있다. 하나의 이미지에 기반한 3차원 인체 재구성을 단안 3차원 인체 재구성이라고 지칭하고, 다른 시야각의 복수의 이미지의 3차원 인체 재구성을 다안 3차원 인체 재구성이라고 지칭한다. 각 이미지는 그레이스케일 맵, RGB 이미지 또는 RGBD 이미지 일 수 있다. 상기 이미지는 목표 오브젝트 주변의 영상 촬영 장치(예: 사진기, 카메라)에 의해 실시간으로 촬영된 이미지일 수도 있고, 미리 촬영되어 저장된 이미지일 수도 있다.In step 201, the target object may be a 3D object such as a person, animal, or robot in a physical space, or may be a region of one or more of the 3D objects, such as a face or limb of a person. For convenience of description, a person is used as a target object, and 3D reconstruction of the target object is human body reconstruction as an example. The image of the target object may be one image or may include a plurality of images obtained by photographing the target object from a plurality of different viewing angles. 3D human body reconstruction based on one image is referred to as monocular 3D human body reconstruction, and 3D human body reconstruction of a plurality of images of different viewing angles is referred to as multiocular 3D human body reconstruction. Each image can be a grayscale map, RGB image or RGBD image. The image may be an image captured in real time by an image capture device (eg, a camera) around the target object, or may be a pre-photographed and stored image.

3차원 재구성 네트워크를 통해 오브젝트의 이미지에 대해 3차원 재구성할 수 있으며, 여기서 3차원 재구성 네트워크는 사전 훈련된 신경망일 수 있다. 3차원 재구성 네트워크는 이미지에 기반하여 3차원 재구성을 수행할 수 있고, 또한 자연스럽고 합리적인 파라미터의 초기값을 추정할 수 있는 바, 여기서 파라미터의 초기값은 하나의 벡터로 나타낼 수 있고, 상기 벡터의 차원은 예를 들어 85차원이고, 상기 벡터에 인체의 움직임 사지 회전 정보(즉 자세 파라미터의 초기값, 인체의 전역 회전 파라미터의 초기값과 23개의 키 포인트를 포함하는 키 포인트 회전 파라미터의 초기값을 포함함), 자태 파라미터의 초기값, 카메라의 파라미터의 초기값 이 3가지 정보가 포함된다. 인체는 키 포인트와 이러한 키 포인트를 연결하는 사지 뼈로 나타낼 수 있으며, 인체의 키 포인트는 정수리, 코, 목, 왼쪽 눈과 오른쪽 눈, 왼쪽 귀와 오른쪽 귀, 가슴, 왼쪽 어깨와 오른쪽 어깨, 왼쪽 팔꿈치와 오른쪽 팔꿈치, 왼쪽 및 오른쪽 손목, 왼쪽 및 오른쪽 고관절, 왼쪽 및 오른쪽 엉덩이, 왼쪽 및 오른쪽 무릎, 왼쪽 및 오른쪽 발목 등 키 포인트 중 하나 이상을 포함할 수 있으며, 자세 파라미터의 초기 값은 인체의 키 포인트의 3차원 공간에서의 위치를 결정하는 데에 사용된다. 자태 파라미터의 초기값은 인체의 키, 뚱뚱함, 날씬함과 같은 신체 정보를 결정하는 데에 사용된다. 상기 카메라 파라미터의 초기값은 인체의 카메라 좌표계의 3차원 공간에서의 절대 위치를 결정하는 데에 사용되며, 카메라의 파라미터에는 카메라와 인체 사이의 변위 파라미터 및 카메라의 자세 파라미터가 포함되며, 여기서 카메라의 자세 파라미터의 초기값은 인체의 전역 회전 파라미터의 초기값으로 대체할 수 있다. 인체 파라미터는 SMPL(Skinned Multi-Person Linear) 모델의 파라미터 형식(SMPL 파라미터라고 지칭함)을 사용하여 나타낼 수 있다. SMPL 파라미터의 값을 획득한 후 SMPL 파라미터의 값에 기반하여 골격 스키닝 처리를 수행, 즉 하나의 매핑함수

를 사용하여 자태 파라미터의 초기값 및 자세 파라미터의 초기값을 인체 표면의 3차원 모델에 매핑할 수 있는 바, 당해 3차원 모델은 6890개의 정점을 포함하고, 정점 사이는 고정된 연결 관계를 통해 삼각 패치(Triangular patch)를 형성한다. 미리 훈련된 회귀자(Regressor) W를 사용하여 인체 표면 모델의 정점으로부터 인체의 3차원 키 포인트

를 추가로 회귀할 수 있다.A 3D reconstruction may be performed on an image of an object through a 3D reconstruction network, where the 3D reconstruction network may be a pretrained neural network. The 3D reconstruction network can perform 3D reconstruction based on the image, and can also estimate natural and reasonable initial values of parameters, where the initial values of the parameters can be represented by a vector, and the vector The dimension is, for example, 85 dimensions, and the motion limb rotation information of the human body (that is, the initial value of the posture parameter, the initial value of the global rotation parameter of the human body and the initial value of the key point rotation parameter including 23 key points) is added to the vector. ), the initial value of the self-portrait parameter, and the initial value of the camera parameter. The human body can be represented by key points and the limb bones connecting these key points. The key points of the human body are the crown, nose, neck, left eye and right eye, left ear and right ear, chest, left shoulder and right shoulder, left elbow It can contain one or more of the following key points: right elbow, left and right wrist, left and right hip, left and right hip, left and right knee, left and right ankle, and the initial value of the posture parameter is the key point of the human body. It is used to determine position in 3D space. The initial values of the figure parameters are used to determine body information such as height, fatness, and thinness of the human body. The initial values of the camera parameters are used to determine the absolute position of the camera coordinate system of the human body in a 3D space, and the parameters of the camera include a displacement parameter between the camera and the human body and a posture parameter of the camera. The initial values of the posture parameters may be replaced with the initial values of global rotation parameters of the human body. Human body parameters can be expressed using a parameter format (referred to as SMPL parameters) of a Skinned Multi-Person Linear (SMPL) model. After obtaining the SMPL parameter value, based on the SMPL parameter value, skeleton skinning process is performed, that is, one mapping function

The initial values of the posture parameters and the initial values of the posture parameters can be mapped to a 3-dimensional model of the human body surface using . It forms a triangular patch. 3D key points of the human body from the vertices of the human body surface model using the pretrained regressor W

can be further regressed.

단계 202에 있어서, 감독 정보는 2차원 시각적 특징(2차원 관찰 특징이라고도 지칭함), 예를 들어, 이미지 내의 목표 오브젝트의 2차원 키 포인트와 상기 목표 오브젝트의 복수 개의 픽셀 포인트의 시멘틱 정보 중 적어도 하나일 수 있다. 하나의 픽셀 포인트의 시멘틱 정보는 상기 픽셀 포인트가 상기 목표 오브젝트 중 어느 영역에 위치하는지를 나타내는 데에 사용되는 바, 상기 영역은 예를 들어 머리, 팔, 몸통, 다리 등이 소재하는 영역일 수 있다. 2차원 키 포인트 정보를 감독 정보로 사용하는 경우, 2차원 키 포인트 추출 네트워크를 사용하여 이미지 내의 인체 키 포인트 위치를 추정할 수 있으며, 여기서 임의의 2차원 자태 추정 방법, 예를 들어 OpenPose 등을 선택하여 사용할 수 있다. 2차원 시각적 특징을 감독 정보로 사용하는 외에, 2차원 시각적 특징 및 목표 오브젝트 표면의 초기 3차원 포인트 클라우드를 함께 감독 정보로 이용하여 3차원 재구성의 정확성을 더욱 향상시킬 수 있다.In step 202, the supervision information is at least one of two-dimensional visual characteristics (also referred to as two-dimensional observation characteristics), for example, two-dimensional key points of a target object in an image and semantic information of a plurality of pixel points of the target object. can Semantic information of one pixel point is used to indicate which region of the target object the pixel point is located in, and the region may be, for example, a region where a head, arm, torso, or leg is located. In the case of using 2D key point information as supervision information, a 2D key point extraction network can be used to estimate the location of human body key points in the image, where an arbitrary 2D self-image estimation method such as OpenPose is selected. and can be used. In addition to using the 2D visual features as supervision information, the accuracy of 3D reconstruction can be further improved by using the 2D visual features and the initial 3D point cloud of the target object surface together as supervision information.

상기 이미지가 깊이 맵(예를 들어, 상기 이미지는 RGBD이미지)를 포함하는 경우, 상기 깊이 맵으로부터 상기 목표 오브젝트의 복수 개의 픽셀 포인트의 깊이 정보를 추출하고, 상기 깊이 정보에 기반하여 상기 깊이 맵 내의 상기 목표 오브젝트의 복수 개의 픽셀 포인트를 3차원 공간에 투영하여 상기 목표 오브젝트 표면의 초기 3차원 포인트 클라우드를 얻을 수 있다.When the image includes a depth map (for example, the image is an RGBD image), depth information of a plurality of pixel points of the target object is extracted from the depth map, and based on the depth information, the depth information in the depth map is extracted. An initial 3D point cloud of a surface of the target object may be obtained by projecting a plurality of pixel points of the target object onto a 3D space.

상기 복수 개의 픽셀 포인트는 이미지 내의 목표 오브젝트의 일부 또는 전부 픽셀 포인트일 수 있다. 예를 들어, 목표 오브젝트의 3차원 재구성이 필요한 각 영역의 픽셀 포인트를 포함할 수 있고, 또한 각 영역에서 픽셀 포인트의 수량은 3차원 재구성이 필요한 수량 이상일 수 있다.The plurality of pixel points may be some or all pixel points of the target object in the image. For example, the target object may include pixel points in each area requiring 3D reconstruction, and the number of pixel points in each area may be greater than or equal to the number requiring 3D reconstruction.

일반적으로 이미지는 목표 오브젝트도 포함하고 배경 영역도 포함하기에, 상기 이미지에 포함된 RGB 이미지에 대해 이미지 분할을 수행하여 상기 RGB 이미지 내의 목표 오브젝트가 소재하는 이미지 영역을 획득하고, 상기 RGB 이미지 내의 목표 오브젝트가 소재하는 이미지 영역에 기반하여 상기 깊이 맵 내의 목표 오브젝트가 소재하는 이미지 영역을 결정하며; 상기 깊이 맵 내의 상기 목표 오브젝트가 소재하는 이미지 영역의 복수 개의 픽셀 포인트의 깊이 정보를 획득할 수 있다. 이미지 분할을 통해 이미지로부터 3차원 재구성이 필요한 목표 오브젝트가 소재하는 이미지 영역을 추출하여 이미지 내의 배경 영역의 3차원 재구성에 대한 영향을 피할 수 있다. 일부 실시예에서, 상기 깊이 맵 내의 픽셀 포인트와 상기 RGB 이미지 내의 픽셀 포인트는 일대일로 대응된다. 예를 들어, 상기 이미지는 RGBD이미지일 수도 있다.In general, since an image includes a target object and a background area, image segmentation is performed on an RGB image included in the image to obtain an image area where the target object in the RGB image is located, and the target object in the RGB image is located. determining an image area where a target object is located in the depth map based on an image area where the object is located; Depth information of a plurality of pixel points of an image area in which the target object is located in the depth map may be obtained. Through image segmentation, an image region in which a target object requiring 3D reconstruction is located is extracted from an image, thereby avoiding the influence of the 3D reconstruction of the background region in the image. In some embodiments, there is a one-to-one correspondence between pixel points in the depth map and pixel points in the RGB image. For example, the image may be an RGBD image.

나아가, 3차원 포인트 클라우드(즉, 초기 3차원 포인트 클라우드)에서 이상치를 필터링 할 수 있으며, 감독 정보는 필터링 후의 3차원 포인트 클라우드를 포함할 수 있다. 상기 필터링은 포인트 클라우드 필터로 구현할 수 있다. 이상치를 필터링함으로써, 목표 오브젝트 표면의 더욱 정밀한 3차원 포인트 클라우드를 얻을 수 있으므로 3차원 재구성의 정확성을 더욱 향상시킨다. 3차원 포인트 클라우드 중 각 목표 3차원 포인트에 대해 당해 목표 3차원 포인트에 가장 가까운 n개의 3차원 포인트로부터 당해 목표 3차원 포인트까지의 평균 거리를 획득하며, 각 목표 3차원 포인트에 대응하는 평균 거리가 하나의 통계 분포(예를 들어, 가우스 분포)를 따른다고 가정할 경우, 당해 통계 분포의 평균 값 및 분산을 계산할 수 있으며, 상기 평균값 및 분산에 기반하여 하나의 임계값 s를 설정할 경우, 평균 거리가 임계값 s 범위를 벗어나는 3차원 포인트를 이상치로 간주할 수 있으며, 3차원 포인트 클라우드에서 필터링한다.Furthermore, outliers may be filtered from the 3D point cloud (ie, the initial 3D point cloud), and the supervision information may include the 3D point cloud after filtering. The filtering may be implemented as a point cloud filter. By filtering outliers, a more precise 3D point cloud of the target object surface can be obtained, further improving the accuracy of 3D reconstruction. For each target 3D point in the 3D point cloud, an average distance from n 3D points closest to the target 3D point to the target 3D point is obtained, and the average distance corresponding to each target 3D point is Assuming that one statistical distribution (eg, Gaussian distribution) is followed, the average value and variance of the statistical distribution can be calculated, and when one threshold value s is set based on the average value and variance, the average distance 3D points outside the range of the threshold value s can be regarded as outliers, and filtered in the 3D point cloud.

실제 응용에서, 상기 이미지가 RGB 이미지인 경우, 2차원 관찰 특징을 감독 정보로 사용하여 상기 파라미터의 초기값을 반복적으로 최적화할 수 있다. 상기 이미지가 RGBD이미지인 경우, 2차원 관찰 특징 및 목표 오브젝트 표면의 3차원 포인트 클라우드를 함께 감독 정보로 사용하여 상기 파라미터의 초기값을 최적화할 수 있다. 최적화 방법은 예를 들어 경사 하강법을 사용할 수 있으며, 본 발명은 이에 대해 한정하지 않는다.In practical applications, when the image is an RGB image, the initial values of the parameters may be repeatedly optimized using the two-dimensional observation characteristics as supervision information. If the image is an RGBD image, the initial values of the parameters may be optimized by using both the 2D observation features and the 3D point cloud of the surface of the target object as supervision information. The optimization method may use, for example, a gradient descent method, but the present invention is not limited thereto.

단계 203에 있어서, 상기 파라미터의 최적화된 값에 기반하여 골격 스키닝 처리를 수행하여 상기 목표 오브젝트의 3차원 모델을 얻을 수 있다.In step 203, a skeletal skinning process is performed according to the optimized values of the parameters to obtain a 3D model of the target object.

도 3에 도시된 바와 같이, 이는 본 발명의 실시예의 전체 흐름도이다. 입력이 RGB 이미지일 경우, 3차원 재구성 네트워크를 통해 RGB 이미지에 대해 3차원 재구성을 수행하여 이미지 내의 사람의 인체 파라미터 값을 얻을 수 있으며, 키 포인트 추출 네트워크를 사용하여 이미지 내의 사람에 대해 키 포인트 추출을 수행하여 인체의 2차원 키 포인트를 얻는다. 다음 인체 파라미터 값을 파라미터의 초기값으로 사용하고, 인체의 2차원 키 포인트를 감독 정보로 사용하며, 파라미터 최적화 모듈을 통해 인체 파라미터 초기값을 최적화하여 인체 파라미터의 최적화된 값을 얻으며, 인체 파라미터의 최적화된 값에 기반하여 골격 스키닝 처리를 수행하여 인체 재구성 모델을 얻는다.As shown in Fig. 3, this is the overall flow chart of an embodiment of the present invention. If the input is an RGB image, 3D reconstruction is performed on the RGB image through a 3D reconstruction network to obtain human body parameter values of the person in the image, and key points are extracted for the person in the image using the keypoint extraction network. to obtain the 2D key points of the human body. Use the following human body parameter values as the initial values of the parameters, use the two-dimensional key points of the human body as supervision information, optimize the initial values of the human body parameters through the parameter optimization module to obtain optimized values of the human body parameters, Based on the optimized values, skeletal skinning is performed to obtain a human body reconstruction model.

입력이 RGBD이미지일 경우, 이미지를 RGB 이미지 및 TOF(Time of Flight, 비행시간) 깊이 맵으로 분해할 수 있으며, TOF 깊이 맵에는 RGB 이미지 내의 각 픽셀 포인트의 깊이 정보가 포함된다. 3차원 재구성 네트워크를 통해 RGB 이미지에 대해 3차원 재구성을 수행하여 이미지 내의 사람의 인체 파라미터 값을 얻을 수 있고, 키 포인트 추출 네트워크를 사용하여 이미지 내의 사람에 대해 키 포인트 추출을 수행하여 인체 2차원 키 포인트를 얻을 수 있다. 포인트 클라우드 재구성 모듈을 사용하여 TOF 깊이 맵 내의 깊이 정보에 기반하여 인체 표면 포인트 클라우드를 재구성할 수도 있다. 다음 인체 파라미터 값을 파라미터의 초기값으로 이용하고, 인체 2차원 키 포인트 및 인체 표면 포인트 클라우드를 함께 감독 정보로 사용하며, 파라미터 최적화 모듈을 통해 인체 파라미터 초기값을 최적화하여 인체 파라미터의 최적화된 값을 얻으며, 인체 파라미터의 최적화된 값에 기반하여 골격 스키닝 처리를 수행하여 인체 재구성 모델을 얻는다.If the input is an RGBD image, the image can be decomposed into an RGB image and a Time of Flight (TOF) depth map, and the TOF depth map includes depth information of each pixel point in the RGB image. By performing 3D reconstruction on the RGB image through a 3D reconstruction network, the human body parameter values of the person in the image can be obtained, and by performing key point extraction on the person in the image using the key point extraction network, the 2D key of the human body can be obtained. points can be earned. A point cloud reconstruction module may be used to reconstruct a human body surface point cloud based on depth information in a TOF depth map. The following human body parameter values are used as the initial values of the parameters, the human body 2-dimensional key points and the human body surface point cloud are used as supervision information, and the initial values of the human body parameters are optimized through the parameter optimization module to obtain the optimized values of the human body parameters. skeletal skinning process is performed based on the optimized values of human body parameters to obtain a human body reconstruction model.

나아가, 인체 재구성 모델을 얻은 후 RGB 이미지 또는 RGBD이미지 내의 색상 정보를 기반으로 인체 재구성 모델에 대해 색상 처리를 수행하여 인체 재구성 모델과 이미지 내의 사람의 색상 정보가 일치하도록 할 수 있다.Furthermore, after obtaining the human body reconstruction model, color processing may be performed on the human body reconstruction model based on color information in the RGB image or RGBD image so that the human body reconstruction model and the color information of the person in the image match.

본 발명의 실시예에 있어서, 3차원 재구성 네트워크를 통해 이미지 내의 목표 오브젝트에 대해 3차원 재구성을 수행하여 파라미터의 초기값을 얻으며, 다시 감독 정보에 기반하여 상기 파라미터의 초기값을 최적화하고, 파라미터의 최적화된 값에 기반하여 목표 오브젝트의 3차원 모델을 구축한다. 파라미터 최적화 방법의 장점은 정확하고, 이미지 2차원 관찰 특징에 부합되는 3차원 재구성 결과를 제공할 수 있는 것이지만, 부자연스럽고 불합리한 동작 결과를 제공하는 경우가 많으며, 신뢰도가 낮다. 3차원 재구성 네트워크를 통한 네트워크 회귀는 보다 자연스럽고 합리적인 동작 결과를 제공할 수 있으므로, 3차원 재구성 네트워크의 출력 결과를 파라미터의 초기값으로 이용하여 파라미터 최적화를 수행하면 3차원 재구성 결과의 신뢰성을 확보할 수 있으며, 3차원 재구성의 정확성을 동시에 고려할 수 있다.In an embodiment of the present invention, 3D reconstruction is performed on a target object in an image through a 3D reconstruction network to obtain initial values of parameters, and the initial values of the parameters are optimized again based on supervision information, and the parameters A 3D model of the target object is built based on the optimized values. The advantage of the parameter optimization method is that it can provide a 3D reconstruction result that is accurate and conforms to the 2D image observation characteristics, but often provides unnatural and unreasonable operation results, and reliability is low. Since network regression through a 3D reconstruction network can provide more natural and reasonable operating results, if parameter optimization is performed using the output result of the 3D reconstruction network as the initial value of a parameter, the reliability of the 3D reconstruction result can be secured. and the accuracy of 3D reconstruction can be considered at the same time.

일부 실시예에 있어서, 파라미터 최적화 단계에서 다단계 최적화 방법을 사용할 수 있다. 상기 다단계 최적화 방법은 카메라 최적화 단계와 자태 최적화 단계를 포함할 수 있다. 카메라 최적화 단계에서 최적화 목표는 전역 회전 파라미터의 값 R과 상기 이미지 수집 장치와 상기 목표 오브젝트 사이의 변위 파라미터의 현재 값 t이다. 여기서 t와 R은 모두 3차원 벡터이고, R은 축각의 형태로 표시된다. 자태 최적화 단계에서 최적화 목표는 키 포인트 회전 파라미터의 값과 자태 파라미터의 값이다.In some embodiments, a multi-step optimization method may be used in the parameter optimization step. The multi-step optimization method may include a camera optimization step and a shape optimization step. In the camera optimization step, an optimization target is a value R of a global rotation parameter and a current value t of a displacement parameter between the image acquisition device and the target object. Here, t and R are both 3-dimensional vectors, and R is expressed in the form of an axis angle. In the shape optimization step, the optimization target is the value of the key point rotation parameter and the value of the shape parameter.

최적화 과정에서 카메라 위치를 변경하거나 인체 3차원 키 포인트 위치를 변경하면 모두 3차원 키 포인트의 2차원 투영의 변화를 초래하며, 이는 최적화 과정의 불안정을 초래한다. 따라서 카메라 최적화 단계에서, 인체의 자태를 고정하고, 자태 최적화 단계에서는 카메라 위치를 고정함으로써 최적화 과정의 안정성을 향상시킨다. 즉, 상기 자태 파라미터의 초기값과 키 포인트 회전 파라미터의 초기값이 변하지 않은 상황에서, 상기 감독 정보와 상기 변위 파라미터의 초기값에 기반하여 상기 이미지 수집 장치의 변위 파라미터의 현재 값 및 상기 전역 회전 파라미터의 초기값을 최적화하여 변위 파라미터의 최적화된 값과 전역 회전 파라미터의 최적화된 값을 얻고; 다음 변위 파라미터의 최적화된 값과 전역 회전 파라미터의 최적화된 값이 변하지 않도록 유지하고, 상기 변위 파라미터의 최적화된 값과 전역 회전 파라미터의 최적화된 값에 기반하여 상기 키 포인트 회전 파라미터의 초기값과 상기 자태 파라미터의 초기값을 최적화하여 키 포인트 회전 파라미터의 최적화된 값과 자태 파라미터의 최적화된 값을 얻는다.In the optimization process, changing the position of the camera or changing the position of the 3D key point on the human body both causes a change in the 2D projection of the 3D key point, which causes instability in the optimization process. Therefore, in the camera optimization step, the body shape is fixed, and in the shape optimization step, the camera position is fixed to improve the stability of the optimization process. That is, in a situation where the initial values of the figure parameters and the initial values of the key point rotation parameters do not change, the current values of the displacement parameters and the global rotation parameters of the image acquisition device are based on the supervision information and the initial values of the displacement parameters. optimize the initial value of to obtain an optimized value of the displacement parameter and an optimized value of the global rotation parameter; The optimized value of the next displacement parameter and the optimized value of the global rotation parameter are kept unchanged, and the initial value of the key point rotation parameter and the configuration are based on the optimized value of the displacement parameter and the optimized value of the global rotation parameter. The initial values of the parameters are optimized to obtain the optimized values of the key point rotation parameters and the optimized values of the shape parameters.

나아가, 상기 목표 오브젝트의 3차원 키 포인트에 대응하는 2차원 투영 키 포인트 중 상기 목표 오브젝트의 미리 설정된 부위에 속하는 목표 2차원 투영 키 포인트를 획득할 수 있고; 여기서, 상기 목표 오브젝트의 3차원 키 포인트는 상기 전역 회전 파라미터의 초기값, 키 포인트 회전 파라미터의 초기값 및 자태 파라미터의 초기값에 기반하여 얻으며; 상기 2차원 투영 키 포인트는 상기 변위 파라미터의 현재 값과 전역 회전 파라미터의 초기값에 기반하여 상기 목표 오브젝트의 3차원 키 포인트에 대해 투영을 수행하여 얻는다. 상기 목표 2차원 투영 키 포인트와 상기 초기 2차원 키 포인트 사이의 제1 손실을 획득한다. 상기 변위 파라미터의 초기값과 상기 변위 파라미터의 현재 값 사이의 제2 손실을 획득한다. 상기 제1 손실과 제2 손실에 기반하여 상기 변위 파라미터의 현재 값과 전역 회전 파라미터의 초기값을 최적화한다.Furthermore, among the 2D projection key points corresponding to the 3D key points of the target object, a target 2D projection key point belonging to a preset portion of the target object may be acquired; Here, the three-dimensional key point of the target object is obtained based on the initial value of the global rotation parameter, the initial value of the key point rotation parameter and the initial value of the shape parameter; The 2D projection key point is obtained by performing projection on the 3D key point of the target object based on the current value of the displacement parameter and the initial value of the global rotation parameter. A first loss between the target 2D projection key point and the initial 2D key point is obtained. A second loss between an initial value of the displacement parameter and a current value of the displacement parameter is obtained. A current value of the displacement parameter and an initial value of the global rotation parameter are optimized based on the first loss and the second loss.

여기서, 상기 미리 설정된 부위는 몸통 부분일 수 있고, 상기 목표 2차원 투영 키 포인트는 좌우 어깨 포인트, 좌우 고관절 포인트, 척추 중심점 등 키 포인트를 포함할 수 있다. 상이한 동작의 몸통 부위의 키 포인트에 대한 영향이 작으므로, 몸통 부위의 키 포인트를 사용하여 제1 손실을 설정함으로써 상이한 동작의 키 포인트의 위치에 대한 영향을 완화시키고, 최적화 결과의 정확성을 향상시킬 수 있다. 제1 손실은 몸통 키 포인트 투영 손실이라고도 지칭할 수 있고, 제2 손실은 카메라 변위 정규화 손실이라고도 지칭할 수 있으며, 제1 손실은 다음 수학식(1)을 통해 얻을 수 있고, 제2 손실은 다음 수학식(2)을 통해 얻을 수 있다:Here, the preset part may be a body part, and the target 2D projection key point may include key points such as left and right shoulder points, left and right hip joint points, and a central point of the spine. Since the influence on the key points of the torso of different motions is small, by setting the first loss using the key points of the torso, the influence on the position of the key points of the different motions can be mitigated and the accuracy of the optimization result can be improved. can The first loss can also be referred to as the torso key point projection loss, the second loss can also be referred to as the camera displacement normalization loss, the first loss can be obtained through Equation (1), and the second loss is It can be obtained through Equation (2):

여기서

와

는 각각 제1 손실과 제2 손실을 표시하고,

와

는 각각 목표 2차원 투영 키 포인트와 초기 2차원 키 포인트를 표시하며,

와

는 각각 상기 이미지 수집 장치와 상기 목표 오브젝트 사이의 변위 파라미터의 현재 값과 상기 변위 파라미터의 초기값을 표시한다. 제1 손실과 제2 손실에 기반하여 제1 목표 손실을 결정할 수 있다. 예를 들어, 상기 제1 목표 손실은 상기 제1 손실과 상기 제2 손실의 합으로 결정할 수 있고 다음 수학식(3)을 통해 결정할 수 있다:here

and

Represents the first loss and the second loss, respectively,

and

denotes a target 2D projection key point and an initial 2D key point, respectively,

and

Indicates a current value of a displacement parameter between the image acquisition device and the target object and an initial value of the displacement parameter, respectively. A first target loss may be determined based on the first loss and the second loss. For example, the first target loss may be determined as the sum of the first loss and the second loss and may be determined through Equation (3):

상기 목표 오브젝트의 최적화 2차원 투영 키 포인트와 상기 초기 2차원 키 포인트 사이의 제3 손실을 획득할 수 있으며, 여기서, 상기 최적화 2차원 투영 키 포인트는 상기 변위 파라미터의 최적화된 값과 전역 회전 파라미터의 최적화된 값에 기반하여 상기 목표 오브젝트의 최적화 3차원 키 포인트에 대해 투영을 수행하여 얻고, 상기 최적화 3차원 키 포인트는 상기 전역 회전 파라미터의 최적화된 값, 키 포인트 회전 파라미터의 초기값 및 자태 파라미터의 초기값에 기반하여 얻는다. 상기 전역 회전 파라미터의 최적화된 값, 키 포인트 회전 파라미터의 초기값 및 자태 파라미터의 초기값에 대응하는 자태의 합리성을 나타내는 데에 사용되는 제4 손실을 획득한다. 상기 제3 손실 및 상기 제4 손실에 기반하여 상기 키 포인트 회전 파라미터의 초기값 및 상기 자태 파라미터의 초기값을 최적화한다.A third loss between an optimized 2D projection key point of the target object and the initial 2D key point may be obtained, wherein the optimized 2D projection key point is a ratio between an optimized value of the displacement parameter and a global rotation parameter. It is obtained by performing projection on an optimized 3D key point of the target object based on the optimized value, and the optimized 3D key point is obtained by performing a projection on the optimized value of the global rotation parameter, an initial value of a key point rotation parameter, and a shape parameter. Get it based on the initial value. A fourth loss used for representing the rationality of a posture corresponding to the optimized value of the global rotation parameter, the initial value of the key point rotation parameter, and the initial value of the posture parameter is obtained. An initial value of the key point rotation parameter and an initial value of the shape parameter are optimized based on the third loss and the fourth loss.

제3 손실은 2차원 키 포인트 투영 손실이라고도 지칭할 수 있고, 제4 손실은 자태 합리성 손실이라고도 지칭할 수 있으며, 제3 손실은 다음 수학식(4)을 통해 결정될 수 있다:The third loss may also be referred to as the two-dimensional key point projection loss, the fourth loss may also be referred to as the self-rationality loss, and the third loss may be determined through Equation (4):

여기서,

는 제3 손실이고,

와

는 각각 상기 최적화 2차원 투영 키 포인트 및 상기 초기 2차원 키 포인트를 표시한다. 제3 손실과 제4 손실에 기반하여 제2 목표 손실을 결정할 수 있는바, 예를 들어, 상기 제2 목표 손실은 상기 제3 손실과 상기 제4 손실의 합으로 결정될 수 있고, 다음 수학식(5)을 통해 결정된다:here,

is the third loss,

and

denote the optimization two-dimensional projection key point and the initial two-dimensional key point, respectively. A second target loss may be determined based on the third loss and the fourth loss. For example, the second target loss may be determined as the sum of the third loss and the fourth loss, and the following equation ( 5) is determined via:

여기서,

는 제2 목표 손실이고,

는 제4 손실이고, 가우스 혼합 모델(Gaussian Mixture Model, GMM)을 사용하여 얻을 수 있으며, 전역 회전 파라미터의 최적화된 값, 키 포인트 회전 파라미터의 초기값 및 자태 파라미터의 초기값에 대응하는 자태가 합리한지 여부를 판단하고, 불합리한 자태에 대해 큰 손실을 수출하는 데에 사용된다.here,

is the second target loss,

is the fourth loss, and can be obtained using a Gaussian Mixture Model (GMM), and the shape corresponding to the optimized value of the global rotation parameter, the initial value of the key point rotation parameter, and the initial value of the shape parameter is reasonable It is used to determine whether or not to do so, and to export large losses for unreasonable behavior.

상기 변위 파라미터의 최적화된 값과 전역 회전 파라미터의 최적화된 값에 기반하여 상기 키 포인트 회전 파라미터의 초기값과 상기 자태 파라미터의 초기값을 최적화한 후, 상기 전역 회전 파라미터의 최적화된 값, 상기 키 포인트 회전 파라미터의 최적화된 값, 자태 파라미터의 최적화된 값 및 상기 변위 파라미터의 최적화된 값에 대해 연합 최적화를 수행 즉 3단계 최적화 방법을 사용할 수도 있다. 감독 정보에 목표 오브젝트 표면의 3차원 포인트 클라우드의 정보가 포함된 경우, 카메라 최적화 단계, 자태 최적화 단계 및 포인트 클라우드 최적화 단계를 포함하는 상기 3단계 최적화 방법을 사용할 수 있다.After optimizing the initial value of the key point rotation parameter and the initial value of the shape parameter based on the optimized value of the displacement parameter and the optimized value of the global rotation parameter, the optimized value of the global rotation parameter, the key point Combined optimization is performed on the optimized values of the rotation parameters, the optimized values of the shape parameters, and the optimized values of the displacement parameters, that is, a three-step optimization method may be used. When the supervision information includes information on the 3D point cloud of the surface of the target object, the three-step optimization method including a camera optimization step, a self-optimization step, and a point cloud optimization step may be used.

카메라 최적화 단계에서, 상기 목표 오브젝트의 3차원 키 포인트에 대응하는 2차원 투영 키 포인트 중 상기 목표 오브젝트의 미리 설정된 부위에 속하는 목표 2차원 투영 키 포인트를 획득할 수 있으며; 여기서, 상기 목표 오브젝트의 3차원 키 포인트는 상기 전역 회전 파라미터의 초기값, 키 포인트 회전 파라미터의 초기값 및 자태 파라미터의 초기값에 기반하여 얻고, 상기 2차원 투영 키 포인트는 상기 변위 파라미터의 현재 값과 전역 회전 파라미터의 초기값에 기반하여 상기 목표 오브젝트의 3차원 키 포인트에 대해 투영을 수행하여 얻는다. 상기 목표 2차원 투영 키 포인트와 상기 초기 2차원 키 포인트 사이의 제1 손실을 획득한다. 상기 변위 파라미터의 초기값과 상기 변위 파라미터의 현재 값 사이의 제2 손실을 획득한다. 상기 목표 오브젝트 표면의 제1 3차원 포인트 클라우드와 상기 초기 3차원 포인트 클라우드 사이의 제5 손실을 획득하고; 여기서, 상기 제1 3차원 포인트 클라우드는 상기 전역 회전 파라미터의 초기값, 키 포인트 회전 파라미터의 초기값 및 자태 파라미터의 초기값에 기반하여 얻는다. 상기 제1 손실, 제2 손실 및 제5 손실에 기반하여 상기 변위 파라미터의 현재 값과 전역 회전 파라미터의 초기값을 최적화한다. 상기 제5 손실은 ICP(Iterative Closest Point) 포인트 클라우드 정합 손실(registration loss)이라고도 지칭할 수 있으며, 다음 수학식(6)을 통해 결정될 수 있다:In the camera optimization step, among the 2D projection key points corresponding to the 3D key points of the target object, a target 2D projection key point belonging to a preset part of the target object is obtained; Here, the 3D key point of the target object is obtained based on the initial value of the global rotation parameter, the initial value of the key point rotation parameter, and the initial value of the shape parameter, and the 2D projection key point is the current value of the displacement parameter. It is obtained by performing projection on the 3D key point of the target object based on the initial values of and global rotation parameters. A first loss between the target 2D projection key point and the initial 2D key point is obtained. A second loss between an initial value of the displacement parameter and a current value of the displacement parameter is obtained. obtain a fifth loss between the first 3-dimensional point cloud of the surface of the target object and the initial 3-dimensional point cloud; Here, the first 3D point cloud is obtained based on the initial values of the global rotation parameters, the initial values of the key point rotation parameters, and the initial values of the shape parameters. A current value of the displacement parameter and an initial value of the global rotation parameter are optimized based on the first loss, the second loss and the fifth loss. The fifth loss may also be referred to as an Iterative Closest Point (ICP) point cloud registration loss, and may be determined through Equation (6):

수학식에서

는 상기 제5 손실이고, 상기 초기 3차원 포인트 클라우드를 포인트 클라우드 P로 간주하며, 상기 제1 3차원 포인트 클라우드를 포인트 클라우드 Q로 간주하고,

는 포인트 클라우드 P 중 각 포인트로부터 포인트 클라우드 Q까지의 거리가 가장 가까운 포인트들로 구성된 포인트 쌍(point pair) 집합이며,

는 포인트 클라우드 Q 중 매개 포인트로부터 포인트 클라우드 P까지의 거리가 가장 가까운 포인트들로 구성된 포인트 쌍 집합이다. 제1 손실과 제2 손실은 각각 다음 수학식(7)과 수학식(8)으로 표시된다:in the equation

is the fifth loss, the initial 3-dimensional point cloud is regarded as a point cloud P, the first 3-dimensional point cloud is regarded as a point cloud Q,

Is a set of point pairs consisting of points with the closest distance from each point in the point cloud P to the point cloud Q,

is a set of point pairs consisting of points with the closest distance from each point in the point cloud Q to the point cloud P. The first loss and the second loss are represented by the following equations (7) and (8), respectively:

여기서

와

는 각각 제1 손실과 제2 손실을 표시하고,

와

는 각각 상기 변위 파라미터의 현재 값과 상기 변위 파라미터의 초기값을 표시한다. 제1 손실, 제2 손실 및 제5 손실의 합에 기반하여 제1 목표 손실을 결정하고, 다시 제1 목표 손실에 기반하여 상기 변위 파라미터의 현재 값과 전역 회전 파라미터의 초기값을 최적화하는 바, 즉 다음 수학식(9)이다:here

and

Represents the first loss and the second loss, respectively,

and

denotes the current value of the displacement parameter and the initial value of the displacement parameter, respectively. Determine a first target loss based on the sum of the first loss, the second loss and the fifth loss, and optimize the current value of the displacement parameter and the initial value of the global rotation parameter based on the first target loss again, That is, the following equation (9):

3단계 최적화 과정의 자태 최적화 단계와 2단계 최적화 과정의 자태 최적화 단계의 최적화 방법은 동일하므로 여기서 더 이상 설명하지 않는다.Since the optimization method of the phase optimization step of the 3-step optimization process and the phase optimization step of the 2-step optimization process are the same, they are not further described here.

포인트 클라우드 최적화 단계에서, 상기 목표 오브젝트의 최적화 2차원 투영 키 포인트와 상기 초기 2차원 키 포인트 사이의 제6 손실을 획득하며, 여기서, 상기 최적화 2차원 투영 키 포인트는 상기 변위 파라미터의 최적화된 값과 전역 회전 파라미터의 최적화된 값에 기반하여 상기 목표 오브젝트의 최적화 3차원 키 포인트에 대해 투영을 수행하여 얻고, 상기 최적화 3차원 키 포인트는 상기 전역 회전 파라미터의 최적화된 값, 키 포인트 회전 파라미터의 최적화된 값 및 자태 파라미터의 최적화된 값에 기반하여 얻는다. 상기 전역 회전 파라미터의 최적화된 값, 키 포인트 회전 파라미터의 최적화된 값 및 자태 파라미터의 최적화된 값에 대응하는 자태의 합리성을 나타내기 위한 제7손실을 획득한다. 상기 목표 오브젝트 표면의 제2 3차원 포인트 클라우드와 상기 초기 3차원 포인트 클라우드 사이의 제8 손실을 획득하며; 여기서, 상기 제2 3차원 포인트 클라우드는 상기 전역 회전 파라미터의 최적화된 값, 키 포인트 회전 파라미터의 최적화된 값 및 자태 파라미터의 최적화된 값에 기반하여 얻는다. 상기 제6 손실, 제7 손실 및 제8 손실에 기반하여 상기 전역 회전 파라미터의 최적화된 값, 상기 키 포인트 회전 파라미터의 최적화된 값, 자태 파라미터의 최적화된 값 및 상기 변위 파라미터의 최적화된 값에 대해 연합 최적화를 수행하며, 다음 수학식(10) 및 수학식(11)을 통해 최적화한다:In the point cloud optimization step, a sixth loss between an optimized 2D projection key point of the target object and the initial 2D key point is obtained, wherein the optimized 2D projection key point is equal to the optimized value of the displacement parameter. Obtained by performing projection on an optimized 3D key point of the target object based on an optimized value of a global rotation parameter, wherein the optimized 3D key point is obtained by performing a projection on an optimized value of the global rotation parameter and an optimized key point rotation parameter. It is obtained based on the optimized value of the value and aspect parameters. A seventh loss for representing the rationality of a posture corresponding to the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter, and the optimized value of the posture parameter is obtained. obtain an eighth loss between a second 3-dimensional point cloud of the target object surface and the initial 3-dimensional point cloud; Here, the second 3D point cloud is obtained based on the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter, and the optimized value of the shape parameter. For the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter, the optimized value of the shape parameter and the optimized value of the displacement parameter based on the sixth loss, the seventh loss and the eighth loss. Perform federated optimization, and optimize through the following equations (10) and (11):

수학식에서

는 제6 손실이고,

는 최적화 2차원 투영 키 포인트이며,

는 초기 2차원 키 포인트다. 제7 손실은 가우스 혼합 모델을 사용하여 획득할 수 있고, 이는 전역 회전 파라미터의 최적화된 값, 키 포인트 회전 파라미터의 최적화된 값 및 자태 파라미터의 최적화된 값에 대응하는 자태가 합리한지 여부를 판단하는 데에 사용되며, 불합리한 자태에 대해 큰 손실을 출력한다.

는 제8 손실이고, 상기 초기 3차원 포인트 클라우드를 포인트 클라우드 P로 간주하며,

는 상기 제2 3차원 포인트 클라우드이고,

는 포인트 클라우드 P 중 각 포인트로부터 포인트 클라우드

까지의 거리가 가장 가까운 포인트들로 구성된 포인트 쌍 집합이며,

는 포인트 클라우드

중 각 포인트로부터 포인트 클라우드 P까지의 거리가 가장 가까운 포인트들로 구성된 포인트 쌍 집합이다. 나아가, 제6 손실, 제7 손실 및 제8 손실의 합을 제3 목표 손실

로 결정할 수 있고, 제3 목표 손실에 기반하여 상기 전역 회전 파라미터의 최적화된 값, 상기 키 포인트 회전 파라미터의 최적화된 값, 자태 파라미터의 최적화된 값 및 상기 변위 파라미터의 최적화된 값에 대해 연합 최적화를 수행하며, 다음 수학식(12)을 통해 연합 최적화를 수행할 수 있다:in the equation

is the sixth loss,

is the optimization two-dimensional projection key point,

is an initial two-dimensional key point. The seventh loss can be obtained using a Gaussian mixture model, which determines whether the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter, and the shape corresponding to the optimized value of the shape parameter are reasonable. It is used for, and outputs a large loss for unreasonable self.

is the eighth loss, and regards the initial three-dimensional point cloud as a point cloud P,

Is the second 3-dimensional point cloud,

is the point cloud from each point in the point cloud P

is a set of point pairs consisting of points with the closest distance to

is the point cloud

It is a set of point pairs consisting of points with the closest distance from each point to the point cloud P. Further, the third target loss is the sum of the sixth loss, the seventh loss, and the eighth loss.

, and based on the third target loss, joint optimization is performed for the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter, the optimized value of the shape parameter, and the optimized value of the displacement parameter. , and federation optimization can be performed through the following equation (12):

목표 오브젝트의 이미지가 RGB 이미지인 경우, 위에서 언급한 카메라 최적화 단계와 자태 최적화 단계를 포함하는 2단계 최적화 방법에 기반하여 파라미터 최적화를 수행할 수 있고; 목표 오브젝트의 이미지가 RGBD 이미지인 경우, 위에서 언급한 카메라 최적화 단계, 자태 최적화 단계, 포인트 클라우드 최적화 단계를 포함하는 3단계 최적화 방법에 기반하여 파라미터 최적화를 수행할 수 있다.If the image of the target object is an RGB image, parameter optimization may be performed based on the above-mentioned two-step optimization method including the camera optimization step and the shape optimization step; When the image of the target object is an RGBD image, parameter optimization may be performed based on the above-mentioned three-step optimization method including the camera optimization step, the self-optimization step, and the point cloud optimization step.

본 방안은 다양한 시나리오에서 사용될 수 있는 바, 가상 탈의실, 가상 앵커, 비디오 모션 마이그레이션(Motion migration) 등 시나리오에서 자연스럽고 합리적이며 정확한 인체 재구성 모델을 제공할 수 있다.This method can be used in various scenarios, and can provide a natural, reasonable, and accurate human body reconstruction model in scenarios such as virtual dressing rooms, virtual anchors, and video motion migration.

도 4a에 도시된 바와 같이, 이는 본 발명의 실시예의 가상 탈의실 응용 시나리오의 개략도이다. 카메라(403)를 통해 유저(401)의 이미지를 수집할 수 있고, 수집된 이미지를 프로세서(미도시)에 전송하여 3차원 인체 재구성을 수행하여 유저(401)에 대응하는 인체 재구성 모델(404)을 획득하도록 하며, 유저(401)가 볼 수 있게 인체 재구성 모델(404)을 디스플레이 인터페이스(402)에 전시한다. 동시에 유저(401)는 원하는 의류(405)을 선택할 수 있으며, 상기 의류(405)은 옷(4051), 모자(4052) 등을 포함하나 이에 한정되지 않는다. 인체 재구성 모델(404)에 기반하여 디스플레이 인터페이스(402)에 의류(405)을 디스플레이하여 유저(401)가 의류(405)의 착용 효과를 볼 수 있게 한다.As shown in Fig. 4A, this is a schematic diagram of a virtual dressing room application scenario in an embodiment of the present invention. A human body reconstruction model 404 corresponding to the user 401 may be obtained by collecting images of the user 401 through the camera 403 and transmitting the collected images to a processor (not shown) to perform 3D human body reconstruction. is acquired, and the human body reconstruction model 404 is displayed on the display interface 402 so that the user 401 can see it. At the same time, the user 401 can select desired clothing 405, and the clothing 405 includes, but is not limited to, the clothing 4051 and a hat 4052. The clothing 405 is displayed on the display interface 402 based on the human body reconstruction model 404 so that the user 401 can see the wearing effect of the clothing 405 .

도 4b에 도시된 바와 같이, 이는 본 발명의 실시예의 가상 라이브 룸 응용 시나리오의 개략도이다. 라이브 방송 과정에서 앵커 클라이언트(407)를 통해 앵커 유저(406)의 이미지를 수집할 수 있고, 앵커 유저(406)의 이미지를 서버(408)로 전송하여 3차원 재구성을 수행하며, 앵커 유저의 인체 재구성 모델 즉 가상 앵커를 얻는다. 서버(408)는 앵커 유저의 인체 재구성 모델을 앵커 클라이언트(407)로 되돌려 전시할 수 있는바, 도면의 모델(4071)과 같다. 또한 앵커 클라이언트(407)는 앵커 유저의 음성 정보를 수집할 수 있으며, 음성 정보를 서버(408)로 전송하여 서버(408)가 인체 재구성 모델 및 음성 정보에 대해 융합을 수행하도록 한다. 서버(408)는 융합 후의 인체 재구성 모델과 음성 정보를 라이브 방송 프로그램을 시청하는 시청자 클라이언트(409)에 전송하여 디스플레이 및 재생할 수 있는바, 여기서, 디스플레이된 인체 재구성 모델은 도면의 모델(4091)에 도시된 바와 같다. 상기 방법을 통해 시청자 클라이언트(409)에 가상 앵커가 라이브 방송하는 화면을 디스플레이할 수 있다.As shown in Fig. 4B, this is a schematic diagram of a virtual live room application scenario of an embodiment of the present invention. In the process of live broadcasting, the anchor user 406's image can be collected through the anchor client 407, the anchor user's 406's image is transmitted to the server 408 to perform 3D reconstruction, and the anchor user's human body is performed. Obtain a reconstruction model, i.e. a virtual anchor. The server 408 can return the anchor user's human body reconstruction model to the anchor client 407 for display, as shown in the model 4071 in the drawing. In addition, the anchor client 407 may collect voice information of the anchor user and transmit the voice information to the server 408 so that the server 408 performs convergence on the human body reconstruction model and the voice information. The server 408 transmits the human body reconstruction model and voice information after fusion to the viewer client 409 watching the live broadcast program so that they can be displayed and reproduced. Here, the displayed human body reconstruction model corresponds to the model 4091 in the drawing. As shown. Through the above method, a live broadcasting screen of the virtual anchor can be displayed on the viewer client 409 .

본 기술분야의 통상의 기술자는, 특정 구현의 설명된 방법에서, 각각의 단계의 드래프팅 순서가 엄격하게 실행된 순서가 구현 프로세스에 대한 임의의 제한을 형성한다는 것을 암시하지 않으며, 각각의 단계의 특정 실행 순서는 그것의 기능 및 가능하게는 고유 로직에 의해 결정되어야 한다는 것을 이해할 수 있다.A person skilled in the art does not imply that, in the described method of a particular implementation, the order in which the drafting of each step is strictly executed forms any limitation to the implementation process, and the order of drafting each step It can be appreciated that the specific order of execution should be determined by its functionality and possibly its own logic.

도 5에 도시된 바와 같이, 본 발명은 3차원 재구성 장치를 더 제공하는 바, 상기 장치는 제1 3차원 재구성 모듈(501), 최적화 모듈(502), 제2 3차원 재구성 모듈(503)을 포함한다.As shown in FIG. 5, the present invention further provides a 3D reconstruction device, which includes a first 3D reconstruction module 501, an optimization module 502, and a second 3D reconstruction module 503. include

상기 제1 3차원 재구성 모듈(501)은 3차원 재구성 네트워크를 통해 이미지 내의 목표 오브젝트에 대해 3차원 재구성을 수행하여 상기 목표 오브젝트의 파라미터의 초기값을 얻는 데에 사용되고, 상기 파라미터의 초기값은 상기 목표 오브젝트의 3차원 모델을 구축하는 데에 사용된다;The first 3D reconstruction module 501 is used to obtain initial values of parameters of the target object by performing 3D reconstruction on a target object in an image through a 3D reconstruction network, and the initial values of the parameters are used to build a three-dimensional model of a target object;

상기 최적화 모듈(502)은 미리 얻은, 목표 오브젝트의 특징을 나타내는 감독 정보에 기반하여 상기 파라미터의 초기값을 최적화하여 상기 파라미터의 최적화된 값을 얻는 데에 사용된다;the optimization module 502 is used to optimize the initial value of the parameter according to previously obtained supervision information representing the characteristics of the target object to obtain an optimized value of the parameter;

상기 제2 3차원 재구성 모듈(503)은 상기 파라미터의 최적화된 값에 기반하여 골격 스키닝 처리를 수행하여 상기 목표 오브젝트의 3차원 모델을 구축하는 데에 사용된다.The second 3D reconstruction module 503 is used to construct a 3D model of the target object by performing skeletal skinning processing based on the optimized values of the parameters.

일부 실시예에서 상기 감독 정보는 제1 감독 정보를 포함하거나 또는 상기 감독 정보는 제1 감독 정보와 제2 감독 정보를 포함하고; 상기 제1 감독 정보는 상기 목표 오브젝트의 초기 2차원 키 포인트, 상기 이미지 내의 상기 목표 오브젝트의 복수 개의 픽셀 포인트의 시멘틱 정보 중 적어도 하나를 포함하며; 상기 제2 감독 정보는 상기 목표 오브젝트 표면의 초기 3차원 포인트 클라우드를 포함한다. 본 발명의 실시예에서 목표 오브젝트의 초기 2차원 키 포인트 또는 픽셀 포인트의 시멘틱 정보만 감독 정보로 사용하여 상기 파라미터의 초기값을 최적화하여 최적화 효율이 높고 최적화 복잡도가 낮거나; 또는 목표 오브젝트 표면의 초기 3차원 포인트 클라우드와 앞에서 언급한 초기 2차원 키 포인트 또는 픽셀 포인트의 시멘틱 정보를 함께 감독 정보로 사용하여 획득된 파라미터의 최적화된 값의 정확성을 향상시킬 수 있다.In some embodiments, the supervisory information includes first supervisory information, or the supervisory information includes first supervisory information and second supervisory information; the first supervision information includes at least one of an initial two-dimensional key point of the target object and semantic information of a plurality of pixel points of the target object in the image; The second supervision information includes an initial 3D point cloud of the surface of the target object. In an embodiment of the present invention, only semantic information of an initial two-dimensional key point or pixel point of a target object is used as supervision information to optimize the initial values of the parameters so that optimization efficiency is high and optimization complexity is low; Alternatively, the initial 3D point cloud of the target object surface and the aforementioned initial 2D key point or pixel point semantic information are used together as supervision information to improve the accuracy of the optimized value of the acquired parameter.

일부 실시예에서, 상기 장치는 키 포인트 추출 네트워크를 통해 상기 이미지로부터 상기 목표 오브젝트의 초기 2차원 키 포인트의 정보를 추출하는 데에 사용되는 2차원 키 포인트 추출 모듈을 더 포함한다. 키 포인트 추출 네트워크에 의해 추출된 초기 2차원 키 포인트의 정보를 감독 정보로 이용하여 3차원 모델을 위해 자연스럽고 합리적인 동작을 생성할 수 있다.In some embodiments, the device further includes a 2D key point extraction module, which is used to extract information of an initial 2D key point of the target object from the image via a key point extraction network. A natural and reasonable motion can be created for a 3D model by using the initial 2D key point information extracted by the key point extraction network as supervision information.

일부 실시예에서, 상기 이미지는 상기 목표 오브젝트의 깊이 맵을 포함하고; 상기 장치는 깊이 정보 추출 모듈, 역투영 모듈을 더 포함한다. 상기 깊이 정보 추출 모듈은 상기 깊이 맵으로부터 상기 목표 오브젝트의 복수 개의 픽셀 포인트의 깊이 정보를 추출하는 데에 사용되고; 상기 역투영 모듈은 상기 깊이 정보에 기반하여 상기 깊이 맵 내의 상기 목표 오브젝트의 복수 개의 픽셀 포인트를 3차원 공간에 역투영하여 상기 목표 오브젝트 표면의 초기 3차원 포인트 클라우드를 얻는 데에 사용된다. 깊이 정보를 추출하고, 깊이 정보에 기반하여 2차원 이미지 내의 픽셀 포인트를 3차원 공간에 역투영하여 목표 오브젝트 표면의 초기 3차원 포인트 클라우드를 얻음으로써, 당해 초기 3차원 포인트 클라우드를 감독 정보로 이용하여 파라미터의 초기값을 최적화하며, 나아가 파라미터 최적화의 정확성을 향상시킨다.In some embodiments, the image includes a depth map of the target object; The device further includes a depth information extraction module and a back-projection module. the depth information extraction module is used to extract depth information of a plurality of pixel points of the target object from the depth map; The back-projection module is used to back-project a plurality of pixel points of the target object in the depth map into a 3-dimensional space based on the depth information to obtain an initial 3-dimensional point cloud of a surface of the target object. Depth information is extracted, and based on the depth information, pixel points in the 2D image are back-projected onto a 3D space to obtain an initial 3D point cloud of the surface of the target object, and the initial 3D point cloud is used as supervision information. Optimize the initial values of parameters, and further improve the accuracy of parameter optimization.

일부 실시예에서, 상기 이미지는 상기 목표 오브젝트의 RGB 이미지를 더 포함한다; 상기 깊이 정보 추출 모듈은 이미지 분할 유닛, 이미지 영역 결정 유닛, 깊이 정보 획득 유닛을 포함한다. 상기 이미지 분할 유닛은 상기 RGB 이미지에 대해 이미지 분할을 수행하는 데에 사용되고; 상기 이미지 영역 결정 유닛은, 이미지 분할의 결과에 기반하여 상기 RGB 이미지 내의 목표 오브젝트가 소재하는 이미지 영역을 결정하며, 상기 RGB 이미지 내의 목표 오브젝트가 소재하는 이미지 영역에 기반하여 상기 깊이 맵 내의 목표 오브젝트가 소재하는 이미지 영역을 결정하는 데에 사용되고; 상기 깊이 정보 획득 유닛은, 상기 깊이 맵 내의 상기 목표 오브젝트가 소재하는 이미지 영역의 복수 개의 픽셀 포인트의 깊이 정보를 획득하는 데에 사용된다. RGB 이미지에 대해 이미지 분할을 수행하여 목표 오브젝트의 위치를 정확하게 결정할 수 있음으로써, 추출 목표 오브젝트의 깊이 정보를 정확하게 추출할 수 있다.In some embodiments, the image further includes an RGB image of the target object; The depth information extraction module includes an image segmentation unit, an image area determination unit, and a depth information acquisition unit. the image segmentation unit is used to perform image segmentation on the RGB image; The image area determination unit determines an image area where a target object in the RGB image is located based on a result of image segmentation, and a target object in the depth map is determined based on the image area where the target object in the RGB image is located. used to determine the region of the image to be located; The depth information acquisition unit is used to acquire depth information of a plurality of pixel points of an image area where the target object in the depth map is located. By performing image segmentation on the RGB image to accurately determine the location of the target object, depth information of the extraction target object can be accurately extracted.

일부 실시예에서, 상기 장치는 상기 초기 3차원 포인트 클라우드 중 이상치를 필터링하고, 필터링 후의 상기 초기 3차원 포인트 클라우드를 상기 제2 감독 정보로 이용하는 데에 사용되는 필터링 모듈을 더 포함한다. 이상치를 필터링하는 것을 통해 이상치의 간섭이 완화되고, 나아가 파라미터 최적화 과정의 정확성이 더욱 향상된다.In some embodiments, the device further includes a filtering module used for filtering outliers in the initial 3D point cloud and using the filtered initial 3D point cloud as the second supervisory information. Through outlier filtering, the interference of outliers is mitigated, and furthermore, the accuracy of the parameter optimization process is further improved.

일부 실시예에서, 상기 목표 오브젝트의 이미지는 이미지 수집 장치를 통해 수집되고 얻으며, 상기 파라미터는 상기 목표 오브젝트의 전역 회전 파라미터, 상기 목표 오브젝트의 각 키 포인트의 키 포인트 회전 파라미터, 상기 목표 오브젝트의 자태 파라미터 및 상기 이미지 수집 장치의 변위 파라미터를 포함한다; 상기 최적화 모듈은 제1 최적화 유닛, 제2 최적화 유닛을 포함한다. 상기 제1 최적화 유닛은 상기 자태 파라미터의 초기값과 키 포인트 회전 파라미터의 초기값이 변하지 않은 상황에서, 상기 감독 정보와 상기 변위 파라미터의 초기값에 기반하여 상기 이미지 수집 장치의 변위 파라미터의 현재 값 및 상기 전역 회전 파라미터의 초기값을 최적화하여 변위 파라미터의 최적화된 값과 전역 회전 파라미터의 최적화된 값을 얻는 데에 사용되고; 상기 제2 최적화 유닛은 상기 변위 파라미터의 최적화된 값과 전역 회전 파라미터의 최적화된 값에 기반하여 상기 키 포인트 회전 파라미터의 초기값과 상기 자태 파라미터의 초기값을 최적화하여 키 포인트 회전 파라미터의 최적화된 값과 자태 파라미터의 최적화된 값을 얻는 데에 사용된다. 최적화 과정에서 이미지 수집 장치의 위치를 변경하거나 3차원 키 포인트 위치를 변경하는 것은 모두 3차원 키 포인트의 2차원 투영의 변화를 일으킬 수 있으므로 최적화 과정이 불안정하게 된다. 두 단계의 최적화의 방식을 사용하여 키 포인트 회전 파라미터의 초기값 및 자태 파라미터의 초기값을 미리 고정함으로써 이미지 수집 장치의 변위 파라미터의 초기값 및 전역 회전 파라미터의 초기값을 최적화하고, 다시 변위 파라미터의 초기값 및 전역 회전 파라미터의 초기값을 고정하여 키 포인트 회전 파라미터의 초기값 및 자태 파라미터의 초기값을 최적화하여 최적화 과정의 안정성을 향상시킨다.In some embodiments, the image of the target object is collected and obtained by an image collecting device, and the parameters include a global rotation parameter of the target object, a key point rotation parameter of each key point of the target object, and a shape parameter of the target object. and a displacement parameter of the image acquisition device; The optimization module includes a first optimization unit and a second optimization unit. The first optimization unit determines the current value of the displacement parameter of the image capture device and used to optimize an initial value of the global rotation parameter to obtain an optimized value of a displacement parameter and an optimized value of a global rotation parameter; The second optimization unit optimizes the initial value of the key point rotation parameter and the initial value of the shape parameter based on the optimized value of the displacement parameter and the optimized value of the global rotation parameter to optimize the value of the key point rotation parameter. It is used to obtain the optimized values of the parameters of the state and the state. In the optimization process, changing the position of the image acquisition device or changing the position of the 3D key point may cause a change in the 2D projection of the 3D key point, and thus the optimization process becomes unstable. By using a two-step optimization method, the initial values of the key point rotation parameters and the initial values of the shape parameters are fixed in advance to optimize the initial values of the displacement parameters and the initial values of the global rotation parameters of the image acquisition device. The initial values of the initial values and the global rotation parameters are fixed, and the initial values of the key point rotation parameters and the initial values of the shape parameters are optimized to improve the stability of the optimization process.

일부 실시예에서, 상기 감독 정보는 상기 목표 오브젝트의 초기 2차원 키 포인트를 포함한다; 상기 제1 최적화 유닛은 상기 목표 오브젝트의 3차원 키 포인트에 대응하는 2차원 투영 키 포인트 중 상기 목표 오브젝트의 미리 설정된 부위에 속하는 목표 2차원 투영 키 포인트를 획득하는 데에 사용되고; 여기서, 상기 목표 오브젝트의 3차원 키 포인트는 상기 전역 회전 파라미터의 초기값, 키 포인트 회전 파라미터의 초기값 및 자태 파라미터의 초기값에 기반하여 얻고, 상기 2차원 투영 키 포인트는 상기 변위 파라미터의 현재 값과 전역 회전 파라미터의 초기값에 기반하여 상기 목표 오브젝트의 3차원 키 포인트에 대해 투영을 수행하여 얻으며; 상기 목표 2차원 투영 키 포인트와 상기 초기 2차원 키 포인트 사이의 제1 손실을 획득하고; 상기 변위 파라미터의 초기값과 상기 변위 파라미터의 현재 값 사이의 제2 손실을 획득하며; 상기 제1 손실과 제2 손실에 기반하여 상기 변위 파라미터의 현재 값과 전역 회전 파라미터의 초기값을 최적화한다. 미리 설정된 부분은 몸통 등 부위일 수 있으며, 상이한 동작이 몸통 부위의 키 포인트에 대한 영향이 작으므로, 몸통 부위의 키 포인트를 사용하여 제1 손실을 결정함으로써 상이한 동작의 키 포인트 위치에 대한 영향을 감소시키고, 최적화 결과의 정확성을 향상시킬 수 있다. 2차원 키 포인트는 2차원 평면의 감독 정보이고, 이미지 수집 장치의 변위 파라미터는 3차원 평면의 파라미터이므로, 제2 손실을 획득하는 것을 통해 최적화 결과가 2차원 평면의 국부 최적화 포인트에 떨어져 실제 포인트에서 벗어나는 상황을 감소시킬 수 있다.In some embodiments, the supervision information includes an initial two-dimensional key point of the target object; the first optimization unit is used to obtain a target 2D projection key point belonging to a preset part of the target object from among the 2D projection key points corresponding to the 3D key points of the target object; Here, the 3D key point of the target object is obtained based on the initial value of the global rotation parameter, the initial value of the key point rotation parameter, and the initial value of the shape parameter, and the 2D projection key point is the current value of the displacement parameter. Obtain by performing projection on the three-dimensional key points of the target object based on initial values of and global rotation parameters; obtain a first loss between the target two-dimensional projection key point and the initial two-dimensional key point; obtain a second loss between an initial value of the displacement parameter and a current value of the displacement parameter; A current value of the displacement parameter and an initial value of the global rotation parameter are optimized based on the first loss and the second loss. The preset part may be a torso or the like, and since different motions have a small effect on key points of the torso, the first loss is determined using the key points of the torso to reduce the influence of different motions on key point positions. and improve the accuracy of optimization results. Since the two-dimensional key point is the supervision information of the two-dimensional plane, and the displacement parameter of the image acquisition device is the parameter of the three-dimensional plane, by obtaining the second loss, the optimization result falls to the local optimization point of the two-dimensional plane at the actual point. You can reduce the chances of getting out of the way.

일부 실시예에서, 상기 감독 정보는 상기 목표 오브젝트의 초기 2차원 키 포인트를 포함하고; 상기 제2 최적화 유닛은 상기 목표 오브젝트의 최적화 2차원 투영 키 포인트와 상기 초기 2차원 키 포인트 사이의 제3 손실을 획득하는 데에 사용되고 상기 최적화 2차원 투영 키 포인트는 상기 변위 파라미터의 최적화된 값과 전역 회전 파라미터의 최적화된 값에 기반하여 상기 목표 오브젝트의 최적화 3차원 키 포인트에 대해 투영을 수행하여 얻고, 상기 최적화 3차원 키 포인트는 상기 전역 회전 파라미터의 최적화된 값, 키 포인트 회전 파라미터의 초기값 및 자태 파라미터의 초기값에 기반하여 얻으며; 상기 전역 회전 파라미터의 최적화된 값, 키 포인트 회전 파라미터의 초기값 및 자태 파라미터의 초기값에 대응하는 자태의 합리성을 나타내기 위한 제4 손실을 얻고; 상기 제3 손실과 상기 제4 손실에 기반하여 상기 키 포인트 회전 파라미터의 초기값과 상기 자태 파라미터의 초기값을 최적화한다. 본 실시예는 변위 파라미터의 최적화된 값과 전역 회전 파라미터의 최적화된 값에 기반하여 키 포인트 회전 파라미터의 초기값과 자태 파라미터의 초기값을 최적화하여 최적화 과정의 안정성을 향상시키고, 동시에 제4 손실을 통해 최적화 후의 파라미터에 대응하는 자태의 합리성을 확보한다.In some embodiments, the supervision information includes an initial two-dimensional key point of the target object; The second optimization unit is used to obtain a third loss between an optimized 2D projection key point of the target object and the initial 2D key point, and the optimized 2D projection key point has an optimized value of the displacement parameter and Obtained by performing projection on an optimized 3D key point of the target object based on an optimized value of a global rotation parameter, wherein the optimized 3D key point is the optimized value of the global rotation parameter and the initial value of the key point rotation parameter. and based on the initial value of the shape parameter; obtain a fourth loss for representing the rationality of a posture corresponding to the optimized value of the global rotation parameter, the initial value of the key point rotation parameter and the initial value of the posture parameter; An initial value of the key point rotation parameter and an initial value of the shape parameter are optimized based on the third loss and the fourth loss. This embodiment optimizes the initial value of the key point rotation parameter and the initial value of the shape parameter based on the optimized value of the displacement parameter and the optimized value of the global rotation parameter to improve the stability of the optimization process, and at the same time reduce the fourth loss. Through this, the rationality of the shape corresponding to the parameters after optimization is secured.

일부 실시예에서, 상기 장치는 상기 변위 파라미터의 최적화된 값과 전역 회전 파라미터의 최적화된 값에 기반하여 상기 키 포인트 회전 파라미터의 초기값과 상기 자태 파라미터의 초기값을 최적화한 후 상기 전역 회전 파라미터의 최적화된 값, 상기 키 포인트 회전 파라미터의 최적화된 값, 자태 파라미터의 최적화된 값 및 상기 변위 파라미터의 최적화된 값에 대해 연합 최적화를 수행하기 위한 연합 최적화 모듈을 더 포함한다. 본 실시예는 전술한 최적화를 바탕으로 최적화 후의 각 파라미터에 대해 연합 최적화를 수행함으로써, 나아가 최적화 결과의 정확성을 향상시킨다.In some embodiments, the device optimizes the initial value of the key point rotation parameter and the initial value of the shape parameter based on the optimized value of the displacement parameter and the optimized value of the global rotation parameter, and then the global rotation parameter. and a federated optimization module for performing federated optimization on the optimized value, the optimized value of the key point rotation parameter, the optimized value of the shape parameter and the optimized value of the displacement parameter. This embodiment further improves the accuracy of the optimization result by performing coalition optimization on each parameter after optimization based on the above optimization.

일부 실시예에서, 상기 감독 정보는 상기 목표 오브젝트의 초기 2차원 키 포인트와 상기 목표 오브젝트 표면의 초기 3차원 포인트 클라우드를 포함하고; 상기 제1 최적화 유닛은 상기 목표 오브젝트의 3차원 키 포인트에 대응하는 2차원 투영 키 포인트 중 상기 목표 오브젝트의 미리 설정된 부위에 속하는 목표 2차원 투영 키 포인트를 획득하는 데에 사용되며 ;여기서, 상기 목표 오브젝트의 3차원 키 포인트는 상기 전역 회전 파라미터의 초기값, 키 포인트 회전 파라미터의 초기값 및 자태 파라미터의 초기값에 기반하여 얻고, 상기 2차원 투영 키 포인트는 상기 변위 파라미터의 현재 값과 전역 회전 파라미터의 초기값에 기반하여 상기 목표 오브젝트의 3차원 키 포인트에 대해 투영을 수행하여 얻으며; 상기 목표 2차원 투영 키 포인트와 상기 초기 2차원 키 포인트 사이의 제1 손실을 획득하고; 상기 변위 파라미터의 초기값과 상기 변위 파라미터의 현재 값 사이의 제2 손실을 획득하며; 상기 목표 오브젝트 표면의 제1 3차원 포인트 클라우드와 상기 초기 3차원 포인트 클라우드 사이의 제5 손실을 획득하고; 상기 제1 3차원 포인트 클라우드는 상기 전역 회전 파라미터의 초기값, 키 포인트 회전 파라미터의 초기값 및 자태 파라미터의 초기값에 기반하여 얻으며; 상기 제1 손실, 제2 손실 및 제5 손실에 기반하여 상기 변위 파라미터의 현재 값과 전역 회전 파라미터의 초기값을 최적화한다. 본 실시예는 3차원 포인트 클라우드를 감독 정보에 추가하여 초기의 각 파라미터를 최적화함으로써 최적화 결과의 정확성을 향상시킨다.In some embodiments, the supervision information includes an initial two-dimensional key point of the target object and an initial three-dimensional point cloud of a surface of the target object; The first optimization unit is used to obtain a target 2D projection key point belonging to a preset part of the target object among 2D projection key points corresponding to the 3D key point of the target object; A 3D key point of an object is obtained based on the initial value of the global rotation parameter, the initial value of the key point rotation parameter, and the initial value of the shape parameter, and the 2D projection key point is the current value of the displacement parameter and the global rotation parameter. obtained by performing projection on the 3D key point of the target object based on an initial value of ; obtain a first loss between the target two-dimensional projection key point and the initial two-dimensional key point; obtain a second loss between an initial value of the displacement parameter and a current value of the displacement parameter; obtain a fifth loss between the first 3-dimensional point cloud of the surface of the target object and the initial 3-dimensional point cloud; the first three-dimensional point cloud is obtained based on the initial values of the global rotation parameters, the initial values of the key point rotation parameters and the initial values of the shape parameters; A current value of the displacement parameter and an initial value of the global rotation parameter are optimized based on the first loss, the second loss and the fifth loss. This embodiment improves the accuracy of the optimization result by optimizing each initial parameter by adding a 3D point cloud to the supervisory information.

일부 실시예에서, 상기 연합 최적화 모듈은 제1 획득 유닛, 제2 획득 유닛, 제3 획득 유닛, 연합 최적화 유닛을 포함한다. 상기 제1 획득 유닛은 상기 목표 오브젝트의 최적화 2차원 투영 키 포인트와 상기 초기 2차원 키 포인트 사이의 제6 손실을 획득하는 데에 사용되고; 상기 최적화 2차원 투영 키 포인트는 상기 변위 파라미터의 최적화된 값과 전역 회전 파라미터의 최적화된 값에 기반하여 상기 목표 오브젝트의 최적화 3차원 키 포인트에 대해 투영을 수행하여 얻고, 상기 최적화 3차원 키 포인트는 상기 전역 회전 파라미터의 최적화된 값, 키 포인트 회전 파라미터의 최적화된 값 및 자태 파라미터의 최적화된 값에 기반하여 얻으며; 상기 제2 획득 유닛은 획득 제7 손실을 획득하는 데에 사용되고, 상기 제7 손실은 상기 전역 회전 파라미터의 최적화된 값, 키 포인트 회전 파라미터의 최적화된 값 및 자태 파라미터의 최적화된 값에 대응하는 자태의 합리성을 나타내는데 사용되며; 상기 제3 획득 유닛은 상기 목표 오브젝트 표면의 제2 3차원 포인트 클라우드와 상기 초기 3차원 포인트 클라우드 사이의 제8 손실을 획득하는 데에 사용되고; 상기 제2 3차원 포인트 클라우드는 상기 전역 회전 파라미터의 최적화된 값, 키 포인트 회전 파라미터의 최적화된 값 및 자태 파라미터의 최적화된 값에 기반하여 얻으며; 상기 연합 최적화 유닛은 상기 제6 손실, 제7 손실 및 제8 손실에 기반하여 상기 전역 회전 파라미터의 최적화된 값, 상기 키 포인트 회전 파라미터의 최적화된 값, 자태 파라미터의 최적화된 값 및 상기 변위 파라미터의 최적화된 값에 대해 연합 최적화를 수행하는 데에 사용된다. 본 실시예는 3차원 포인트 클라우드를 감독 정보에 추가하여 초기의 각 파라미터를 최적화함으로써 최적화 결과의 정확성을 향상시킨다.In some embodiments, the coalitional optimization module includes a first acquisition unit, a second acquisition unit, a third acquisition unit, and a coalitional optimization unit. the first obtaining unit is used to obtain a sixth loss between an optimization two-dimensional projection key point of the target object and the initial two-dimensional key point; The optimized 2D projection key point is obtained by performing projection on the optimized 3D key point of the target object based on the optimized value of the displacement parameter and the optimized value of the global rotation parameter, and the optimized 3D key point is obtain based on the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter and the optimized value of the shape parameter; The second obtaining unit is used to obtain a seventh loss, the seventh loss corresponding to the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter and the optimized value of the aspect parameter. used to indicate the rationality of; the third obtaining unit is used to acquire an eighth loss between a second 3-dimensional point cloud of the target object surface and the initial 3-dimensional point cloud; the second three-dimensional point cloud is obtained based on the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter and the optimized value of the shape parameter; The joint optimization unit determines the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter, the optimized value of the shape parameter and the displacement parameter based on the sixth loss, the seventh loss and the eighth loss. Used to perform federated optimization on optimized values. This embodiment improves the accuracy of the optimization result by optimizing each initial parameter by adding a 3D point cloud to the supervisory information.

일부 실시예에서, 본 발명의 실시예에서 제공하는 장치가 구비하는 기능 또는 포함하는 모듈은 상기 방법 실시예에서 설명한 방법을 실행하는 데에 사용될 수 있고, 이의 구체적인 구현은 상기 방법 실시예의 설명을 참조할 수 있는 바, 간결함을 위해 여기서 더 이상 설명하지 않는다.In some embodiments, functions provided by devices provided by embodiments of the present invention or modules included may be used to execute the methods described in the above method embodiments, and for specific implementations thereof, refer to the description of the above method embodiments. As far as I can, for the sake of brevity, I won't explain it further here.

도 6에 도시된 바와 같이, 본 발명은 3차원 재구성 시스템을 더 제공하며, 상기 시스템은 이미지 수집 장치(601), 처리 유닛(602)을 포함한다.As shown in FIG. 6 , the present invention further provides a 3D reconstruction system, which includes an image acquisition device 601 and a processing unit 602 .

이미지 수집 장치(601)는 목표 오브젝트의 이미지를 수집하는 데에 사용된다.The image collecting device 601 is used to collect an image of a target object.

상기 이미지 수집 장치(601)와 통신 연결된 처리 유닛(602)은 3차원 재구성 네트워크를 통해 상기 이미지 내의 목표 오브젝트에 대해 3차원 재구성을 수행하여 상기 목표 오브젝트의 파라미터의 초기값을 얻는 데에 사용되며, 상기 파라미터의 초기값은 상기 목표 오브젝트의 3차원 모델을 구축하는 데에 사용되고; 미리 얻은, 목표 오브젝트의 특징을 나타내는 감독 정보에 기반하여 상기 파라미터의 초기값을 최적화하여 상기 파라미터의 최적화된 값을 얻으며; 상기 파라미터의 최적화된 값에 기반하여 골격 스키닝 처리를 수행하여 상기 목표 오브젝트의 3차원 모델을 구축한다.A processing unit 602 communicatively connected to the image acquisition device 601 is used to obtain initial values of parameters of the target object by performing 3D reconstruction on a target object in the image through a 3D reconstruction network, the initial values of the parameters are used to build a three-dimensional model of the target object; optimize an initial value of the parameter according to previously obtained supervision information representing characteristics of a target object to obtain an optimized value of the parameter; A skeletal skinning process is performed based on the optimized values of the parameters to build a 3D model of the target object.

본 발명의 실시예의 이미지 수집 장치(601)는 사진기 또는 카메라 등 이미지 수집 기능을 구비한 기기일 수 있고, 이미지 수집 장치(601)에 의해 수집된 이미지는 처리 유닛(602)에 실시간으로 전송되거나 또는 저장된 후 필요할 때 저장 공간에서 처리 유닛(602)에 전송될 수 있다. 처리 유닛(602)은 하나의 서버 또는 복수 개의 서버로 구성된 서버 클러스터일 수 있다. 처리 유닛(602)에 의해 수행되는 방법의 상세한 내용은 상술한 3차원 재구성 방법의 실시예를 참고할 수 있으며 여기서 더 이상 설명하지 않는다.The image collection device 601 of the embodiment of the present invention may be a device having an image collection function, such as a camera or a camera, and the image collected by the image collection device 601 is transmitted to the processing unit 602 in real time, or After being stored, it can be transmitted from the storage space to the processing unit 602 when needed. The processing unit 602 may be one server or a server cluster composed of a plurality of servers. Details of the method performed by the processing unit 602 may refer to the embodiment of the 3D reconstruction method described above and are not described herein any further.

본 명세서 실시예는 컴퓨터 기기를 더 제공하는 바, 상기 컴퓨터 기기는 적어도 메모리, 프로세서 및 메모리에 저장되고 프로세서에서 실행 가능한 컴퓨터 프로그램을 포함하며, 여기서, 상기 프로그램이 프로세서에 의해 실행될 경우 상술한 임의의 실시예에 따른 방법이 구현된다.Embodiments of the present specification further provide a computer device, wherein the computer device includes at least a memory, a processor, and a computer program stored in the memory and executable by the processor, wherein, when the program is executed by the processor, any one of the above A method according to an embodiment is implemented.

도 7은 본 명세서의 실시예에서 제공하는 더욱 구체적인 컴퓨팅 기기의 하드웨어 구조 개략도를 나타내는 바, 상기 기기는 프로세서(701), 메모리(702), 입/출력 인터페이스(703), 통신 인터페이스(704) 및 버스(705)를 포함할 수 있다. 여기서 프로세서(701), 메모리(702), 입/출력 인터페이스(703) 및 통신 인터페이스(704)는 버스(705)를 통해 서로간의 기기 내부의 통신 연결을 구현한다. 7 is a schematic diagram of a hardware structure of a more specific computing device provided by an embodiment of the present specification, which includes a processor 701, a memory 702, an input/output interface 703, a communication interface 704, and bus 705. Here, the processor 701, the memory 702, the input/output interface 703, and the communication interface 704 implement a communication connection inside the device through a bus 705.

프로세서(701)는 범용 CPU(Central Processing Unit, 중앙 처리 장치), 마이크로프로세서, ASIC(Application Specific Integrated Circuit), 또는 하나 이상의 집적 회로 등 방식을 통해 구현될 수 있으며, 이는 관련 프로그램을 구현하여 본 명세서의 실시예에서 제공하는 기술적 해결방안을 구현된다. 프로세서(701)는 그래픽 카드를 더 포함할 수 있고, 상기 그래픽 카드는 Nvidia titan X 그래픽 카드 또는 1080Ti 그래픽 카드 등일 수 있다.The processor 701 may be implemented through a general-purpose CPU (Central Processing Unit), a microprocessor, an ASIC (Application Specific Integrated Circuit), or one or more integrated circuits, which is described herein by implementing a related program. The technical solution provided by the embodiment of is implemented. The processor 701 may further include a graphic card, and the graphic card may be an Nvidia titan X graphic card or a 1080Ti graphic card.

메모리(702)는 읽기 전용 메모리(Read Only Memory, ROM), 랜덤 액세스 메모리(Random Access Memory, RAM), 정적 저장 기기, 동적 저장 기기 등 형태로 구현될 수 있다. 메모리(902)는 운영 체제 및 다른 응용 프로그램을 저장할 수 있고, 소프트웨어 또는 펌웨어를 통해 본 명세서의 실시예에 의해 제공되는 기술적 해결수단을 구현할 경우, 관련되는 프로그램 코드는 메모리(702)에 저장되며, 프로세서(701)에 의해 호출되어 수행된다.The memory 702 may be implemented in the form of a read only memory (ROM), a random access memory (RAM), a static storage device, a dynamic storage device, and the like. The memory 902 may store an operating system and other application programs, and when implementing the technical solutions provided by the embodiments of the present specification through software or firmware, related program codes are stored in the memory 702, It is called and executed by the processor 701.

입/출력 인터페이스(703)는 정보의 입/출력을 구현하기 위해 입/출력 모듈을 연결하는 데에 사용된다. 입/출력/모듈은 구성 요소(도면에 미도시)로서 기기에 설치될 수도 있고, 장치에 외부적으로 연결되어 해당 기능을 제공할 수 있다. 입력 장치는 키보드, 마우스, 터치 스크린, 마이크, 각종 센서 등을 포함할 수 있고, 출력 장치는 디스플레이, 스피커, 진동기, 지시등 등을 포함할 수 있다.The input/output interface 703 is used to connect input/output modules to implement input/output of information. The input/output/module may be installed in the device as a component (not shown in the drawing) or may be externally connected to the device to provide corresponding functions. An input device may include a keyboard, a mouse, a touch screen, a microphone, and various sensors, and an output device may include a display, a speaker, a vibrator, and indicators.

통신 인터페이스(704)는 통신 모듈(도면에 도시되지 않음)을 연결하여 본 기기와 다른 기기 간의 통신 상호 작용을 구현하는 데에 사용된다. 통신 모듈은 유선 수단(예를 들면 USB, 네트워크 케이블 등)을 통해 통신을 구현할 수도 있고, 무선 수단(예를 들면 모바일 네트워크, WIFI, 블루투스 등)을 통해 통신을 구현할 수도 있다.The communication interface 704 is used to connect communication modules (not shown) to realize communication interaction between this device and other devices. The communication module may implement communication through wired means (eg, USB, network cable, etc.) or through wireless means (eg, mobile network, WIFI, Bluetooth, etc.).

버스(705)는 장치의 다양한 구성 요소(예를 들어, 프로세서(701), 메모리(702), 입/출력 인터페이스(703) 및 통신 인터페이스(704)) 사이에서 정보를 전송하기 위한 통로를 포함한다.Bus 705 includes pathways for transferring information between the various components of the device (e.g., processor 701, memory 702, input/output interface 703, and communication interface 704). .

설명해야 할 것은, 위에서 언급한 장치는 프로세서(701), 메모리(702), 입/출력 인터페이스(703), 통신 인터페이스(704) 및 버스(705)만을 보여주지만, 특정 구현 프로세스에서 당해 기기는 정상 작동에 필요한 기타 구성 요소를 포함할 수도 있다. 또한, 상술한 기기는 도면에 도시된 모든 구성 요소를 포함할 필요 없이 본 명세서의 실시예들의 방안을 구현하는 데에 필요한 구성 요소들만을 포함할 수 있음을 당업자는 이해할 수 있을 것이다.It should be noted that the device mentioned above only shows the processor 701, memory 702, input/output interface 703, communication interface 704 and bus 705, but in the specific implementation process, the device is normal It may also contain other components required for operation. In addition, those skilled in the art will understand that the above-described device may include only components necessary for implementing the schemes of the embodiments herein without needing to include all components shown in the drawings.

본 발명의 실시예는 컴퓨터 프로그램이 저장된 컴퓨터 판독 가능 저장 매체를 더 제공하며, 당해 프로그램은 프로세서에 의해 실행될 경우 상술한 임의의 실시예에 따른 방법을 구현한다.Embodiments of the present invention further provide a computer readable storage medium storing a computer program, and the program implements the method according to any of the above embodiments when executed by a processor.

컴퓨터 판독 가능 저장 매체는 영구적 및 비영구적, 이동식 및 비-이동식 매체를 포함하고, 정보 저장은 임의의 방법 또는 기술에 의해 실현될 수 있다. 정보는 컴퓨터 판독 가능 명령어들, 데이터 구조들, 프로그램 모듈들, 또는 다른 데이터일 수 있다. 컴퓨터 저장 매체의 예들은 상 변화 메모리(phase change memory, PRAM), 정적 랜덤 액세스 메모리(static random access memory, SRAM), 동적 랜덤 액세스 메모리(dynamic random access memory, DRAM), 다른 타입들의 랜덤 액세스 메모리(RAM), 판독 전용 메모리(ROM), 전기적 소거 가능 프로그램 가능 판독 전용 메모리(EEPROM), 플래시 메모리 또는 다른 메모리 기술, CD-ROM, 디지털 다기능 디스크(DVD) 또는 다른 광 저장, 자기 카세트들, 자기 테이프 저장 또는 다른 자기 저장 디바이스들 또는 임의의 다른 비-송신 매체를 포함하지만, 이에 제한되지 않고, 컴퓨팅 디바이스들에 의해 액세스될 수 있는 정보를 저장하도록 구성될 수 있다. 이 논문에서의 정의에 따르면, 컴퓨터 판독 가능 저장 매체는 변조된 데이터 신호들 및 반송파들과 같은 일시적 매체(transitory media)를 포함하지 않는다.Computer-readable storage media includes permanent and non-persistent, removable and non-removable media, and storage of information may be realized by any method or technology. The information may be computer readable instructions, data structures, program modules, or other data. Examples of computer storage media include phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory ( RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, CD-ROM, digital versatile disk (DVD) or other optical storage, magnetic cassettes, magnetic tape storage or other magnetic storage devices or any other non-transmitting medium, including but not limited to, may be configured to store information that can be accessed by computing devices. According to the definition in this paper, computer readable storage media does not include transitory media such as modulated data signals and carrier waves.

위의 구현들의 설명으로부터, 본 기술분야의 통상의 기술자들은 본 명세서의 실시예들이 소프트웨어와 필요한 범용 하드웨어 플랫폼에 의해 구현될 수 있다는 것을 명확히 이해할 수 있다는 것을 알 수 있다. 이러한 이해에 기초하여, 본 설명의 실시예들의 기술적 해결책들에 대해, 그들의 필수 부분, 다시 말해서 선행 기술에 기여하는 부분은, 소프트웨어 제품의 형태로 구체화될 수 있다. 컴퓨터 소프트웨어 제품은 저장 매체에 저장될 수 있다. 예를 들어, ROM/RAM, 자기 디스크, 광 디스크 등 컴퓨터 소프트웨어 제품은 컴퓨터 디바이스(개인용 컴퓨터, 서버, 또는 네트워크 디바이스 등일 수 있음)가 본 설명의 각각의 실시예 또는 실시예의 일부 부분에 설명된 방법을 실행할 수 있게 하는 수 개의 명령어들을 포함할 수 있다.From the above description of implementations, it can be seen that those skilled in the art can clearly understand that the embodiments herein can be implemented by software and a necessary general-purpose hardware platform. Based on this understanding, for the technical solutions of the embodiments of this description, their essential part, that is, the part contributing to the prior art, may be embodied in the form of a software product. A computer software product may be stored on a storage medium. For example, a computer software product, such as a ROM/RAM, a magnetic disk, an optical disk, or the like, may be a computer device (which may be a personal computer, a server, a network device, or the like) in each embodiment of this description or a method described in some portion of the embodiment. can contain several commands that allow you to run

위의 실시예들에서 설명된 시스템들, 장치들, 모듈들, 또는 유닛들은 컴퓨터 칩들 또는 엔티티들에 의해 구현되거나, 특정 기능들을 갖는 제품들에 의해 구현될 수 있다. 전형적인 구현 장치는 컴퓨터이고, 컴퓨터의 특정 형태는 개인용 컴퓨터, 랩톱 컴퓨터, 셀룰러 폰, 카메라 폰, 스마트 폰, 개인 휴대 정보 단말(personal digital assistant), 미디어 플레이어, 내비게이션 디바이스, 이메일 송수신기 디바이스, 게임 콘솔, 태블릿 컴퓨터, 웨어러블 디바이스, 또는 이러한 디바이스들 중 임의의 것의 조합일 수 있다.The systems, devices, modules, or units described in the above embodiments may be implemented by computer chips or entities, or by products having specific functions. Typical implementations are computers, some types of computers being personal computers, laptop computers, cellular phones, camera phones, smart phones, personal digital assistants, media players, navigation devices, e-mail transceiver devices, game consoles, It may be a tablet computer, a wearable device, or a combination of any of these devices.

본 명세서의 다양한 실시예들은 점진적인 방식으로 설명되고, 서로에 대해 서로 유사한 부분들이 참조될 수 있다. 각각의 실시예의 설명은 다른 실시예들과 상이하다. 특히, 장치 실시예들에 대해, 그들은 기본적으로 방법 실시예들과 유사하기 때문에, 설명이 단순화되고, 방법 실시예들의 설명의 대응하는 부분들이 참조될 수 있다. 전술된 장치 실시예들은, 별개의 컴포넌트들로서 설명된 모듈들이 물리적으로 분리되거나 또는 분리되지 않을 수 있고, 모듈들의 기능들은 본 설명의 실시예들이 구현될 때 하나 이상의 소프트웨어 및/또는 하드웨어로 구현될 수 있는 단지 개략적인 것들이다. 모듈들의 일부 또는 전부는 실시예들에서의 해결책들의 목적들을 구현하기 위해 실제 요건들에 따라 선택될 수 있다. 본 기술분야의 통상의 기술자들은 창의적인 작업 없이 본 개시내용을 이해하고 구현할 수 있다.Various embodiments herein have been described in a progressive manner, and reference may be made to like parts relative to each other. The description of each embodiment differs from other embodiments. In particular, for the device embodiments, since they are basically similar to the method embodiments, the description is simplified, and reference can be made to corresponding parts of the description of the method embodiments. In the foregoing device embodiments, modules described as separate components may or may not be physically separated, and functions of the modules may be implemented in one or more software and/or hardware when embodiments of the present description are implemented. These are just sketchy things. Some or all of the modules may be selected according to actual requirements to implement the objectives of the solutions in the embodiments. Those skilled in the art can understand and implement the present disclosure without creative work.

Claims

As a three-dimensional reconstruction method,
Obtaining initial values of parameters of the target object by performing 3D reconstruction on a target object in the image through a 3D reconstruction network, wherein the initial values of the parameters are used to build a 3D model of the target object. ;
obtaining an optimized value of the parameter by optimizing an initial value of the parameter based on previously obtained supervision information representing characteristics of the target object;
and constructing a 3D model of the target object by performing skeletal skinning based on the optimized values of the parameters.

According to claim 1,
the supervisory information includes first supervisory information or the supervisory information includes first supervisory information and second supervisory information;
the first supervision information includes at least one of an initial two-dimensional key point of the target object and semantic information of a plurality of pixel points of the target object in the image;
The 3D reconstruction method, wherein the second supervision information includes an initial 3D point cloud of the surface of the target object.

According to claim 2,
The 3D reconstruction method further comprising extracting information of an initial 2D key point of the target object from the image through a key point extraction network.

According to claim 2 or 3,
the image includes a depth map of the target object;
The 3D reconstruction method,
extract depth information of the plurality of pixel points of the target object from the depth map;
and back-projecting the plurality of pixel points of the target object in the depth map into a 3-dimensional space based on the depth information to obtain the initial 3-dimensional point cloud of the surface of the target object. Dimensional reconstruction method.

According to claim 4,
the image further includes an RGB image of the target object; Extracting depth information of the plurality of pixel points of the target object from the depth map includes:
performing image segmentation on the RGB image;
determining an image area in which the target object is located in the RGB image based on a result of image segmentation;
determining an image area where the target object is located in the depth map based on an image area where the target object is located in the RGB image;
and obtaining depth information of the plurality of pixel points of an image area where the target object in the depth map is located.

According to any one of claims 2 to 5,
The 3D reconstruction method is:
The 3D reconstruction method further comprising filtering outliers among the initial 3D point clouds and using the filtered initial 3D point clouds as the second supervisory information.

According to any one of claims 1 to 6,
An image of the target object is collected and obtained through an image collecting device, and the parameters include a global rotation parameter of the target object, a key point rotation parameter of each key point of the target object, a shape parameter of the target object, and the image collecting device. includes a displacement parameter of;
The step of optimizing the initial value of the parameter based on previously obtained supervision information representing the characteristics of the target object,
In a situation where the initial value of the posture parameter and the initial value of the key point rotation parameter do not change, the current value of the displacement parameter and the global rotation parameter of the image acquisition device based on the supervision information and the initial value of the displacement parameter optimizing an initial value of to obtain an optimized value of the displacement parameter and an optimized value of the global rotation parameter;
An initial value of the key point rotation parameter and an initial value of the shape parameter are optimized based on the optimized value of the displacement parameter and the optimized value of the global rotation parameter, so that the optimized value of the key point rotation parameter and the shape parameter A three-dimensional reconstruction method comprising the step of obtaining an optimized value of

According to claim 7,
the supervision information includes an initial two-dimensional key point of the target object;
The step of optimizing the current value of the displacement parameter and the initial value of the global rotation parameter of the image acquisition device based on the supervision information and the initial value of the displacement parameter,
obtaining a target 2D projection key point belonging to a preset portion of the target object among 2D projection key points corresponding to the 3D key point of the target object, wherein the 3D key point of the target object corresponds to the global rotation parameter Obtained based on the initial value of , the initial value of the key point rotation parameter and the initial value of the shape parameter, and the 2D projection key point is obtained based on the current value of the displacement parameter and the initial value of the global rotation parameter. Obtained by performing projection on the three-dimensional key points of -;
obtaining a first loss between the target 2D projection key point and the initial 2D key point;
obtaining a second loss between the initial value of the displacement parameter and the current value of the displacement parameter;
and optimizing a current value of the displacement parameter and an initial value of the global rotation parameter based on the first loss and the second loss.

According to claim 7 or 8,
the supervision information includes an initial two-dimensional key point of the target object; Optimizing the initial value of the key point rotation parameter and the initial value of the shape parameter based on the optimized value of the displacement parameter and the optimized value of the global rotation parameter comprises:
obtaining a third loss between an optimized 2D projection key point of the target object and the initial 2D key point, wherein the optimized 2D projection key point is an optimized value of the displacement parameter and an optimized value of the global rotation parameter. obtained by performing projection on an optimized 3D key point of the target object based on a value, wherein the optimized 3D key point is an optimized value of the global rotation parameter, an initial value of the key point rotation parameter, and the shape parameter. Obtained based on initial value -;
obtaining a fourth loss for representing the rationality of a posture corresponding to the optimized value of the global rotation parameter, the initial value of the key point rotation parameter, and the initial value of the posture parameter;
and optimizing an initial value of the key point rotation parameter and an initial value of the shape parameter based on the third loss and the fourth loss.

According to any one of claims 7 to 9,
After optimizing the initial value of the key point rotation parameter and the initial value of the shape parameter based on the optimized value of the displacement parameter and the optimized value of the global rotation parameter, the 3D reconstruction method comprises:
Performing joint optimization on the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter, the optimized value of the shape parameter and the optimized value of the displacement parameter. 3D reconstruction method.

According to claim 10,
the supervision information includes an initial two-dimensional key point of the target object and an initial three-dimensional point cloud of a surface of the target object; The step of optimizing the current value of the displacement parameter and the initial value of the global rotation parameter of the image acquisition device based on the supervision information and the initial value of the displacement parameter,
obtaining a target 2D projection key point belonging to a preset portion of the target object among 2D projection key points corresponding to the 3D key point of the target object, wherein the 3D key point of the target object corresponds to the global rotation parameter is obtained based on an initial value of , an initial value of the key point rotation parameter and an initial value of the shape parameter, and the two-dimensional projection key point is obtained based on the current value of the displacement parameter and the initial value of the global rotation parameter. Obtained by performing projection on the object's three-dimensional key points -;
obtaining a first loss between the target 2D projection key point and the initial 2D key point;
obtaining a second loss between the initial value of the displacement parameter and the current value of the displacement parameter;
obtaining a fifth loss between a first 3D point cloud of the target object surface and the initial 3D point cloud, wherein the first 3D point cloud is the initial value of the global rotation parameter, the key point rotation parameter Obtained based on the initial value and the initial value of the state parameter -;
and optimizing a current value of the displacement parameter and an initial value of the global rotation parameter based on the first loss, the second loss, and the fifth loss.

According to claim 10 or 11,
Performing joint optimization on the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter, the optimized value of the shape parameter and the optimized value of the displacement parameter:
obtaining a sixth loss between an optimized 2D projection key point of the target object and the initial 2D key point, wherein the optimized 2D projection key point is an optimized value of the displacement parameter and an optimized value of the global rotation parameter. obtained by performing projection on an optimized 3D key point of the target object based on a value, wherein the optimized 3D key point includes an optimized value of the global rotation parameter, an optimized value of the key point rotation parameter, and the shape parameter. Obtained based on the optimized value of -;
obtaining a seventh loss for representing the rationality of a posture corresponding to the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter, and the optimized value of the posture parameter;
obtaining an eighth loss between a second 3D point cloud of the target object surface and the initial 3D point cloud, wherein the second 3D point cloud is an optimized value of the global rotation parameter, the key point rotation parameter Obtained based on the optimized value of and the optimized value of the state parameter -;
An optimized value of the global rotation parameter, an optimized value of the key point rotation parameter, an optimized value of the shape parameter, and an optimized value of the displacement parameter based on the sixth loss, the seventh loss, and the eighth loss. A three-dimensional reconstruction method comprising the step of performing federated optimization on the values.

As a three-dimensional reconstruction device,
A first 3D reconstruction module for performing 3D reconstruction on a target object in an image through a 3D reconstruction network to obtain initial values of parameters of the target object, wherein the initial values of the parameters form a 3D model of the target object used to build -;
an optimization module for obtaining an optimized value of the parameter by optimizing an initial value of the parameter based on previously obtained supervision information representing characteristics of the target object;
and a second 3D reconstruction module for constructing a 3D model of the target object by performing skeletal skinning based on the optimized values of the parameters.

As a three-dimensional reconstruction system,
an image collecting device for collecting an image of a target object; and
a processing unit communicatively connected with the image collection device, which is used to obtain initial values of parameters of the target object by performing 3D reconstruction on the target object in the image through a 3D reconstruction network, wherein the initial values of the parameters are used to build a three-dimensional model of the target object; optimize an initial value of the parameter according to previously obtained supervision information representing characteristics of the target object to obtain an optimized value of the parameter; and constructing a 3D model of the target object by performing skeletal skinning based on the optimized values of the parameters.

As a computer readable storage medium,
A computer readable storage medium in which a computer program is stored and the method according to any one of claims 1 to 12 is implemented when the computer program is executed by a processor.

As a computer device,
A computer comprising a memory, a processor and a computer program stored in the memory and executable by the processor, wherein the method according to any one of claims 1 to 12 is implemented when the program is executed by the processor. device.