TW202244853A

TW202244853A - 3d reconstruction method, apparatus and system, storage medium and computer device

Info

Publication number: TW202244853A
Application number: TW111111578A
Authority: TW
Inventors: 曹智杰; 汪旻; 劉文韜; 錢晨; 馬利莊
Original assignee: 大陸商上海商湯智能科技有限公司
Priority date: 2021-05-10
Filing date: 2022-03-28
Publication date: 2022-11-16
Also published as: CN113160418A; KR20230078777A; JP2023547888A; WO2022237249A1

Abstract

The present disclosure provides a 3D reconstruction method, an apparatus and a system, a medium and a computer device. The method includes: performing 3D reconstruction of a target object in an image by a 3D reconstruction network to obtain initial values of parameters of the target object, wherein the initial values of the parameters are used to construct a 3D model of the target object; optimizing the initial values of the parameters based on pre-obtained supervision information for representing features of the target object to obtain optimized values of the parameters; and performing skinned mesh based on the optimized values of the parameters to construct the 3D model of the target object.

Description

Three-dimensional reconstruction method, device and system, media and computer equipment

本公開涉及計算機視覺技術領域，尤其涉及三維重建方法、裝置和系統、媒體及計算機設備。The present disclosure relates to the technical field of computer vision, in particular to a three-dimensional reconstruction method, device and system, media and computer equipment.

三維重建是計算機視覺中的重要技術之一，在擴增實境，虛擬實境等領域有許多潛在的應用。通過對目標對象進行三維重建，能夠重建出目標對象的體態和肢體旋轉。然而，傳統的三維重建方式無法兼顧重建結果的準確性和可靠性。3D reconstruction is one of the important technologies in computer vision, and has many potential applications in augmented reality, virtual reality and other fields. By performing three-dimensional reconstruction on the target object, the posture and limb rotation of the target object can be reconstructed. However, traditional 3D reconstruction methods cannot balance the accuracy and reliability of reconstruction results.

本公開提供一種三維重建方法、裝置和系統、媒體及計算機設備。The present disclosure provides a three-dimensional reconstruction method, device and system, media and computer equipment.

根據本公開實施例的第一方面，提供一種三維重建方法，所述方法包括：通過三維重建網路對圖像中的目標對象進行三維重建，得到所述目標對象的參數的初始值，其中，所述參數的初始值用於建立所述目標對象的三維模型；基於預先獲取的用於表示目標對象的特徵的監督資訊對所述參數的初始值進行優化，得到所述參數的優化值；基於所述參數的優化值進行骨骼蒙皮處理，建立所述目標對象的三維模型。According to the first aspect of the embodiments of the present disclosure, there is provided a 3D reconstruction method, the method comprising: performing 3D reconstruction on a target object in an image through a 3D reconstruction network to obtain an initial value of a parameter of the target object, wherein, The initial value of the parameter is used to establish the three-dimensional model of the target object; the initial value of the parameter is optimized based on the pre-acquired supervision information used to represent the characteristics of the target object to obtain the optimized value of the parameter; based on The optimized values of the parameters are subjected to bone skinning processing to establish a three-dimensional model of the target object.

在一些實施例中，所述監督資訊包括第一監督資訊，或者所述監督資訊包括第一監督資訊和第二監督資訊；所述第一監督資訊包括以下至少一者：所述目標對象的初始二維關鍵點，所述圖像中所述目標對象上的多個像素點的語意資訊；所述第二監督資訊包括所述目標對象表面的初始三維點雲。本公開實施例可以僅採用目標對象的初始二維關鍵點或者像素點的語意資訊作為監督資訊來對所述參數的初始值進行優化，優化效率較高，優化複雜度低；或者，也可以將目標對象表面的初始三維點雲與前述的初始二維關鍵點或者像素點的語意資訊共同作為監督資訊，從而提高獲取的參數的優化值的準確度。In some embodiments, the supervision information includes first supervision information, or the supervision information includes first supervision information and second supervision information; the first supervision information includes at least one of the following: the target object's initial Two-dimensional key points, semantic information of multiple pixel points on the target object in the image; the second supervisory information includes an initial three-dimensional point cloud of the surface of the target object. In the embodiment of the present disclosure, only the initial two-dimensional key points of the target object or the semantic information of the pixels can be used as the supervisory information to optimize the initial value of the parameter, so that the optimization efficiency is high and the optimization complexity is low; or, the The initial 3D point cloud of the surface of the target object and the semantic information of the aforementioned initial 2D key points or pixels are used together as supervisory information, thereby improving the accuracy of the optimized value of the obtained parameters.

在一些實施例中，所述方法還包括：通過關鍵點提取網路從所述圖像中提取所述目標對象的初始二維關鍵點的資訊。將關鍵點提取網路提取出的初始二維關鍵點的資訊作為監督資訊，能夠為三維模型生成較為自然合理的動作。In some embodiments, the method further includes: extracting initial two-dimensional keypoint information of the target object from the image through a keypoint extraction network. Using the information of the initial 2D key points extracted by the key point extraction network as supervision information can generate more natural and reasonable actions for the 3D model.

在一些實施例中，所述圖像包括所述目標對象的深度圖像；所述方法還包括：從所述深度圖像中提取所述目標對象上多個像素點的深度資訊；基於所述深度資訊將所述深度圖像中所述目標對象上的多個像素點反向投影到三維空間，得到所述目標對象表面的初始三維點雲。通過提取深度資訊，並基於深度資訊將二維圖像上的像素點反向投影到三維空間，得到目標對象表面的初始三維點雲，從而能夠將該初始三維點雲作為監督資訊來優化參數的初始值，進一步提高了參數優化的準確性。In some embodiments, the image includes a depth image of the target object; the method further includes: extracting depth information of multiple pixels on the target object from the depth image; based on the The depth information back-projects multiple pixel points on the target object in the depth image to a three-dimensional space to obtain an initial three-dimensional point cloud on the surface of the target object. By extracting the depth information and back-projecting the pixels on the 2D image to the 3D space based on the depth information, the initial 3D point cloud of the surface of the target object can be obtained, so that the initial 3D point cloud can be used as the supervisory information to optimize the parameters. The initial value further improves the accuracy of parameter optimization.

在一些實施例中，所述圖像還包括所述目標對象的RGB圖像；所述從所述深度圖像中提取所述目標對象上多個像素點的深度資訊，包括：對所述RGB圖像進行圖像分割，基於圖像分割的結果確定所述RGB圖像中目標對象所在的圖像區域，基於所述RGB圖像中目標對象所在的圖像區域確定所述深度圖像中目標對象所在的圖像區域；獲取所述深度圖像中所述目標對象所在的圖像區域中多個像素點的深度資訊。通過對RGB圖像進行圖像分割，能夠準確地確定目標對象的位置，從而準確地提取出目標對象的深度資訊。In some embodiments, the image further includes an RGB image of the target object; the extracting the depth information of multiple pixels on the target object from the depth image includes: Carry out image segmentation on the image, determine the image area where the target object is located in the RGB image based on the results of the image segmentation, and determine the target in the depth image based on the image area where the target object is located in the RGB image An image area where the object is located; obtaining depth information of multiple pixels in the image area where the target object is located in the depth image. By performing image segmentation on the RGB image, the position of the target object can be accurately determined, thereby accurately extracting the depth information of the target object.

在一些實施例中，所述方法還包括：從所述初始三維點雲中過濾掉離群點，將過濾後的所述初始三維點雲作為所述第二監督資訊。通過過濾離群點，從而減輕離群點的干擾，進一步提高了參數優化過程的準確性。In some embodiments, the method further includes: filtering outliers from the initial 3D point cloud, and using the filtered initial 3D point cloud as the second supervisory information. By filtering the outliers, the interference of the outliers is reduced, and the accuracy of the parameter optimization process is further improved.

在一些實施例中，所述目標對象的圖像通過圖像採集裝置採集得到，所述參數包括：所述目標對象的全域旋轉參數、所述目標對象各個關鍵點的關鍵點旋轉參數、所述目標對象的體態參數以及所述圖像採集裝置的位移參數；所述基於預先獲取的用於表示目標對象特徵的監督資訊對所述參數的初始值進行優化，包括：在所述體態參數的初始值和關鍵點旋轉參數的初始值保持不變的情況下，基於所述監督資訊和所述位移參數的初始值，對所述圖像採集裝置的位移參數的當前值以及所述全域旋轉參數的初始值進行優化，得到位移參數的優化值和全域旋轉參數的優化值；基於所述位移參數的優化值和全域旋轉參數的優化值，對所述關鍵點旋轉參數的初始值和所述體態參數的初始值進行優化，得到關鍵點旋轉參數的優化值和體態參數的優化值。由於在優化過程中，改變圖像採集裝置的位置與改變三維關鍵點位置均可以導致三維關鍵點的二維投影產生變化，這將會導致優化過程很不穩定。通過採用兩階段優化的方式，先固定關鍵點旋轉參數的初始值和體態參數的初始值來對圖像採集裝置的位移參數的初始值和全域旋轉參數的初始值進行優化，再固定位移參數的初始值和全域旋轉參數的初始值，對關鍵點旋轉參數的初始值和體態參數的初始值進行優化，提高了優化過程的穩定性。In some embodiments, the image of the target object is collected by an image acquisition device, and the parameters include: global rotation parameters of the target object, key point rotation parameters of each key point of the target object, the The body parameters of the target object and the displacement parameters of the image acquisition device; the optimization of the initial value of the parameters based on the pre-acquired supervision information representing the characteristics of the target object includes: value and the initial value of the key point rotation parameter remain unchanged, based on the supervisory information and the initial value of the displacement parameter, the current value of the displacement parameter of the image acquisition device and the global rotation parameter Optimizing the initial value to obtain the optimal value of the displacement parameter and the optimal value of the global rotation parameter; Optimize the initial value of the key point rotation parameter and the optimal value of the body shape parameter. During the optimization process, changing the position of the image acquisition device and changing the position of the 3D key points can lead to changes in the 2D projection of the 3D key points, which will lead to an unstable optimization process. By using a two-stage optimization method, firstly fix the initial value of the key point rotation parameter and the initial value of the body shape parameter to optimize the initial value of the displacement parameter of the image acquisition device and the initial value of the global rotation parameter, and then fix the displacement parameter The initial value and the initial value of the global rotation parameter optimize the initial value of the key point rotation parameter and the initial value of the body shape parameter, which improves the stability of the optimization process.

在一些實施例中，所述監督資訊包括所述目標對象的初始二維關鍵點；所述基於所述監督資訊和所述位移參數的初始值，對所述圖像採集裝置的位移參數的當前值以及所述全域旋轉參數的初始值進行優化，包括：獲取所述目標對象的三維關鍵點對應的二維投影關鍵點中屬於所述目標對象的預設部位的目標二維投影關鍵點；其中，所述目標對象的三維關鍵點基於所述全域旋轉參數的初始值、關鍵點旋轉參數的初始值和體態參數的初始值得到，所述二維投影關鍵點基於所述位移參數的當前值和全域旋轉參數的初始值對所述目標對象的三維關鍵點進行投影得到；獲取所述目標二維投影關鍵點與所述初始二維關鍵點之間的第一損失；獲取所述位移參數的初始值與所述位移參數的當前值之間的第二損失；基於所述第一損失和第二損失對所述位移參數的當前值和全域旋轉參數的初始值進行優化。預設部位可以是軀幹等部位，由於不同的動作對軀幹部位的關鍵點的影響較小，因此，通過採用軀幹部位的關鍵點確定第一損失，能夠減輕不同動作對關鍵點位置的影響，提高優化結果的準確性。由於二維關鍵點是二維平面上的監督資訊，而圖像採集裝置的位移參數是三維平面上的參數，通過獲取第二損失，能夠減少優化結果落入二維平面上的局部最優點從而偏離真實點的情況。In some embodiments, the supervision information includes the initial two-dimensional key points of the target object; the current displacement parameter of the image acquisition device based on the supervision information and the initial value of the displacement parameter value and the initial value of the global rotation parameter, including: obtaining the target two-dimensional projection key point corresponding to the three-dimensional key point of the target object in the two-dimensional projection key point belonging to the preset part of the target object; wherein , the 3D key point of the target object is obtained based on the initial value of the global rotation parameter, the initial value of the key point rotation parameter and the initial value of the body shape parameter, and the 2D projection key point is based on the current value of the displacement parameter and The initial value of the global rotation parameter is obtained by projecting the three-dimensional key point of the target object; obtaining the first loss between the target two-dimensional projection key point and the initial two-dimensional key point; obtaining the initial value of the displacement parameter A second loss between the value and the current value of the displacement parameter; optimizing the current value of the displacement parameter and the initial value of the global rotation parameter based on the first loss and the second loss. The preset part can be the torso and other parts. Since different actions have little influence on the key points of the torso, the first loss can be determined by using the key points of the torso, which can reduce the influence of different actions on the position of the key points and improve Optimize the accuracy of the results. Since the two-dimensional key points are supervisory information on the two-dimensional plane, and the displacement parameters of the image acquisition device are parameters on the three-dimensional plane, by obtaining the second loss, it is possible to reduce the optimization result falling into the local optimal point on the two-dimensional plane, thereby Situations that deviate from the true point.

在一些實施例中，所述監督資訊包括所述目標對象的初始二維關鍵點；所述基於所述位移參數的優化值和全域旋轉參數的優化值，對所述關鍵點旋轉參數的初始值和所述體態參數的初始值進行優化，包括：獲取所述目標對象的優化二維投影關鍵點與所述初始二維關鍵點之間的第三損失，所述優化二維投影關鍵點基於所述位移參數的優化值和全域旋轉參數的優化值對所述目標對象的優化三維關鍵點進行投影得到，所述優化三維關鍵點基於所述全域旋轉參數的優化值、關鍵點旋轉參數的初始值和體態參數的初始值得到；獲取第四損失，所述第四損失用於表徵所述全域旋轉參數的優化值、關鍵點旋轉參數的初始值和體態參數的初始值對應的姿態的合理性；基於所述第三損失和所述第四損失對所述關鍵點旋轉參數的初始值和所述體態參數的初始值進行優化。本實施例基於位移參數的優化值和全域旋轉參數的優化值對關鍵點旋轉參數的初始值和體態參數的初始值進行優化，提高了優化過程的穩定性，同時，通過第四損失保證了優化後的參數對應的姿態的合理性。In some embodiments, the supervisory information includes initial two-dimensional key points of the target object; the optimal value based on the displacement parameter and the global rotation parameter is based on the initial value of the key point rotation parameter Optimizing with the initial value of the posture parameter includes: obtaining the third loss between the optimized two-dimensional projection key point of the target object and the initial two-dimensional key point, and the optimized two-dimensional projection key point is based on the The optimized value of the displacement parameter and the optimized value of the global rotation parameter are obtained by projecting the optimized three-dimensional key point of the target object, and the optimized three-dimensional key point is based on the optimized value of the global rotation parameter and the initial value of the key point rotation parameter And the initial value of the posture parameter is obtained; the fourth loss is obtained, and the fourth loss is used to characterize the rationality of the posture corresponding to the optimal value of the global rotation parameter, the initial value of the key point rotation parameter, and the initial value of the posture parameter; Optimizing the initial value of the key point rotation parameter and the initial value of the body shape parameter based on the third loss and the fourth loss. This embodiment optimizes the initial value of the key point rotation parameter and the initial value of the body shape parameter based on the optimized value of the displacement parameter and the optimized value of the global rotation parameter, which improves the stability of the optimization process. At the same time, the fourth loss ensures the optimization The latter parameters correspond to the rationality of the pose.

在一些實施例中，所述方法還包括：在基於所述位移參數的優化值和全域旋轉參數的優化值，對所述關鍵點旋轉參數的初始值和所述體態參數的初始值進行優化之後，對所述全域旋轉參數的優化值，所述關鍵點旋轉參數的優化值，體態參數的優化值以及所述位移參數的優化值進行聯合優化。本實施例在前述優化的基礎上，對優化後的各項參數進行聯合優化，從而進一步提高了優化結果的準確性。In some embodiments, the method further includes: after optimizing the initial value of the key point rotation parameter and the initial value of the body shape parameter based on the optimized value of the displacement parameter and the optimized value of the global rotation parameter , performing joint optimization on the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter, the optimized value of the body shape parameter and the optimized value of the displacement parameter. In this embodiment, on the basis of the aforementioned optimization, the optimized parameters are jointly optimized, thereby further improving the accuracy of the optimization result.

在一些實施例中，所述監督資訊包括所述目標對象的初始二維關鍵點和所述目標對象表面的初始三維點雲；所述基於所述監督資訊和所述位移參數的初始值，對所述圖像採集裝置的位移參數的當前值以及所述全域旋轉參數的初始值進行優化，包括：獲取所述目標對象的三維關鍵點對應的二維投影關鍵點中屬所述目標對象的預設部位的目標二維投影關鍵點；其中，所述目標對象的三維關鍵點基於所述全域旋轉參數的初始值、關鍵點旋轉參數的初始值和體態參數的初始值得到，所述二維投影關鍵點基於所述位移參數的當前值和全域旋轉參數的初始值對所述目標對象的三維關鍵點進行投影得到；獲取所述目標二維投影關鍵點與所述初始二維關鍵點之間的第一損失；獲取所述位移參數的初始值與所述位移參數的當前值之間的第二損失；獲取所述目標對象表面的第一三維點雲與所述初始三維點雲之間的第五損失；所述第一三維點雲基於所述全域旋轉參數的初始值、關鍵點旋轉參數的初始值和體態參數的初始值得到；基於所述第一損失、第二損失和第五損失對所述位移參數的當前值和全域旋轉參數的初始值進行優化。本實施例將三維點雲加入到監督資訊中對初始的各項參數進行優化，從而提高了優化結果的準確性。In some embodiments, the supervisory information includes initial 2D key points of the target object and an initial 3D point cloud of the surface of the target object; based on the supervisory information and the initial value of the displacement parameter, the Optimizing the current value of the displacement parameter of the image acquisition device and the initial value of the global rotation parameter includes: obtaining the predicted value of the target object in the two-dimensional projection key point corresponding to the three-dimensional key point of the target object. Set the target two-dimensional projection key point of the part; wherein, the three-dimensional key point of the target object is obtained based on the initial value of the global rotation parameter, the initial value of the key point rotation parameter and the initial value of the body shape parameter, and the two-dimensional projection The key points are obtained by projecting the three-dimensional key points of the target object based on the current value of the displacement parameter and the initial value of the global rotation parameter; obtaining the distance between the target two-dimensional projected key point and the initial two-dimensional key point The first loss; obtaining the second loss between the initial value of the displacement parameter and the current value of the displacement parameter; obtaining the first loss between the first three-dimensional point cloud of the surface of the target object and the initial three-dimensional point cloud Five losses; the first three-dimensional point cloud is obtained based on the initial value of the global rotation parameter, the initial value of the key point rotation parameter and the initial value of the body shape parameter; based on the first loss, the second loss and the fifth loss pair The current value of the displacement parameter and the initial value of the global rotation parameter are optimized. In this embodiment, the three-dimensional pointfish is added to the supervisory information to optimize various initial parameters, thereby improving the accuracy of the optimization result.

在一些實施例中，所述對所述全域旋轉參數的優化值，所述關鍵點旋轉參數的優化值，體態參數的優化值以及所述位移參數的優化值進行聯合優化，包括：獲取所述目標對象的優化二維投影關鍵點與所述初始二維關鍵點之間的第六損失，所述優化二維投影關鍵點基於所述位移參數的優化值和全域旋轉參數的優化值對所述目標對象的優化三維關鍵點進行投影得到，所述優化三維關鍵點基於所述全域旋轉參數的優化值、關鍵點旋轉參數的優化值和體態參數的優化值得到；獲取第七損失，所述第七損失用於表徵所述全域旋轉參數的優化值、關鍵點旋轉參數的優化值和體態參數的優化值對應的姿態的合理性；獲取所述目標對象表面的第二三維點雲與所述初始三維點雲之間的第八損失；所述第二三維點雲基於所述全域旋轉參數的優化值、關鍵點旋轉參數的優化值和體態參數的優化值得到；基於所述第六損失、第七損失和第八損失對所述全域旋轉參數的優化值，所述關鍵點旋轉參數的優化值，體態參數的優化值以及所述位移參數的優化值進行聯合優化。本實施例將三維點雲加入到監督資訊中對初始的各項參數進行優化，從而提高了優化結果的準確性。In some embodiments, the joint optimization of the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter, the optimized value of the body shape parameter and the optimized value of the displacement parameter includes: obtaining the A sixth loss between the optimized two-dimensional projection keypoint of the target object and the initial two-dimensional keypoint, the optimized two-dimensional projection keypoint is based on the optimized value of the displacement parameter and the optimized value of the global rotation parameter on the The optimized three-dimensional key points of the target object are obtained by projection, and the optimized three-dimensional key points are obtained based on the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter, and the optimized value of the body shape parameter; the seventh loss is obtained, and the first Seven losses are used to characterize the rationality of the posture corresponding to the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter and the optimized value of the posture parameter; obtain the second 3D point cloud of the surface of the target object and the initial The eighth loss among three-dimensional point clouds; the second three-dimensional point cloud is obtained based on the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter and the optimized value of the body shape parameter; based on the sixth loss, the first The seventh loss and the eighth loss jointly optimize the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter, the optimized value of the body shape parameter and the optimized value of the displacement parameter. In this embodiment, the three-dimensional pointfish is added to the supervisory information to optimize various initial parameters, thereby improving the accuracy of the optimization result.

根據本公開實施例的第二方面，提供一種三維重建裝置，所述裝置包括：第一三維重建模組，用於通過三維重建網路對圖像中的目標對象進行三維重建，得到所述目標對象的參數的初始值，其中，所述參數的初始值用於建立所述目標對象的三維模型；優化模組，用於基於預先獲取的用於表示目標對象的特徵的監督資訊對所述參數的初始值進行優化，得到參數的優化值；第二三維重建模組，用於基於所述參數的優化值進行骨骼蒙皮處理，建立所述目標對象的三維模型。According to a second aspect of an embodiment of the present disclosure, there is provided a 3D reconstruction device, the device comprising: a first 3D reconstruction group, configured to perform 3D reconstruction on a target object in an image through a 3D reconstruction network to obtain the target The initial value of the parameter of the object, wherein the initial value of the parameter is used to establish the three-dimensional model of the target object; the optimization module is used to optimize the parameter based on the pre-acquired supervision information used to represent the characteristics of the target object Optimize the initial value of the parameter to obtain the optimized value of the parameter; the second three-dimensional reconstruction group is used to perform bone skinning processing based on the optimized value of the parameter, and establish a three-dimensional model of the target object.

在一些實施例中，所述裝置還包括：二維關鍵點提取模組，用於通過關鍵點提取網路從所述圖像中提取所述目標對象的初始二維關鍵點的資訊。將關鍵點提取網路提取出的初始二維關鍵點的資訊作為監督資訊，能夠為三維模型生成較為自然合理的動作。In some embodiments, the device further includes: a 2D key point extraction module, configured to extract initial 2D key point information of the target object from the image through a key point extraction network. Using the information of the initial 2D key points extracted by the key point extraction network as supervision information can generate more natural and reasonable actions for the 3D model.

在一些實施例中，所述圖像包括所述目標對象的深度圖像；所述裝置還包括：深度資訊提取模組，用於從所述深度圖像中提取所述目標對象上多個像素點的深度資訊；反向投影模組，用於基於所述深度資訊將所述深度圖像中所述目標對象上的多個像素點反向投影到三維空間，得到所述目標對象表面的初始三維點雲。通過提取深度資訊，並基於深度資訊將二維圖像上的像素點反向投影到三維空間，得到目標對象表面的初始三維點雲，從而能夠將該初始三維點雲作為監督資訊來優化參數的初始值，進一步提高了參數優化的準確性。In some embodiments, the image includes a depth image of the target object; the device further includes: a depth information extraction module, configured to extract a plurality of pixels on the target object from the depth image Depth information of points; a backprojection module, used to backproject multiple pixel points on the target object in the depth image to three-dimensional space based on the depth information, to obtain the initial surface of the target object 3D point cloud. By extracting the depth information and back-projecting the pixels on the 2D image to the 3D space based on the depth information, the initial 3D point cloud of the surface of the target object can be obtained, so that the initial 3D point cloud can be used as the supervisory information to optimize the parameters. The initial value further improves the accuracy of parameter optimization.

在一些實施例中，所述圖像還包括所述目標對象的RGB圖像；所述深度資訊提取模組包括：圖像分割單元，用於對所述RGB圖像進行圖像分割，圖像區域確定單元，用於基於圖像分割的結果確定所述RGB圖像中目標對象所在的圖像區域，基於所述RGB圖像中目標對象所在的圖像區域確定所述深度圖像中目標對象所在的圖像區域；深度資訊獲取單元，用於獲取所述深度圖像中所述目標對象所在的圖像區域中多個像素點的深度資訊。通過對RGB圖像進行圖像分割，能夠準確地確定目標對象的位置，從而準確地提取出目標對象的深度資訊。In some embodiments, the image further includes an RGB image of the target object; the depth information extraction module includes: an image segmentation unit for performing image segmentation on the RGB image, the image An area determining unit, configured to determine the image area where the target object is located in the RGB image based on the image segmentation result, and determine the target object in the depth image based on the image area where the target object is located in the RGB image The image area where the target object is located; a depth information acquisition unit, configured to acquire depth information of multiple pixels in the image area where the target object is located in the depth image. By performing image segmentation on the RGB image, the position of the target object can be accurately determined, thereby accurately extracting the depth information of the target object.

在一些實施例中，所述裝置還包括：過濾模組，用於從所述初始三維點雲中過濾掉離群點，將過濾後的所述初始三維點雲作為所述第二監督資訊。通過過濾離群點，從而減輕離群點的干擾，進一步提高了參數優化過程的準確性。In some embodiments, the device further includes: a filtering module, configured to filter out outliers from the initial 3D point cloud, and use the filtered initial 3D point cloud as the second supervisory information. By filtering the outliers, the interference of the outliers is reduced, and the accuracy of the parameter optimization process is further improved.

在一些實施例中，所述目標對象的圖像通過圖像採集裝置採集得到，所述參數包括：所述目標對象的全域旋轉參數、所述目標對象各個關鍵點的關鍵點旋轉參數、所述目標對象的體態參數以及所述圖像採集裝置的位移參數；所述優化模組包括：第一優化單元，用於在所述體態參數的初始值和關鍵點旋轉參數的初始值保持不變的情況下，基於所述監督資訊和所述位移參數的初始值，對所述圖像採集裝置的位移參數的當前值以及所述全域旋轉參數的初始值進行優化，得到位移參數的優化值和全域旋轉參數的優化值；第二優化單元，用於基於所述位移參數的優化值和全域旋轉參數的優化值，對所述關鍵點旋轉參數的初始值和所述體態參數的初始值進行優化，得到關鍵點旋轉參數的優化值和體態參數的優化值。由於在優化過程中，改變圖像採集裝置的位置與改變三維關鍵點位置均可以導致三維關鍵點的二維投影產生變化，這將會導致優化過程很不穩定。通過採用兩階段優化的方式，先固定關鍵點旋轉參數的初始值和體態參數的初始值來對圖像採集裝置的位移參數的初始值和全域旋轉參數的初始值進行優化，再固定位移參數的初始值和全域旋轉參數的初始值，對關鍵點旋轉參數的初始值和體態參數的初始值進行優化，提高了優化過程的穩定性。In some embodiments, the image of the target object is collected by an image acquisition device, and the parameters include: global rotation parameters of the target object, key point rotation parameters of each key point of the target object, the The posture parameters of the target object and the displacement parameters of the image acquisition device; the optimization module includes: a first optimization unit, which is used to keep the initial value of the posture parameter and the initial value of the key point rotation parameter unchanged In this case, based on the supervision information and the initial value of the displacement parameter, the current value of the displacement parameter of the image acquisition device and the initial value of the global rotation parameter are optimized to obtain the optimized value of the displacement parameter and the global rotation parameter. The optimized value of the rotation parameter; the second optimization unit is used to optimize the initial value of the key point rotation parameter and the initial value of the body shape parameter based on the optimized value of the displacement parameter and the optimized value of the global rotation parameter, The optimal values of the key point rotation parameters and the optimal values of the body parameters are obtained. During the optimization process, changing the position of the image acquisition device and changing the position of the 3D key points can lead to changes in the 2D projection of the 3D key points, which will lead to an unstable optimization process. By using a two-stage optimization method, firstly fix the initial value of the key point rotation parameter and the initial value of the body shape parameter to optimize the initial value of the displacement parameter of the image acquisition device and the initial value of the global rotation parameter, and then fix the displacement parameter The initial value and the initial value of the global rotation parameter optimize the initial value of the key point rotation parameter and the initial value of the body shape parameter, which improves the stability of the optimization process.

在一些實施例中，所述監督資訊包括所述目標對象的初始二維關鍵點；所述第一優化單元用於：獲取所述目標對象的三維關鍵點對應的二維投影關鍵點中屬於所述目標對象的預設部位的目標二維投影關鍵點；其中，所述目標對象的三維關鍵點基於所述全域旋轉參數的初始值、關鍵點旋轉參數的初始值和體態參數的初始值得到，所述二維投影關鍵點基於所述位移參數的當前值和全域旋轉參數的初始值對所述目標對象的三維關鍵點進行投影得到；獲取所述目標二維投影關鍵點與所述初始二維關鍵點之間的第一損失；獲取所述位移參數的初始值與所述位移參數的當前值之間的第二損失；基於所述第一損失和第二損失對所述位移參數的當前值和全域旋轉參數的初始值進行優化。預設部位可以是軀幹等部位，由於不同的動作對軀幹部位的關鍵點的影響較小，因此，通過採用軀幹部位的關鍵點確定第一損失，能夠減輕不同動作對關鍵點位置的影響，提高優化結果的準確性。由於二維關鍵點是二維平面上的監督資訊，而圖像採集裝置的位移參數是三維平面上的參數，通過獲取第二損失，能夠減少優化結果落入二維平面上的局部最優點從而偏離真實點的情況。In some embodiments, the supervisory information includes the initial two-dimensional key points of the target object; the first optimization unit is configured to: obtain the two-dimensional projection key points corresponding to the three-dimensional key points of the target object belonging to the The target two-dimensional projection key point of the preset part of the target object; wherein, the three-dimensional key point of the target object is obtained based on the initial value of the global rotation parameter, the initial value of the key point rotation parameter and the initial value of the body shape parameter, The two-dimensional projection key point is obtained by projecting the three-dimensional key point of the target object based on the current value of the displacement parameter and the initial value of the global rotation parameter; obtaining the target two-dimensional projection key point and the initial two-dimensional a first loss between key points; obtaining a second loss between an initial value of the displacement parameter and a current value of the displacement parameter; based on the first loss and the second loss on the current value of the displacement parameter and the initial value of the global rotation parameter are optimized. The preset part can be the torso and other parts. Since different actions have little influence on the key points of the torso, the first loss can be determined by using the key points of the torso, which can reduce the influence of different actions on the position of the key points and improve Optimize the accuracy of the results. Since the two-dimensional key points are supervisory information on the two-dimensional plane, and the displacement parameters of the image acquisition device are parameters on the three-dimensional plane, by obtaining the second loss, it is possible to reduce the optimization result falling into the local optimal point on the two-dimensional plane, thereby Situations that deviate from the true point.

在一些實施例中，所述監督資訊包括所述目標對象的初始二維關鍵點；所述第二優化單元用於：獲取所述目標對象的優化二維投影關鍵點與所述初始二維關鍵點之間的第三損失，所述優化二維投影關鍵點基於所述位移參數的優化值和全域旋轉參數的優化值對所述目標對象的優化三維關鍵點進行投影得到，所述優化三維關鍵點基於所述全域旋轉參數的優化值、關鍵點旋轉參數的初始值和體態參數的初始值得到；獲取第四損失，所述第四損失用於表徵所述全域旋轉參數的優化值、關鍵點旋轉參數的初始值和體態參數的初始值對應的姿態的合理性；基於所述第三損失和所述第四損失對所述關鍵點旋轉參數的初始值和所述體態參數的初始值進行優化。本實施例基於位移參數的優化值和全域旋轉參數的優化值對關鍵點旋轉參數的初始值和體態參數的初始值進行優化，提高了優化過程的穩定性，同時，通過第四損失保證了優化後的參數對應的姿態的合理性。In some embodiments, the supervisory information includes the initial two-dimensional key points of the target object; the second optimization unit is configured to: obtain the optimized two-dimensional projection key points of the target object and the initial two-dimensional key points The third loss between points, the optimized two-dimensional projection key point is obtained by projecting the optimized three-dimensional key point of the target object based on the optimized value of the displacement parameter and the optimized value of the global rotation parameter, and the optimized three-dimensional key point The point is obtained based on the optimized value of the global rotation parameter, the initial value of the key point rotation parameter, and the initial value of the body shape parameter; obtain the fourth loss, and the fourth loss is used to characterize the optimized value of the global rotation parameter, the key point The rationality of the attitude corresponding to the initial value of the rotation parameter and the initial value of the posture parameter; based on the third loss and the fourth loss, the initial value of the key point rotation parameter and the initial value of the posture parameter are optimized . This embodiment optimizes the initial value of the key point rotation parameter and the initial value of the body shape parameter based on the optimized value of the displacement parameter and the optimized value of the global rotation parameter, which improves the stability of the optimization process. At the same time, the fourth loss ensures the optimization The latter parameters correspond to the rationality of the pose.

在一些實施例中，所述裝置還包括：聯合優化模組，用於在基於所述位移參數的優化值和全域旋轉參數的優化值，對所述關鍵點旋轉參數的初始值和所述體態參數的初始值進行優化之後，對所述全域旋轉參數的優化值，所述關鍵點旋轉參數的優化值，體態參數的優化值以及所述位移參數的優化值進行聯合優化。本實施例在前述優化的基礎上，對優化後的各項參數進行聯合優化，從而進一步提高了優化結果的準確性。In some embodiments, the device further includes: a joint optimization module, which is used to optimize the initial value of the key point rotation parameter and the body posture based on the optimized value of the displacement parameter and the optimized value of the global rotation parameter. After the initial value of the parameter is optimized, the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter, the optimized value of the body shape parameter and the optimized value of the displacement parameter are jointly optimized. In this embodiment, on the basis of the aforementioned optimization, the optimized parameters are jointly optimized, thereby further improving the accuracy of the optimization result.

在一些實施例中，所述監督資訊包括所述目標對象的初始二維關鍵點和所述目標對象表面的初始三維點雲；所述第一優化單元用於：獲取所述目標對象的三維關鍵點對應的二維投影關鍵點中屬於所述目標對象的預設部位的目標二維投影關鍵點；其中，所述目標對象的三維關鍵點基於所述全域旋轉參數的初始值、關鍵點旋轉參數的初始值和體態參數的初始值得到，所述二維投影關鍵點基於所述位移參數的當前值和全域旋轉參數的初始值對所述目標對象的三維關鍵點進行投影得到；獲取所述目標二維投影關鍵點與所述初始二維關鍵點之間的第一損失；獲取所述位移參數的初始值與所述位移參數的當前值之間的第二損失；獲取所述目標對象表面的第一三維點雲與所述初始三維點雲之間的第五損失；所述第一三維點雲基於所述全域旋轉參數的初始值、關鍵點旋轉參數的初始值和體態參數的初始值得到；基於所述第一損失、第二損失和第五損失對所述位移參數的當前值和全域旋轉參數的初始值進行優化。本實施例將三維點雲加入到監督資訊中對初始的各項參數進行優化，從而提高了優化結果的準確性。In some embodiments, the supervisory information includes the initial two-dimensional key points of the target object and the initial three-dimensional point cloud of the surface of the target object; the first optimization unit is configured to: acquire the three-dimensional key points of the target object Among the two-dimensional projection key points corresponding to the point, the target two-dimensional projection key points belonging to the preset part of the target object; wherein, the three-dimensional key points of the target object are based on the initial value of the global rotation parameter, the key point rotation parameter The initial value of the initial value and the initial value of the posture parameter are obtained, and the key points of the two-dimensional projection are obtained by projecting the three-dimensional key points of the target object based on the current value of the displacement parameter and the initial value of the global rotation parameter; obtaining the target The first loss between the two-dimensional projection key point and the initial two-dimensional key point; obtain the second loss between the initial value of the displacement parameter and the current value of the displacement parameter; obtain the target object surface The fifth loss between the first 3D point cloud and the initial 3D point cloud; the first 3D point cloud is obtained based on the initial value of the global rotation parameter, the initial value of the key point rotation parameter and the initial value of the body shape parameter ; optimizing the current value of the displacement parameter and the initial value of the global rotation parameter based on the first loss, the second loss and the fifth loss. In this embodiment, the three-dimensional pointfish is added to the supervisory information to optimize various initial parameters, thereby improving the accuracy of the optimization result.

在一些實施例中，所述聯合優化模組包括：第一獲取單元，用於獲取所述目標對象的優化二維投影關鍵點與所述初始二維關鍵點之間的第六損失，所述優化二維投影關鍵點基於所述位移參數的優化值和全域旋轉參數的優化值對所述目標對象的優化三維關鍵點進行投影得到，所述優化三維關鍵點基於所述全域旋轉參數的優化值、關鍵點旋轉參數的優化值和體態參數的優化值得到；第二獲取單元，用於獲取第七損失，所述第七損失用於表徵所述全域旋轉參數的優化值、關鍵點旋轉參數的優化值和體態參數的優化值對應的姿態的合理性；第三獲取單元，用於獲取所述目標對象表面的第二三維點雲與所述初始三維點雲之間的第八損失；所述第二三維點雲基於所述全域旋轉參數的優化值、關鍵點旋轉參數的優化值和體態參數的優化值得到；聯合優化單元，用於基於所述第六損失、第七損失和第八損失對所述全域旋轉參數的優化值，所述關鍵點旋轉參數的優化值，體態參數的優化值以及所述位移參數的優化值進行聯合優化。本實施例將三維點雲加入到監督資訊中對初始的各項參數進行優化，從而提高了優化結果的準確性。In some embodiments, the joint optimization module includes: a first acquisition unit, configured to acquire a sixth loss between the optimized 2D projection keypoint of the target object and the initial 2D keypoint, the The optimized two-dimensional projection key point is obtained by projecting the optimized three-dimensional key point of the target object based on the optimized value of the displacement parameter and the optimized value of the global rotation parameter, and the optimized three-dimensional key point is based on the optimized value of the global rotation parameter , the optimized value of the key point rotation parameter and the optimized value of the posture parameter are obtained; the second acquisition unit is used to obtain the seventh loss, and the seventh loss is used to characterize the optimized value of the global rotation parameter and the key point rotation parameter The optimal value and the rationality of the posture corresponding to the optimal value of the body parameters; the third acquisition unit is used to acquire the eighth loss between the second 3D point cloud on the surface of the target object and the initial 3D point cloud; the The second three-dimensional point cloud is obtained based on the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter, and the optimized value of the body shape parameter; a joint optimization unit is used for based on the sixth loss, the seventh loss and the eighth loss Jointly optimize the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter, the optimized value of the body shape parameter and the optimized value of the displacement parameter. In this embodiment, the three-dimensional pointfish is added to the supervisory information to optimize various initial parameters, thereby improving the accuracy of the optimization result.

根據本公開實施例的第三方面，提供一種三維重建系統，所述系統包括：圖像採集裝置，用於採集目標對象的圖像；以及與所述圖像採集裝置通訊連接的處理單元，用於通過三維重建網路對所述圖像中的所述目標對象進行三維重建，得到所述目標對象的參數的初始值，所述參數的初始值用於建立所述目標對象的三維模型；基於預先獲取的用於表示目標對象特徵的監督資訊對所述參數的初始值進行優化，得到所述參數的優化值；基於所述參數的優化值進行骨骼蒙皮處理，建立所述目標對象的三維模型。According to a third aspect of the embodiments of the present disclosure, there is provided a three-dimensional reconstruction system, the system comprising: an image acquisition device for acquiring an image of a target object; and a processing unit connected in communication with the image acquisition device for performing three-dimensional reconstruction on the target object in the image through a three-dimensional reconstruction network to obtain an initial value of a parameter of the target object, and the initial value of the parameter is used to establish a three-dimensional model of the target object; based on The pre-acquired supervisory information used to represent the characteristics of the target object optimizes the initial value of the parameter to obtain the optimized value of the parameter; performs bone skinning processing based on the optimized value of the parameter to establish a three-dimensional image of the target object Model.

根據本公開實施例的第四方面，提供一種計算機可讀儲存媒體，其上儲存有計算機程式，該計算機程式被處理器執行時實現任一實施例所述的方法。According to a fourth aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the method described in any embodiment is implemented.

根據本公開實施例的第五方面，提供一種計算機設備，包括儲存器、處理器及儲存在儲存器上並可在處理器上運行的計算機程式，所述處理器執行所述計算機程式時實現任一實施例所述的方法。According to a fifth aspect of the embodiments of the present disclosure, there is provided a computer device, including a memory, a processor, and a computer program stored in the memory and operable on the processor. When the processor executes the computer program, any A method described in one embodiment.

根據本公開實施例的第六方面，提供一種計算機程式產品，該計算機程式產品儲存於儲存媒體中並包括可在處理器上運行的計算機程式，所述處理器執行所述計算機程式時實現任一實施例所述的方法。According to a sixth aspect of the embodiments of the present disclosure, a computer program product is provided, the computer program product is stored in a storage medium and includes a computer program that can run on a processor, and when the processor executes the computer program, any The method described in the examples.

本公開實施例通過將三維重建網路對目標對象的圖像進行三維重建，從而得到參數的初始值，再基於監督資訊對所述參數的初始值進行優化，基於參數優化得到的參數的優化值來建立目標對象的三維模型。參數優化的方法優點在於能夠給出較為精確的，符合圖像二維觀察特徵的三維重建結果，但往往會給不自然的，不合理的動作結果，可靠性較低。而通過三維重建網路進行網路回歸則能夠給出較為自然合理的動作結果，因此，將三維重建網路的輸出結果作為參數的初始值來進行優化，能夠在保證三維重建結果可靠性的基礎上，兼顧三維重建的準確性。In the embodiment of the present disclosure, the initial value of the parameter is obtained by performing three-dimensional reconstruction on the image of the target object through the three-dimensional reconstruction network, and then optimizes the initial value of the parameter based on the supervision information, and the optimized value of the parameter obtained based on the parameter optimization to create a 3D model of the target object. The advantage of the parameter optimization method is that it can give more accurate 3D reconstruction results that conform to the 2D observation characteristics of the image, but it often gives unnatural and unreasonable action results with low reliability. Network regression through the 3D reconstruction network can give more natural and reasonable action results. Therefore, optimizing the output results of the 3D reconstruction network as the initial value of the parameters can ensure the reliability of the 3D reconstruction results. On the other hand, taking into account the accuracy of 3D reconstruction.

應當理解的是，以上的一般描述和後文的細節描述僅是示例性和解釋性的，而非限制本公開。It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

本公開要求於2021年05月10日提交的、申請號為202110506464X、發明名稱為“三維重建方法、裝置和系統、媒體及計算機設備”的中國專利申請的優先權，該申請以引用的方式併入本文中。This disclosure claims the priority of the Chinese patent application with application number 202110506464X and titled "3D reconstruction method, device and system, media and computer equipment" filed on May 10, 2021, which is incorporated by reference into this article.

這裡將詳細地對示例性實施例進行說明，其示例表示在附圖中。下面的描述涉及附圖時，除非另有表示，不同附圖中的相同數字表示相同或相似的要素。以下示例性實施例中所描述的實施方式並不代表與本公開相一致的所有實施方式。相反，它們僅是與如所附申請專利範圍中所詳述的、本公開的一些方面相一致的裝置和方法的例子。Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatuses and methods consistent with aspects of the present disclosure as detailed in the appended claims.

在本公開使用的術語是僅僅出於描述特定實施例的目的，而非旨在限制本公開。在本公開和所附申請專利範圍中所使用的單數形式的“一種”、“所述”和“該”也旨在包括多數形式，除非上下文清楚地表示其他含義。還應當理解，本文中使用的術語“和/或”是指並包含一個或多個相關聯的列出項目的任何或所有可能組合。另外，本文中術語“至少一種”表示多種中的任意一種或多種中的至少兩種的任意組合。The terminology used in the present disclosure is for the purpose of describing particular embodiments only, and is not intended to limit the present disclosure. As used in this disclosure and the appended claims, the singular forms "a", "the", and "the" are also intended to include the plural forms unless the context clearly dictates otherwise. It should also be understood that the term "and/or" as used herein refers to and includes any and all possible combinations of one or more of the associated listed items. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality.

應當理解，儘管在本公開可能採用術語第一、第二、第三等來描述各種資訊，但這些資訊不應限於這些術語。這些術語僅用來將同一類型的資訊彼此區分開。例如，在不脫離本公開範圍的情況下，第一資訊也可以被稱為第二資訊，類似地，第二資訊也可以被稱為第一資訊。取決於語境，如在此所使用的詞語“如果”可以被解釋成為“在……時”或“當……時”或“響應於確定”。It should be understood that although the terms first, second, third, etc. may be used in the present disclosure to describe various pieces of information, these pieces of information should not be limited to these terms. These terms are only used to distinguish information of the same type from one another. For example, without departing from the scope of the present disclosure, first information may also be called second information, and similarly, second information may also be called first information. Depending on the context, the word "if" as used herein may be interpreted as "at" or "when" or "in response to a determination."

為了使本技術領域的人員更好的理解本公開實施例中的技術方案，並使本公開實施例的上述目的、特徵和優點能夠更加明顯易懂，下面結合附圖對本公開實施例中的技術方案作進一步詳細的說明。In order to enable those skilled in the art to better understand the technical solutions in the embodiments of the present disclosure, and to make the above-mentioned purposes, features and advantages of the embodiments of the present disclosure more obvious and understandable, the technical solutions in the embodiments of the present disclosure are described below in conjunction with the accompanying drawings The program is described in further detail.

對目標對象進行三維重建需要重建出目標對象的體態和肢體旋轉，通常使用參數化模型來表達目標對象的體態和肢體旋轉，而不僅僅是三維關鍵點。例如，對不同的人進行三維重建，分別重建出了體態較瘦的人的三維模型（如圖1A所示）和體態較胖的人的三維模型（如圖1B所示），由於圖1A所示的人和圖1B所示的人處於相同的姿態下，關鍵點資訊相同，僅通過關鍵點資訊則無法表示出二者體態上的差異。The 3D reconstruction of the target object needs to reconstruct the body posture and limb rotation of the target object. Usually, a parametric model is used to express the body posture and limb rotation of the target object, not just the 3D key points. For example, when performing 3D reconstruction on different people, the 3D model of a thinner person (as shown in Figure 1A) and the 3D model of a fatter person (as shown in Figure 1B) are respectively reconstructed. The person shown in FIG. 1B is in the same posture as the person shown in FIG. 1B , and the key point information is the same, and the difference in posture between the two cannot be represented only through the key point information.

在相關技術中，一般通過參數優化和網路回歸兩種方式進行三維重建。參數優化的方法通常選擇一套標準參數，依據目標對象的圖像的二維視覺特徵，採用梯度下降法來對目標對象的三維模型的參數的初始值進行迭代優化，其中圖像的二維視覺特徵可以選擇二維關鍵點等。參數優化的方法優點在於能夠給出較為準確的、符合圖像二維視覺特徵的參數估計結果，但往往會給出不自然、不合理的動作結果，並且參數優化的最終性能非常依賴參數的初始值，導致基於參數優化的三維重建方式可靠性較低。In related technologies, three-dimensional reconstruction is generally carried out by means of parameter optimization and network regression. The parameter optimization method usually selects a set of standard parameters, and uses the gradient descent method to iteratively optimize the initial values of the parameters of the 3D model of the target object according to the 2D visual features of the image of the target object. Features can select 2D keypoints, etc. The advantage of the parameter optimization method is that it can give more accurate parameter estimation results that conform to the two-dimensional visual characteristics of the image, but it often gives unnatural and unreasonable action results, and the final performance of parameter optimization is very dependent on the initial parameters. value, resulting in low reliability of the 3D reconstruction method based on parameter optimization.

網路回歸的方法通常訓練一個端到端的神經網路來學習從圖像到三維模型參數的映射。網路回歸的方法優點在於能夠給出較為自然合理的動作結果，但由於缺乏大量的訓練數據，三維重建結果可能與圖像中的二維視覺特徵不符，因此，基於網路回歸的三維重建方式準確度較低。相關技術中的三維重建方式無法兼顧三維重建結果的準確性和可靠性。The methods of network regression usually train an end-to-end neural network to learn the mapping from images to 3D model parameters. The advantage of the network regression method is that it can give more natural and reasonable action results. However, due to the lack of a large amount of training data, the 3D reconstruction results may not match the 2D visual features in the image. Therefore, the 3D reconstruction method based on network regression Less accurate. The 3D reconstruction method in the related art cannot take into account the accuracy and reliability of the 3D reconstruction results.

基於此，本公開實施例提供一種三維重建方法，如圖2所示，所述方法包括：Based on this, an embodiment of the present disclosure provides a three-dimensional reconstruction method, as shown in FIG. 2 , the method includes:

步驟201：通過三維重建網路對圖像中的目標對象進行三維重建，得到所述目標對象的參數的初始值，其中，所述參數的初始值用於建立所述目標對象的三維模型；Step 201: Perform 3D reconstruction on the target object in the image through the 3D reconstruction network to obtain the initial value of the parameter of the target object, wherein the initial value of the parameter is used to establish the 3D model of the target object;

步驟202：基於預先獲取的用於表示目標對象的特徵的監督資訊對所述參數的初始值進行優化，得到參數的優化值；Step 202: Optimizing the initial value of the parameter based on the pre-acquired supervision information representing the characteristics of the target object to obtain the optimized value of the parameter;

步驟203：基於所述參數的優化值進行骨骼蒙皮處理，建立所述目標對象的三維模型。Step 203: Perform bone skinning processing based on the optimized values of the parameters, and establish a 3D model of the target object.

在步驟201中，目標對象可以是三維對象，例如物理空間中的人、動物、機器人等，或者是所述三維對象上的一個或多個區域，例如，人臉或者肢體等。為了便於描述，下文以目標對象是人，對目標對象進行的三維重建為人體重建為例進行說明。所述目標對象的圖像可以是單張圖像，也可以包括從多個不同視角對目標對象進行拍攝得到的多張圖像。基於單張圖像的三維人體重建稱為單目三維人體重建，基於不同視角的多張圖像的三維人體重建稱為多目三維人體重建。每張圖像都可以是灰度圖、RGB圖像或者RGBD圖像。所述圖像可以是目標對象周圍的圖像採集裝置（例如，相機或者攝像頭）即時採集的圖像，也可以是預先採集並儲存的圖像。In step 201, the target object may be a three-dimensional object, such as a person, an animal, a robot, etc. in a physical space, or one or more regions on the three-dimensional object, such as a human face or a limb. For the convenience of description, the target object is a human being, and the three-dimensional reconstruction performed on the target object is a human body reconstruction as an example for description. The image of the target object may be a single image, or may include multiple images obtained by shooting the target object from multiple different angles of view. 3D human body reconstruction based on a single image is called monocular 3D human body reconstruction, and 3D human body reconstruction based on multiple images from different perspectives is called multi-eye 3D human body reconstruction. Each image can be a grayscale image, RGB image or RGBD image. The image may be an image captured instantly by an image acquisition device (for example, a camera or a camera) around the target object, or may be an image acquired and stored in advance.

可以通過三維重建網路對目標對象的圖像進行三維重建，其中，三維重建網路可以是一個預先訓練的神經網路。三維重建網路可以基於圖像進行三維重建，並估計出自然合理的參數的初始值，這裡的參數的初始值可以通過一個向量來表示，所述向量的維度例如可以是85維，所述向量中包含人體的運動肢體旋轉資訊（即姿態參數的初始值，包括人體的全域旋轉參數的初始值和23個關鍵點的關鍵點旋轉參數的初始值）、體態參數的初始值以及攝像機的參數的初始值這三部分資訊。人體可以由關鍵點和連接這些關鍵點的肢體骨骼表示，人體關鍵點可包括頭頂、鼻子、脖子、左右眼、左右耳、胸部、左右肩膀、左右手肘、左右手腕、左右髖部、左右臀、左右膝蓋、左右腳踝等關鍵點中的一個或多個，姿態參數的初始值用於確定人體的關鍵點在三維空間中的位置。體態參數的初始值用於確定人體的高矮胖瘦等身材資訊。所述攝像機的參數的初始值用於確定人體在攝像機坐標系下在三維空間中的絕對位置，攝像機的參數包括攝像機與人體之間的位移參數以及攝像機的姿態參數，其中，攝像機的姿態參數的初始值可以用人體的全域旋轉參數的初始值來代替。可以使用多人線性蒙皮（Skinned Multi-Person Linear，SMPL）模型的參數形式（稱為SMPL參數）來表示所述人體參數。在獲取SMPL參數的值之後，可以基於SMPL參數的值進行骨骼蒙皮處理，即使用一個映射函數

將體態參數的初始值和姿態參數的初始值映射為人體表面的三維模型，該三維模型包括6890個頂點，頂點之間通過固定的連接關係構成三角面片。可以使用一個預訓練的回歸器W，從人體表面模型的頂點進一步回歸出人體的三維關鍵點

，即：

。 The image of the target object can be reconstructed in 3D through a 3D reconstruction network, wherein the 3D reconstruction network can be a pre-trained neural network. The 3D reconstruction network can perform 3D reconstruction based on images, and estimate the initial values of natural and reasonable parameters. The initial values of the parameters here can be represented by a vector. The dimension of the vector can be 85 dimensions, for example. The vector contains the rotation information of the human body's moving limbs (that is, the initial value of the posture parameters, including the initial values of the global rotation parameters of the human body and the initial values of the key point rotation parameters of 23 key points), the initial values of the body parameters and the parameters of the camera The initial value of these three parts of information. The human body can be represented by key points and limb bones connecting these key points. The key points of the human body can include the top of the head, nose, neck, left and right eyes, left and right ears, chest, left and right shoulders, left and right elbows, left and right wrists, left and right hips, left and right buttocks, One or more of key points such as left and right knees, left and right ankles, etc., the initial value of the pose parameter is used to determine the position of the key points of the human body in three-dimensional space. The initial values of the body parameters are used to determine body information such as height, shortness, fatness, and thinness of the human body. The initial value of the parameter of the camera is used to determine the absolute position of the human body in the three-dimensional space under the camera coordinate system, and the parameter of the camera includes a displacement parameter between the camera and the human body and a posture parameter of the camera, wherein the posture parameter of the camera is The initial value can be replaced by the initial value of the global rotation parameter of the human body. The parameters of the human body can be expressed using a parametric form of a Skinned Multi-Person Linear (SMPL) model (referred to as SMPL parameters). After obtaining the value of the SMPL parameter, bone skinning can be performed based on the value of the SMPL parameter, that is, using a mapping function

The initial value of the body shape parameter and the initial value of the posture parameter are mapped to a three-dimensional model of the human body surface, the three-dimensional model includes 6890 vertices, and the vertices form a triangular patch through a fixed connection relationship. A pre-trained regressor W can be used to further regress the 3D key points of the human body from the vertices of the human surface model

,which is:

.

在步驟202中，監督資訊可以是圖像的二維視覺特徵（也被稱為二維觀察特徵），例如，圖像中目標對象的二維關鍵點和所述目標對象上的多個像素點的語意資訊中的至少一者。一個像素點的語意資訊用於表徵所述像素點處於所述目標對象上的哪個區域，所述區域例如可以是頭部、手臂、軀幹、腿等所在區域。在採用二維關鍵點資訊作為監督資訊的情況下，可以使用二維關鍵點提取網路對圖像中的人體關鍵點位置進行估計，此處可以選用任意的二維姿態估計方法，例如OpenPose。除了採用二維視覺特徵作為監督資訊之外，還可以將二維視覺特徵和目標對象表面的初始三維點雲共同作為監督資訊，從而進一步提高三維重建的準確性。In step 202, the supervisory information may be two-dimensional visual features of the image (also called two-dimensional observation features), for example, two-dimensional key points of the target object in the image and multiple pixel points on the target object At least one of the semantic information for . The semantic information of a pixel is used to represent which area the pixel is located on the target object, and the area may be, for example, the area where the head, arm, torso, leg, etc. are located. In the case of using two-dimensional key point information as supervision information, the two-dimensional key point extraction network can be used to estimate the position of the key point of the human body in the image. Here, any two-dimensional pose estimation method can be used, such as OpenPose. In addition to using 2D visual features as supervisory information, 2D visual features and the initial 3D point cloud on the surface of the target object can also be used as supervisory information to further improve the accuracy of 3D reconstruction.

在所述圖像包括深度圖像（例如，所述圖像為RGBD圖像）的情況下，可以從所述深度圖像中提取所述目標對象上多個像素點的深度資訊，基於所述深度資訊將所述深度圖像中所述目標對象上的多個像素點投影到三維空間，得到所述目標對象表面的初始三維點雲。In the case that the image includes a depth image (for example, the image is an RGBD image), depth information of multiple pixels on the target object may be extracted from the depth image, based on the Depth information projects a plurality of pixel points on the target object in the depth image to a three-dimensional space to obtain an initial three-dimensional point cloud on the surface of the target object.

所述多個像素點可以是圖像中目標對象上的部分或全部像素點。例如，可以包括目標對象上需要進行三維重建的各個區域的像素點，且每個區域中像素點的數量應大於或等於進行三維重建所需的數量。The plurality of pixels may be part or all of the pixels on the target object in the image. For example, it may include pixel points of various areas on the target object that need to be three-dimensionally reconstructed, and the number of pixel points in each area should be greater than or equal to the number required for three-dimensional reconstruction.

由於圖像中一般既包括目標對象，又包括背景區域。因此，可以對所述圖像中包括的RGB圖像進行圖像分割，獲取所述RGB圖像中目標對象所在的圖像區域，基於所述RGB圖像中目標對象所在的圖像區域確定所述深度圖像中目標對象所在的圖像區域；獲取所述深度圖像中所述目標對象所在的圖像區域中多個像素點的深度資訊。通過進行圖像分割，可以從圖像中提取出需要進行三維重建的目標對象所在的圖像區域，避免圖像中的背景區域對三維重建的影響。在一些實施例中，所述深度圖像中的像素點與所述RGB圖像中的像素點一一對應。例如，所述圖像也可以為RGBD圖像。Because the image generally includes both the target object and the background area. Therefore, image segmentation can be performed on the RGB image included in the image, the image area where the target object is located in the RGB image is acquired, and the image area where the target object is located in the RGB image is determined based on the image area where the target object is located. The image area where the target object is located in the depth image; acquiring the depth information of multiple pixels in the image area where the target object is located in the depth image. By performing image segmentation, the image area where the target object that needs to be three-dimensionally reconstructed is located can be extracted from the image, and the influence of the background area in the image on the three-dimensional reconstruction can be avoided. In some embodiments, the pixels in the depth image correspond one-to-one to the pixels in the RGB image. For example, the image may also be an RGBD image.

進一步地，還可以從三維點雲（即，初始三維點雲）中過濾掉離群點，監督資訊可包括過濾後的三維點雲。所述過濾可以採用點雲過濾器實現。通過過濾掉離群點，能夠得到更加精細的目標對象表面的三維點雲，從而進一步提高三維重建的準確性。對三維點雲中的每一個目標三維點，獲取與該目標三維點距離最近的n個三維點到該目標三維點的平均距離，假設各個目標三維點對應的平均距離服從一個統計分佈（例如，高斯分佈），可以計算該統計分佈的均值和方差，並基於所述均值和方差設定一個閾值s，那麼平均距離在閾值s範圍之外的三維點，可以被視為離群點並從三維點雲中過濾掉。Further, outliers can also be filtered out from the 3D point cloud (ie, the initial 3D point cloud), and the supervisory information can include the filtered 3D point cloud. The filtering can be implemented using a point cloud filter. By filtering out outliers, a finer 3D point cloud of the surface of the target object can be obtained, thereby further improving the accuracy of 3D reconstruction. For each target 3D point in the 3D point cloud, obtain the average distance from the n 3D points closest to the target 3D point to the target 3D point, assuming that the average distance corresponding to each target 3D point obeys a statistical distribution (for example, Gaussian distribution), the mean and variance of the statistical distribution can be calculated, and a threshold s can be set based on the mean and variance, then the 3D points whose average distance is outside the range of the threshold s can be regarded as outliers and obtained from the 3D points Filtered out in the cloud.

在實際應用中，如果所述圖像為RGB圖像，可以將二維觀察特徵作為監督資訊對所述參數的初始值進行迭代優化。如果所述圖像為RGBD圖像，可以將二維觀察特徵和目標對象表面的三維點雲共同作為監督資訊對所述參數的初始值進行迭代優化。優化方式例如可以採用梯度下降法，本公開對此不做限制。In practical applications, if the image is an RGB image, the initial values of the parameters can be iteratively optimized using the two-dimensional observation features as supervisory information. If the image is an RGBD image, the two-dimensional observation features and the three-dimensional point cloud on the surface of the target object can be used as supervisory information to iteratively optimize the initial value of the parameter. The optimization method may, for example, use a gradient descent method, which is not limited in the present disclosure.

在步驟203中，可以基於所述參數的優化值進行骨骼蒙皮處理，得到所述目標對象的三維模型。In step 203, bone skinning processing may be performed based on the optimized values of the parameters to obtain a three-dimensional model of the target object.

如圖3所示，是本公開實施例的整體流程圖。在輸入為RGB圖像的情況下，可以通過三維重建網路對RGB圖像進行三維重建，得到圖像中人的人體參數值，並採用關鍵點提取網路對圖像中的人進行關鍵點提取，得到人體二維關鍵點。然後，將人體參數值作為參數的初始值，將人體二維關鍵點作為監督資訊，通過參數優化模組對人體參數初始值進行優化，得到人體參數的優化值，並基於人體參數的優化值進行骨骼蒙皮處理，得到人體重建模型。As shown in FIG. 3 , it is an overall flowchart of the embodiment of the present disclosure. In the case that the input is an RGB image, the RGB image can be 3D reconstructed through the 3D reconstruction network to obtain the human body parameter values of the person in the image, and the key points of the person in the image can be obtained by using the key point extraction network. Extract to obtain the two-dimensional key points of the human body. Then, the human body parameter value is used as the initial value of the parameter, and the two-dimensional key points of the human body are used as the supervision information, and the initial value of the human body parameter is optimized through the parameter optimization module to obtain the optimized value of the human body parameter, and based on the optimized value of the human body parameter. Skeleton skinning processing to obtain the human body reconstruction model.

在輸入為RGBD圖像的情況下，可以將圖像分解為RGB圖像和TOF（Time of Flight，飛行時間）深度圖，TOF深度圖中包括RGB圖像中各個像素點的深度資訊。可以通過三維重建網路對RGB圖像進行三維重建，得到圖像中人的人體參數值，並採用關鍵點提取網路對圖像中的人進行關鍵點提取，得到人體二維關鍵點。還可以採用點雲重建模組來基於TOF深度圖中的深度資訊重建出人體表麵點雲。然後，將人體參數值作為參數的初始值，將人體二維關鍵點和人體表麵點雲共同作為監督資訊，通過參數優化模組對人體參數初始值進行優化，得到人體參數的優化值，並基於人體參數的優化值進行骨骼蒙皮處理，得到人體重建模型。In the case of an RGBD image input, the image can be decomposed into an RGB image and a TOF (Time of Flight, time of flight) depth map. The TOF depth map includes the depth information of each pixel in the RGB image. The RGB image can be reconstructed three-dimensionally through the three-dimensional reconstruction network to obtain the human body parameter value of the person in the image, and the key point extraction network can be used to extract the key point of the person in the image to obtain the two-dimensional key point of the human body. The point cloud reconstruction group can also be used to reconstruct the surface point cloud of the human body based on the depth information in the TOF depth map. Then, the human body parameter value is used as the initial value of the parameter, and the two-dimensional key points of the human body and the point cloud of the human body surface are used as supervision information, and the initial value of the human body parameter is optimized through the parameter optimization module to obtain the optimized value of the human body parameter, and based on The optimized value of the human body parameters is processed by bone skinning to obtain the human body reconstruction model.

進一步地，在得到人體重建模型之後，還可以基於RGB圖像或者RGBD圖像中的顏色資訊，對人體重建模型進行色彩處理，以使人體重建模型與圖像中的人物的顏色資訊相匹配。Further, after obtaining the human body reconstruction model, color processing may be performed on the human body reconstruction model based on the color information in the RGB image or the RGBD image, so that the human body reconstruction model matches the color information of the person in the image.

本公開實施例中，通過三維重建網路對圖像中的目標對象進行三維重建，從而得到參數的初始值，再基於監督資訊對所述參數的初始值進行優化，基於參數的優化值來建立目標對象的三維模型。參數優化的方法優點在於能夠給出較為精確的，符合圖像二維觀察特徵的三維重建結果，但往往會給不自然的、不合理的動作結果，可靠性較低。而通過三維重建網路進行網路回歸則能夠給出較為自然合理的動作結果，因此，將三維重建網路的輸出結果作為參數的初始值來進行參數優化，能夠在保證三維重建結果可靠性的基礎上，兼顧三維重建的準確性。In the embodiment of the present disclosure, the target object in the image is reconstructed three-dimensionally through the three-dimensional reconstruction network, thereby obtaining the initial value of the parameter, and then optimizing the initial value of the parameter based on the supervision information, and establishing the parameter based on the optimized value of the parameter. 3D model of the target object. The advantage of the parameter optimization method is that it can give more accurate 3D reconstruction results that conform to the 2D observation characteristics of the image, but it often gives unnatural and unreasonable action results with low reliability. The network regression through the 3D reconstruction network can give more natural and reasonable action results. Therefore, using the output of the 3D reconstruction network as the initial value of the parameters for parameter optimization can ensure the reliability of the 3D reconstruction results. On the basis of taking into account the accuracy of 3D reconstruction.

在一些實施例中，在參數優化階段，可以採用多階段優化方法。所述多階段優化方法可包括攝像機優化階段與姿態優化階段。在攝像機優化階段，優化目標為全域旋轉參數的值R以及所述圖像採集裝置與所述目標對象之間的位移參數的當前值t。其中，t和R都是三維向量，R使用軸角形式表達。在姿態優化階段，優化目標為關鍵點旋轉參數的值與體態參數的值。In some embodiments, in the parameter optimization stage, a multi-stage optimization method may be used. The multi-stage optimization method may include a camera optimization stage and a pose optimization stage. In the camera optimization stage, the optimization targets are the value R of the global rotation parameter and the current value t of the displacement parameter between the image acquisition device and the target object. Among them, t and R are three-dimensional vectors, and R is expressed in the form of axis and angle. In the pose optimization stage, the optimization targets are the values of key point rotation parameters and body posture parameters.

由於在優化過程中，改變攝像機位置與改變人體三維關鍵點位置均可以導致三維關鍵點的二維投影產生變化，這將會導致優化過程很不穩定。因此在攝像機優化階段中，固定人體姿態，在姿態優化階段，固定攝像機位置，從而提高優化過程的穩定性。即，在所述體態參數的初始值和關鍵點旋轉參數的初始值保持不變的情況下，基於所述監督資訊和所述位移參數的初始值，對所述圖像採集裝置的位移參數的當前值以及所述全域旋轉參數的初始值進行優化，得到位移參數的優化值和全域旋轉參數的優化值；然後保持位移參數的優化值和全域旋轉參數的優化值不變，基於所述位移參數的優化值和全域旋轉參數的優化值，對所述關鍵點旋轉參數的初始值和所述體態參數的初始值進行優化，得到關鍵點旋轉參數的優化值和體態參數的優化值。Because in the optimization process, changing the position of the camera and changing the position of the 3D key points of the human body can cause changes in the 2D projection of the 3D key points, which will make the optimization process very unstable. Therefore, in the camera optimization stage, the human body pose is fixed, and in the pose optimization stage, the camera position is fixed, thereby improving the stability of the optimization process. That is, when the initial value of the body posture parameter and the initial value of the key point rotation parameter remain unchanged, based on the supervision information and the initial value of the displacement parameter, the displacement parameter of the image acquisition device The current value and the initial value of the global rotation parameter are optimized to obtain the optimized value of the displacement parameter and the optimized value of the global rotation parameter; then keep the optimized value of the displacement parameter and the optimized value of the global rotation parameter unchanged, based on the displacement parameter The optimized value of the key point rotation parameter and the optimized value of the global rotation parameter are optimized, and the initial value of the key point rotation parameter and the initial value of the body shape parameter are optimized to obtain the optimized value of the key point rotation parameter and the optimized value of the body shape parameter.

進一步地，可以獲取所述目標對象的三維關鍵點對應的二維投影關鍵點中屬所述目標對象的預設部位的目標二維投影關鍵點；其中，所述目標對象的三維關鍵點基於所述全域旋轉參數的初始值、關鍵點旋轉參數的初始值和體態參數的初始值得到；所述二維投影關鍵點基於所述位移參數的當前值和全域旋轉參數的初始值對所述目標對象的三維關鍵點進行投影得到。獲取所述目標二維投影關鍵點與所述初始二維關鍵點之間的第一損失。獲取所述位移參數的初始值與所述位移參數的當前值之間的第二損失。基於所述第一損失和第二損失對所述位移參數的當前值和全域旋轉參數的初始值進行優化。Further, among the 2D projection key points corresponding to the 3D key points of the target object, target 2D projection key points belonging to preset parts of the target object can be acquired; wherein, the 3D key points of the target object are based on the The initial value of the global rotation parameter, the initial value of the key point rotation parameter and the initial value of the body shape parameter are obtained; the two-dimensional projection key point is based on the current value of the displacement parameter and the initial value of the global rotation parameter for the target object The 3D key points of are obtained by projection. A first loss between the target 2D projection keypoint and the initial 2D keypoint is acquired. A second loss between an initial value of the displacement parameter and a current value of the displacement parameter is obtained. The current value of the displacement parameter and the initial value of the global rotation parameter are optimized based on the first loss and the second loss.

其中，所述預設部位可以是軀幹部位，所述目標二維投影關鍵點可以包括左右肩膀點，左右髖部點，脊柱中心點等關鍵點。由於不同的動作對軀幹部位的關鍵點的影響較小，因此，通過採用軀幹部位的關鍵點建立第一損失，能夠減輕不同動作對關鍵點位置的影響，提高優化結果的準確性。第一損失也可以稱為軀幹關鍵點投影損失，第二損失也可以稱為相機位移正則化損失，第一損失可通過下述公式（1）得到，第二損失可通過下述公式（2）得到：

（1）；

（2）；其中，

和

分別表示第一損失和第二損失，

和

分別表示目標二維投影關鍵點和初始二維關鍵點，

和

分別表示所述圖像採集裝置與所述目標對象之間的位移參數的當前值以及所述位移參數的初始值。可以基於第一損失和第二損失確定第一目標損失

，例如，所述第一目標損失可以確定為所述第一損失與所述第二損失之和，可通過下述公式（3）確定：

（3）。 Wherein, the preset part may be a trunk part, and the key points of the target two-dimensional projection may include left and right shoulder points, left and right hip points, spine center points and other key points. Since different actions have less influence on the key points of the torso, by using the key points of the torso to establish the first loss, the influence of different actions on the position of the key points can be reduced and the accuracy of the optimization result can be improved. The first loss can also be called torso key point projection loss, and the second loss can also be called camera displacement regularization loss. The first loss can be obtained by the following formula (1), and the second loss can be obtained by the following formula (2) get:

(1);

(2); where,

with

denote the first loss and the second loss, respectively,

with

represent the target 2D projection keypoint and the initial 2D keypoint, respectively,

with

Respectively represent the current value of the displacement parameter between the image acquisition device and the target object and the initial value of the displacement parameter. The first target loss can be determined based on the first loss and the second loss

, for example, the first target loss can be determined as the sum of the first loss and the second loss, which can be determined by the following formula (3):

(3).

可以獲取所述目標對象的優化二維投影關鍵點與所述初始二維關鍵點之間的第三損失，其中，所述優化二維投影關鍵點基於所述位移參數的優化值和全域旋轉參數的優化值對所述目標對象的優化三維關鍵點進行投影得到，所述優化三維關鍵點基於所述全域旋轉參數的優化值、關鍵點旋轉參數的初始值和體態參數的初始值得到。獲取第四損失，所述第四損失用於表徵所述全域旋轉參數的優化值、關鍵點旋轉參數的初始值和體態參數的初始值對應的姿態的合理性。基於所述第三損失和所述第四損失對所述關鍵點旋轉參數的初始值和所述體態參數的初始值進行優化。A third loss between an optimized 2D projection keypoint of the target object and the initial 2D keypoint may be obtained, wherein the optimized 2D projection keypoint is based on an optimized value of the displacement parameter and a global rotation parameter The optimized value of is obtained by projecting the optimized 3D key point of the target object, and the optimized 3D key point is obtained based on the optimized value of the global rotation parameter, the initial value of the key point rotation parameter and the initial value of the body shape parameter. A fourth loss is obtained, and the fourth loss is used to characterize the rationality of the posture corresponding to the optimized value of the global rotation parameter, the initial value of the key point rotation parameter, and the initial value of the posture parameter. Optimizing the initial value of the key point rotation parameter and the initial value of the body shape parameter based on the third loss and the fourth loss.

第三損失也可以稱為二維關鍵點投影損失，第四損失也可以稱為姿態合理性損失，第三損失可通過下述公式（4）確定：

（4）；其中，

為第三損失，

和

分別表示所述優化二維投影關鍵點以及所述初始二維關鍵點。可以基於第三損失和第四損失確定第二目標損失，例如，所述第二目標損失可以確定為所述第三損失與所述第四損失之和，可通過下述公式（5）確定：

（5）；其中，

為第二目標損失，

為第四損失，可以採用高斯混合模型（Gaussian Mixture Model，GMM）來獲取，用於判斷全域旋轉參數的優化值、關鍵點旋轉參數的初始和體態參數的初始值對應的姿態是否合理，對不合理的姿態輸出較大的損失。 The third loss can also be called the two-dimensional key point projection loss, the fourth loss can also be called the attitude rationality loss, and the third loss can be determined by the following formula (4):

(4); where,

for the third loss,

with

represent the optimized two-dimensional projection key points and the initial two-dimensional key points respectively. The second target loss may be determined based on the third loss and the fourth loss, for example, the second target loss may be determined as the sum of the third loss and the fourth loss, which may be determined by the following formula (5):

(5); where,

for the second target loss,

It is the fourth loss, which can be obtained by using the Gaussian Mixture Model (GMM), which is used to judge whether the optimal value of the global rotation parameter, the initial value of the key point rotation parameter, and the initial value of the body shape parameter correspond to a reasonable attitude, right? Reasonable poses output larger losses.

在基於所述位移參數的優化值和全域旋轉參數優化值，對所述關鍵點旋轉參數的初始值和所述體態參數的初始值進行優化之後，還可以對所述全域旋轉參數的優化值，所述關鍵點旋轉參數的優化值，體態參數的優化值以及所述位移參數的優化值進行聯合優化，即採用三階段優化方式。對於監督資訊中包括目標對象表面的三維點雲的資訊的情況，可以採用所述三階段優化方式，包括攝像機優化階段、姿態優化階段和點雲優化階段。After optimizing the initial value of the key point rotation parameter and the initial value of the posture parameter based on the optimized value of the displacement parameter and the optimized value of the global rotation parameter, the optimized value of the global rotation parameter may also be The optimized value of the key point rotation parameter, the optimized value of the body shape parameter and the optimized value of the displacement parameter are jointly optimized, that is, a three-stage optimization method is adopted. For the case where the supervision information includes the 3D point cloud information of the surface of the target object, the three-stage optimization method can be adopted, including camera optimization stage, pose optimization stage and point cloud optimization stage.

在攝像機優化階段，可以獲取所述目標對象的三維關鍵點對應的二維投影關鍵點中屬所述目標對象的預設部位的目標二維投影關鍵點；其中，所述目標對象的三維關鍵點基於所述全域旋轉參數的初始值、關鍵點旋轉參數的初始值和體態參數的初始值得到，所述二維投影關鍵點基於所述位移參數的當前值和全域旋轉參數的初始值對所述目標對象的三維關鍵點進行投影得到。獲取所述目標二維投影關鍵點與所述初始二維關鍵點之間的第一損失。獲取所述位移參數的初始值與所述位移參數的當前值之間的第二損失。獲取所述目標對象表面的第一三維點雲與所述初始三維點雲之間的第五損失；其中，所述第一三維點雲基於所述全域旋轉參數的初始值、關鍵點旋轉參數的初始值和體態參數的初始值得到。基於所述第一損失、第二損失和第五損失對所述位移參數的當前值和全域旋轉參數的初始值進行優化。所述第五損失也可以稱為最近點迭代（Iterative Closest Point，ICP）點雲配准損失，可通過如下公式（6）確定：

（6）；式中，

為所述第五損失，將所述初始三維點雲看作點雲P，將所述第一三維點雲看作點雲Q，

為點雲P中的每個點到點雲Q中距離最近的點構成的點對集合，

為點雲Q中的每個點到點雲P中距離最近的點構成的點對集合。第一損失和第二損失分別通過如下公式（7）和公式（8）表示：

（7）；

（8）；其中，

和

分別表示第一損失和第二損失，

和

分別表示目標二維投影關鍵點和初始二維關鍵點，

和

分別表示所述位移參數的當前值以及所述位移參數的初始值。可以基於第一損失、第二損失和第五損失之和確定第一目標損失

，再基於第一目標損失對所述位移參數的當前值和全域旋轉參數的初始值進行優化，即，如以下公式（9）：

（9）。 In the camera optimization stage, among the two-dimensional projection key points corresponding to the three-dimensional key points of the target object, the target two-dimensional projection key points belonging to the preset parts of the target object can be obtained; wherein, the three-dimensional key points of the target object Based on the initial value of the global rotation parameter, the initial value of the key point rotation parameter and the initial value of the body shape parameter, the key point of the two-dimensional projection is based on the current value of the displacement parameter and the initial value of the global rotation parameter. The 3D key points of the target object are obtained by projection. A first loss between the target 2D projection keypoint and the initial 2D keypoint is acquired. A second loss between an initial value of the displacement parameter and a current value of the displacement parameter is obtained. Acquiring the fifth loss between the first 3D point cloud of the surface of the target object and the initial 3D point cloud; wherein, the first 3D point cloud is based on the initial value of the global rotation parameter and the key point rotation parameter Initial values and initial values of body parameters are obtained. The current value of the displacement parameter and the initial value of the global rotation parameter are optimized based on the first loss, the second loss and the fifth loss. The fifth loss can also be called the Iterative Closest Point (ICP) point cloud registration loss, which can be determined by the following formula (6):

(6); where,

For the fifth loss, the initial 3D point cloud is regarded as point cloud P, and the first 3D point cloud is regarded as point cloud Q,

For each point in the point cloud P to the closest point in the point cloud Q, the set of point pairs,

It is a set of point pairs formed from each point in point cloud Q to the nearest point in point cloud P. The first loss and the second loss are expressed by the following formula (7) and formula (8) respectively:

(7);

(8); where,

with

denote the first loss and the second loss, respectively,

with

represent the current value of the displacement parameter and the initial value of the displacement parameter respectively. The first target loss can be determined based on the sum of the first loss, the second loss and the fifth loss

, and then optimize the current value of the displacement parameter and the initial value of the global rotation parameter based on the first target loss, that is, as the following formula (9):

(9).

三階段優化過程中的姿態優化階段與二階段優化過程中的姿態優化階段的優化方式相同，此處不再贅述。The attitude optimization stage in the three-stage optimization process is the same as the attitude optimization stage in the two-stage optimization process, and will not be repeated here.

在點雲優化階段，可以獲取所述目標對象的優化二維投影關鍵點與所述初始二維關鍵點之間的第六損失，其中，所述優化二維投影關鍵點基於所述位移參數的優化值和全域旋轉參數的優化值對所述目標對象的優化三維關鍵點進行投影得到，所述優化三維關鍵點基於所述全域旋轉參數的優化值、關鍵點旋轉參數的優化值和體態參數的優化值得到。獲取第七損失，所述第七損失用於表徵所述全域旋轉參數的優化值、關鍵點旋轉參數的優化值和體態參數的優化值對應的姿態的合理性。獲取所述目標對象表面的第二三維點雲與所述初始三維點雲之間的第八損失；其中，所述第二三維點雲基於所述全域旋轉參數的優化值、關鍵點旋轉參數的優化值和體態參數的優化值得到。基於所述第六損失、第七損失和第八損失對所述全域旋轉參數的優化值、所述關鍵點旋轉參數的優化值、體態參數的優化值以及所述位移參數的優化值進行聯合優化，可通過以下公式（10）和公式（11）進行優化：

（10）；

（11）。式中，

為第六損失，

為優化二維投影關鍵點，

為初始二維關鍵點。第七損失可以採用高斯混合模型來獲取，用於判斷全域旋轉參數的優化值、關鍵點旋轉參數的優化值和體態參數的優化值對應的姿態是否合理，對不合理的姿態輸出較大的損失。

為第八損失，P為所述初始三維點雲看作點雲，

為所述第二三維點雲，

為點雲P中的每個點到點雲

中距離最近的點構成的點對集合，

為點雲

中的每個點到點雲P中距離最近的點構成的點對集合。進一步地，可以將第六損失、第七損失和第八損失之和確定為第三目標損失

，並基於第三目標損失對所述全域旋轉參數的優化值、所述關鍵點旋轉參數的優化值、體態參數的優化值以及所述位移參數的優化值進行聯合優化，可通過以下公式（12）進行聯合優化：

（12）。 In the point cloud optimization stage, the sixth loss between the optimized 2D projection keypoint of the target object and the initial 2D keypoint can be obtained, wherein the optimized 2D projection keypoint is based on the displacement parameter The optimized value and the optimized value of the global rotation parameter are obtained by projecting the optimized 3D key point of the target object, and the optimized 3D key point is based on the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter and the body shape parameter. The optimized value is obtained. A seventh loss is obtained, and the seventh loss is used to characterize the rationality of the posture corresponding to the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter, and the optimized value of the posture parameter. Obtain an eighth loss between the second 3D point cloud of the surface of the target object and the initial 3D point cloud; wherein, the second 3D point cloud is based on the optimized value of the global rotation parameter and the key point rotation parameter The optimal value and the optimal value of body parameters are obtained. Based on the sixth loss, the seventh loss and the eighth loss, jointly optimize the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter, the optimized value of the body shape parameter and the optimized value of the displacement parameter , can be optimized by the following formula (10) and formula (11):

(10);

(11). In the formula,

for the sixth loss,

To optimize the 2D projection keypoints,

is the initial two-dimensional keypoint. The seventh loss can be obtained by using a Gaussian mixture model, which is used to judge whether the posture corresponding to the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter, and the optimized value of the posture parameter is reasonable, and outputs a large loss for an unreasonable posture. .

is the eighth loss, P is the initial 3D point cloud as a point cloud,

is the second 3D point cloud,

For each point in the point cloud P to the point cloud

A set of point pairs consisting of the closest points in the middle,

for the point cloud

A set of point pairs from each point in point cloud P to the nearest point in point cloud P. Further, the sum of the sixth loss, the seventh loss and the eighth loss can be determined as the third target loss

, and based on the third objective loss, jointly optimize the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter, the optimized value of the body shape parameter and the optimized value of the displacement parameter, which can be obtained by the following formula (12 ) for joint optimization:

(12).

在目標對象的圖像為RGB圖像的情況下，可以基於前述包括攝像機優化階段與姿態優化階段的二階段優化方法進行參數優化；在目標對象的圖像為RGBD圖像的情況下，可以基於前述包括攝像機優化階段、姿態優化階段與點雲優化階段的三階段優化方法進行參數優化。In the case where the image of the target object is an RGB image, parameter optimization can be performed based on the aforementioned two-stage optimization method including the camera optimization stage and the attitude optimization stage; The aforementioned three-stage optimization method including camera optimization stage, attitude optimization stage and point cloud optimization stage performs parameter optimization.

本方案的使用場景廣泛，可以在虛擬試衣間、虛擬主播、視頻動作遷移等場景中給出自然合理且準確的人體重建模型。This solution can be used in a wide range of scenarios, and can provide natural, reasonable and accurate human body reconstruction models in scenarios such as virtual fitting rooms, virtual anchors, and video action migration.

如圖4A所示，是本公開實施例的虛擬試衣間應用場景的示意圖。可以通過攝像頭403採集用戶401的圖像，並將採集的圖像發送給處理器（圖中未示出）進行三維人體重建，以便獲取用戶401對應的人體重建模型404，並將人體重建模型404展示在顯示界面402上供用戶401觀看。同時，用戶401可以選擇所需的服飾405，包括但不限於衣服4051和帽子4052等，可以基於人體重建模型404在顯示界面402上顯示服飾405，從而使用戶401觀看服飾405的穿戴效果。As shown in FIG. 4A , it is a schematic diagram of an application scene of a virtual fitting room according to an embodiment of the present disclosure. The image of the user 401 can be collected by the camera 403, and the collected image can be sent to a processor (not shown in the figure) for three-dimensional human body reconstruction, so as to obtain the human body reconstruction model 404 corresponding to the user 401, and the human body reconstruction model 404 displayed on the display interface 402 for the user 401 to watch. At the same time, the user 401 can select the required clothing 405, including but not limited to clothing 4051 and hat 4052, etc., and the clothing 405 can be displayed on the display interface 402 based on the human body reconstruction model 404, so that the user 401 can watch the wearing effect of the clothing 405.

如圖4B所示，是本公開實施例的虛擬直播間應用場景的示意圖。在進行直播的過程中，可以通過主播客戶端407採集主播用戶406的圖像，將主播用戶406的圖像發送至伺服器408進行三維重建，得到主播用戶的人體重建模型，即虛擬主播。伺服器408可以將主播用戶的人體重建模型返回至主播客戶端407進行展示，如圖中的模型4071所示。此外，主播客戶端407還可以採集主播用戶的語音資訊，並將語音資訊發送至伺服器408，以使伺服器408對人體重建模型以及語音資訊進行融合。伺服器408可以將融合後的人體重建模型和語音資訊發送至觀看直播節目的觀眾客戶端409進行顯示和播放，其中，顯示的人體重建模型如圖中的模型4091所示。通過上述方式，可以在觀眾客戶端409上顯示虛擬主播進行直播的畫面。As shown in FIG. 4B , it is a schematic diagram of an application scenario of a virtual live broadcast room according to an embodiment of the present disclosure. During the live broadcast, the image of the anchor user 406 can be collected by the anchor client 407, and the image of the anchor user 406 can be sent to the server 408 for three-dimensional reconstruction to obtain the human body reconstruction model of the anchor user, that is, the virtual anchor. The server 408 can return the human body reconstruction model of the anchor user to the anchor client 407 for display, as shown in the model 4071 in the figure. In addition, the host client 407 can also collect the voice information of the host user, and send the voice information to the server 408, so that the server 408 can fuse the reconstruction model of the human body and the voice information. The server 408 can send the fused human body reconstruction model and voice information to the viewer client 409 watching the live program for display and playback, wherein the displayed human body reconstruction model is shown as model 4091 in the figure. Through the above method, the live broadcast screen of the virtual anchor can be displayed on the viewer client 409 .

本領域技術人員可以理解，在具體實施方式的上述方法中，各步驟的撰寫順序並不意味著嚴格的執行順序而對實施過程構成任何限定，各步驟的具體執行順序應當以其功能和可能的內在邏輯確定。Those skilled in the art can understand that in the above method of specific implementation, the writing order of each step does not mean a strict execution order and constitutes any limitation on the implementation process. The specific execution order of each step should be based on its function and possible The inner logic is OK.

如圖5所示，本公開還提供一種三維重建裝置，所述裝置包括：As shown in FIG. 5 , the present disclosure also provides a three-dimensional reconstruction device, which includes:

第一三維重建模組501，用於通過三維重建網路對圖像中的目標對象進行三維重建，得到所述目標對象的參數的初始值，所述參數的初始值用於建立所述目標對象的三維模型；The first 3D reconstruction group 501 is configured to perform 3D reconstruction on the target object in the image through the 3D reconstruction network to obtain the initial value of the parameter of the target object, and the initial value of the parameter is used to establish the target object 3D model of

優化模組502，用於基於預先獲取的用於表示目標對象的特徵的監督資訊對所述參數的初始值進行優化，得到所述參數的優化值；An optimization module 502, configured to optimize the initial value of the parameter based on the pre-acquired supervisory information representing the characteristics of the target object, to obtain the optimized value of the parameter;

第二三維重建模組503，用於基於所述參數的優化值進行骨骼蒙皮處理，建立所述目標對象的三維模型。The second 3D reconstruction group 503 is configured to perform bone skinning processing based on the optimized values of the parameters, and establish a 3D model of the target object.

在一些實施例中，所述監督資訊包括所述目標對象的初始二維關鍵點；所述第一優化單元用於：獲取所述目標對象的三維關鍵點對應的二維投影關鍵點中屬所述目標對象的預設部位的目標二維投影關鍵點；其中，所述目標對象的三維關鍵點基於所述全域旋轉參數的初始值、關鍵點旋轉參數的初始值和體態參數的初始值得到，所述二維投影關鍵點基於所述位移參數的當前值和全域旋轉參數的初始值對所述目標對象的三維關鍵點進行投影得到；獲取所述目標二維投影關鍵點與所述初始二維關鍵點之間的第一損失；獲取所述位移參數的初始值與所述位移參數的當前值之間的第二損失；基於所述第一損失和第二損失對所述位移參數的當前值和全域旋轉參數的初始值進行優化。預設部位可以是軀幹等部位，由於不同的動作對軀幹部位的關鍵點的影響較小，因此，通過採用軀幹部位的關鍵點確定第一損失，能夠減輕不同動作對關鍵點位置的影響，提高優化結果的準確性。由於二維關鍵點是二維平面上的監督資訊，而圖像採集裝置的位移參數是三維平面上的參數，通過獲取第二損失，能夠減少優化結果落入二維平面上的局部最優點從而偏離真實點的情況。In some embodiments, the supervisory information includes the initial two-dimensional key points of the target object; the first optimization unit is configured to: obtain the location of the two-dimensional projected key points corresponding to the three-dimensional key points of the target object The target two-dimensional projection key point of the preset part of the target object; wherein, the three-dimensional key point of the target object is obtained based on the initial value of the global rotation parameter, the initial value of the key point rotation parameter and the initial value of the body shape parameter, The two-dimensional projection key point is obtained by projecting the three-dimensional key point of the target object based on the current value of the displacement parameter and the initial value of the global rotation parameter; obtaining the target two-dimensional projection key point and the initial two-dimensional a first loss between key points; obtaining a second loss between an initial value of the displacement parameter and a current value of the displacement parameter; based on the first loss and the second loss on the current value of the displacement parameter and the initial value of the global rotation parameter are optimized. The preset part can be the torso and other parts. Since different actions have little influence on the key points of the torso, the first loss can be determined by using the key points of the torso, which can reduce the influence of different actions on the position of the key points and improve Optimize the accuracy of the results. Since the two-dimensional key points are supervisory information on the two-dimensional plane, and the displacement parameters of the image acquisition device are parameters on the three-dimensional plane, by obtaining the second loss, it is possible to reduce the optimization result falling into the local optimal point on the two-dimensional plane, thereby Situations that deviate from the true point.

在一些實施例中，所述監督資訊包括所述目標對象的初始二維關鍵點和所述目標對象表面的初始三維點雲；所述第一優化單元用於：獲取所述目標對象的三維關鍵點對應的二維投影關鍵點中屬所述目標對象的預設部位的目標二維投影關鍵點；其中，所述目標對象的三維關鍵點基於所述全域旋轉參數的初始值、關鍵點旋轉參數的初始值和體態參數的初始值得到，所述二維投影關鍵點基於所述位移參數的當前值和全域旋轉參數的初始值對所述目標對象的三維關鍵點進行投影得到；獲取所述目標二維投影關鍵點與所述初始二維關鍵點之間的第一損失；獲取所述位移參數的初始值與所述位移參數的當前值之間的第二損失；獲取所述目標對象表面的第一三維點雲與所述初始三維點雲之間的第五損失；所述第一三維點雲基於所述全域旋轉參數的初始值、關鍵點旋轉參數的初始值和體態參數的初始值得到；基於所述第一損失、第二損失和第五損失對所述位移參數的當前值和全域旋轉參數的初始值進行優化。本實施例將三維點雲加入到監督資訊中對初始的各項參數進行優化，從而提高了優化結果的準確性。In some embodiments, the supervisory information includes the initial two-dimensional key points of the target object and the initial three-dimensional point cloud of the surface of the target object; the first optimization unit is configured to: acquire the three-dimensional key points of the target object Among the two-dimensional projection key points corresponding to the point, the target two-dimensional projection key point belonging to the preset part of the target object; wherein, the three-dimensional key point of the target object is based on the initial value of the global rotation parameter, the key point rotation parameter The initial value of the initial value and the initial value of the posture parameter are obtained, and the key points of the two-dimensional projection are obtained by projecting the three-dimensional key points of the target object based on the current value of the displacement parameter and the initial value of the global rotation parameter; obtaining the target The first loss between the two-dimensional projection key point and the initial two-dimensional key point; obtain the second loss between the initial value of the displacement parameter and the current value of the displacement parameter; obtain the target object surface The fifth loss between the first 3D point cloud and the initial 3D point cloud; the first 3D point cloud is obtained based on the initial value of the global rotation parameter, the initial value of the key point rotation parameter and the initial value of the body shape parameter ; optimizing the current value of the displacement parameter and the initial value of the global rotation parameter based on the first loss, the second loss and the fifth loss. In this embodiment, the three-dimensional pointfish is added to the supervisory information to optimize various initial parameters, thereby improving the accuracy of the optimization result.

在一些實施例中，本公開實施例提供的裝置具有的功能或包含的模組可以用於執行上文方法實施例描述的方法，其具體實現可以參照上文方法實施例的描述，為了簡潔，這裡不再贅述。In some embodiments, the functions or modules included in the device provided by the embodiments of the present disclosure can be used to execute the methods described in the above method embodiments, and its specific implementation can refer to the description of the above method embodiments. For the sake of brevity, I won't go into details here.

如圖6所示，本公開還提供一種三維重建系統，所述系統包括：As shown in FIG. 6, the present disclosure also provides a three-dimensional reconstruction system, which includes:

圖像採集裝置601，用於採集目標對象的圖像；以及An image acquisition device 601, configured to acquire an image of a target object; and

與所述圖像採集裝置601通訊連接的處理單元602，用於通過三維重建網路對所述圖像中的目標對象進行三維重建，得到所述目標對象的參數的初始值，所述參數的初始值用於建立所述目標對象的三維模型；基於預先獲取的用於表示目標對象的特徵的監督資訊對所述參數的初始值進行優化，得到所述參數的優化值；基於所述參數的優化值進行骨骼蒙皮處理，建立所述目標對象的三維模型。The processing unit 602 communicated with the image acquisition device 601 is configured to perform three-dimensional reconstruction on the target object in the image through the three-dimensional reconstruction network to obtain the initial value of the parameter of the target object, and the parameter The initial value is used to establish the three-dimensional model of the target object; the initial value of the parameter is optimized based on the pre-acquired supervisory information representing the characteristics of the target object to obtain the optimized value of the parameter; based on the parameter The optimized value is subjected to bone skinning processing to establish a 3D model of the target object.

本公開實施例中的圖像採集裝置601可以是相機或者攝像頭等具有圖像採集功能的設備，圖像採集裝置601採集的圖像可以即時傳輸給處理單元602，或者經過儲存，並在需要時從儲存空間傳輸到處理單元602。處理單元602可以是單個伺服器或者是由多個伺服器構成的伺服器叢集。處理單元602所執行的方法詳見前述三維重建方法的實施例，此處不再贅述。The image acquisition device 601 in the embodiment of the present disclosure may be a device with an image acquisition function such as a camera or a camera, and the images collected by the image acquisition device 601 may be transmitted to the processing unit 602 in real time, or stored, and when needed From the storage space to the processing unit 602. The processing unit 602 may be a single server or a server cluster composed of multiple servers. For the method executed by the processing unit 602, refer to the above-mentioned embodiment of the three-dimensional reconstruction method for details, and details are not repeated here.

本說明書實施例還提供一種計算機設備，其至少包括儲存器、處理器及儲存在儲存器上並可在處理器上運行的計算機程式，其中，處理器執行所述程式時實現前述任一實施例所述的方法。The embodiment of this specification also provides a computer device, which at least includes a memory, a processor, and a computer program stored in the memory and operable on the processor, wherein, when the processor executes the program, any of the foregoing embodiments is realized the method described.

圖7示出了本說明書實施例所提供的一種更為具體的計算設備硬體結構示意圖，該設備可以包括：處理器701、儲存器702、輸入/輸出介面703、通訊介面704和匯流排705。其中處理器701、儲存器702、輸入/輸出介面703和通訊介面704通過匯流排705實現彼此之間在設備內部的通訊連接。FIG. 7 shows a schematic diagram of a more specific hardware structure of a computing device provided by the embodiment of this specification. The device may include: a processor 701, a storage 702, an input/output interface 703, a communication interface 704, and a bus 705 . The processor 701 , the storage 702 , the input/output interface 703 and the communication interface 704 are connected to each other within the device through the bus bar 705 .

處理器701可以採用通用的CPU（Central Processing Unit，中央處理器）、微處理器、特殊應用積體電路（Application Specific Integrated Circuit，ASIC）、或者一個或多個基體電路等方式實現，用於執行相關程式，以實現本說明書實施例所提供的技術方案。處理器701還可以包括顯卡，所述顯卡可以是Nvidia titan X顯卡或者1080Ti顯卡等。The processor 701 may be implemented by a general-purpose CPU (Central Processing Unit, central processing unit), a microprocessor, an application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more substrate circuits, and is used to execute Relevant programs to realize the technical solutions provided by the embodiments of this specification. The processor 701 may also include a graphics card, and the graphics card may be an Nvidia titan X graphics card or a 1080Ti graphics card.

儲存器702可以採用ROM（Read Only Memory，唯讀記憶體）、RAM（Random Access Memory，隨機存取記憶體）、靜態儲存設備，動態儲存設備等形式實現。儲存器702可以儲存操作系統和其他應用程式，在通過軟體或者韌體來實現本說明書實施例所提供的技術方案時，相關的程式代碼保存在儲存器702中，並由處理器701來調用執行。The storage 702 may be implemented in the form of ROM (Read Only Memory, read-only memory), RAM (Random Access Memory, random access memory), static storage device, dynamic storage device, and the like. The storage 702 can store operating systems and other application programs. When implementing the technical solutions provided by the embodiments of this specification through software or firmware, the relevant program codes are stored in the storage 702 and called and executed by the processor 701. .

輸入/輸出介面703用於連接輸入/輸出模組，以實現資訊輸入及輸出。輸入輸出/模組可以作為組件配置在設備中（圖中未示出），也可以外接於設備以提供相應功能。其中輸入設備可以包括鍵盤、滑鼠、觸控螢幕、麥克風、各類傳感器等，輸出設備可以包括顯示器、揚聲器、振動器、指示燈等。The input/output interface 703 is used to connect the input/output module to realize information input and output. The input/output/module can be configured in the device as a component (not shown in the figure), or can be connected externally to the device to provide corresponding functions. The input devices may include keyboards, mice, touch screens, microphones, and various sensors, and the output devices may include displays, speakers, vibrators, and indicator lights.

通訊介面704用於連接通訊模組（圖中未示出），以實現本設備與其他設備的通訊互動。其中通訊模組可以通過有線方式（例如USB、網路線等）實現通訊，也可以通過無線方式（例如移動網路、WIFI、藍牙等）實現通訊。The communication interface 704 is used to connect a communication module (not shown in the figure), so as to realize the communication interaction between the device and other devices. Among them, the communication module can realize communication through wired methods (such as USB, network cable, etc.), and can also realize communication through wireless methods (such as mobile network, WIFI, Bluetooth, etc.).

匯流排705包括一通路，在設備的各個組件（例如處理器701、儲存器702、輸入/輸出介面703和通訊介面704）之間傳輸資訊。The bus 705 includes a path for transferring information between various components of the device (eg, the processor 701 , the memory 702 , the input/output interface 703 and the communication interface 704 ).

需要說明的是，儘管上述設備僅示出了處理器701、儲存器702、輸入/輸出介面703、通訊介面704以及匯流排705，但是在具體實施過程中，該設備還可以包括實現正常運行所必需的其他組件。此外，本領域的技術人員可以理解的是，上述設備中也可以僅包含實現本說明書實施例方案所必需的組件，而不必包含圖中所示的全部組件。It should be noted that although the above device only shows the processor 701, the storage 702, the input/output interface 703, the communication interface 704 and the bus bar 705, in the specific implementation process, the device may also include Additional components required. In addition, those skilled in the art can understand that the above-mentioned device may only include components necessary to implement the solutions of the embodiments of this specification, and does not necessarily include all the components shown in the figure.

本公開實施例還提供一種計算機可讀儲存媒體，其上儲存有計算機程式，該程式被處理器執行時實現前述任一實施例所述的方法。An embodiment of the present disclosure also provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the method described in any one of the foregoing embodiments is implemented.

計算機可讀媒體包括永久性和非永久性、可移動和非可移動媒體可以由任何方法或技術來實現資訊儲存。資訊可以是計算機可讀指令、數據結構、程式的模組或其他數據。計算機的儲存媒體的例子包括，但不限於相變記憶體（PRAM）、靜態隨機存取記憶體（SRAM）、動態隨機存取記憶體（DRAM）、其他類型的隨機存取記憶體（RAM）、唯讀記憶體（ROM）、電可擦除可程式唯讀記憶體（EEPROM）、快閃記憶體或其他記憶體技術、唯讀光碟唯讀儲存器（CD-ROM）、數位多功能光碟（DVD）或其他光學儲存、磁盒式磁帶，磁帶磁磁碟儲存或其他磁性儲存設備或任何其他非傳輸媒體，可用於儲存可以被計算設備訪問的資訊。按照本文中的界定，計算機可讀媒體不包括暫存電腦可讀媒體（transitory media），如調製的數據信號和載波。Computer-readable media includes both volatile and non-permanent, removable and non-removable media and can be implemented by any method or technology for storage of information. Information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM) , Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Flash memory or other memory technologies, Compact Disc Read Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, magnetic cassette, tape magnetic disk storage or other magnetic storage device or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include transitory computer readable media, such as modulated data signals and carrier waves.

通過以上的實施方式的描述可知，本領域的技術人員可以清楚地瞭解到本說明書實施例可借助軟體加必需的通用硬體平台的方式來實現。基於這樣的理解，本說明書實施例的技術方案本質上或者說對現有技術做出貢獻的部分可以以軟體產品的形式體現出來，該計算機軟體產品可以儲存在儲存媒體中，如ROM/RAM、磁碟、光碟等，包括若干指令用以使得一台計算機設備（可以是個人計算機，伺服器，或者網路設備等）執行本說明書實施例各個實施例或者實施例的某些部分所述的方法。It can be known from the above description of the implementation manners that those skilled in the art can clearly understand that the embodiments of this specification can be implemented by means of software plus a necessary general-purpose hardware platform. Based on this understanding, the essence of the technical solutions of the embodiments of this specification or the part that contributes to the prior art can be embodied in the form of software products, and the computer software products can be stored in storage media, such as ROM/RAM, magnetic discs, optical discs, etc., including several instructions to make a computer device (which may be a personal computer, server, or network device, etc.) execute the methods described in various embodiments or some parts of the embodiments of this specification.

上述實施例闡明的系統、裝置、模組或單元，具體可以由計算機晶片或實體實現，或者由具有某種功能的產品來實現。一種典型的實現設備為計算機，計算機的具體形式可以是個人計算機、筆記型計算機、蜂窩電話、相機電話、智慧電話、個人數位助理、媒體播放器、導航設備、電子郵件收發設備、遊戲控制台、平板計算機、可穿戴設備或者這些設備中的任意幾種設備的組合。The systems, devices, modules or units described in the above embodiments can be implemented by computer chips or entities, or by products with certain functions. A typical implementation device is a computer, and the specific form of the computer can be a personal computer, a notebook computer, a cellular phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email sending and receiving device, a game console, A tablet computer, a wearable device, or a combination of any of these devices.

本說明書中的各個實施例均採用遞進的方式描述，各個實施例之間相同相似的部分互相參見即可，每個實施例重點說明的都是與其他實施例的不同之處。尤其，對於裝置實施例而言，由於其基本相似於方法實施例，所以描述得比較簡單，相關之處參見方法實施例的部分說明即可。以上所描述的裝置實施例僅僅是示意性的，其中所述作為分離部件說明的模組可以是或者也可以不是實體上分開的，在實施本說明書實施例方案時可以把各模組的功能在同一個或多個軟體和/或硬體中實現。也可以根據實際的需要選擇其中的部分或者全部模組來實現本實施例方案的目的。本領域普通技術人員在不付出創造性勞動的情況下，即可以理解並實施。Each embodiment in this specification is described in a progressive manner, the same and similar parts of each embodiment can be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, as for the device embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and for relevant parts, please refer to part of the description of the method embodiment. The device embodiments described above are only illustrative, and the modules described as separate components may or may not be physically separated. When implementing the embodiments of this specification, the functions of each module may be integrated implemented in one or more software and/or hardware. Part or all of the modules can also be selected according to actual needs to achieve the purpose of the solution of this embodiment. It can be understood and implemented by those skilled in the art without creative effort.

201:通過三維重建網路對圖像中的目標對象進行三維重建，得到所述目標對象的參數的初始值，所述參數的初始值用於建立所述目標對象的三維模型 202:基於預先獲取的用於表示目標對象的特徵的監督資訊對所述參數的初始值進行優化，得到參數的優化值 203:基於所述參數的優化值進行骨骼蒙皮處理，建立所述目標對象的三維模型 401:用戶 402:顯示界面 403:攝像頭 404:人體重建模型 405:服飾 4051:衣服 4052:帽子 406:主播用戶 407:主播客戶端 4071:模型 408:伺服器 409:觀眾客戶端 4091:模型 501:第一三維重建模組 502:優化模組 503:第二三維重建模組 601:圖像採集裝置 602:處理單元 701:處理器 702:儲存器 703:輸入/輸出介面 704:通訊介面 705:匯流排 201: Perform three-dimensional reconstruction on the target object in the image through the three-dimensional reconstruction network to obtain the initial value of the parameter of the target object, and the initial value of the parameter is used to establish the three-dimensional model of the target object 202: Optimizing the initial value of the parameter based on the pre-acquired supervision information used to represent the characteristics of the target object to obtain the optimized value of the parameter 203: Carry out bone skinning processing based on the optimized value of the parameter, and establish a three-dimensional model of the target object 401: user 402: display interface 403: camera 404: Human reconstruction model 405: Apparel 4051: clothes 4052: hat 406: anchor user 407: Anchor client 4071: model 408: server 409: audience client 4091: model 501: The first 3D reconstruction group 502: Optimize the module 503: Second 3D reconstruction group 601: image acquisition device 602: processing unit 701: Processor 702: Storage 703: Input/Output Interface 704: communication interface 705: Bus

圖1A和圖1B是一些實施例的三維模型的示意圖。圖2是本公開實施例的三維重建方法的流程圖。圖3是本公開實施例的整體流程圖。圖4A和圖4B分別是本公開實施例的應用場景的示意圖。圖5是本公開實施例的三維重建裝置的框圖。圖6是本公開實施例的三維重建系統的示意圖。圖7是本公開實施例的計算機設備的結構示意圖。 1A and 1B are schematic illustrations of three-dimensional models of some embodiments. Fig. 2 is a flowchart of a three-dimensional reconstruction method according to an embodiment of the present disclosure. FIG. 3 is an overall flowchart of an embodiment of the present disclosure. FIG. 4A and FIG. 4B are schematic diagrams of application scenarios of embodiments of the present disclosure, respectively. FIG. 5 is a block diagram of a three-dimensional reconstruction device according to an embodiment of the present disclosure. FIG. 6 is a schematic diagram of a three-dimensional reconstruction system according to an embodiment of the present disclosure. FIG. 7 is a schematic structural diagram of a computer device according to an embodiment of the present disclosure.

201:通過三維重建網路對圖像中的目標對象進行三維重建，得到所述目標對象的參數的初始值，所述參數的初始值用於建立所述目標對象的三維模型 201: Perform three-dimensional reconstruction on the target object in the image through the three-dimensional reconstruction network to obtain the initial value of the parameter of the target object, and the initial value of the parameter is used to establish the three-dimensional model of the target object

202:基於預先獲取的用於表示目標對象的特徵的監督資訊對所述參數的初始值進行優化，得到參數的優化值 202: Optimizing the initial value of the parameter based on the pre-acquired supervision information used to represent the characteristics of the target object to obtain the optimized value of the parameter

203:基於所述參數的優化值進行骨骼蒙皮處理，建立所述目標對象的三維模型 203: Carry out bone skinning processing based on the optimized value of the parameter, and establish a three-dimensional model of the target object

Claims

A three-dimensional reconstruction method, said method comprising: Performing three-dimensional reconstruction on the target object in the image through a three-dimensional reconstruction network to obtain an initial value of a parameter of the target object, wherein the initial value of the parameter is used to establish a three-dimensional model of the target object; Optimizing the initial value of the parameter based on the pre-acquired supervisory information representing the characteristics of the target object to obtain the optimized value of the parameter; Skeleton skinning processing is performed based on the optimized values of the parameters, and a three-dimensional model of the target object is established.

The method according to claim 1, wherein the supervision information includes first supervision information, or the supervision information includes first supervision information and second supervision information; The first supervisory information includes at least one of the following: initial two-dimensional key points of the target object, semantic information of multiple pixels on the target object in the image; The second supervisory information includes an initial 3D point cloud of the surface of the target object.

The method according to claim 2, wherein the image includes a depth image of the target object; the method also includes: extracting depth information of the plurality of pixel points on the target object from the depth image; Back-projecting the plurality of pixel points on the target object in the depth image to a three-dimensional space based on the depth information to obtain the initial three-dimensional point cloud of the surface of the target object.

The method according to claim 3, wherein the image further includes an RGB image of the target object; the depth information of the plurality of pixels on the target object is extracted from the depth image ,include: Carry out image segmentation to described RGB image; Determining the image area where the target object is located in the RGB image based on the image segmentation result; determining the image area where the target object is located in the depth image based on the image area where the target object is located in the RGB image; Depth information of the plurality of pixels in the image area where the target object is located in the depth image is obtained.

The method according to any one of claims 2 to 4, wherein the method further comprises: Filter out outliers from the initial 3D point cloud, and use the filtered initial 3D point cloud as the second supervisory information.

The method according to any one of claim items 1 to 5, wherein the image of the target object is collected by an image acquisition device, and the parameters include: global rotation parameters of the target object, the target Key point rotation parameters of each key point of the object, body posture parameters of the target object, and displacement parameters of the image acquisition device; Optimizing initial values of the parameters based on pre-acquired supervisory information representing features of the target object, including: Under the condition that the initial value of the posture parameter and the initial value of the key point rotation parameter remain unchanged, based on the supervision information and the initial value of the displacement parameter, the displacement of the image acquisition device is optimizing the current value of the parameter and the initial value of the global rotation parameter to obtain the optimized value of the displacement parameter and the optimized value of the global rotation parameter; Based on the optimized value of the displacement parameter and the optimized value of the global rotation parameter, optimize the initial value of the key point rotation parameter and the initial value of the body posture parameter, and obtain the optimized value of the key point rotation parameter and the optimal value of the key point rotation parameter. Optimum values for the body parameters.

The method according to claim 6, wherein the supervisory information includes initial two-dimensional key points of the target object; Optimizing the current value of the displacement parameter of the image acquisition device and the initial value of the global rotation parameter based on the supervisory information and the initial value of the displacement parameter, including: Obtaining the target two-dimensional projection key points corresponding to the three-dimensional key points of the target object, which belong to the preset part of the target object, among the two-dimensional projection key points; wherein, the three-dimensional key points of the target object are based on the global rotation parameter The initial value of the initial value of the key point rotation parameter and the initial value of the body shape parameter are obtained, and the key point of the two-dimensional projection is based on the current value of the displacement parameter and the initial value of the global rotation parameter. The three-dimensional key points of the object are obtained by projection; obtaining a first loss between the target 2D projected keypoint and the initial 2D keypoint; obtaining a second loss between an initial value of the displacement parameter and a current value of the displacement parameter; The current value of the displacement parameter and the initial value of the global rotation parameter are optimized based on the first loss and the second loss.

The method according to claim 6 or 7, wherein the supervisory information includes the initial two-dimensional key points of the target object; based on the optimized value of the displacement parameter and the optimized value of the global rotation parameter, the The initial value of the key point rotation parameter and the initial value of the body shape parameter are optimized, including: Obtaining a third loss between the optimized two-dimensional projection keypoint of the target object and the initial two-dimensional keypoint, wherein the optimized two-dimensional projection keypoint is based on the optimized value of the displacement parameter and the global rotation The optimized value of the parameter is obtained by projecting the optimized 3D key point of the target object, and the optimized 3D key point is obtained based on the optimized value of the global rotation parameter, the initial value of the key point rotation parameter and the initial value of the body shape parameter worth it; Obtaining a fourth loss, the fourth loss is used to characterize the rationality of the posture corresponding to the optimized value of the global rotation parameter, the initial value of the key point rotation parameter, and the initial value of the body posture parameter; Optimizing the initial value of the key point rotation parameter and the initial value of the body shape parameter based on the third loss and the fourth loss.

The method according to any one of claim items 6 to 8, wherein the initial value of the key point rotation parameter and the body posture are adjusted based on the optimized value of the displacement parameter and the optimized value of the global rotation parameter After the initial values of the parameters are optimized, the method further includes: A joint optimization is performed on the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter, the optimized value of the body shape parameter and the optimized value of the displacement parameter.

The method according to claim 9, wherein the supervision information includes the initial two-dimensional key points of the target object and the initial three-dimensional point cloud of the surface of the target object; based on the supervision information and the displacement parameter The initial value of the initial value of the displacement parameter of the image acquisition device and the initial value of the global rotation parameter are optimized, including: Obtaining the target two-dimensional projection key points corresponding to the three-dimensional key points of the target object, which belong to the preset part of the target object, among the two-dimensional projection key points; wherein, the three-dimensional key points of the target object are based on the global rotation parameter The initial value of the initial value of the key point rotation parameter and the initial value of the body shape parameter are obtained, and the key point of the two-dimensional projection is based on the current value of the displacement parameter and the initial value of the global rotation parameter. The 3D key points of the target object are obtained by projection; obtaining a first loss between the target 2D projected keypoint and the initial 2D keypoint; obtaining a second loss between an initial value of the displacement parameter and a current value of the displacement parameter; Obtain a fifth loss between the first 3D point cloud of the surface of the target object and the initial 3D point cloud; wherein, the first 3D point cloud is based on the initial value of the global rotation parameter, the key point rotation The initial value of the parameter and the initial value of the posture parameter are obtained; The current value of the displacement parameter and the initial value of the global rotation parameter are optimized based on the first loss, the second loss and the fifth loss.

The method as described in claim item 9 or 10, characterized in that, the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter, the optimized value of the body shape parameter and the optimized value of the displacement parameter Perform joint optimization, including: Obtaining a sixth loss between the optimized two-dimensional projection keypoint of the target object and the initial two-dimensional keypoint, wherein the optimized two-dimensional projection keypoint is based on the optimized value of the displacement parameter and the global rotation The optimized value of the parameter is obtained by projecting the optimized 3D key point of the target object, and the optimized 3D key point is obtained based on the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter and the optimized body shape parameter worth it; Obtaining a seventh loss, the seventh loss is used to characterize the rationality of the posture corresponding to the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter, and the optimized value of the posture parameter; Obtain an eighth loss between the second 3D point cloud of the surface of the target object and the initial 3D point cloud; the second 3D point cloud is based on the optimized value of the global rotation parameter and the key point rotation parameter Optimal value and the optimal value of described posture parameter are obtained; Based on the sixth loss, the seventh loss and the eighth loss, the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter, the optimized value of the body shape parameter and the displacement parameter The optimized value is jointly optimized.

A computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the method described in any one of claims 1 to 11 is implemented.

A computer device, comprising a memory, a processor, and a computer program stored in the memory and operable on the processor, when the processor executes the program, the method as described in any one of claims 1 to 11 is implemented .