WO2022237249A1 - 三维重建方法、装置和系统、介质及计算机设备 - Google Patents

三维重建方法、装置和系统、介质及计算机设备 Download PDF

Info

Publication number
WO2022237249A1
WO2022237249A1 PCT/CN2022/075636 CN2022075636W WO2022237249A1 WO 2022237249 A1 WO2022237249 A1 WO 2022237249A1 CN 2022075636 W CN2022075636 W CN 2022075636W WO 2022237249 A1 WO2022237249 A1 WO 2022237249A1
Authority
WO
WIPO (PCT)
Prior art keywords
parameter
value
target object
optimized
dimensional
Prior art date
Application number
PCT/CN2022/075636
Other languages
English (en)
French (fr)
Chinese (zh)
Inventor
曹智杰
汪旻
刘文韬
钱晨
马利庄
Original Assignee
上海商汤智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海商汤智能科技有限公司 filed Critical 上海商汤智能科技有限公司
Priority to KR1020237014677A priority Critical patent/KR20230078777A/ko
Priority to JP2023525021A priority patent/JP2023547888A/ja
Publication of WO2022237249A1 publication Critical patent/WO2022237249A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/20Finite element generation, e.g. wire-frame surface description, tesselation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/97Determining parameters from multiple pictures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20036Morphological image processing
    • G06T2207/20044Skeletonization; Medial axis transform

Definitions

  • the present disclosure relates to the technical field of computer vision, and in particular to a three-dimensional reconstruction method, device and system, media and computer equipment.
  • 3D reconstruction is one of the important technologies in computer vision, and has many potential applications in fields such as augmented reality and virtual reality. By performing three-dimensional reconstruction on the target object, the posture and limb rotation of the target object can be reconstructed. However, traditional 3D reconstruction methods cannot balance the accuracy and reliability of reconstruction results.
  • the present disclosure provides a three-dimensional reconstruction method, device and system, medium and computer equipment.
  • a 3D reconstruction method comprising: performing 3D reconstruction on a target object in an image through a 3D reconstruction network to obtain an initial value of a parameter of the target object, wherein the The initial value of the parameter is used to establish the three-dimensional model of the target object; the initial value of the parameter is optimized based on the pre-acquired supervision information used to represent the characteristics of the target object to obtain the optimized value of the parameter; based on the obtained The optimized values of the above parameters are used for bone skinning processing, and the three-dimensional model of the target object is established.
  • the supervision information includes first supervision information, or the supervision information includes first supervision information and second supervision information; the first supervision information includes at least one of the following: the initial Two-dimensional key points, semantic information of multiple pixel points on the target object in the image; the second supervisory information includes an initial three-dimensional point cloud of the target object surface.
  • the initial two-dimensional key points or semantic information of pixels of the target object can be used as supervisory information to optimize the initial value of the parameter, which has high optimization efficiency and low optimization complexity; or, can also use The initial 3D point cloud of the surface of the target object and the semantic information of the aforementioned initial 2D key points or pixels are used as supervisory information, thereby improving the accuracy of the optimal value of the obtained parameters.
  • the method further includes: extracting information of initial two-dimensional key points of the target object from the image through a key point extraction network. Using the information of the initial two-dimensional key points extracted by the key point extraction network as supervision information can generate more natural and reasonable actions for the three-dimensional model.
  • the image includes a depth image of the target object; the method further includes: extracting depth information of a plurality of pixels on the target object from the depth image; A plurality of pixel points on the target object in the depth image are back-projected to a three-dimensional space to obtain an initial three-dimensional point cloud of the surface of the target object.
  • the initial three-dimensional point cloud of the target object surface can be obtained, so that the initial three-dimensional point cloud can be used as the supervision information to optimize the initial parameters. value, further improving the accuracy of parameter optimization.
  • the method further includes: filtering outliers from the initial three-dimensional point cloud, and using the filtered initial three-dimensional point cloud as the second supervisory information. By filtering the outliers, the interference of the outliers is reduced, and the accuracy of the parameter optimization process is further improved.
  • the image of the target object is acquired by an image acquisition device
  • the parameters include: the global rotation parameter of the target object, the key point rotation parameters of each key point of the target object, the target object The body parameters and the displacement parameters of the image acquisition device; the initial value of the parameter is optimized based on the pre-acquired supervision information used to represent the characteristics of the target object, including: the initial value of the body parameter and the key Under the condition that the initial value of the point rotation parameter remains unchanged, based on the supervisory information and the initial value of the displacement parameter, optimize the current value of the displacement parameter of the image acquisition device and the initial value of the global rotation parameter , to obtain the optimized value of the displacement parameter and the optimized value of the global rotation parameter; based on the optimized value of the displacement parameter and the optimized value of the global rotation parameter, the initial value of the key point rotation parameter and the initial value of the body shape parameter are performed Optimization, to obtain the optimal value of the key point rotation parameter and the optimal value of the body parameter.
  • the supervision information includes the initial two-dimensional key points of the target object; the current value of the displacement parameter of the image acquisition device based on the supervision information and the initial value of the displacement parameter
  • optimizing the initial value of the global rotation parameter includes: obtaining the target two-dimensional projection key points corresponding to the two-dimensional projection key points corresponding to the three-dimensional key points of the target object belonging to the preset position of the target object; wherein, The 3D key points of the target object are obtained based on the initial value of the global rotation parameter, the initial value of the key point rotation parameter and the initial value of the posture parameter, and the 2D projection key point is based on the current value of the displacement parameter and the global
  • the initial value of the rotation parameter is obtained by projecting the three-dimensional key point of the target object; obtaining the first loss between the target two-dimensional projection key point and the initial two-dimensional key point; obtaining the initial value of the displacement parameter and a second loss between the current value of the displacement parameter; optimizing the current value of the displacement parameter and the initial value of the global rotation
  • the preset part can be the torso and other parts. Since different actions have little influence on the key points of the torso, the first loss can be determined by using the key points of the torso, which can reduce the influence of different actions on the position of the key points and improve Optimize the accuracy of the results. Since the two-dimensional key points are supervisory information on the two-dimensional plane, and the displacement parameters of the image acquisition device are parameters on the three-dimensional plane, by obtaining the second loss, it is possible to reduce the deviation of the optimization result from falling into the local optimal point on the two-dimensional plane. The real situation.
  • the supervisory information includes the initial two-dimensional key points of the target object; the optimized value based on the displacement parameter and the optimized value of the global rotation parameter, and the initial value of the key point rotation parameter
  • Optimizing with the initial value of the posture parameter includes: obtaining the third loss between the optimized two-dimensional projection key point of the target object and the initial two-dimensional key point, and the optimized two-dimensional projection key point is based on the The optimized value of the displacement parameter and the optimized value of the global rotation parameter are obtained by projecting the optimized three-dimensional key point of the target object, and the optimized three-dimensional key point is based on the optimized value of the global rotation parameter and the initial value of the key point rotation parameter and the initial value of the posture parameter is obtained; the fourth loss is obtained, and the fourth loss is used to characterize the rationality of the posture corresponding to the optimal value of the global rotation parameter, the initial value of the key point rotation parameter, and the initial value of the posture parameter; Optimizing the initial value of the key point rotation parameter and the initial value of the body shape parameter based on the third
  • This embodiment optimizes the initial value of the key point rotation parameter and the initial value of the body shape parameter based on the optimized value of the displacement parameter and the optimized value of the global rotation parameter, which improves the stability of the optimization process.
  • the fourth loss ensures the optimization
  • the latter parameters correspond to the rationality of the pose.
  • the method further includes: after optimizing the initial value of the key point rotation parameter and the initial value of the body shape parameter based on the optimized value of the displacement parameter and the optimized value of the global rotation parameter , performing joint optimization on the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter, the optimized value of the body shape parameter and the optimized value of the displacement parameter.
  • the optimized parameters are jointly optimized, thereby further improving the accuracy of the optimization result.
  • the supervision information includes the initial two-dimensional key points of the target object and the initial three-dimensional point cloud of the surface of the target object; based on the supervision information and the initial value of the displacement parameter, the Optimizing the current value of the displacement parameter of the image acquisition device and the initial value of the global rotation parameter includes: obtaining presets belonging to the target object in the two-dimensional projection key points corresponding to the three-dimensional key points of the target object The target two-dimensional projection key point of the part; wherein, the three-dimensional key point of the target object is obtained based on the initial value of the global rotation parameter, the initial value of the key point rotation parameter and the initial value of the posture parameter, and the two-dimensional projection key The point is obtained by projecting the 3D key point of the target object based on the current value of the displacement parameter and the initial value of the global rotation parameter; obtaining the first 2D key point between the target 2D projection key point and the initial 2D key point A loss; obtain the second loss between the initial value of the displacement parameter and the current value of the displacement parameter;
  • the joint optimization of the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter, the optimized value of the posture parameter and the optimized value of the displacement parameter includes: obtaining the A sixth loss between an optimized two-dimensional projection keypoint of the target object and the initial two-dimensional keypoint, the optimized two-dimensional projection keypoint is based on the optimized value of the displacement parameter and the optimized value of the global rotation parameter on the The optimized three-dimensional key points of the target object are obtained by projection, and the optimized three-dimensional key points are obtained based on the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter, and the optimized value of the body shape parameter; the seventh loss is obtained, and the first Seven losses are used to characterize the rationality of the posture corresponding to the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter and the optimized value of the body shape parameter; the second three-dimensional point cloud of the surface of the target object is obtained and the initial The eighth loss between three-dimensional point clouds; the second three-dimensional point cloud is obtained based on
  • a 3D reconstruction device comprising: a first 3D reconstruction module, configured to perform 3D reconstruction on a target object in an image through a 3D reconstruction network, to obtain the target object
  • the initial value of the parameter wherein the initial value of the parameter is used to establish the three-dimensional model of the target object
  • the optimization module is used to adjust the initial value of the parameter based on the pre-acquired supervision information used to represent the characteristics of the target object Perform optimization to obtain the optimized value of the parameter
  • the second three-dimensional reconstruction module is used to perform bone skinning processing based on the optimized value of the parameter, and establish a three-dimensional model of the target object.
  • the supervision information includes first supervision information, or the supervision information includes first supervision information and second supervision information; the first supervision information includes at least one of the following: the initial Two-dimensional key points, semantic information of multiple pixel points on the target object in the image; the second supervisory information includes an initial three-dimensional point cloud of the target object surface.
  • the initial two-dimensional key points or semantic information of pixels of the target object can be used as supervisory information to optimize the initial value of the parameter, which has high optimization efficiency and low optimization complexity; or, can also use The initial 3D point cloud of the surface of the target object and the semantic information of the aforementioned initial 2D key points or pixels are used as supervisory information, thereby improving the accuracy of the optimal value of the obtained parameters.
  • the device further includes: a two-dimensional key point extraction module, configured to extract initial two-dimensional key point information of the target object from the image through a key point extraction network. Using the information of the initial two-dimensional key points extracted by the key point extraction network as supervision information can generate more natural and reasonable actions for the three-dimensional model.
  • the image includes a depth image of the target object; the device further includes: a depth information extraction module, configured to extract depth information of multiple pixels on the target object from the depth image a back-projection module, configured to back-project multiple pixel points on the target object in the depth image to a three-dimensional space based on the depth information, to obtain an initial three-dimensional point cloud on the surface of the target object.
  • a depth information extraction module configured to extract depth information of multiple pixels on the target object from the depth image
  • a back-projection module configured to back-project multiple pixel points on the target object in the depth image to a three-dimensional space based on the depth information, to obtain an initial three-dimensional point cloud on the surface of the target object.
  • the image further includes an RGB image of the target object;
  • the depth information extraction module includes: an image segmentation unit for performing image segmentation on the RGB image, and an image area determination unit for based on The result of image segmentation determines the image area where the target object is located in the RGB image, and determines the image area where the target object is located in the depth image based on the image area where the target object is located in the RGB image;
  • the depth information acquisition unit is used to acquire Depth information of multiple pixels in the image area where the target object is located in the depth image.
  • the device further includes: a filtering module, configured to filter out outliers from the initial 3D point cloud, and use the filtered initial 3D point cloud as the second supervisory information. By filtering the outliers, the interference of the outliers is reduced, and the accuracy of the parameter optimization process is further improved.
  • a filtering module configured to filter out outliers from the initial 3D point cloud, and use the filtered initial 3D point cloud as the second supervisory information.
  • the image of the target object is acquired by an image acquisition device
  • the parameters include: the global rotation parameter of the target object, the key point rotation parameters of each key point of the target object, the target object
  • the optimization module includes: a first optimization unit, for when the initial values of the body posture parameters and the initial values of the key point rotation parameters remain unchanged, based on The supervision information and the initial value of the displacement parameter are optimized by optimizing the current value of the displacement parameter of the image acquisition device and the initial value of the global rotation parameter to obtain an optimized value of the displacement parameter and an optimized value of the global rotation parameter
  • the second optimization unit is used to optimize the initial value of the key point rotation parameter and the initial value of the body posture parameter based on the optimal value of the displacement parameter and the optimal value of the global rotation parameter to obtain the key point rotation parameter The optimal value of and the optimal value of body parameters.
  • the supervisory information includes the initial two-dimensional key points of the target object; the first optimization unit is configured to: obtain the two-dimensional projection key points corresponding to the three-dimensional key points of the target object belonging to the The target two-dimensional projection key point of the preset part of the target object; wherein, the three-dimensional key point of the target object is obtained based on the initial value of the global rotation parameter, the initial value of the key point rotation parameter and the initial value of the body shape parameter, The two-dimensional projection key point is obtained by projecting the three-dimensional key point of the target object based on the current value of the displacement parameter and the initial value of the global rotation parameter; obtaining the target two-dimensional projection key point and the initial two-dimensional a first loss between key points; obtaining a second loss between an initial value of the displacement parameter and a current value of the displacement parameter; based on the first loss and the second loss on the current value of the displacement parameter and the initial value of the global rotation parameter for optimization.
  • the preset part can be the torso and other parts. Since different actions have little influence on the key points of the torso, the first loss can be determined by using the key points of the torso, which can reduce the influence of different actions on the position of the key points and improve Optimize the accuracy of the results. Since the two-dimensional key points are supervisory information on the two-dimensional plane, and the displacement parameters of the image acquisition device are parameters on the three-dimensional plane, by obtaining the second loss, it is possible to reduce the deviation of the optimization result from falling into the local optimal point on the two-dimensional plane. The real situation.
  • the supervisory information includes the initial two-dimensional key points of the target object; the second optimization unit is configured to: obtain the optimized two-dimensional projection key points of the target object and the initial two-dimensional key points The third loss between points, the optimized two-dimensional projection key point is obtained by projecting the optimized three-dimensional key point of the target object based on the optimized value of the displacement parameter and the optimized value of the global rotation parameter, and the optimized three-dimensional key point The point is obtained based on the optimized value of the global rotation parameter, the initial value of the key point rotation parameter and the initial value of the posture parameter; the fourth loss is obtained, and the fourth loss is used to characterize the optimal value of the global rotation parameter, the key point The rationality of the attitude corresponding to the initial value of the rotation parameter and the initial value of the posture parameter; based on the third loss and the fourth loss, the initial value of the key point rotation parameter and the initial value of the posture parameter are optimized .
  • This embodiment optimizes the initial value of the key point rotation parameter and the initial value of the body shape parameter based on the optimized value of the displacement parameter and the optimized value of the global rotation parameter, which improves the stability of the optimization process.
  • the fourth loss ensures the optimization
  • the latter parameters correspond to the rationality of the pose.
  • the device further includes: a joint optimization module, configured to perform an initial value of the key point rotation parameter and the body shape parameter based on the optimized value of the displacement parameter and the optimized value of the global rotation parameter. After the initial value of is optimized, the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter, the optimized value of the body shape parameter and the optimized value of the displacement parameter are jointly optimized. In this embodiment, on the basis of the aforementioned optimization, the optimized parameters are jointly optimized, thereby further improving the accuracy of the optimization result.
  • the supervisory information includes the initial two-dimensional key points of the target object and the initial three-dimensional point cloud of the surface of the target object;
  • the first optimization unit is configured to: acquire the three-dimensional key points of the target object Among the two-dimensional projection key points corresponding to the point, the target two-dimensional projection key point belonging to the preset part of the target object; wherein, the three-dimensional key point of the target object is based on the initial value of the global rotation parameter, the key point rotation parameter The initial value of the initial value and the initial value of the posture parameter are obtained, and the two-dimensional projection key point is obtained by projecting the three-dimensional key point of the target object based on the current value of the displacement parameter and the initial value of the global rotation parameter; obtaining the target The first loss between the two-dimensional projection key point and the initial two-dimensional key point; obtain the second loss between the initial value of the displacement parameter and the current value of the displacement parameter; obtain the target object surface The fifth loss between the first 3D point cloud and the initial 3D point cloud; the first 3D point
  • the joint optimization module includes: a first acquisition unit, configured to acquire the sixth loss between the optimized 2D projection keypoint of the target object and the initial 2D keypoint, the optimization The two-dimensional projection key point is obtained by projecting the optimized three-dimensional key point of the target object based on the optimized value of the displacement parameter and the optimized value of the global rotation parameter, and the optimized three-dimensional key point is based on the optimized value of the global rotation parameter, The optimized value of the key point rotation parameter and the optimized value of the posture parameter are obtained; the second acquisition unit is used to obtain the seventh loss, and the seventh loss is used to represent the optimized value of the global rotation parameter and the optimized value of the key point rotation parameter value and the rationality of the posture corresponding to the optimized value of the body posture parameter; the third acquisition unit is used to acquire the eighth loss between the second 3D point cloud on the surface of the target object and the initial 3D point cloud; the first 3D point cloud The 2D and 3D point cloud is obtained based on the optimized value of the global rotation parameter, the optimized value of the key point
  • a three-dimensional reconstruction system comprising: an image acquisition device, configured to acquire an image of a target object; and a processing unit communicatively connected to the image acquisition device, configured to The three-dimensional reconstruction network performs three-dimensional reconstruction on the target object in the image to obtain the initial value of the parameter of the target object, and the initial value of the parameter is used to establish the three-dimensional model of the target object; Optimizing the initial value of the parameter based on the supervisory information representing the characteristics of the target object to obtain the optimized value of the parameter; performing bone skinning processing based on the optimized value of the parameter to establish a three-dimensional model of the target object.
  • a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the method described in any embodiment is implemented.
  • a computer device including a memory, a processor, and a computer program stored in the memory and operable on the processor.
  • the processor executes the computer program, any The method described in the examples.
  • a computer program product is provided, the computer program product is stored in a storage medium and includes a computer program that can run on a processor, and when the processor executes the computer program, any A method described in one embodiment.
  • the initial value of the parameter is obtained by three-dimensionally reconstructing the image of the target object through the three-dimensional reconstruction network, and then the initial value of the parameter is optimized based on the supervisory information, and the optimized value of the parameter obtained based on the parameter optimization is established.
  • 3D model of the target object The advantage of the parameter optimization method is that it can give more accurate 3D reconstruction results that conform to the 2D observation characteristics of the image, but it often gives unnatural and unreasonable action results with low reliability. Network regression through the 3D reconstruction network can give more natural and reasonable action results. Therefore, optimizing the output of the 3D reconstruction network as the initial value of the parameters can ensure the reliability of the 3D reconstruction results. Accuracy of 3D reconstruction.
  • FIG. 1A and 1B are schematic illustrations of three-dimensional models of some embodiments.
  • Fig. 2 is a flowchart of a three-dimensional reconstruction method according to an embodiment of the present disclosure.
  • FIG. 3 is an overall flowchart of an embodiment of the present disclosure.
  • FIG. 4A and FIG. 4B are schematic diagrams of application scenarios of embodiments of the present disclosure, respectively.
  • FIG. 5 is a block diagram of a three-dimensional reconstruction device according to an embodiment of the present disclosure.
  • FIG. 6 is a schematic diagram of a three-dimensional reconstruction system according to an embodiment of the present disclosure.
  • FIG. 7 is a schematic structural diagram of a computer device according to an embodiment of the present disclosure.
  • first, second, third, etc. may be used in the present disclosure to describe various information, the information should not be limited to these terms. These terms are only used to distinguish information of the same type from one another. For example, without departing from the scope of the present disclosure, first information may also be called second information, and similarly, second information may also be called first information. Depending on the context, the word “if” as used herein may be interpreted as “at” or “when” or “in response to a determination.”
  • the 3D reconstruction of the target object needs to reconstruct the body posture and limb rotation of the target object.
  • a parametric model is used to express the body posture and limb rotation of the target object, not just the 3D key points.
  • a 3D model of a thinner person as shown in Figure 1A
  • a 3D model of a fatter person as shown in Figure 1B
  • the person shown in Figure 1B is in the same posture as the person shown in Figure 1B, and the key point information is the same, and the difference in posture between the two cannot be represented only through the key point information.
  • 3D reconstruction is generally carried out by means of parameter optimization and network regression.
  • the parameter optimization method usually selects a set of standard parameters, and uses the gradient descent method to iteratively optimize the initial value of the parameters of the 3D model of the target object according to the 2D visual features of the image of the target object, where the 2D visual features of the image can be Select 2D keypoints, etc.
  • the advantage of the parameter optimization method is that it can give more accurate parameter estimation results that conform to the two-dimensional visual characteristics of the image, but it often gives unnatural and unreasonable action results, and the final performance of parameter optimization is very dependent on the initial value of the parameters , leading to low reliability of the 3D reconstruction method based on parameter optimization.
  • Methods for network regression typically train an end-to-end neural network to learn the mapping from images to 3D model parameters.
  • the advantage of the network regression method is that it can give more natural and reasonable action results.
  • the 3D reconstruction results may not match the 2D visual features in the image. Therefore, the accuracy of the 3D reconstruction method based on network regression is relatively low. Low.
  • the 3D reconstruction method in the related art cannot take into account the accuracy and reliability of the 3D reconstruction results.
  • an embodiment of the present disclosure provides a three-dimensional reconstruction method, as shown in FIG. 2 , the method includes:
  • Step 201 Perform 3D reconstruction on the target object in the image through a 3D reconstruction network to obtain initial values of parameters of the target object, wherein the initial values of the parameters are used to establish a 3D model of the target object;
  • Step 202 Optimizing the initial value of the parameter based on the pre-acquired supervisory information representing the characteristics of the target object to obtain the optimized value of the parameter;
  • Step 203 Perform bone skinning processing based on the optimized values of the parameters, and establish a 3D model of the target object.
  • the target object may be a three-dimensional object, such as a person, an animal, a robot, etc. in a physical space, or one or more regions on the three-dimensional object, such as a human face or a limb.
  • the target object is a human being
  • the three-dimensional reconstruction performed on the target object is a human body reconstruction as an example for description.
  • the image of the target object may be a single image, or may include multiple images obtained by shooting the target object from multiple different angles of view.
  • 3D human body reconstruction based on a single image is called monocular 3D human body reconstruction, and 3D human body reconstruction based on multiple images from different perspectives is called multi-eye 3D human body reconstruction.
  • Each image can be a grayscale image, RGB image or RGBD image.
  • the image may be an image collected in real time by an image acquisition device (for example, a camera or a camera) around the target object, or an image collected and stored in advance.
  • the image of the target object can be reconstructed in 3D through a 3D reconstruction network, wherein the 3D reconstruction network can be a pre-trained neural network.
  • the 3D reconstruction network can perform 3D reconstruction based on images, and estimate the initial values of natural and reasonable parameters.
  • the initial values of the parameters here can be represented by a vector.
  • the dimension of the vector can be 85 dimensions, for example, and the vector contains
  • the rotation information of the moving limbs of the human body that is, the initial value of the posture parameters, including the initial values of the global rotation parameters of the human body and the initial values of the key point rotation parameters of 23 key points
  • the human body can be represented by key points and limb bones connecting these key points.
  • the key points of the human body can include the top of the head, nose, neck, left and right eyes, left and right ears, chest, left and right shoulders, left and right elbows, left and right wrists, left and right hips, left and right buttocks, One or more of key points such as left and right knees, left and right ankles, etc., the initial value of the pose parameter is used to determine the position of the key points of the human body in three-dimensional space.
  • the initial value of the body shape parameter is used to determine body shape information such as height, shortness, fatness, and thinness of the human body.
  • the initial value of the parameter of the camera is used to determine the absolute position of the human body in the three-dimensional space under the camera coordinate system, and the parameter of the camera includes a displacement parameter between the camera and the human body and a posture parameter of the camera, wherein the posture parameter of the camera is
  • the initial value can be replaced by the initial value of the global rotation parameter of the human body.
  • the parameters of the human body can be expressed using a parametric form of a Skinned Multi-Person Linear (SMPL) model (referred to as SMPL parameters).
  • SMPL parameters Skinned Multi-Person Linear
  • the bone skinning process can be performed based on the value of the SMPL parameter, that is, a mapping function M( ⁇ , ⁇ ) is used to map the initial value of the body parameter and the initial value of the attitude parameter to the three-dimensional model of the human body surface , the 3D model includes 6890 vertices, and the vertices form a triangular patch through a fixed connection relationship.
  • a pre-trained regressor W can be used to further regress the 3D key points of the human body from the vertices of the human surface model which is:
  • the supervisory information can be two-dimensional visual features of the image (also called two-dimensional observation features), for example, the two-dimensional key points of the target object in the image and the semantics of multiple pixel points on the target object at least one of the information.
  • the semantic information of a pixel is used to represent which area the pixel is located on the target object, and the area may be, for example, the area where the head, arm, torso, leg, etc. are located.
  • the two-dimensional key point extraction network can be used to estimate the position of human key points in the image.
  • any two-dimensional pose estimation method can be used, such as OpenPose.
  • 2D visual features and the initial 3D point cloud of the target object surface can also be used as supervision information to further improve the accuracy of 3D reconstruction.
  • the depth information of multiple pixels on the target object can be extracted from the depth image, and the Multiple pixel points on the target object in the depth image are projected into a three-dimensional space to obtain an initial three-dimensional point cloud on the surface of the target object.
  • the plurality of pixels may be part or all of the pixels on the target object in the image.
  • it may include pixel points of various areas on the target object that need to be three-dimensionally reconstructed, and the number of pixel points in each area should be greater than or equal to the number required for three-dimensional reconstruction.
  • the image generally includes both the target object and the background area. Therefore, image segmentation can be performed on the RGB image included in the image, the image area where the target object is located in the RGB image is obtained, and the target object in the depth image is determined based on the image area where the target object is located in the RGB image. the image area; acquiring depth information of multiple pixels in the image area where the target object is located in the depth image.
  • the pixels in the depth image correspond one-to-one to the pixels in the RGB image.
  • the image may also be an RGBD image.
  • outliers can also be filtered out from the 3D point cloud (ie, the initial 3D point cloud), and the supervision information can include the filtered 3D point cloud.
  • the filtering can be implemented using a point cloud filter. By filtering out outliers, a finer 3D point cloud of the surface of the target object can be obtained, thereby further improving the accuracy of 3D reconstruction.
  • each target 3D point in the 3D point cloud For each target 3D point in the 3D point cloud, obtain the average distance from the n 3D points nearest to the target 3D point to the target 3D point, assuming that the average distance corresponding to each target 3D point obeys a statistical distribution (for example, Gaussian distribution), the mean and variance of the statistical distribution can be calculated, and a threshold s can be set based on the mean and variance, then the three-dimensional points whose average distance is outside the range of the threshold s can be regarded as outliers and analyzed from the three-dimensional filtered from the point cloud.
  • a statistical distribution for example, Gaussian distribution
  • the initial values of the parameters can be iteratively optimized using the two-dimensional observation features as supervisory information.
  • the image is an RGBD image
  • the two-dimensional observation features and the three-dimensional point cloud of the surface of the target object can be used as supervisory information to iteratively optimize the initial value of the parameter.
  • the optimization method may, for example, use a gradient descent method, which is not limited in the present disclosure.
  • bone skinning processing may be performed based on the optimized values of the parameters to obtain a three-dimensional model of the target object.
  • the RGB image can be reconstructed three-dimensionally through the three-dimensional reconstruction network to obtain the human body parameter value of the person in the image, and the key point extraction network can be used to extract the key points of the person in the image to obtain the two-dimensional human body key point.
  • the human body parameter value is used as the initial value of the parameter
  • the two-dimensional key points of the human body are used as the supervision information
  • the initial value of the human body parameter is optimized through the parameter optimization module to obtain the optimized value of the human body parameter, and based on the optimized value of the human body parameter. Skinning processing to obtain the human body reconstruction model.
  • the image can be decomposed into an RGB image and a TOF (Time of Flight, time of flight) depth map.
  • the TOF depth map includes the depth information of each pixel in the RGB image.
  • the RGB image can be reconstructed three-dimensionally through the three-dimensional reconstruction network to obtain the human body parameter value of the person in the image, and the key point extraction network can be used to extract the key point of the person in the image to obtain the two-dimensional key point of the human body.
  • the point cloud reconstruction module can also be used to reconstruct the surface point cloud of the human body based on the depth information in the TOF depth map.
  • the human body parameter value is used as the initial value of the parameter, and the two-dimensional key points of the human body and the point cloud of the human body surface are jointly used as supervision information.
  • the optimal value of the parameters is processed by bone skinning to obtain the human body reconstruction model.
  • color processing may be performed on the human body reconstruction model based on the color information in the RGB image or the RGBD image, so that the human body reconstruction model matches the color information of the person in the image.
  • the target object in the image is reconstructed three-dimensionally through the three-dimensional reconstruction network, so as to obtain the initial value of the parameter, and then optimize the initial value of the parameter based on the supervision information, and establish the target object based on the optimized value of the parameter 3D model of .
  • the advantage of the parameter optimization method is that it can give more accurate 3D reconstruction results that conform to the 2D observation characteristics of the image, but it often gives unnatural and unreasonable action results with low reliability.
  • the network regression through the 3D reconstruction network can give more natural and reasonable action results. Therefore, using the output of the 3D reconstruction network as the initial value of the parameters for parameter optimization can ensure the reliability of the 3D reconstruction results. Taking into account the accuracy of 3D reconstruction.
  • a multi-stage optimization method may be used in the parameter optimization stage.
  • the multi-stage optimization method may include a camera optimization stage and a pose optimization stage.
  • the optimization targets are the value R of the global rotation parameter and the current value t of the displacement parameter between the image acquisition device and the target object.
  • t and R are three-dimensional vectors, and R is expressed in the form of axis and angle.
  • the optimization targets are the values of key point rotation parameters and body posture parameters.
  • the current displacement parameter of the image acquisition device value and the initial value of the global rotation parameter are optimized to obtain the optimal value of the displacement parameter and the optimal value of the global rotation parameter; then keep the optimal value of the displacement parameter and the optimal value of the global rotation parameter unchanged, and based on the
  • the optimized value and the optimized value of the global rotation parameter are optimized by optimizing the initial value of the key point rotation parameter and the initial value of the body shape parameter to obtain the optimized value of the key point rotation parameter and the optimized value of the body shape parameter.
  • target 2D projection key points belonging to preset parts of the target object can be acquired; wherein, the 3D key points of the target object are based on the The initial value of the global rotation parameter, the initial value of the key point rotation parameter and the initial value of the body shape parameter are obtained; the two-dimensional projection key point is based on the current value of the displacement parameter and the initial value of the global rotation parameter for the target object
  • the 3D key points of are obtained by projection.
  • a first loss between the target 2D projection keypoint and the initial 2D keypoint is obtained.
  • a second loss between an initial value of the displacement parameter and a current value of the displacement parameter is obtained.
  • the current value of the displacement parameter and the initial value of the global rotation parameter are optimized based on the first loss and the second loss.
  • the preset part may be a trunk part
  • the key points of the target two-dimensional projection may include left and right shoulder points, left and right hip points, spine center points and other key points. Since different actions have less influence on the key points of the torso, by using the key points of the torso to establish the first loss, the influence of different actions on the position of the key points can be reduced and the accuracy of the optimization result can be improved.
  • the first loss can also be called torso key point projection loss
  • the second loss can also be called camera displacement regularization loss.
  • the first loss can be obtained by the following formula (1)
  • the second loss can be obtained by the following formula (2) get:
  • L torso and L cam denote the first loss and the second loss respectively
  • t and t net represent the current value of the displacement parameter between the image acquisition device and the target object and the initial value of the displacement parameter respectively.
  • the first target loss L 1 can be determined based on the first loss and the second loss.
  • the first target loss can be determined as the sum of the first loss and the second loss, which can be determined by the following formula (3) Sure:
  • a third loss between an optimized 2D projection keypoint of the target object and the initial 2D keypoint may be obtained, wherein the optimized 2D projection keypoint is based on an optimized value of the displacement parameter and a global rotation parameter
  • the optimized value of is obtained by projecting the optimized 3D key point of the target object, and the optimized 3D key point is obtained based on the optimized value of the global rotation parameter, the initial value of the key point rotation parameter and the initial value of the body shape parameter.
  • the fourth loss is obtained, and the fourth loss is used to characterize the rationality of the posture corresponding to the optimal value of the global rotation parameter, the initial value of the key point rotation parameter and the initial value of the body posture parameter. Optimizing the initial value of the key point rotation parameter and the initial value of the body shape parameter based on the third loss and the fourth loss.
  • the third loss can also be called the two-dimensional key point projection loss
  • the fourth loss can also be called the attitude rationality loss
  • the third loss can be determined by the following formula (4):
  • L 2d is the third loss
  • x and represent the optimized two-dimensional projection key points and the initial two-dimensional key points respectively.
  • the second target loss may be determined based on the third loss and the fourth loss.
  • the second target loss may be determined as the sum of the third loss and the fourth loss, which may be determined by the following formula (5):
  • L 2 is the second target loss
  • L prior is the fourth loss, which can be obtained by using a Gaussian Mixture Model (GMM), which is used to judge the optimal value of the global rotation parameter, the initial and body posture of the key point rotation parameter Whether the attitude corresponding to the initial value of the parameter is reasonable, and output a large loss for the unreasonable attitude.
  • GMM Gaussian Mixture Model
  • the optimized value of the global rotation parameter may also be optimized.
  • the optimized value of the key point rotation parameter, the optimized value of the body shape parameter and the optimized value of the displacement parameter are jointly optimized, that is, a three-stage optimization method is adopted.
  • the supervision information includes the information of the 3D point cloud on the surface of the target object
  • the three-stage optimization method can be adopted, including the camera optimization stage, the attitude optimization stage and the point cloud optimization stage.
  • target two-dimensional projection key points belonging to preset parts of the target object among the two-dimensional projection key points corresponding to the three-dimensional key points of the target object can be obtained; wherein, the three-dimensional key points of the target object Based on the initial value of the global rotation parameter, the initial value of the key point rotation parameter and the initial value of the body shape parameter, the two-dimensional projection key point is based on the current value of the displacement parameter and the initial value of the global rotation parameter.
  • the 3D key points of the target object are obtained by projection.
  • a first loss between the target 2D projection keypoint and the initial 2D keypoint is obtained.
  • a second loss between an initial value of the displacement parameter and a current value of the displacement parameter is obtained.
  • the fifth loss can also be called the nearest point iteration (Iterative Closest Point, ICP) point cloud registration loss, which can be determined by the following formula (6):
  • L icp is the fifth loss
  • the initial 3D point cloud is regarded as point cloud P
  • the first 3D point cloud is regarded as point cloud Q
  • K 1 ⁇ (p,q) ⁇
  • Each point in point cloud P is a set of point pairs formed by the closest point in point cloud Q
  • K 2 ⁇ (p,q) ⁇ is the point pair set from each point in point cloud Q to the closest point in point cloud P
  • the first loss and the second loss are represented by the following formula (7) and formula (8) respectively:
  • L torso and L cam denote the first loss and the second loss respectively
  • x torso and respectively represent the target two-dimensional projection key point and the initial two-dimensional key point
  • t and t net represent the current value of the displacement parameter and the initial value of the displacement parameter respectively.
  • the first target loss L 1 can be determined based on the sum of the first loss, the second loss and the fifth loss, and then optimize the current value of the displacement parameter and the initial value of the global rotation parameter based on the first target loss, that is, as The following formula (9):
  • L 1 L torso +L cam +L icp (9).
  • the attitude optimization stage in the three-stage optimization process is the same as the attitude optimization stage in the two-stage optimization process, and will not be repeated here.
  • the sixth loss between the optimized 2D projection keypoint of the target object and the initial 2D keypoint can be obtained, wherein the optimized 2D projection keypoint is based on the displacement parameter
  • the optimized value and the optimized value of the global rotation parameter are obtained by projecting the optimized three-dimensional key point of the target object, and the optimized three-dimensional key point is obtained based on the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter and the body shape parameter.
  • the optimized value is obtained.
  • a seventh loss is obtained, and the seventh loss is used to characterize the rationality of the posture corresponding to the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter, and the optimized value of the posture parameter.
  • the initial two-dimensional keypoint To optimize the 2D projection keypoints, is the initial two-dimensional keypoint.
  • the seventh loss can be obtained by using a Gaussian mixture model, which is used to judge whether the posture corresponding to the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter, and the optimized value of the posture parameter is reasonable, and outputs a large loss for an unreasonable posture .
  • P is the initial 3D point cloud as a point cloud
  • the second 3D point cloud For each point in the point cloud P to the point cloud A set of point pairs consisting of the closest points in the middle, for the point cloud A set of point pairs from each point in point cloud P to the nearest point in point cloud P.
  • the sum of the sixth loss, the seventh loss and the eighth loss can be determined as the third target loss L 3 , and based on the third target loss, optimize the value of the global rotation parameter, the key point rotation parameter
  • the optimal value of the optimized value, the optimized value of the body shape parameter and the optimized value of the displacement parameter are jointly optimized, and can be jointly optimized by the following formula (12):
  • parameter optimization can be performed based on the aforementioned two-stage optimization method including the camera optimization stage and the attitude optimization stage;
  • the parameters are optimized by the three-stage optimization method of the stage, attitude optimization stage and point cloud optimization stage.
  • This solution can be used in a wide range of scenarios, and can provide natural, reasonable and accurate human body reconstruction models in scenarios such as virtual fitting rooms, virtual anchors, and video action migration.
  • FIG. 4A it is a schematic diagram of an application scene of a virtual fitting room according to an embodiment of the present disclosure.
  • the image of the user 401 can be collected by the camera 403, and the collected image is sent to a processor (not shown in the figure) for three-dimensional human body reconstruction, so as to obtain the human body reconstruction model 404 corresponding to the user 401, and the human body reconstruction model 404 is displayed on
  • the display interface 402 is for the user 401 to watch.
  • the user 401 can select the required clothing 405, including but not limited to clothing 4051 and hat 4052, etc., and the clothing 405 can be displayed on the display interface 402 based on the human body reconstruction model 404, so that the user 401 can watch the wearing effect of the clothing 405.
  • FIG. 4B it is a schematic diagram of an application scenario of a virtual live broadcast room according to an embodiment of the present disclosure.
  • the image of the anchor user 406 can be collected through the anchor client 407, and the image of the anchor user 406 can be sent to the server 408 for three-dimensional reconstruction to obtain the human body reconstruction model of the anchor user, that is, the virtual anchor.
  • the server 408 can return the human body reconstruction model of the host user to the host client 407 for display, as shown in the model 4071 in the figure.
  • the host client 407 can also collect the voice information of the host user, and send the voice information to the server 408, so that the server 408 can fuse the reconstruction model of the human body and the voice information.
  • the server 408 can send the fused human body reconstruction model and voice information to the viewer client 409 watching the live program for display and playback, wherein the displayed human body reconstruction model is shown as model 4091 in the figure.
  • the live broadcast screen of the virtual anchor can be displayed on the viewer client 409 .
  • the writing order of each step does not mean a strict execution order and constitutes any limitation on the implementation process.
  • the specific execution order of each step should be based on its function and possible
  • the inner logic is OK.
  • the present disclosure also provides a three-dimensional reconstruction device, which includes:
  • the first three-dimensional reconstruction module 501 is configured to perform three-dimensional reconstruction on the target object in the image through a three-dimensional reconstruction network to obtain an initial value of a parameter of the target object, and the initial value of the parameter is used to establish a three-dimensional model of the target object ;
  • An optimization module 502 configured to optimize the initial value of the parameter based on the pre-acquired supervisory information used to represent the characteristics of the target object, to obtain the optimized value of the parameter;
  • the second three-dimensional reconstruction module 503 is configured to perform bone skinning processing based on the optimized values of the parameters, and establish a three-dimensional model of the target object.
  • the supervision information includes first supervision information, or the supervision information includes first supervision information and second supervision information; the first supervision information includes at least one of the following: the initial Two-dimensional key points, semantic information of multiple pixel points on the target object in the image; the second supervisory information includes an initial three-dimensional point cloud of the target object surface.
  • the initial two-dimensional key points or semantic information of pixels of the target object can be used as supervisory information to optimize the initial value of the parameter, which has high optimization efficiency and low optimization complexity; or, can also use The initial 3D point cloud of the surface of the target object and the semantic information of the aforementioned initial 2D key points or pixels are used as supervisory information, thereby improving the accuracy of the optimal value of the obtained parameters.
  • the device further includes: a two-dimensional key point extraction module, configured to extract initial two-dimensional key point information of the target object from the image through a key point extraction network. Using the information of the initial two-dimensional key points extracted by the key point extraction network as supervision information can generate more natural and reasonable actions for the three-dimensional model.
  • the image includes a depth image of the target object; the device further includes: a depth information extraction module, configured to extract depth information of multiple pixels on the target object from the depth image a back-projection module, configured to back-project multiple pixel points on the target object in the depth image to a three-dimensional space based on the depth information, to obtain an initial three-dimensional point cloud on the surface of the target object.
  • a depth information extraction module configured to extract depth information of multiple pixels on the target object from the depth image
  • a back-projection module configured to back-project multiple pixel points on the target object in the depth image to a three-dimensional space based on the depth information, to obtain an initial three-dimensional point cloud on the surface of the target object.
  • the image further includes an RGB image of the target object;
  • the depth information extraction module includes: an image segmentation unit for performing image segmentation on the RGB image, and an image area determination unit for based on The result of image segmentation determines the image area where the target object is located in the RGB image, and determines the image area where the target object is located in the depth image based on the image area where the target object is located in the RGB image;
  • the depth information acquisition unit is used to acquire Depth information of multiple pixels in the image area where the target object is located in the depth image.
  • the device further includes: a filtering module, configured to filter out outliers from the initial 3D point cloud, and use the filtered initial 3D point cloud as the second supervisory information. By filtering the outliers, the interference of the outliers is reduced, and the accuracy of the parameter optimization process is further improved.
  • a filtering module configured to filter out outliers from the initial 3D point cloud, and use the filtered initial 3D point cloud as the second supervisory information.
  • the image of the target object is acquired by an image acquisition device
  • the parameters include: the global rotation parameter of the target object, the key point rotation parameters of each key point of the target object, the target object
  • the optimization module includes: a first optimization unit, for when the initial values of the body posture parameters and the initial values of the key point rotation parameters remain unchanged, based on The supervision information and the initial value of the displacement parameter are optimized by optimizing the current value of the displacement parameter of the image acquisition device and the initial value of the global rotation parameter to obtain an optimized value of the displacement parameter and an optimized value of the global rotation parameter
  • the second optimization unit is used to optimize the initial value of the key point rotation parameter and the initial value of the body posture parameter based on the optimal value of the displacement parameter and the optimal value of the global rotation parameter to obtain the key point rotation parameter The optimal value of and the optimal value of body parameters.
  • the supervisory information includes the initial two-dimensional key points of the target object; the first optimization unit is configured to: obtain the two-dimensional projection key points corresponding to the three-dimensional key points of the target object belonging to the The target two-dimensional projection key point of the preset part of the target object; wherein, the three-dimensional key point of the target object is obtained based on the initial value of the global rotation parameter, the initial value of the key point rotation parameter and the initial value of the body shape parameter, The two-dimensional projection key point is obtained by projecting the three-dimensional key point of the target object based on the current value of the displacement parameter and the initial value of the global rotation parameter; obtaining the target two-dimensional projection key point and the initial two-dimensional a first loss between key points; obtaining a second loss between an initial value of the displacement parameter and a current value of the displacement parameter; based on the first loss and the second loss on the current value of the displacement parameter and the initial value of the global rotation parameter for optimization.
  • the preset part can be the torso and other parts. Since different actions have little influence on the key points of the torso, the first loss can be determined by using the key points of the torso, which can reduce the influence of different actions on the position of the key points and improve Optimize the accuracy of the results. Since the two-dimensional key points are supervisory information on the two-dimensional plane, and the displacement parameters of the image acquisition device are parameters on the three-dimensional plane, by obtaining the second loss, it is possible to reduce the deviation of the optimization result from falling into the local optimal point on the two-dimensional plane. The real situation.
  • the supervisory information includes the initial two-dimensional key points of the target object; the second optimization unit is configured to: obtain the optimized two-dimensional projection key points of the target object and the initial two-dimensional key points The third loss between points, the optimized two-dimensional projection key point is obtained by projecting the optimized three-dimensional key point of the target object based on the optimized value of the displacement parameter and the optimized value of the global rotation parameter, and the optimized three-dimensional key point The point is obtained based on the optimized value of the global rotation parameter, the initial value of the key point rotation parameter and the initial value of the posture parameter; the fourth loss is obtained, and the fourth loss is used to characterize the optimal value of the global rotation parameter, the key point The rationality of the attitude corresponding to the initial value of the rotation parameter and the initial value of the posture parameter; based on the third loss and the fourth loss, the initial value of the key point rotation parameter and the initial value of the posture parameter are optimized .
  • This embodiment optimizes the initial value of the key point rotation parameter and the initial value of the body shape parameter based on the optimized value of the displacement parameter and the optimized value of the global rotation parameter, which improves the stability of the optimization process.
  • the fourth loss ensures the optimization
  • the latter parameters correspond to the rationality of the pose.
  • the device further includes: a joint optimization module, configured to perform an initial value of the key point rotation parameter and the body shape parameter based on the optimized value of the displacement parameter and the optimized value of the global rotation parameter. After the initial value of is optimized, the optimized value of the global rotation parameter, the optimized value of the key point rotation parameter, the optimized value of the body shape parameter and the optimized value of the displacement parameter are jointly optimized. In this embodiment, on the basis of the aforementioned optimization, the optimized parameters are jointly optimized, thereby further improving the accuracy of the optimization result.
  • the supervisory information includes the initial two-dimensional key points of the target object and the initial three-dimensional point cloud of the surface of the target object;
  • the first optimization unit is configured to: acquire the three-dimensional key points of the target object Among the two-dimensional projection key points corresponding to the point, the target two-dimensional projection key point belonging to the preset part of the target object; wherein, the three-dimensional key point of the target object is based on the initial value of the global rotation parameter, the key point rotation parameter The initial value of the initial value and the initial value of the posture parameter are obtained, and the two-dimensional projection key point is obtained by projecting the three-dimensional key point of the target object based on the current value of the displacement parameter and the initial value of the global rotation parameter; obtaining the target The first loss between the two-dimensional projection key point and the initial two-dimensional key point; obtain the second loss between the initial value of the displacement parameter and the current value of the displacement parameter; obtain the target object surface The fifth loss between the first 3D point cloud and the initial 3D point cloud; the first 3D point
  • the joint optimization module includes: a first acquisition unit, configured to acquire the sixth loss between the optimized 2D projection keypoint of the target object and the initial 2D keypoint, the optimization The two-dimensional projection key point is obtained by projecting the optimized three-dimensional key point of the target object based on the optimized value of the displacement parameter and the optimized value of the global rotation parameter, and the optimized three-dimensional key point is based on the optimized value of the global rotation parameter, The optimized value of the key point rotation parameter and the optimized value of the posture parameter are obtained; the second acquisition unit is used to obtain the seventh loss, and the seventh loss is used to represent the optimized value of the global rotation parameter and the optimized value of the key point rotation parameter value and the rationality of the posture corresponding to the optimized value of the body posture parameter; the third acquisition unit is used to acquire the eighth loss between the second 3D point cloud on the surface of the target object and the initial 3D point cloud; the first 3D point cloud The 2D and 3D point cloud is obtained based on the optimized value of the global rotation parameter, the optimized value of the key point
  • the functions or modules included in the device provided by the embodiments of the present disclosure can be used to execute the methods described in the method embodiments above, and its specific implementation can refer to the description of the method embodiments above. For brevity, here No longer.
  • the present disclosure also provides a three-dimensional reconstruction system, which includes:
  • An image acquisition device 601 configured to acquire an image of a target object
  • the processing unit 602 communicated with the image acquisition device 601 is configured to perform three-dimensional reconstruction on the target object in the image through the three-dimensional reconstruction network to obtain the initial value of the parameter of the target object, and the initial value of the parameter is used To establish a three-dimensional model of the target object; optimize the initial value of the parameter based on the pre-acquired supervisory information used to represent the characteristics of the target object to obtain the optimized value of the parameter; Skeletal skinning processing to establish a 3D model of the target object.
  • the image acquisition device 601 in the embodiment of the present disclosure may be a device with an image acquisition function such as a camera or a camera, and the images collected by the image acquisition device 601 may be transmitted to the processing unit 602 in real time, or stored, and transmitted from the storage space when needed to processing unit 602.
  • the processing unit 602 may be a single server or a server cluster composed of multiple servers. For the method executed by the processing unit 602, refer to the above-mentioned embodiment of the three-dimensional reconstruction method for details, and details are not repeated here.
  • the embodiment of this specification also provides a computer device, which at least includes a memory, a processor, and a computer program stored on the memory and operable on the processor, wherein, when the processor executes the program, the computer program described in any of the preceding embodiments is implemented. described method.
  • FIG. 7 shows a schematic diagram of a more specific hardware structure of a computing device provided by the embodiment of this specification.
  • the device may include: a processor 701 , a memory 702 , an input/output interface 703 , a communication interface 704 and a bus 705 .
  • the processor 701 , the memory 702 , the input/output interface 703 and the communication interface 704 are connected to each other within the device through the bus 705 .
  • the processor 701 may be implemented by a general-purpose CPU (Central Processing Unit, central processing unit), a microprocessor, an application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, and is used to execute related programs to realize the technical solutions provided by the embodiments of this specification.
  • the processor 701 may also include a graphics card, and the graphics card may be an Nvidia titan X graphics card or a 1080Ti graphics card.
  • the memory 702 can be implemented in the form of ROM (Read Only Memory, read-only memory), RAM (Random Access Memory, random access memory), static storage device, dynamic storage device, etc.
  • the memory 702 can store an operating system and other application programs. When implementing the technical solutions provided by the embodiments of this specification through software or firmware, the relevant program codes are stored in the memory 702 and invoked by the processor 701 for execution.
  • the input/output interface 703 is used to connect the input/output module to realize information input and output.
  • the input/output/module can be configured in the device as a component (not shown in the figure), or can be externally connected to the device to provide corresponding functions.
  • the input device may include a keyboard, mouse, touch screen, microphone, various sensors, etc.
  • the output device may include a display, a speaker, a vibrator, an indicator light, and the like.
  • the communication interface 704 is used to connect with a communication module (not shown in the figure), so as to realize communication interaction between the device and other devices.
  • the communication module can realize communication through wired means (such as USB, network cable, etc.), and can also realize communication through wireless means (such as mobile network, WIFI, Bluetooth, etc.).
  • Bus 705 includes a path for transferring information between the various components of the device (eg, processor 701, memory 702, input/output interface 703, and communication interface 704).
  • the above device only shows the processor 701, the memory 702, the input/output interface 703, the communication interface 704, and the bus 705, in the specific implementation process, the device may also include other components.
  • the above-mentioned device may only include components necessary to implement the solutions of the embodiments of this specification, and does not necessarily include all the components shown in the figure.
  • An embodiment of the present disclosure further provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the method described in any one of the foregoing embodiments is implemented.
  • Computer-readable media including both permanent and non-permanent, removable and non-removable media, can be implemented by any method or technology for storage of information.
  • Information may be computer readable instructions, data structures, modules of a program, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Flash memory or other memory technology, Compact Disc Read-Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cartridge, tape magnetic disk storage or other magnetic storage device or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
  • computer-readable media excludes transitory computer-readable media, such as modulated data signals and carrier waves.
  • a typical implementing device is a computer, which may take the form of a personal computer, laptop computer, cellular phone, camera phone, smart phone, personal digital assistant, media player, navigation device, e-mail device, game control device, etc. desktops, tablets, wearables, or any combination of these.
  • each embodiment in this specification is described in a progressive manner, the same and similar parts of each embodiment can be referred to each other, and each embodiment focuses on the differences from other embodiments.
  • the description is relatively simple, and for relevant parts, please refer to part of the description of the method embodiment.
  • the device embodiments described above are only illustrative, and the modules described as separate components may or may not be physically separated, and the functions of each module may be integrated in the same or multiple software and/or hardware implementations. Part or all of the modules can also be selected according to actual needs to achieve the purpose of the solution of this embodiment. It can be understood and implemented by those skilled in the art without creative effort.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Software Systems (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)
  • Image Generation (AREA)
PCT/CN2022/075636 2021-05-10 2022-02-09 三维重建方法、装置和系统、介质及计算机设备 WO2022237249A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
KR1020237014677A KR20230078777A (ko) 2021-05-10 2022-02-09 3차원 재구성 방법, 장치와 시스템, 매체 및 컴퓨터 기기
JP2023525021A JP2023547888A (ja) 2021-05-10 2022-02-09 三次元再構成方法、装置、システム、媒体及びコンピュータデバイス

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110506464.XA CN113160418A (zh) 2021-05-10 2021-05-10 三维重建方法、装置和系统、介质及计算机设备
CN202110506464.X 2021-05-10

Publications (1)

Publication Number Publication Date
WO2022237249A1 true WO2022237249A1 (zh) 2022-11-17

Family

ID=76874172

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/075636 WO2022237249A1 (zh) 2021-05-10 2022-02-09 三维重建方法、装置和系统、介质及计算机设备

Country Status (5)

Country Link
JP (1) JP2023547888A (ja)
KR (1) KR20230078777A (ja)
CN (1) CN113160418A (ja)
TW (1) TW202244853A (ja)
WO (1) WO2022237249A1 (ja)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116030189A (zh) * 2022-12-20 2023-04-28 中国科学院空天信息创新研究院 一种基于单视角遥感图像的目标三维重建方法
KR20240078717A (ko) * 2022-11-28 2024-06-04 주식회사 인공지능연구원 다시점 카메라 기반 다관절 객체의 입체영상캡쳐 장치 및 방법

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113160418A (zh) * 2021-05-10 2021-07-23 上海商汤智能科技有限公司 三维重建方法、装置和系统、介质及计算机设备
CN113724378B (zh) * 2021-11-02 2022-02-25 北京市商汤科技开发有限公司 三维建模方法和装置、计算机可读存储介质及计算机设备
KR20230087750A (ko) 2021-12-10 2023-06-19 삼성전자주식회사 3차원 모델링 장치 및 방법
CN115375856B (zh) * 2022-10-25 2023-02-07 杭州华橙软件技术有限公司 三维重建方法、设备以及存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109840939A (zh) * 2019-01-08 2019-06-04 北京达佳互联信息技术有限公司 三维重建方法、装置、电子设备及存储介质
CN110288696A (zh) * 2019-06-13 2019-09-27 南京航空航天大学 一种完备一致生物体三维特征表征模型的建立方法
CN111862299A (zh) * 2020-06-15 2020-10-30 上海非夕机器人科技有限公司 人体三维模型构建方法、装置、机器人和存储介质
CN112037320A (zh) * 2020-09-01 2020-12-04 腾讯科技(深圳)有限公司 一种图像处理方法、装置、设备以及计算机可读存储介质
CN112419454A (zh) * 2020-11-25 2021-02-26 北京市商汤科技开发有限公司 一种人脸重建方法、装置、计算机设备及存储介质
CN112509144A (zh) * 2020-12-09 2021-03-16 深圳云天励飞技术股份有限公司 人脸图像处理方法、装置、电子设备及存储介质
CN113160418A (zh) * 2021-05-10 2021-07-23 上海商汤智能科技有限公司 三维重建方法、装置和系统、介质及计算机设备

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7925049B2 (en) * 2006-08-15 2011-04-12 Sri International Stereo-based visual odometry method and system
US9189886B2 (en) * 2008-08-15 2015-11-17 Brown University Method and apparatus for estimating body shape
CN103236082B (zh) * 2013-04-27 2015-12-02 南京邮电大学 面向捕获静止场景的二维视频的准三维重建方法
DE102015208929B3 (de) * 2015-05-13 2016-06-09 Friedrich-Alexander-Universität Erlangen-Nürnberg Verfahren zur 2D-3D-Registrierung, Recheneinrichtung und Computerprogramm
US9928663B2 (en) * 2015-07-27 2018-03-27 Technische Universiteit Delft Skeletal joint optimization for linear blend skinning deformations utilizing skeletal pose sampling
US10546433B2 (en) * 2017-08-03 2020-01-28 The University Of North Carolina At Chapel Hill Methods, systems, and computer readable media for modeling garments using single view images
CN107945269A (zh) * 2017-12-26 2018-04-20 清华大学 基于多视点视频的复杂动态人体对象三维重建方法及系统
CN110298916B (zh) * 2019-06-21 2022-07-01 湖南大学 一种基于合成深度数据的三维人体重建方法
CN112136137A (zh) * 2019-10-29 2020-12-25 深圳市大疆创新科技有限公司 一种参数优化方法、装置及控制设备、飞行器
CN111383333B (zh) * 2020-04-02 2024-02-20 西安因诺航空科技有限公司 一种分段式sfm三维重建方法

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109840939A (zh) * 2019-01-08 2019-06-04 北京达佳互联信息技术有限公司 三维重建方法、装置、电子设备及存储介质
CN110288696A (zh) * 2019-06-13 2019-09-27 南京航空航天大学 一种完备一致生物体三维特征表征模型的建立方法
CN111862299A (zh) * 2020-06-15 2020-10-30 上海非夕机器人科技有限公司 人体三维模型构建方法、装置、机器人和存储介质
CN112037320A (zh) * 2020-09-01 2020-12-04 腾讯科技(深圳)有限公司 一种图像处理方法、装置、设备以及计算机可读存储介质
CN112419454A (zh) * 2020-11-25 2021-02-26 北京市商汤科技开发有限公司 一种人脸重建方法、装置、计算机设备及存储介质
CN112509144A (zh) * 2020-12-09 2021-03-16 深圳云天励飞技术股份有限公司 人脸图像处理方法、装置、电子设备及存储介质
CN113160418A (zh) * 2021-05-10 2021-07-23 上海商汤智能科技有限公司 三维重建方法、装置和系统、介质及计算机设备

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20240078717A (ko) * 2022-11-28 2024-06-04 주식회사 인공지능연구원 다시점 카메라 기반 다관절 객체의 입체영상캡쳐 장치 및 방법
KR102705610B1 (ko) 2022-11-28 2024-09-11 주식회사 인공지능연구원 다시점 카메라 기반 다관절 객체의 입체영상캡쳐 장치 및 방법
CN116030189A (zh) * 2022-12-20 2023-04-28 中国科学院空天信息创新研究院 一种基于单视角遥感图像的目标三维重建方法
CN116030189B (zh) * 2022-12-20 2023-07-04 中国科学院空天信息创新研究院 一种基于单视角遥感图像的目标三维重建方法

Also Published As

Publication number Publication date
KR20230078777A (ko) 2023-06-02
CN113160418A (zh) 2021-07-23
TW202244853A (zh) 2022-11-16
JP2023547888A (ja) 2023-11-14

Similar Documents

Publication Publication Date Title
WO2022237249A1 (zh) 三维重建方法、装置和系统、介质及计算机设备
US11238606B2 (en) Method and system for performing simultaneous localization and mapping using convolutional image transformation
US10846903B2 (en) Single shot capture to animated VR avatar
CN113012282B (zh) 三维人体重建方法、装置、设备及存储介质
WO2022001236A1 (zh) 三维模型生成方法、装置、计算机设备及存储介质
CN110264509A (zh) 确定图像捕捉设备的位姿的方法、装置及其存储介质
WO2023071964A1 (zh) 数据处理方法, 装置, 电子设备及计算机可读存储介质
WO2022205762A1 (zh) 三维人体重建方法、装置、设备及存储介质
WO2023109753A1 (zh) 虚拟角色的动画生成方法及装置、存储介质、终端
JP7387202B2 (ja) 3次元顔モデル生成方法、装置、コンピュータデバイス及びコンピュータプログラム
KR20160098560A (ko) 동작 분석 장치 및 방법
CN111710035B (zh) 人脸重建方法、装置、计算机设备及存储介质
CN109242950A (zh) 多人紧密交互场景下的多视角人体动态三维重建方法
WO2023071790A1 (zh) 目标对象的姿态检测方法、装置、设备及存储介质
CN115496864B (zh) 模型构建方法、重建方法、装置、电子设备及存储介质
KR20220149717A (ko) 단안 카메라로부터 전체 골격 3d 포즈 복구
KR20230071588A (ko) 디오라마 적용을 위한 다수 참여 증강현실 콘텐츠 제공 장치 및 그 방법
CN116342782A (zh) 生成虚拟形象渲染模型的方法和装置
US20230290101A1 (en) Data processing method and apparatus, electronic device, and computer-readable storage medium
CN115775300B (zh) 人体模型的重建方法、人体重建模型的训练方法及装置
WO2023160074A1 (zh) 一种图像生成方法、装置、电子设备以及存储介质
KR20240128015A (ko) 실시간 의복 교환
CN110689602A (zh) 三维人脸重建方法、装置、终端及计算机可读存储介质
US12051168B2 (en) Avatar generation based on driving views
CN116229008B (zh) 图像处理方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22806226

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2023525021

Country of ref document: JP

ENP Entry into the national phase

Ref document number: 20237014677

Country of ref document: KR

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22806226

Country of ref document: EP

Kind code of ref document: A1