WO2023273093A1 - 一种人体三维模型获取方法、装置、智能终端及存储介质 - Google Patents

一种人体三维模型获取方法、装置、智能终端及存储介质 Download PDF

Info

Publication number
WO2023273093A1
WO2023273093A1 PCT/CN2021/130104 CN2021130104W WO2023273093A1 WO 2023273093 A1 WO2023273093 A1 WO 2023273093A1 CN 2021130104 W CN2021130104 W CN 2021130104W WO 2023273093 A1 WO2023273093 A1 WO 2023273093A1
Authority
WO
WIPO (PCT)
Prior art keywords
human body
dimensional
joint points
depth
coordinate information
Prior art date
Application number
PCT/CN2021/130104
Other languages
English (en)
French (fr)
Inventor
张敏
潘哲
钱贝贝
王飞
Original Assignee
奥比中光科技集团股份有限公司
深圳奥芯微视科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 奥比中光科技集团股份有限公司, 深圳奥芯微视科技有限公司 filed Critical 奥比中光科技集团股份有限公司
Publication of WO2023273093A1 publication Critical patent/WO2023273093A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • G06T2207/10012Stereo images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Definitions

  • the present invention relates to the technical field of image processing, in particular to a method, device, intelligent terminal and storage medium for obtaining a three-dimensional model of a human body.
  • the 3D model of the human body is very important for describing the posture of the human body and predicting the behavior of the human body.
  • the 3D human body model has been widely used in various fields, such as abnormal behavior monitoring, automatic driving and monitoring and other fields.
  • the reconstruction effect of human body three-dimensional model has gradually improved.
  • color images are usually used to obtain a three-dimensional model of a human body through a convolutional neural network.
  • the problem of the prior art is that the color image cannot provide effective three-dimensional space information, so that the obtained three-dimensional model of the human body has a low accuracy rate and cannot accurately reflect the three-dimensional posture of the human body.
  • the main purpose of the present invention is to provide a method, device, intelligent terminal and storage medium for obtaining a three-dimensional model of the human body, aiming at solving the problem of using color images in the prior art to obtain a three-dimensional model of the human body through a convolutional neural network, and the obtained three-dimensional model of the human body is accurate. low rate problem.
  • the first aspect of the present invention provides a method for acquiring a three-dimensional model of a human body, wherein the above method includes:
  • the acquisition of the color image and the depth image corresponding to the above color image include:
  • Aligning the depth image to be processed with the color image is used as a depth image corresponding to the color image.
  • the above-mentioned two-dimensional coordinate information of human body joint points and human body segmentation regions are obtained based on the above-mentioned color images, including:
  • the target single-person pose estimation frame is obtained through the human pose estimation algorithm
  • the above-mentioned two-dimensional coordinate information of the joint points of the human body and the above-mentioned human body segmentation area are obtained based on the above-mentioned target single-person pose estimation framework.
  • the above-mentioned two-dimensional coordinate information of the above-mentioned human body joint points and the above-mentioned human body segmentation area are obtained based on the above-mentioned target single-person pose estimation framework, including:
  • a plurality of human body segmentation regions are obtained based on the pedestrian detection frame and each of the human body joint points, wherein each of the above human body segmentation regions is a human body region obtained by dividing the human body edge contour based on each of the above human body joint points.
  • the above preset loss function is used to iteratively fit all the above three-dimensional coordinate information of the joint points of the human body and all the above human body segmentation depth regions to obtain the three-dimensional model of the human body, including:
  • the three-dimensional model of the human body is obtained based on the position information of each of the above-mentioned target human body joint points and each target point cloud, wherein the above-mentioned target point cloud includes point cloud three-dimensional coordinates of points in the human body segmentation depth region corresponding to the above-mentioned target human body joint points.
  • the preset loss functions include a reprojection loss function, a three-dimensional joint point loss function, an angle loss function, and a surface point depth loss function.
  • the above method further includes:
  • the three-dimensional skeleton points of the human body are obtained based on the above-mentioned three-dimensional model of the human body.
  • the second aspect of the present invention provides a device for obtaining a three-dimensional model of a human body, wherein the device includes:
  • An image acquisition module configured to acquire a color image and a depth image corresponding to the color image
  • the human body segmentation area acquisition module is used to acquire the two-dimensional coordinate information of the human body joint points and the human body segmentation area based on the above-mentioned color image;
  • the human body segmentation depth area acquisition module is used to obtain the three-dimensional coordinate information of human body joint points corresponding to the two-dimensional coordinate information of each of the above-mentioned human body joint points and the human body segmentation depth area corresponding to each of the above-mentioned human body segmentation areas based on the above-mentioned depth image;
  • the three-dimensional model reconstruction module of the human body is used to iteratively fit all the above-mentioned three-dimensional coordinate information of the joint points of the human body and all the above-mentioned segmentation depth regions of the human body based on a preset loss function to obtain a three-dimensional model of the human body.
  • the third aspect of the present invention provides an intelligent terminal.
  • the above-mentioned intelligent terminal includes a memory, a processor, and a human body three-dimensional model acquisition program that is stored in the above-mentioned memory and can run on the above-mentioned processor.
  • the above-mentioned human body three-dimensional model acquisition program is executed by the above-mentioned processor During execution, the steps of any one of the methods for obtaining a three-dimensional human body model described above are realized.
  • the fourth aspect of the present invention provides a computer-readable storage medium.
  • the computer-readable storage medium stores a human body three-dimensional model acquisition program.
  • the human body three-dimensional model acquisition program is executed by a processor, any one of the above-mentioned human body three-dimensional model acquisition methods can be realized. A step of.
  • the scheme of the present invention acquires a color image and a depth image corresponding to the above-mentioned color image; obtains two-dimensional coordinate information of human body joint points and a human body segmentation area based on the above-mentioned color image; The three-dimensional coordinate information of the human body joint points corresponding to the two-dimensional coordinate information, and the human body segmentation depth area corresponding to each of the above-mentioned human body segmentation areas; based on the preset loss function, all the above-mentioned three-dimensional coordinate information of the human body joint points and all the above-mentioned human body segmentation depth areas are processed Iterative fitting to obtain a 3D model of the human body.
  • the scheme of the present invention combines the depth image that can provide the corresponding three-dimensional space information of the human body to obtain the three-dimensional human body model, which is conducive to improving the accuracy of the obtained three-dimensional human body model.
  • the obtained three-dimensional model of the human body can better reflect the three-dimensional posture of the human body.
  • FIG. 1 is a schematic flowchart of a method for acquiring a human body three-dimensional model provided by an embodiment of the present invention
  • FIG. 2 is a schematic flow chart of the present invention implementing step S100 in FIG. 1;
  • FIG. 3 is a schematic flow chart of the present invention implementing step S200 in FIG. 1;
  • FIG. 4 is a schematic flow chart of the present invention implementing step S203 in FIG. 3;
  • Fig. 5 is a schematic diagram of a target single-person pose estimation framework provided by an embodiment of the present invention.
  • Fig. 6 is a schematic diagram of a human body segmentation region provided by an embodiment of the present invention.
  • FIG. 7 is a schematic flow chart of the present invention implementing step S400 in FIG. 1;
  • Fig. 8 is a schematic flowchart of another method for acquiring a human body three-dimensional model provided by an embodiment of the present invention.
  • Fig. 9 is a schematic structural diagram of a human body three-dimensional model acquisition device provided by an embodiment of the present invention.
  • Fig. 10 is a functional block diagram of an internal structure of a smart terminal provided by an embodiment of the present invention.
  • the term “if” may be construed as “when” or “once” or “in response to determining” or “in response to detecting” depending on the context.
  • the phrases “if determined” or “if detected [the described condition or event]” may be construed, depending on the context, to mean “once determined” or “in response to the determination” or “once detected [the described condition or event]” event]” or “in response to detection of [described condition or event]”.
  • the 3D model of the human body is very important for describing the posture of the human body and predicting the behavior of the human body.
  • the 3D human body model has been widely used in various fields, such as abnormal behavior monitoring, automatic driving and monitoring and other fields.
  • the reconstruction effect of human body three-dimensional model has gradually improved.
  • color images are usually used to obtain a three-dimensional model of a human body through a convolutional neural network.
  • the problem of the prior art is that the color image cannot provide effective three-dimensional space information, so that the obtained three-dimensional model of the human body has a low accuracy rate and cannot accurately reflect the three-dimensional posture of the human body.
  • the obtained three-dimensional human body model cannot be applied to scenes with high requirements such as human-computer interaction, which limits the application of the three-dimensional human body model.
  • the solution of the present invention obtains a color image and a depth image corresponding to the above color image; obtains two-dimensional coordinate information of human body joint points and a human body segmentation area based on the above color image; The three-dimensional coordinate information of the human body joint points corresponding to the two-dimensional coordinate information of the above-mentioned human body joint points, and the human body segmentation depth area corresponding to each of the above-mentioned human body segmentation regions; Segment the depth region for iterative fitting to obtain a 3D model of the human body.
  • the scheme of the present invention combines the depth image that can provide the corresponding three-dimensional space information of the human body to obtain the three-dimensional human body model, which is conducive to improving the accuracy of the obtained three-dimensional human body model.
  • the obtained three-dimensional model of the human body can better reflect the three-dimensional posture of the human body.
  • an embodiment of the present invention provides a method for acquiring a three-dimensional model of a human body. Specifically, the above method includes the following steps:
  • Step S100 acquiring a color image and a depth image corresponding to the color image.
  • the above-mentioned color image and the above-mentioned depth image are images containing a target object
  • the target object is an object that needs to be reconstructed from a three-dimensional model of a human body.
  • the above-mentioned color image and depth image may include multiple target objects.
  • the existence of one target object is taken as an example for specific description.
  • the method in this embodiment may be used to respectively The three-dimensional model of the human body is reconstructed for each target object.
  • the depth image is an image in which depth information (distance) is used as a pixel value, and can provide effective three-dimensional space information corresponding to a target object, thereby improving the accuracy of the acquired three-dimensional human body model.
  • Step S200 acquiring two-dimensional coordinate information of joint points of the human body and segmented regions of the human body based on the above color image.
  • target detection and human body pose estimation can be performed on the target object in the above color image, and corresponding two-dimensional coordinate information of human body joint points and human body segmentation regions can be obtained.
  • the two-dimensional coordinate information of each human body joint point is the position coordinate of the human body joint point of the target object in the color image
  • the above human body segmentation area is a human body area obtained by dividing the human body edge contour based on each human body joint point.
  • Step S300 based on the above-mentioned depth image, respectively acquire the three-dimensional coordinate information of human body joint points corresponding to the two-dimensional coordinate information of each of the above-mentioned human body joint points, and the human body segmentation depth area corresponding to each of the above-mentioned human body segmentation areas.
  • the above-mentioned three-dimensional coordinate information of human body joint points is depth information corresponding to each of the above-mentioned two-dimensional coordinate information of human body joint points in the above-mentioned depth image
  • the above-mentioned human body segmentation depth area is an area in the above-mentioned depth image corresponding to each of the above-mentioned human body segmentation areas.
  • the joint points of the human body should be inside the human body, but the depth image cannot collect the depth information inside the human body. Therefore, in this embodiment, the three-dimensional coordinate information of the skin surface corresponding to each joint point of the human body is used as the three-dimensional coordinate information of the joint points of the human body. That is, the depth information in the depth image corresponding to the two-dimensional coordinate information of the above-mentioned human joint points is directly used as the three-dimensional coordinate information of the human joint points.
  • Step S400 iteratively fitting all the 3D coordinate information of the joint points of the human body and all the segmentation depth regions of the human body based on a preset loss function to obtain a 3D model of the human body.
  • the method for acquiring a three-dimensional human body model acquires a color image and a depth image corresponding to the above-mentioned color image; based on the above-mentioned color image, two-dimensional coordinate information of human body joint points and a human body segmentation area are obtained; based on the above-mentioned depth image, Obtain the three-dimensional coordinate information of human body joint points corresponding to the two-dimensional coordinate information of each of the above-mentioned human body joint points, and the human body segmentation depth area corresponding to each of the above-mentioned human body segmentation regions; Perform iterative fitting with all the above-mentioned human body segmentation depth regions to obtain a 3D model of the human body.
  • the scheme of the present invention combines the depth image that can provide the corresponding three-dimensional space information of the human body to obtain the three-dimensional human body model, which is conducive to improving the accuracy of the obtained three-dimensional human body model.
  • the obtained three-dimensional model of the human body can better reflect the three-dimensional posture of the human body.
  • the video stream may also be processed based on the above method for obtaining a three-dimensional human body model, so as to obtain a three-dimensional human body model in the video stream.
  • the video stream to be processed is obtained, and the video stream to be processed includes color images and depth images of multiple consecutive frames that are frame-synchronized and aligned.
  • the processing from step S100 to step S400 above perform the processing from step S100 to step S400 above to obtain the three-dimensional human body model of each frame.
  • each frame can be processed in parallel or sequentially, which will not be described here. Be specific.
  • a smoothing loss function can also be set in the preset loss function to ensure that the 3D human body model fitted by the upper and lower frames is as smooth as possible, and by calculating the L2 loss of the joint points in the 3D human body model fitted by the upper and lower frames, avoid The visual effect is affected due to the large joint point jump between frames.
  • a frame of color image and its corresponding depth image are taken as an example for specific description, but no specific limitation is made.
  • the above step S100 includes:
  • Step S101 acquiring the color image collected by the acquisition device and the depth image to be processed synchronously with the color image.
  • Step S102 aligning the depth image to be processed with the color image as a depth image corresponding to the color image.
  • the acquisition device may include at least one depth camera and at least one color camera. Further, the above acquisition device may also include other components, such as corresponding camera fixing components, lighting sources, etc., which may be set and adjusted according to actual needs. In another application scenario, the above acquisition device may also be a binocular camera or a multi-eye camera, which is not specifically limited here.
  • the above-mentioned depth camera and the above-mentioned color camera are controlled to perform synchronous shooting, so as to obtain a synchronized color image and a depth image to be processed. The method of synchronous control can be set according to actual needs.
  • the timing can be set through the controller or other control devices, so as to realize the synchronous control of the depth camera and the color camera, and the synchronous control of the color camera
  • the multi-frame frame-synchronized color image and the depth image to be processed are respectively continuously collected by the depth camera.
  • the processing in this embodiment is performed on each frame of image respectively, and the corresponding human body in each frame of image is obtained. 3D model.
  • the depth image to be processed is directly acquired by the depth camera, and is synchronized with the color image frame but not aligned, and the depth image corresponding to the color image is obtained by aligning the depth image to be processed with the color image.
  • the depth image is an image with depth information (distance) as the pixel value, and the pixel value of a certain point in the depth image is the distance from the point to the plane where the acquisition module (such as the acquisition module composed of the above-mentioned depth camera and color camera) is located. distance.
  • the illumination source projects a structured light beam to the target area
  • the acquisition module receives the beam reflected by the target area and forms an electrical signal, which is then transmitted to the processor.
  • the processor processes the electrical signal, calculates the intensity information reflecting the light beam to form a structured light pattern, and finally performs matching calculation or triangulation calculation based on the structured light pattern to obtain a depth image to be processed.
  • the illumination source projects an infrared beam to the target area, and the acquisition module receives the beam reflected by the target area and forms an electrical signal, which is then transmitted to the processor.
  • the processor processes the electrical signal to calculate the phase difference, and based on the phase difference, indirectly calculates the time-of-flight of the light beam emitted from the illumination source to received by the camera. Further, a depth image is acquired based on the time-of-flight calculation.
  • the above-mentioned infrared light beam may include pulse type and/or continuous wave type, which is not limited here.
  • the illumination source projects an infrared pulse beam to the target area, and the acquisition module receives the beam reflected by the target area and forms an electrical signal, which is transmitted to the processor.
  • the processor counts the electrical signals to obtain the waveform histogram, and directly calculates the time-of-flight of the light beam from the illumination source to the camera according to the waveform histogram, and obtains the depth image based on the time-of-flight calculation.
  • the above-mentioned depth camera and color camera are calibrated in advance, and the internal and external parameters of the depth camera and color camera are obtained respectively.
  • the conversion relationship of the pixel coordinate system makes the depth image to be processed correspond to the pixels on the color image one by one, thereby realizing the alignment of the depth image to be processed and the color image.
  • the internal and external parameters of the camera include the internal parameters of the camera and the external parameters of the camera.
  • the internal parameters of the camera are parameters related to the characteristics of the camera itself, such as focal length, pixel size, etc.
  • the external parameters of the camera are parameters in the world coordinate system, such as the position of the camera , direction of rotation, etc.
  • the above step S200 includes:
  • Step S201 performing target detection on the above color image to obtain a pedestrian detection frame.
  • Step S202 based on the above-mentioned pedestrian detection frame, obtain a target single-person pose estimation frame through a human pose estimation algorithm.
  • Step S203 acquiring the above-mentioned two-dimensional coordinate information of the joint points of the human body and the above-mentioned human body segmentation region based on the above-mentioned target single-person pose estimation framework.
  • a target detection algorithm may be used to perform target detection on the above color image to obtain a pedestrian detection frame.
  • specific target detection algorithms and human body pose estimation algorithms can be selected and adjusted according to actual needs, and are not specifically limited here.
  • the above-mentioned human body pose estimation algorithm can be an alphapose 2D model algorithm, preferably, the RMPE pose estimation model is used in the alphapose algorithm to perform human body pose estimation.
  • the RMPE attitude estimation model includes a symmetric space transformation network unit (SSTN, Symmetric Spatial Transformer Network), a parameterized attitude maximum suppression unit (NMS, Parametric Pose NonMaximum-Suppression) and a attitude guidance area generation unit (PGPG, Pose- Guided Proposals Generator).
  • the above-mentioned symmetric space transformation network unit is used to obtain the single-person pose estimation frame based on the pedestrian detection frame;
  • the parameterized pose maximum suppression unit is used to remove the redundant frame of the current single-person pose estimation framework by using the method of pose distance measurement, so as to Obtain the target single-person attitude estimation framework;
  • the attitude guidance area generation unit is used to generate new training samples according to the single-person attitude estimation framework and the target single-person attitude estimation framework, further train the RMPE attitude estimation model, and enhance the data to improve the performance of the model performance.
  • the above-mentioned RMPE pose estimation model can be used for both multi-person detection and single-person detection
  • the above-mentioned target single-person pose estimation framework is a pose estimation framework corresponding to a target object that needs to obtain a three-dimensional model of a human body.
  • the above-mentioned human body pose estimation algorithm may also be any one or more combinations of 2D model algorithms such as openpose and ppn, which are not limited here.
  • the above step S203 includes:
  • Step S2031 Acquire a plurality of human body joint points based on the above-mentioned target single-person pose estimation framework, and obtain corresponding two-dimensional coordinate information of the human body joint points. The location coordinates in the image.
  • Step S2032 obtaining a plurality of human body segmentation regions based on the pedestrian detection frame and each of the above human body joint points, wherein each of the above human body segmentation regions is a human body region obtained by dividing the human body edge contour based on each of the above human body joint points.
  • At least 15 human body joint points are obtained based on the above-mentioned target single-person pose estimation framework, and corresponding two-dimensional coordinate information of the human body joint points is obtained.
  • the two-dimensional information of each human body joint point is the position coordinate of each pixel point corresponding to each human body joint point in the color image.
  • the above-mentioned 15 human joint points are preferably head, neck, middle hip, left shoulder, left elbow, left wrist, right shoulder, right elbow, right wrist, left hip, left knee, left ankle, right hip, Right knee, right ankle, as shown in Figure 5.
  • specific human body joint points and the number of human body joint points can be set and adjusted according to actual needs, and are not specifically limited here.
  • FIG. 6 is a schematic diagram of a human body segmentation region provided by an embodiment of the present invention. As shown in FIG. 6 , 14 human body segmentation regions are obtained by dividing in this embodiment. Optionally, there may be other methods for obtaining human body segmentation regions, and the number of human body segmentation regions obtained by dividing can be set and adjusted according to actual needs, which is not specifically limited here.
  • the above-mentioned joint points of the human body, the two-dimensional information of the joint points of the human body, and the segmented regions of the human body are all the information of the human body in the color image, and the depth image can be obtained by using the alignment relationship between the color image and the depth image
  • the 3D information of the corresponding human body joint points and the depth area of the human body segmentation are obtained, so as to obtain the 3D space information corresponding to the target object, and then realize the reconstruction of the 3D model of the human body.
  • step S400 includes:
  • Step S401 acquiring the point cloud three-dimensional coordinates corresponding to each point in the human body segmentation depth region.
  • Step S402 iteratively fitting the above-mentioned human body joint points based on the above-mentioned loss function to obtain position information of the target human body joint points.
  • Step S403 acquiring a three-dimensional human body model based on the position information of each of the above-mentioned target human body joint points and each target point cloud, wherein the above-mentioned target point cloud includes the point cloud three-dimensional coordinates of each point in the human body segmentation depth region corresponding to the above-mentioned target human body joint points .
  • the point cloud three-dimensional coordinates corresponding to each point in the human body segmentation area in the depth image can be obtained by the following formula (1):
  • (x s , y s , z s ) are the three-dimensional coordinates of the point cloud to be obtained, that is, the three-dimensional coordinates of the point cloud of each point in the depth camera coordinate system.
  • z is the pixel value of each point on the depth image, that is, the depth (distance) corresponding to each point.
  • (u, v) is the pixel coordinates of each point in the depth image
  • (u 0 , v 0 ) is the coordinates of the principal point of the image
  • dx and dy are the physical dimensions of the sensor pixel of the depth camera in two directions
  • f ' is the focal length of the depth camera in millimeters.
  • the principal point of the image that is, the principal point of the image
  • the principal point of the image is the intersection point of the perpendicular line between the photographic center and the image plane and the image plane.
  • the parameterized human body model and the preset loss function are used to iteratively fit the joint points of the target human body and the point clouds corresponding to each point in each human body segmentation depth region to obtain a three-dimensional model of the human body. Specifically, in the process of iterative fitting to obtain the three-dimensional model of the human body, constraints are performed by a preset loss function.
  • the above parametric human body model is a pre-set model for reconstructing a three-dimensional human body model.
  • the above parametric human body model is preferably an SMPL model.
  • the traditional SMPL model is trained to obtain a three-dimensional human body model composed of 24 human body joints, 6890 vertices, and 13776 patches, which requires a large amount of calculation.
  • the position information of the target human body joint points is obtained by iteratively fitting the human body joint points through a plurality of preset loss functions, and further based on the above-mentioned target
  • the position information of the joint points is used to iteratively fit the point cloud 3D coordinates of each point in the corresponding human body segmentation depth area to obtain a 3D human body model, and the 3D human body model in the iterative process is constrained based on the loss function.
  • the amount of calculation can be reduced, and the efficiency of obtaining the three-dimensional model of the human body can be improved.
  • the above preset loss function includes one or more of a reprojection loss function, a three-dimensional joint point loss function, an angle loss function, and a surface point depth loss function.
  • the various preset loss functions include the above-mentioned reprojection loss function, three-dimensional joint point loss function, angle loss function, and surface point depth loss function.
  • step S402 iteratively fit the joint points of the human body based on the above-mentioned reprojection loss function, three-dimensional joint point loss function and angle loss function, and in step S403, based on the above-mentioned surface point depth loss function. Constraints, and iteratively fitting the position information of the target human joint points with each target point cloud to obtain a 3D model of the human body.
  • the above reprojection loss function is used to reflect the reprojection position loss between the obtained target human joint points projected onto a two-dimensional plane (color image plane) and the corresponding human joint points obtained in the plane.
  • the obtained 15 target human body joint points are projected onto the color image plane, and the two-dimensional pixel positions of each target human body joint points in the color image can be obtained, and the two-dimensional pixel position is calculated and used to identify the color image.
  • the GM (Geman-McClure) loss corresponding to the position of the human body joint point output by the two-dimensional graphic of the human body joint point is used as the above-mentioned reprojection loss function.
  • the above three-dimensional joint point loss function is used to reflect the loss of the three-dimensional distance between the obtained position of the target human body joint point and the corresponding human body joint point observed based on the depth image.
  • the depth corresponding to each human body joint can be obtained in the aligned depth image.
  • the observation coordinates of 15 human joint points in the camera coordinate system can be obtained, but due to the problem of occlusion or self-occlusion of human joint points , the observation coordinates of all human joint points cannot be obtained.
  • the observation coordinates obtained at this time are the three-dimensional positions of the surface skin corresponding to the joint points of the human body, not the three-dimensional coordinates corresponding to the actual joint points in the human skeleton. Therefore, here only the distance between the effective observed 3D coordinate point positions of the human body joint points and the corresponding target human body joint points in the reconstructed 3D model of the human body is calculated. If the distance is greater than the set threshold (can be set according to actual needs and adjustment), then calculate the GM loss as the above-mentioned 3D joint point loss function, otherwise it is considered that the position of the target human joint point in the 3D skeleton is reasonable, and the 3D joint point loss is recorded as 0.
  • the set threshold can be set according to actual needs and adjustment
  • angle loss function is used to constrain the angles between the joint points of each target human body.
  • the movement angle of the human body joints is limited by the anatomical structure of the human body. For example, it is unreasonable to rotate the upper body backward 180 degrees while the lower body remains stationary. Therefore, during the fitting process, angle constraints are applied to each joint point to achieve the effect of accelerating convergence and avoiding the deformity of the fitted target human joint points. Specifically, set the corresponding joint point angle range for each joint point in advance, and judge whether the currently fitted target human body joint point angle is within the corresponding joint point angle range, and calculate the square loss of the excess part if it exceeds the range, as Angle loss function, if not exceeded, the angle loss is recorded as 0.
  • the above-mentioned surface point depth loss function is used to constrain the depth value loss of the surface point cloud of each region of the human body three-dimensional model obtained by fitting each iteration.
  • the surface point depth loss is the GM loss between the standard depth value in the depth direction of the surface point cloud of each region of the human body three-dimensional model and the value of the pixel point converted into the depth image.
  • 6890 vertices of the SMPL model are divided into 14 regions corresponding to the above-mentioned human body segmentation regions in advance.
  • the surface points of the 14 regions of the SMPL model and the depth values of the 14 human body segmentation depth regions segmented from the depth image are calculated.
  • loss Specifically, taking the surface loss calculation of the right thigh area as an example, all point clouds of the right thigh area can be obtained from the 3D human body model obtained by SMPL model fitting, and the normal vector of the point cloud can be obtained through the connection relationship of the patches. According to the normal vector direction of the point cloud, the surface point cloud of the right thigh facing the camera can be obtained.
  • the preset loss function may also include a smoothing loss function, so as to ensure that the 3D human body model fitted between the upper and lower frames is as smooth as possible. Specifically, calculate the L2 loss of the target human body joint points fitted by the upper and lower frames, and use it as the above smooth loss function to avoid affecting the visual effect due to large joint point position jumps between frames.
  • the above-mentioned loss functions are combined and summed to obtain the value of the sum of the loss functions, and compared with the preset threshold range (which can be set and adjusted according to actual needs), if the sum of the loss functions is If the value is not within the preset threshold range, continue to iteratively fit the target point cloud of the human body joint points and the corresponding human body segmentation depth area to obtain a new human body 3D model until the value of the sum of the loss functions is within the preset threshold range.
  • the combined summation of the above loss functions may be direct summation or summation according to weight distribution, which is not specifically limited here.
  • the above loss functions may be GM loss, L1 loss, L2 loss or other loss functions, which are not specifically limited here.
  • the method further includes: step S500 , obtaining three-dimensional skeleton points of the human body based on the three-dimensional model of the human body.
  • the iteratively fitted three-dimensional model of the human body is used to further calculate and obtain the three-dimensional skeleton points of the human body.
  • the reconstruction effect of the three-dimensional human body model after iterative fitting is equivalent to that of an ideal three-dimensional human body model, and based on this, further calculation and acquisition of three-dimensional human body skeleton points can improve the accuracy of the three-dimensional human body skeleton points.
  • the method of obtaining the three-dimensional skeleton points of the human body by using the iterated three-dimensional human body model may be to directly obtain the coordinate information of the three-dimensional human skeleton points used when obtaining the final optimal three-dimensional human body model in the iterative fitting process.
  • an embodiment of the present invention also provides a device for obtaining a three-dimensional human body model.
  • the above-mentioned device for obtaining a three-dimensional human body model includes:
  • the image acquiring module 610 is configured to acquire a color image and a depth image corresponding to the color image.
  • the above-mentioned color image and the above-mentioned depth image are images containing a target object
  • the target object is an object that needs to be reconstructed from a three-dimensional model of a human body.
  • the above-mentioned color image and depth image may include multiple target objects.
  • the existence of one target object is taken as an example for specific description.
  • the device in this embodiment may be used to respectively The three-dimensional model of the human body is reconstructed for each target object.
  • the human body segmentation area acquisition module 620 is configured to acquire the two-dimensional coordinate information of the human body joint points and the human body segmentation area based on the above color image.
  • target detection and human body pose estimation can be performed on the target object in the above color image, and corresponding two-dimensional coordinate information of human body joint points and human body segmentation regions can be obtained.
  • the two-dimensional coordinate information of each human body joint point is the position coordinate of the human body joint point of the target object in the color image
  • the above human body segmentation area is a human body area obtained by dividing the human body edge contour based on each human body joint point.
  • the human body segmentation depth area acquisition module 630 is configured to acquire the three-dimensional coordinate information of human body joint points corresponding to the two-dimensional coordinate information of each of the above-mentioned human body joint points and the human body segmentation depth area corresponding to each of the above-mentioned human body segmentation areas based on the above-mentioned depth image.
  • the above-mentioned three-dimensional coordinate information of human body joint points is depth information corresponding to each of the above-mentioned two-dimensional coordinate information of human body joint points in the above-mentioned depth image
  • the above-mentioned human body segmentation depth area is an area in the above-mentioned depth image corresponding to each of the above-mentioned human body segmentation areas.
  • the joint points of the human body should be inside the human body, but the depth image cannot collect the depth information inside the human body. Therefore, in this embodiment, the three-dimensional coordinate information of the skin surface corresponding to each joint point of the human body is used as the three-dimensional coordinate information of the joint points of the human body. That is, the depth information in the depth image corresponding to the two-dimensional coordinate information of the above-mentioned human joint points is directly used as the three-dimensional coordinate information of the human joint points.
  • the human body three-dimensional model reconstruction module 640 is configured to iteratively fit all the above-mentioned three-dimensional coordinate information of human body joint points and all the above-mentioned human body segmentation depth regions based on a preset loss function to obtain a three-dimensional human body model.
  • the device for obtaining a three-dimensional human body model obtains a color image and a depth image corresponding to the above-mentioned color image through the image acquisition module 610; Coordinate information and human body segmentation area; through the human body segmentation depth area acquisition module 630, based on the above-mentioned depth image, respectively acquire the three-dimensional coordinate information of human body joint points corresponding to the two-dimensional coordinate information of each of the above-mentioned human body joint points, and the corresponding to each of the above-mentioned human body segmentation areas Human body segmentation depth area: the human body three-dimensional model reconstruction module 640 performs iterative fitting on all the above-mentioned three-dimensional coordinate information of human body joint points and all the above-mentioned human body segmentation depth areas based on a preset loss function to obtain a three-dimensional human body model.
  • the scheme of the present invention combines the depth image that can provide the corresponding three-dimensional space information of the human body to obtain the three-dimensional human body model, which is conducive to improving the accuracy of the obtained three-dimensional human body model.
  • the obtained three-dimensional model of the human body can better reflect the three-dimensional posture of the human body.
  • the video stream may also be processed based on the above-mentioned apparatus for acquiring a three-dimensional human body model, so as to obtain a three-dimensional human body model in the video stream.
  • the video stream to be processed is obtained, and the video stream to be processed includes color images and depth images of multiple consecutive frames that are frame-synchronized and aligned.
  • the video stream to be processed includes color images and depth images of multiple consecutive frames that are frame-synchronized and aligned.
  • For each frame of synchronous and aligned color image and depth image respectively process based on the above-mentioned three-dimensional human body model acquisition device to obtain the three-dimensional human body model of each frame.
  • each frame can be processed in parallel or sequentially, which will not be described here. Be specific.
  • a frame of color image and its corresponding depth image are taken as an example for specific description, but no specific limitation is made.
  • the specific functions of the above-mentioned human body three-dimensional model acquisition device and its modules can also refer to the corresponding description in the above-mentioned human body three-dimensional model acquisition method, and will not be repeated here.
  • the present invention also provides an intelligent terminal, the functional block diagram of which may be shown in FIG. 10 .
  • the above intelligent terminal includes a processor, a memory, a network interface and a display screen connected through a system bus.
  • the processor of the smart terminal is used to provide calculation and control capabilities.
  • the memory of the smart terminal includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system and a human body three-dimensional model acquisition program.
  • the internal memory provides an environment for the operation of the operating system and the human body three-dimensional model acquisition program in the non-volatile storage medium.
  • the network interface of the smart terminal is used to communicate with external terminals through a network connection. When the human body three-dimensional model acquisition program is executed by the processor, the steps of any one of the above-mentioned human body three-dimensional model acquisition methods are realized.
  • the display screen of the smart terminal may be a liquid crystal display screen or an electronic ink display screen.
  • FIG. 10 is only a block diagram of a part of the structure related to the solution of the present invention, and does not constitute a limitation on the smart terminal to which the solution of the present invention is applied.
  • the specific smart terminal More or fewer components than shown in the figures may be included, or certain components may be combined, or have a different arrangement of components.
  • an intelligent terminal includes a memory, a processor, and a human body three-dimensional model acquisition program stored on the above-mentioned memory and operable on the above-mentioned processor.
  • the above-mentioned human body three-dimensional model acquisition program is obtained by the above-mentioned
  • the processor executes the following operation instructions:
  • the three-dimensional coordinate information of the human body joint points corresponding to the two-dimensional coordinate information of each of the above-mentioned human body joint points, and the human body segmentation depth area corresponding to each of the above-mentioned human body segmentation areas are respectively obtained;
  • the embodiment of the present invention also provides a computer-readable storage medium.
  • the above-mentioned computer-readable storage medium stores a program for obtaining a three-dimensional model of a human body.
  • the steps of the method for obtaining a three-dimensional model of a human body are not limited to.
  • the disclosed apparatus/terminal device and method may be implemented in other ways.
  • the device/terminal device embodiments described above are only illustrative.
  • the division of the above-mentioned modules or units is only a logical function division.
  • other division methods can be used, such as multiple units or Components may be combined or integrated into another system, or some features may be omitted, or not implemented.
  • the above-mentioned integrated modules/units are realized in the form of software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the present invention realizes all or part of the processes in the methods of the above-mentioned embodiments, and can also be completed by instructing related hardware through computer programs.
  • the above-mentioned computer programs can be stored in a computer-readable storage medium. When executed by a processor, the steps in the foregoing method embodiments can be realized.
  • the above-mentioned computer program includes computer program code, and the above-mentioned computer program code may be in the form of source code, object code, executable file or some intermediate form.
  • the above-mentioned computer-readable medium may include: any entity or device capable of carrying the above-mentioned computer program code, recording medium, U disk, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory), random Access memory (RAM, Random Access Memory), electrical carrier signal, telecommunication signal and software distribution medium, etc. It should be noted that the content contained in the above computer-readable storage medium can be appropriately increased or decreased according to the requirements of legislation and patent practice in the jurisdiction.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Multimedia (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

本发明公开了一种人体三维模型获取方法、装置、智能终端及存储介质,其中,上述人体三维模型获取方法包括:获取彩色图像以及与上述彩色图像对应的深度图像;基于上述彩色图像获取人体关节点二维坐标信息和人体分割区域;基于上述深度图像,分别获取与各上述人体关节点二维坐标信息对应的人体关节点三维坐标信息,以及与各上述人体分割区域对应的人体分割深度区域;基于预设的损失函数对所有上述人体关节点三维坐标信息和所有上述人体分割深度区域进行迭代拟合,获取人体三维模型。与现有技术相比,本发明方案有利于提高获得的人体三维模型的准确率,使得获得的人体三维模型能更好的反映人体三维姿态。

Description

一种人体三维模型获取方法、装置、智能终端及存储介质
本申请要求于2021年6月30日提交中国专利局,申请号为202110744388.6,发明名称为“一种人体三维模型获取方法、装置、智能终端及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及图像处理技术领域,尤其涉及的是一种人体三维模型获取方法、装置、智能终端及存储介质。
背景技术
人体三维模型对于描述人体姿态、预测人体行为至关重要。目前,人体三维模型已经被广泛运用于各种领域中,例如异常行为监测、自动驾驶及监控等领域。近年来,随着科学技术的发展,尤其是深度学习技术的发展,人体三维模型的重建效果逐渐提升。
但现有技术中,通常是利用彩色图像,通过卷积神经网络获取人体三维模型。现有技术的问题在于,彩色图像无法提供有效的三维空间信息,使得获得的人体三维模型准确率较低,无法准确地反映人体三维姿态。
因此,现有技术还有待改进和发展。
发明内容
本发明的主要目的在于提供一种人体三维模型获取方法、装置、智能终端及存储介质,旨在解决现有技术中利用彩色图像,通过卷积神经网络获取人体三维模型,获得的人体三维模型准确率较低的问题。
为了实现上述目的,本发明第一方面提供一种人体三维模型获取方法,其中,上述方法包括:
获取彩色图像以及与上述彩色图像对应的深度图像;
基于上述彩色图像获取人体关节点二维坐标信息和人体分割区域;
基于上述深度图像,分别获取与各上述人体关节点二维坐标信息对应的人 体关节点三维坐标信息,以及与各上述人体分割区域对应的人体分割深度区域;
基于预设的损失函数对所有上述人体关节点三维坐标信息和所有上述人体分割深度区域进行迭代拟合,获取人体三维模型。
可选的,上述获取彩色图像以及与上述彩色图像对应的深度图像,包括:
获取由采集设备采集的彩色图像以及与上述彩色图像同步的待处理深度图像;
将上述待处理深度图像对齐上述彩色图像后作为上述彩色图像对应的深度图像。
可选的,上述基于上述彩色图像获取人体关节点二维坐标信息和人体分割区域,包括:
对上述彩色图像进行目标检测,获取行人检测框;
基于上述行人检测框,通过人体姿态估计算法获取目标单人姿态估计框架;
基于上述目标单人姿态估计框架获取上述人体关节点二维坐标信息和上述人体分割区域。
可选的,上述基于上述目标单人姿态估计框架获取上述人体关节点二维坐标信息和上述人体分割区域,包括:
基于上述目标单人姿态估计框架获取多个人体关节点,并获取对应的人体关节点二维坐标信息,其中,各上述人体关节点二维坐标信息是各上述人体关节点在上述彩色图像中的位置坐标;
基于上述行人检测框和各上述人体关节点获取多个人体分割区域,其中,各上述人体分割区域是基于各上述人体关节点对人体边缘轮廓划分获得的人体区域。
可选的,上述基于预设的损失函数对所有上述人体关节点三维坐标信息和所有上述人体分割深度区域进行迭代拟合,获取人体三维模型,包括:
获取上述人体分割深度区域内的各点对应的点云三维坐标;
基于上述损失函数对上述人体关节点进行迭代拟合,获取目标人体关节点 的位置信息;
基于各上述目标人体关节点的位置信息与各目标点云获取人体三维模型,其中,上述目标点云包括与上述目标人体关节点对应的人体分割深度区域内各点的点云三维坐标。
可选的,上述预设的损失函数中包括重投影损失函数、三维关节点损失函数、角度损失函数和表面点深度损失函数。
可选的,在基于预设的损失函数对所有上述人体关节点三维坐标信息和所有上述人体分割深度区域进行迭代拟合,获取人体三维模型之后,上述方法还包括:
基于上述人体三维模型获取人体三维骨架点。
本发明第二方面提供一种人体三维模型获取装置,其中,上述装置包括:
图像获取模块,用于获取彩色图像以及与上述彩色图像对应的深度图像;
人体分割区域获取模块,用于基于上述彩色图像获取人体关节点二维坐标信息和人体分割区域;
人体分割深度区域获取模块,用于基于上述深度图像,分别获取与各上述人体关节点二维坐标信息对应的人体关节点三维坐标信息,以及与各上述人体分割区域对应的人体分割深度区域;
人体三维模型重建模块,用于基于预设的损失函数对所有上述人体关节点三维坐标信息和所有上述人体分割深度区域进行迭代拟合,获取人体三维模型。
本发明第三方面提供一种智能终端,上述智能终端包括存储器、处理器以及存储在上述存储器上并可在上述处理器上运行的人体三维模型获取程序,上述人体三维模型获取程序被上述处理器执行时实现任意一项上述人体三维模型获取方法的步骤。
本发明第四方面提供一种计算机可读存储介质,上述计算机可读存储介质上存储有人体三维模型获取程序,上述人体三维模型获取程序被处理器执行时实现任意一项上述人体三维模型获取方法的步骤。
由上可见,本发明方案获取彩色图像以及与上述彩色图像对应的深度图像;基于上述彩色图像获取人体关节点二维坐标信息和人体分割区域;基于上述深度图像,分别获取与各上述人体关节点二维坐标信息对应的人体关节点三维坐标信息,以及与各上述人体分割区域对应的人体分割深度区域;基于预设的损失函数对所有上述人体关节点三维坐标信息和所有上述人体分割深度区域进行迭代拟合,获取人体三维模型。与现有技术中仅利用彩色图像获取人体三维模型的方案相比,本发明方案结合能够提供人体对应的三维空间信息的深度图像获取人体三维模型,有利于提高获得的人体三维模型的准确率,使得获得的人体三维模型能更好的反映人体三维姿态。
附图说明
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其它的附图。
图1是本发明实施例提供的一种人体三维模型获取方法的流程示意图;
图2是本发明实施图1中步骤S100的具体流程示意图;
图3是本发明实施图1中步骤S200的具体流程示意图;
图4是本发明实施图3中步骤S203的具体流程示意图;
图5是本发明实施例提供的一种目标单人姿态估计框架示意图;
图6是本发明实施例提供的一种人体分割区域示意图;
图7是本发明实施图1中步骤S400的具体流程示意图;
图8是本发明实施例提供的另一种人体三维模型获取方法的流程示意图;
图9是本发明实施例提供的一种人体三维模型获取装置的结构示意图;
图10是本发明实施例提供的一种智能终端的内部结构原理框图。
具体实施方式
以下描述中,为了说明而不是为了限定,提出了诸如特定系统结构、技术之类的具体细节,以便透彻理解本发明实施例。然而,本领域的技术人员应当清楚,在没有这些具体细节的其它实施例中也可以实现本发明。在其它情况下,省略对众所周知的系统、装置、电路以及方法的详细说明,以免不必要的细节妨碍本发明的描述。
应当理解,当在本说明书和所附权利要求书中使用时,术语“包括”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。
还应当理解,在本发明说明书中所使用的术语仅仅是出于描述特定实施例的目的而并不意在限制本发明。如在本发明说明书和所附权利要求书中所使用的那样,除非上下文清楚地指明其它情况,否则单数形式的“一”、“一个”及“该”意在包括复数形式。
还应当进一步理解,在本发明说明书和所附权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。
如在本说明书和所附权利要求书中所使用的那样,术语“如果”可以依据上下文被解释为“当…时”或“一旦”或“响应于确定”或“响应于检测到”。类似的,短语“如果确定”或“如果检测到[所描述条件或事件]”可以依据上下文被解释为意指“一旦确定”或“响应于确定”或“一旦检测到[所描述的条件或事件]”或“响应于检测到[所描述条件或事件]”。
下面结合本发明实施例的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明的一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都属于本发明保护的范围。
在下面的描述中阐述了很多具体细节以便于充分理解本发明,但是本发明 还可以采用其它不同于在此描述的其它方式来实施,本领域技术人员可以在不违背本发明内涵的情况下做类似推广,因此本发明不受下面公开的具体实施例的限制。
人体三维模型对于描述人体姿态、预测人体行为至关重要。目前,人体三维模型已经被广泛运用于各种领域中,例如异常行为监测、自动驾驶及监控等领域。近年来,随着科学技术的发展,尤其是深度学习技术的发展,人体三维模型的重建效果逐渐提升。
但现有技术中,通常是利用彩色图像,通过卷积神经网络获取人体三维模型。现有技术的问题在于,彩色图像无法提供有效的三维空间信息,使得获得的人体三维模型准确率较低,无法准确地反映人体三维姿态。从而使得获得的人体三维模型无法应用到人机交互等要求较高的场景中,限制了人体三维模型的应用。
为了解决现有技术的问题,本发明方案获取彩色图像以及与上述彩色图像对应的深度图像;基于上述彩色图像获取人体关节点二维坐标信息和人体分割区域;基于上述深度图像,分别获取与各上述人体关节点二维坐标信息对应的人体关节点三维坐标信息,以及与各上述人体分割区域对应的人体分割深度区域;基于预设的损失函数对所有上述人体关节点三维坐标信息和所有上述人体分割深度区域进行迭代拟合,获取人体三维模型。与现有技术中仅利用彩色图像获取人体三维模型的方案相比,本发明方案结合能够提供人体对应的三维空间信息的深度图像获取人体三维模型,有利于提高获得的人体三维模型的准确率,使得获得的人体三维模型能更好的反映人体三维姿态。
示例性方法
如图1所示,本发明实施例提供一种人体三维模型获取方法,具体的,上述方法包括如下步骤:
步骤S100,获取彩色图像以及与上述彩色图像对应的深度图像。
其中,上述彩色图像和上述深度图像是包含目标对象的图像,目标对象是 需要进行人体三维模型重建的对象。进一步的,上述彩色图像和深度图像中可以包括多个目标对象,本实施例中,以存在一个目标对象为例进行具体说明,当存在多个目标对象时,可以使用本实施例中的方法分别对各个目标对象进行人体三维模型重建。具体的,深度图像是以深度信息(距离)作为像素值的图像,可以提供目标对象对应的有效的三维空间信息,从而提高获取的人体三维模型的准确率。
步骤S200,基于上述彩色图像获取人体关节点二维坐标信息和人体分割区域。
具体的,本实施例中,可以对上述彩色图像中的目标对象进行目标检测和人体姿态估计,获得对应的人体关节点二维坐标信息和人体分割区域。其中,各上述人体关节点二维坐标信息是目标对象的人体关节点在上述彩色图像中的位置坐标,上述人体分割区域是基于各人体关节点对人体边缘轮廓划分获得的人体区域。
步骤S300,基于上述深度图像,分别获取与各上述人体关节点二维坐标信息对应的人体关节点三维坐标信息,以及与各上述人体分割区域对应的人体分割深度区域。
其中,上述人体关节点三维坐标信息是上述深度图像中与各上述人体关节点二维坐标信息对应的深度信息,上述人体分割深度区域是上述深度图像中与各上述人体分割区域对应的区域。具体的,人体关节点应该在人体内部,但深度图像无法采集人体内部的深度信息,因此本实施例中,将各人体关节点对应的皮肤表面的三维坐标信息作为上述人体关节点三维坐标信息,即直接将深度图像中与各上述人体关节点二维坐标信息对应的深度信息作为人体关节点三维坐标信息。
步骤S400,基于预设的损失函数对所有上述人体关节点三维坐标信息和所有上述人体分割深度区域进行迭代拟合,获取人体三维模型。
由上可见,本发明实施例提供的人体三维模型获取方法获取彩色图像以及 与上述彩色图像对应的深度图像;基于上述彩色图像获取人体关节点二维坐标信息和人体分割区域;基于上述深度图像,分别获取与各上述人体关节点二维坐标信息对应的人体关节点三维坐标信息,以及与各上述人体分割区域对应的人体分割深度区域;基于预设的损失函数对所有上述人体关节点三维坐标信息和所有上述人体分割深度区域进行迭代拟合,获取人体三维模型。与现有技术中仅利用彩色图像获取人体三维模型的方案相比,本发明方案结合能够提供人体对应的三维空间信息的深度图像获取人体三维模型,有利于提高获得的人体三维模型的准确率,使得获得的人体三维模型能更好的反映人体三维姿态。
在一种应用场景中,还可以基于上述人体三维模型获取方法对视频流进行处理,以获得视频流中的人体三维模型。在对视频流进行处理时,获取待处理视频流,上述待处理视频流中包括连续多帧帧同步且对齐的彩色图像和深度图像。对于每一帧同步且对齐的彩色图像和深度图像,分别进行上述步骤S100到步骤S400的处理,获得每一帧的人体三维模型,具体可以对每一帧进行并行处理或依次处理,在此不做具体限定。进一步的,在预设的损失函数中还可以设置平滑损失函数,保证上下帧拟合出的人体三维模型尽可能平滑,通过计算上下帧拟合出的人体三维模型中关节点的L2损失,避免因帧间发生较大的关节点位置跳动而影响视觉效果。本实施例中,以对一帧彩色图像及其对应的深度图像为例进行具体说明,但不做具体限定。
具体的,本实施例中,如图2所示,上述步骤S100包括:
步骤S101,获取由采集设备采集的彩色图像以及与上述彩色图像同步的待处理深度图像。
步骤S102,将上述待处理深度图像对齐上述彩色图像后作为上述彩色图像对应的深度图像。
在一种应用场景中,上述采集设备可以包括至少一部深度相机和至少一部彩色相机。进一步的,上述采集设备还可以包括其它组件,例如对应的相机固定组件、照明光源等,具体可以根据实际需求进行设置和调整。在另一种应用 场景中,上述采集设备还可以为双目相机或多目相机,在此不做具体限定。本实施例中,控制上述深度相机和上述彩色相机进行同步拍摄,以获得同步的彩色图像和待处理深度图像。进行同步控制的方法可以根据实际需求进行设定,例如,在一种应用场景中,可以通过控制器或其它控制设备设定时序,从而实现对深度相机和彩色相机的同步控制,同步控制彩色相机和深度相机分别连续采集多帧帧同步的彩色图像和待处理深度图像。本实施例中,以对采集获得的一帧图像为例进行具体说明,当采集获得了多帧图像时,分别对每一帧图像进行本实施例中的处理,获得各帧图像中对应的人体三维模型。
上述待处理深度图像是上述深度相机直接获取的,与上述彩色图像帧同步但未进行对齐,上述彩色图像对应的深度图像是将上述待处理深度图像与上述彩色图像进行对齐后获得的。具体的,深度图像是以深度信息(距离)作为像素值的图像,深度图像中某一点的像素值是该点到采集模组(如上述深度相机和彩色相机构成的采集模组)所在平面的距离。
上述步骤S101中获取待处理深度图像的方法有多种,可以根据实际需求进行选择和调整。在一种应用场景中,照明光源向目标区域投射结构光光束,采集模组接收经目标区域反射回的光束并形成电信号,并传输至处理器。处理器对该电信号进行处理,计算出反映该光束的强度信息以形成结构光图案,最后基于该结构光图案进行匹配计算或三角法计算,得到待处理深度图像。在另一种应用场景中,照明光源向目标区域投射红外光束,采集模组接收经目标区域反射回的光束并形成电信号,并传输至处理器。处理器对该电信号进行处理以计算出相位差,并基于该相位差间接计算光束由照明光源发射到摄像头接收所用的飞行时间。进一步地,基于该飞行时间计算获取深度图像。其中,上述红外光束可包括脉冲型和/或连续波型,此处不作限制。在另一种应用场景中,照明光源向目标区域投射红外脉冲光束,采集模组接收经目标区域反射回的光束并形成电信号,并传输至处理器。处理器对电信号进行计数以获取波形直方图,并根据波形直方图直接计算光束由照明光源发射到摄像头接收所用的飞行 时间,基于该飞行时间计算获取深度图像。
本实施例中,预先对上述深度相机和彩色相机进行标定,分别获取深度相机和彩色相机的内外参数,进一步的,利用深度相机和彩色相机的内外参数分别获取深度相机与彩色相机获得的图像的像素坐标系的转换关系,使待处理深度图像与彩色图像上的像素一一对应,进而实现待处理深度图像与彩色图像的对齐。其中,相机的内外参数包括相机内参数和相机外参数,相机内参数是与相机自身特性相关的参数,例如焦距、像素大小等,相机外参数是在世界坐标系中的参数,例如相机的位置、旋转方向等。
具体的,本实施例中,如图3所示,上述步骤S200包括:
步骤S201,对上述彩色图像进行目标检测,获取行人检测框。
步骤S202,基于上述行人检测框,通过人体姿态估计算法获取目标单人姿态估计框架。
步骤S203,基于上述目标单人姿态估计框架获取上述人体关节点二维坐标信息和上述人体分割区域。
具体的,可以利用目标检测算法对上述彩色图像进行目标检测,获取行人检测框。其中,具体的目标检测算法和人体姿态估计算法可以根据实际需求进行选择和调整,在此不作具体限定。在一种应用场景中,上述人体姿态估计算法可以为alphapose 2D模型算法,优选的,alphapose算法中利用RMPE姿态估计模型进行人体姿态估计。具体的,RMPE姿态估计模型包括对称空间变换网络单元(SSTN,Symmetric Spatial Transformer Network)、参数化姿态最极大抑制单元(NMS,Parametric Pose NonMaximum-Suppression)及姿态指导区域产生单元(PGPG,Pose-Guided Proposals Generator)。其中,上述对称空间变换网络单元用于基于行人检测框获取单人姿态估计框架;参数化姿态最极大抑制单元用于利用姿态距离测量的方法去除当前单人姿态估计框架的冗余框,以获取目标单人姿态估计框架;姿态指导区域产生单元用于根据单人姿态估计框架和目标单人姿态估计框架生成新的训练样本,进一步对RMPE姿态估计模型 进行训练,增强数据以提高该模型的性能。其中,上述RMPE姿态估计模型既可用于多人检测,也可用于单人检测,上述目标单人姿态估计框架是需要获取人体三维模型的目标对象对应的姿态估计框架。其中,上述人体姿态估计算法除了为alphapose2D模型算法外,还可以为openpose、ppn等2D模型算法的任意一种或多种组合,此处不作限制。
具体的,本实施例中,如图4所示,上述步骤S203包括:
步骤S2031,基于上述目标单人姿态估计框架获取多个人体关节点,并获取对应的人体关节点二维坐标信息,其中,各上述人体关节点二维坐标信息是各上述人体关节点在上述彩色图像中的位置坐标。
步骤S2032,基于上述行人检测框和各上述人体关节点获取多个人体分割区域,其中,各上述人体分割区域是基于各上述人体关节点对人体边缘轮廓划分获得的人体区域。
本实施例中,基于上述目标单人姿态估计框架获取至少15个人体关节点,并获取对应的人体关节点二维坐标信息。具体的,各人体关节点二维信息是各人体关节点在上述彩色图像中对应的像素点的位置坐标。本实施例中,上述15个人体关节点优选为头,颈部,中臀,左肩,左肘,左手腕,右肩,右肘,右手腕,左臀,左膝,左脚踝,右臀,右膝,右脚踝,如图5所示。进一步的,具体的人体关节点以及人体关节点的数目可以根据实际需求进行设置和调整,在此不做具体限定。
进一步的,利用边缘检测算法在上述行人检测框中获取人体边缘轮廓,获取上述各人体关节点二维信息,利用相邻的人体关节点在上述人体边缘轮廓中进行划分,获取多个人体分割区域。图6是本发明实施例提供的一种人体分割区域示意图,如图6所示,本实施例中划分获得14个人体分割区域。可选的,还可以有其它获得人体分割区域的方法,且划分获得的人体分割区域的数目可以根据实际需求进行设置和调整,在此不做具体限定。
进一步的,本实施例中,获得的上述各人体关节点、人体关节点二维信息 以及人体分割区域都是人体在彩色图像中的信息,利用彩色图像和深度图像的对齐关系,可以获取深度图像中对应的人体关节点三维信息以及人体分割深度区域,从而获得目标对象对应的三维空间信息,进而实现人体三维模型重建。
具体的,本实施例中,如图7所示,上述步骤S400包括:
步骤S401,获取上述人体分割深度区域内的各点对应的点云三维坐标。
步骤S402,基于上述损失函数对上述人体关节点进行迭代拟合,获取目标人体关节点的位置信息。
步骤S403,基于各上述目标人体关节点的位置信息与各目标点云获取人体三维模型,其中,上述目标点云包括与上述目标人体关节点对应的人体分割深度区域内各点的点云三维坐标。
本实施例中,可以通过如下公式(1)获得深度图像中人体分割区域内的各点对应的点云三维坐标:
Figure PCTCN2021130104-appb-000001
其中,(x s,y s,z s)是需要获得的点云三维坐标,即深度相机坐标系下的各点的点云三维坐标。z为各点在深度图像上的像素值,即各点对应的深度(距离)。(u,v)是各点在深度图像中的像素的坐标,(u 0,v 0)是图像主点坐标,dx和dy是深度相机的传感器像元在两个方向上的物理尺寸,f′是深度相机的焦距(单位为毫米)。其中,图像主点(即像主点)是摄影中心与像平面的垂线与像平面的交点。
进一步的,利用参数化人体模型及预设的损失函数对目标人体关节点,以及各人体分割深度区域内各点对应的点云进行迭代拟合,获取人体三维模型。具体的,在迭代拟合获得人体三维模型的过程中,通过预设的损失函数进行约束。
其中,上述参数化人体模型是预先设置的用于重建人体三维模型的模型。 在一种应用场景中,上述参数化人体模型优选为SMPL模型。传统的SMPL模型经训练获取由24个人体关节点、6890个顶点、13776个面片组成的人体三维模型,计算量较大。本实施例中,在24个人体关节点优选上述15个人体关节点,并通过预设的多个损失函数对人体关节点进行迭代拟合获取目标人体关节点的位置信息,并进一步基于上述目标关节点的位置信息,利用其与对应的人体分割深度区域内各点的点云三维坐标迭代拟合获得人体三维模型,并基于损失函数对迭代过程中的人体三维模型进行约束。在提高准确度的基础上能减少计算量,提高人体三维模型的获取效率。
在一种应用场景中,上述预设的损失函数包括重投影损失函数、三维关节点损失函数、角度损失函数和表面点深度损失函数中的一种或多种。本实施例中,预设的多种损失函数包括上述重投影损失函数、三维关节点损失函数、角度损失函数和表面点深度损失函数。优选的,本实施例中,在步骤S402中基于上述重投影损失函数、三维关节点损失函数和角度损失函数对人体关节点进行迭代拟合,在步骤S403中,基于上述表面点深度损失函数进行约束,并基于目标人体关节点的位置信息与各目标点云进行迭代拟合获取人体三维模型。
具体的,上述重投影损失函数用于体现获得的目标人体关节点投影到二维平面(彩色图像平面)后与该平面中获得的对应人体关节点之间的重投影位置损失。本实施例中,将获得的15个目标人体关节点投影到彩色图像平面,可获得各目标人体关节点在彩色图像中的二维像素位置,计算该二维像素位置与用于识别彩色图像中人体关节点的二维图形输出的对应人体关节点位置的GM(Geman-McClure)损失,作为上述重投影损失函数。
上述三维关节点损失函数用于体现获得的目标人体关节点的位置与基于深度图像观测到的对应人体关节点之间的三维距离的损失。具体的,基于上述彩色图像中识别出的15个人体关节点,可以在对齐的深度图像中获得各人体关节点对应的深度。理想情况下,采用像素坐标到相机坐标的转换公式,即上述公式(1),可以得到15个人体关节点在相机坐标系下的观测坐标,但是由于人 体关节点被遮挡或者发生自遮挡的问题,无法取得所有人体关节点的观测坐标。另一方面,此时获得的观测坐标是人体关节点对应的表面皮肤的三维位置,并不是人体骨架中实际关节点对应的三维坐标。所以此处仅计算有效的观测到的人体关节点的三维坐标点位置与重建获得的人体三维模型中各对应的目标人体关节点的距离,若距离大于设定的阈值(可以根据实际需求进行设置和调整),则计算GM损失,作为上述三维关节点损失函数,否则认为该三维骨架中的目标人体关节点的位置较为合理,将三维关节点损失记为0。
上述角度损失函数用于对各个目标人体关节点之间的角度进行约束。具体的,实际运动过程中,人体关节的运动角度是受到人体解剖结构的限制的。例如,在下肢保持不动时,上肢向后旋转180度是不合理的。因此,在拟合过程中,对每个关节点进行角度约束,以达到加速收敛且避免拟合出的目标人体关节点出现畸形的效果。具体的,预先为每个关节点设定对应的关节点角度范围,判断当前拟合出的目标人体关节点角度是否在对应的关节点角度范围内,超过范围则计算超出部分的平方损失,作为角度损失函数,没有超出,则角度损失记为0。
上述表面点深度损失函数用于对每一次迭代拟合获得的人体三维模型的各区域的表面点云的深度值损失进行约束。具体的,表面点深度损失是人体三维模型的各区域的表面点云在深度方向的标准深度值与其转换到深度图像中的像素点的值之间的GM损失。本实施例中,预先将SMPL模型的6890个顶点划分为与上述人体分割区域对应的14个区域。在基于每一帧彩色图像及其对应的深度图像进行拟合的每次迭代过程中,计算出SMPL模型14个区域的表面点和深度图像中分割出的14个人体分割深度区域的深度值的损失。具体的,以右大腿区域的表面损失计算为例,从SMPL模型拟合获得的人体三维模型中可以获取到右大腿区域的所有点云,通过面片的连接关系可以获取点云的法向量,根据点云的法向量方向,可以获得右大腿面向相机的表面点云。首先获取在SMPL模型中拟合获得的人体三维模型中上述表面点云在深度方向的标准深度 值(Z值),即上述表面点云与对应的采集模组所在平面的距离。然后利用相机坐标到像素坐标的公式将这些表面点云投影到深度图像中,即利用相机坐标到像素坐标的公式计算获取这些表面点云在深度图像中的二维坐标,在深度图像中获取这些点对应的二维像素对应的深度值(像素值),其中,上述相机坐标到像素坐标的公式可以根据上述公式(1)获得。计算上述标准深度值与上述深度图像中二维像素对应的深度值之间的GM损失值,作为上述表面点深度损失函数。该损失值越小,表示SMPL模型的表面越贴近深度图像对应关节区域的表面,即拟合出的关节点位置越准确。
进一步的,上述人体三维模型用于处理连续多帧的视频流时,预设的损失函数还可以包括平滑损失函数,从而保证上下帧之间拟合出的人体三维模型尽可能平滑。具体的,计算上下帧拟合出的目标人体关节点的L2损失,作为上述平滑损失函数,避免因帧间发生较大的关节点位置跳动而影响视觉效果。
在一种应用场景中,对上述各损失函数进行组合求和,获取损失函数求和的值,并与预设阈值范围(可以根据实际需求进行设置和调整)进行比较,若损失函数求和的值不在预设阈值范围内,则继续迭代拟合人体关节点与对应的人体分割深度区域的目标点云,获取新的人体三维模型,直至损失函数求和的值在预设阈值范围内。其中,对上述各损失函数进行组合求和可以是直接相加或按权重分配进行求和,在此不做具体限定。进一步的,上述各损失函数可以是GM损失、L1损失、L2损失或其它损失函数,在此不做具体限定。
具体的,本实施例中,如图8所示,在上述步骤S400之后,上述方法还包括:步骤S500,基于上述人体三维模型获取人体三维骨架点。
具体的,利用迭代拟合后的人体三维模型,进一步计算获取人体三维骨架点。其中,迭代拟合后的人体三维模型的重建效果等同于理想人体三维模型,基于此进一步计算获取人体三维骨架点,可提高人体三维骨架点的准确度。优选的,利用迭代后的人体三维模型获取人体三维骨架点的方法可以为直接获取在迭代拟合过程中获取最终的最优人体三维模型时利用的人体三维骨架点的坐 标信息。也可以进一步将最终迭代拟合获得的人体三维模型输入至神经网络模型获取对应的人体三维骨架点,以进一步提高准确率。还可以有其它获取方法,在此不做具体限定。
示例性设备
如图9中所示,对应于上述人体三维模型获取方法,本发明实施例还提供一种人体三维模型获取装置,上述人体三维模型获取装置包括:
图像获取模块610,用于获取彩色图像以及与上述彩色图像对应的深度图像。
其中,上述彩色图像和上述深度图像是包含目标对象的图像,目标对象是需要进行人体三维模型重建的对象。进一步的,上述彩色图像和深度图像中可以包括多个目标对象,本实施例中,以存在一个目标对象为例进行具体说明,当存在多个目标对象时,可以使用本实施例中的装置分别对各个目标对象进行人体三维模型重建。
人体分割区域获取模块620,用于基于上述彩色图像获取人体关节点二维坐标信息和人体分割区域。
具体的,本实施例中,可以对上述彩色图像中的目标对象进行目标检测和人体姿态估计,获得对应的人体关节点二维坐标信息和人体分割区域。其中,各上述人体关节点二维坐标信息是目标对象的人体关节点在上述彩色图像中的位置坐标,上述人体分割区域是基于各人体关节点对人体边缘轮廓划分获得的人体区域。
人体分割深度区域获取模块630,用于基于上述深度图像,分别获取与各上述人体关节点二维坐标信息对应的人体关节点三维坐标信息,以及与各上述人体分割区域对应的人体分割深度区域。
其中,上述人体关节点三维坐标信息是上述深度图像中与各上述人体关节点二维坐标信息对应的深度信息,上述人体分割深度区域是上述深度图像中与各上述人体分割区域对应的区域。具体的,人体关节点应该在人体内部,但深 度图像无法采集人体内部的深度信息,因此本实施例中,将各人体关节点对应的皮肤表面的三维坐标信息作为上述人体关节点三维坐标信息,即直接将深度图像中与各上述人体关节点二维坐标信息对应的深度信息作为人体关节点三维坐标信息。
人体三维模型重建模块640,用于基于预设的损失函数对所有上述人体关节点三维坐标信息和所有上述人体分割深度区域进行迭代拟合,获取人体三维模型。
由上可见,本发明实施例提供的人体三维模型获取装置通过图像获取模块610获取彩色图像以及与上述彩色图像对应的深度图像;通过人体分割区域获取模块620基于上述彩色图像获取人体关节点二维坐标信息和人体分割区域;通过人体分割深度区域获取模块630基于上述深度图像,分别获取与各上述人体关节点二维坐标信息对应的人体关节点三维坐标信息,以及与各上述人体分割区域对应的人体分割深度区域;通过人体三维模型重建模块640基于预设的损失函数对所有上述人体关节点三维坐标信息和所有上述人体分割深度区域进行迭代拟合,获取人体三维模型。与现有技术中仅利用彩色图像获取人体三维模型的方案相比,本发明方案结合能够提供人体对应的三维空间信息的深度图像获取人体三维模型,有利于提高获得的人体三维模型的准确率,使得获得的人体三维模型能更好的反映人体三维姿态。
在一种应用场景中,还可以基于上述人体三维模型获取装置对视频流进行处理,以获得视频流中的人体三维模型。在对视频流进行处理时,获取待处理视频流,上述待处理视频流中包括连续多帧帧同步且对齐的彩色图像和深度图像。对于每一帧同步且对齐的彩色图像和深度图像,分别基于上述人体三维模型获取装置进行处理,获得每一帧的人体三维模型,具体可以对每一帧进行并行处理或依次处理,在此不做具体限定。本实施例中,以对一帧彩色图像及其对应的深度图像为例进行具体说明,但不做具体限定。
具体的,本实施例中,上述人体三维模型获取装置及其各个模块的具体功 能还可以参照上述人体三维模型获取方法中的对应描述,在此不再赘述。
基于上述实施例,本发明还提供了一种智能终端,其原理框图可以如图10所示。上述智能终端包括通过系统总线连接的处理器、存储器、网络接口以及显示屏。其中,该智能终端的处理器用于提供计算和控制能力。该智能终端的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统和人体三维模型获取程序。该内存储器为非易失性存储介质中的操作系统和人体三维模型获取程序的运行提供环境。该智能终端的网络接口用于与外部的终端通过网络连接通信。该人体三维模型获取程序被处理器执行时实现上述任意一种人体三维模型获取方法的步骤。该智能终端的显示屏可以是液晶显示屏或者电子墨水显示屏。
本领域技术人员可以理解,图10中示出的原理框图,仅仅是与本发明方案相关的部分结构的框图,并不构成对本发明方案所应用于其上的智能终端的限定,具体的智能终端可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
在一个实施例中,提供了一种智能终端,上述智能终端包括存储器、处理器以及存储在上述存储器上并可在上述处理器上运行的人体三维模型获取程序,上述人体三维模型获取程序被上述处理器执行时进行以下操作指令:
获取彩色图像以及与上述彩色图像对应的深度图像;
基于上述彩色图像获取人体关节点二维坐标信息和人体分割区域;
基于上述深度图像,分别获取与各上述人体关节点二维坐标信息对应的人体关节点三维坐标信息,以及与各上述人体分割区域对应的人体分割深度区域;
基于预设的损失函数对所有上述人体关节点三维坐标信息和所有上述人体分割深度区域进行迭代拟合,获取人体三维模型。
本发明实施例还提供一种计算机可读存储介质,上述计算机可读存储介质上存储有人体三维模型获取程序,上述人体三维模型获取程序被处理器执行时实现本发明实施例提供的任意一种人体三维模型获取方法的步骤。
应理解,上述实施例中各步骤的序号大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本发明实施例的实施过程构成任何限定。
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将上述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。实施例中的各功能单元、模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中,上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。另外,各功能单元、模块的具体名称也只是为了便于相互区分,并不用于限制本发明的保护范围。上述系统中单元、模块的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述或记载的部分,可以参见其它实施例的相关描述。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各实例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟是以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。
在本发明所提供的实施例中,应该理解到,所揭露的装置/终端设备和方法,可以通过其它的方式实现。例如,以上所描述的装置/终端设备实施例仅仅是示意性的,例如,上述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以由另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。
上述集成的模块/单元如果以软件功能单元的形式实现并作为独立的产品 销售或使用时,可以存储在一个计算机可读存储介质中。基于这样的理解,本发明实现上述实施例方法中的全部或部分流程,也可以通过计算机程序来指令相关的硬件来完成,上述计算机程序可存储于一计算机可读存储介质中,该计算机程序在被处理器执行时,可实现上述各个方法实施例的步骤。其中,上述计算机程序包括计算机程序代码,上述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。上述计算机可读介质可以包括:能够携带上述计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、电载波信号、电信信号以及软件分发介质等。需要说明的是,上述计算机可读存储介质包含的内容可以根据司法管辖区内立法和专利实践的要求进行适当的增减。
以上所述实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解;其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不是相应技术方案的本质脱离本发明各实施例技术方案的精神和范围,均应包含在本发明的保护范围之内。

Claims (10)

  1. 一种人体三维模型获取方法,其特征在于,所述方法包括:
    获取彩色图像以及与所述彩色图像对应的深度图像;
    基于所述彩色图像获取人体关节点二维坐标信息和人体分割区域;
    基于所述深度图像,分别获取与各所述人体关节点二维坐标信息对应的人体关节点三维坐标信息,以及与各所述人体分割区域对应的人体分割深度区域;
    基于预设的损失函数对所有所述人体关节点三维坐标信息和所有所述人体分割深度区域进行迭代拟合,获取人体三维模型。
  2. 根据权利要求1所述的人体三维模型获取方法,其特征在于,所述获取彩色图像以及与所述彩色图像对应的深度图像,包括:
    获取由采集设备采集的彩色图像以及与所述彩色图像同步的待处理深度图像;
    将所述待处理深度图像对齐所述彩色图像后作为所述彩色图像对应的深度图像。
  3. 根据权利要求1所述的人体三维模型获取方法,其特征在于,所述基于所述彩色图像获取人体关节点二维坐标信息和人体分割区域,包括:
    对所述彩色图像进行目标检测,获取行人检测框;
    基于所述行人检测框,通过人体姿态估计算法获取目标单人姿态估计框架;
    基于所述目标单人姿态估计框架获取所述人体关节点二维坐标信息和所述人体分割区域。
  4. 根据权利要求3所述的人体三维模型获取方法,其特征在于,所述基于所述目标单人姿态估计框架获取所述人体关节点二维坐标信息和所述人体分割区域,包括:
    基于所述目标单人姿态估计框架获取多个人体关节点,并获取对应的人体关节点二维坐标信息,其中,各所述人体关节点二维坐标信息是各所述人体关 节点在所述彩色图像中的位置坐标;
    基于所述行人检测框和各所述人体关节点获取多个人体分割区域,其中,各所述人体分割区域是基于各所述人体关节点对人体边缘轮廓划分获得的人体区域。
  5. 根据权利要求4所述的人体三维模型获取方法,其特征在于,所述基于预设的损失函数对所有所述人体关节点三维坐标信息和所有所述人体分割深度区域进行迭代拟合,获取人体三维模型,包括:
    获取所述人体分割深度区域内的各点对应的点云三维坐标;
    基于所述损失函数对所述人体关节点进行迭代拟合,获取目标人体关节点的位置信息;
    基于各所述目标人体关节点的位置信息与各目标点云获取人体三维模型,其中,所述目标点云包括与所述目标人体关节点对应的人体分割深度区域内各点的点云三维坐标。
  6. 根据权利要求1所述的人体三维模型获取方法,其特征在于,所述预设的损失函数中包括重投影损失函数、三维关节点损失函数、角度损失函数和表面点深度损失函数。
  7. 根据权利要求1所述的人体三维模型获取方法,其特征在于,在基于预设的损失函数对所有所述人体关节点三维坐标信息和所有所述人体分割深度区域进行迭代拟合,获取人体三维模型之后,所述方法还包括:
    基于所述人体三维模型获取人体三维骨架点。
  8. 一种人体三维模型获取装置,其特征在于,所述装置包括:
    图像获取模块,用于获取彩色图像以及与所述彩色图像对应的深度图像;
    人体分割区域获取模块,用于基于所述彩色图像获取人体关节点二维坐标信息和人体分割区域;
    人体分割深度区域获取模块,用于基于所述深度图像,分别获取与各所述人体关节点二维坐标信息对应的人体关节点三维坐标信息,以及与各所述人体 分割区域对应的人体分割深度区域;
    人体三维模型重建模块,用于基于预设的损失函数对所有所述人体关节点三维坐标信息和所有所述人体分割深度区域进行迭代拟合,获取人体三维模型。
  9. 一种智能终端,其特征在于,所述智能终端包括存储器、处理器以及存储在所述存储器上并可在所述处理器上运行的人体三维模型获取程序,所述人体三维模型获取程序被所述处理器执行时实现如权利要求1-7任意一项所述人体三维模型获取方法的步骤。
  10. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有人体三维模型获取程序,所述人体三维模型获取程序被处理器执行时实现如权利要求1-7任意一项所述人体三维模型获取方法的步骤。
PCT/CN2021/130104 2021-06-30 2021-11-11 一种人体三维模型获取方法、装置、智能终端及存储介质 WO2023273093A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110744388.6A CN113610889B (zh) 2021-06-30 2021-06-30 一种人体三维模型获取方法、装置、智能终端及存储介质
CN202110744388.6 2021-06-30

Publications (1)

Publication Number Publication Date
WO2023273093A1 true WO2023273093A1 (zh) 2023-01-05

Family

ID=78337136

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/130104 WO2023273093A1 (zh) 2021-06-30 2021-11-11 一种人体三维模型获取方法、装置、智能终端及存储介质

Country Status (2)

Country Link
CN (1) CN113610889B (zh)
WO (1) WO2023273093A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116503958A (zh) * 2023-06-27 2023-07-28 江西师范大学 人体姿态识别方法、系统、存储介质及计算机设备
CN117726907A (zh) * 2024-02-06 2024-03-19 之江实验室 一种建模模型的训练方法、三维人体建模的方法以及装置

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113610889B (zh) * 2021-06-30 2024-01-16 奥比中光科技集团股份有限公司 一种人体三维模型获取方法、装置、智能终端及存储介质
CN115177755A (zh) * 2022-07-07 2022-10-14 中国人民解放军军事科学院军事医学研究院 在线智能紫外辐射消毒系统和方法
CN114973422A (zh) * 2022-07-19 2022-08-30 南京应用数学中心 一种基于三维人体建模点云特征编码的步态识别方法
CN116309641B (zh) * 2023-03-23 2023-09-22 北京鹰之眼智能健康科技有限公司 图像区域获取系统

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180211399A1 (en) * 2017-01-26 2018-07-26 Samsung Electronics Co., Ltd. Modeling method and apparatus using three-dimensional (3d) point cloud
CN109176512A (zh) * 2018-08-31 2019-01-11 南昌与德通讯技术有限公司 一种体感控制机器人的方法、机器人及控制装置
CN110335343A (zh) * 2019-06-13 2019-10-15 清华大学 基于rgbd单视角图像人体三维重建方法及装置
CN110363858A (zh) * 2019-06-18 2019-10-22 新拓三维技术(深圳)有限公司 一种三维人脸重建方法及系统
CN111652974A (zh) * 2020-06-15 2020-09-11 腾讯科技(深圳)有限公司 三维人脸模型的构建方法、装置、设备及存储介质
CN111739161A (zh) * 2020-07-23 2020-10-02 之江实验室 一种有遮挡情况下的人体三维重建方法、装置及电子设备
CN111968169A (zh) * 2020-08-19 2020-11-20 北京拙河科技有限公司 动态人体三维重建方法、装置、设备和介质
CN112950668A (zh) * 2021-02-26 2021-06-11 北斗景踪技术(山东)有限公司 一种基于模位置测量的智能监控方法及系统
CN113610889A (zh) * 2021-06-30 2021-11-05 奥比中光科技集团股份有限公司 一种人体三维模型获取方法、装置、智能终端及存储介质

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105787469B (zh) * 2016-03-25 2019-10-18 浩云科技股份有限公司 行人监控和行为识别的方法和系统
CN109636831B (zh) * 2018-12-19 2023-08-04 安徽大学 一种估计三维人体姿态及手部信息的方法
CN109859296B (zh) * 2019-02-01 2022-11-29 腾讯科技(深圳)有限公司 Smpl参数预测模型的训练方法、服务器及存储介质
CN110276768B (zh) * 2019-06-28 2022-04-05 京东方科技集团股份有限公司 图像分割方法、图像分割装置、图像分割设备及介质
CN111968238A (zh) * 2020-08-22 2020-11-20 晋江市博感电子科技有限公司 基于动态融合算法的人体彩色三维重建方法
CN112836618B (zh) * 2021-01-28 2023-10-20 清华大学深圳国际研究生院 一种三维人体姿态估计方法及计算机可读存储介质
CN112819951A (zh) * 2021-02-09 2021-05-18 北京工业大学 一种基于深度图修复的带遮挡三维人体重建方法

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180211399A1 (en) * 2017-01-26 2018-07-26 Samsung Electronics Co., Ltd. Modeling method and apparatus using three-dimensional (3d) point cloud
CN109176512A (zh) * 2018-08-31 2019-01-11 南昌与德通讯技术有限公司 一种体感控制机器人的方法、机器人及控制装置
CN110335343A (zh) * 2019-06-13 2019-10-15 清华大学 基于rgbd单视角图像人体三维重建方法及装置
CN110363858A (zh) * 2019-06-18 2019-10-22 新拓三维技术(深圳)有限公司 一种三维人脸重建方法及系统
CN111652974A (zh) * 2020-06-15 2020-09-11 腾讯科技(深圳)有限公司 三维人脸模型的构建方法、装置、设备及存储介质
CN111739161A (zh) * 2020-07-23 2020-10-02 之江实验室 一种有遮挡情况下的人体三维重建方法、装置及电子设备
CN111968169A (zh) * 2020-08-19 2020-11-20 北京拙河科技有限公司 动态人体三维重建方法、装置、设备和介质
CN112950668A (zh) * 2021-02-26 2021-06-11 北斗景踪技术(山东)有限公司 一种基于模位置测量的智能监控方法及系统
CN113610889A (zh) * 2021-06-30 2021-11-05 奥比中光科技集团股份有限公司 一种人体三维模型获取方法、装置、智能终端及存储介质

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116503958A (zh) * 2023-06-27 2023-07-28 江西师范大学 人体姿态识别方法、系统、存储介质及计算机设备
CN116503958B (zh) * 2023-06-27 2023-10-03 江西师范大学 人体姿态识别方法、系统、存储介质及计算机设备
CN117726907A (zh) * 2024-02-06 2024-03-19 之江实验室 一种建模模型的训练方法、三维人体建模的方法以及装置
CN117726907B (zh) * 2024-02-06 2024-04-30 之江实验室 一种建模模型的训练方法、三维人体建模的方法以及装置

Also Published As

Publication number Publication date
CN113610889A (zh) 2021-11-05
CN113610889B (zh) 2024-01-16

Similar Documents

Publication Publication Date Title
WO2023273093A1 (zh) 一种人体三维模型获取方法、装置、智能终端及存储介质
CN110032278B (zh) 一种人眼感兴趣物体的位姿识别方法、装置及系统
CN110599540B (zh) 多视点相机下的实时三维人体体型与姿态重建方法及装置
WO2019161813A1 (zh) 动态场景的三维重建方法以及装置和系统、服务器、介质
US11263443B2 (en) Centimeter human skeleton pose estimation
CN110892408A (zh) 用于立体视觉和跟踪的系统、方法和装置
EP4307233A1 (en) Data processing method and apparatus, and electronic device and computer-readable storage medium
US20150146928A1 (en) Apparatus and method for tracking motion based on hybrid camera
CN113366491B (zh) 眼球追踪方法、装置及存储介质
CN109961523B (zh) 虚拟目标的更新方法、装置、系统、设备及存储介质
CN108305321B (zh) 一种基于双目彩色成像系统的立体人手3d骨架模型实时重建方法和装置
CN110264527A (zh) 基于zynq的实时双目立体视觉输出方法
WO2019136588A1 (zh) 基于云端计算的标定方法、装置、电子设备和计算机程序产品
CN116452752A (zh) 联合单目稠密slam与残差网络的肠壁重建方法
CN115830675A (zh) 一种注视点跟踪方法、装置、智能眼镜及存储介质
CN112365589B (zh) 一种虚拟三维场景展示方法、装置及系统
CN108491752A (zh) 一种基于手部分割卷积网络的手部姿态估计方法
CN116958443A (zh) 基于smplx重建数字人可量化检测模型、方法及应用
KR102333768B1 (ko) 딥러닝 기반 손 인식 증강현실 상호 작용 장치 및 방법
CN106204604A (zh) 投影触控显示装置及其交互方法
CN114494582A (zh) 一种基于视觉感知的三维模型动态更新方法
Lacher et al. Low-cost surface reconstruction for aesthetic results assessment and prediction in breast cancer surgery
CN111612912A (zh) 一种基于Kinect 2相机面部轮廓点云模型的快速三维重构及优化方法
KR102075079B1 (ko) 하이브리드 카메라 기반 동작 추적 장치 및 그 방법
CN111582120A (zh) 用于捕捉眼球活动特征的方法、终端设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21948019

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE