CN115035551A - Three-dimensional human body posture estimation method, device, equipment and storage medium - Google Patents

Three-dimensional human body posture estimation method, device, equipment and storage medium Download PDF

Info

Publication number
CN115035551A
CN115035551A CN202210956640.4A CN202210956640A CN115035551A CN 115035551 A CN115035551 A CN 115035551A CN 202210956640 A CN202210956640 A CN 202210956640A CN 115035551 A CN115035551 A CN 115035551A
Authority
CN
China
Prior art keywords
human body
heat map
dimensional
information
fusion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210956640.4A
Other languages
Chinese (zh)
Other versions
CN115035551B (en
Inventor
胡波
胡世卓
周斌
沈振冈
李艳红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Etah Information Technology Co ltd
Original Assignee
Wuhan Etah Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Etah Information Technology Co ltd filed Critical Wuhan Etah Information Technology Co ltd
Priority to CN202210956640.4A priority Critical patent/CN115035551B/en
Publication of CN115035551A publication Critical patent/CN115035551A/en
Application granted granted Critical
Publication of CN115035551B publication Critical patent/CN115035551B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/766Arrangements for image or video recognition or understanding using pattern recognition or machine learning using regression, e.g. by projecting features on hyperplanes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Abstract

The invention discloses a three-dimensional human body posture estimation method, a device, equipment and a storage medium, wherein the method comprises the steps of generating a target heat map corresponding to an input image by adopting a multi-view fusion network, and matching and fusing human body central point heat map information of the target heat map to obtain fusion information; projecting the fusion information to a 3D space to obtain a three-dimensional characteristic volume; the three-dimensional human body posture is estimated according to the three-dimensional characteristic volume, the accuracy of three-dimensional human body posture estimation can be improved, reasoning search spaces of other human body key points are reduced, errors of three-dimensional human body posture estimation are reduced, the posture reconstruction quality is improved, the calculation cost is reduced, the influence of quantization errors is avoided, the accuracy of three-dimensional human body posture estimation is improved, the scheme is simple and reliable to implement, the method can be suitable for three-dimensional human body posture estimation of most scenes, and the speed and the efficiency of three-dimensional human body posture estimation are improved.

Description

Three-dimensional human body posture estimation method, device, equipment and storage medium
Technical Field
The invention relates to the technical field of multi-view fusion, in particular to a three-dimensional human body posture estimation method, device, equipment and storage medium.
Background
In recent years, three-dimensional human posture estimation studies through multi-view matching are mainly classified into two main categories: two-dimensional to three-dimensional based multi-stage methods and direct regression based methods; two-dimensional to three-dimensional based methods such as by estimating 2D keypoints of the same person in each view, and then lifting the matched 2D single view pose to 3D space; if the 2D image structure model is expanded to the 3D image structure model to encode the pair-wise relation between the body joint positions, if the multi-person 2D posture detection is firstly solved, the association is carried out in a plurality of camera views, and then the 3D posture is recovered by using triangulation; the methods are effective in a specific scene, but the method is very dependent on a 2D detection result, the reconstruction quality of the 3D posture is greatly influenced by inaccuracy of two-dimensional posture estimation, and particularly the shielding condition exists.
The method based on the direct regression is also called as an end-to-end based method, and because the deep neural network can fit a complex function, the method usually does not need other algorithm assistance and intermediate data, the three-dimensional attitude coordinate can be directly predicted based on the network structure of the regression; the discretization 3D feature volume is constructed by multi-view features like the VoxelPose model, the 2D pose in each view is not independently estimated, but the obtained 2D heatmap is directly projected to be inferred in a 3D space, but the calculation cost for searching key points in the whole space geometrically increases along with the detailed division of the space, and is also influenced by quantization errors caused by the spatial discretization.
Disclosure of Invention
The invention mainly aims to provide a method, a device, equipment and a storage medium for estimating a three-dimensional human body posture, and aims to solve the technical problems that the reconstruction quality of a 3D posture is greatly influenced due to inaccuracy of two-dimensional posture estimation and the direct regression mode has high calculation cost and large error because of depending on a 2D detection result in the prior art.
In a first aspect, the present invention provides a three-dimensional human body posture estimation method, including the following steps:
generating a target heat map corresponding to an input image by adopting a multi-view fusion network, and matching and fusing human body central point heat map information of the target heat map to obtain fusion information;
projecting the fusion information to a 3D space to obtain a three-dimensional characteristic volume;
and estimating the three-dimensional human body posture according to the three-dimensional characteristic volume.
Optionally, the generating a target heat map corresponding to the input image by using a multi-view fusion network, matching and fusing the human body central point heat map information of the target heat map to obtain fusion information includes:
inputting an input image into a high-resolution network of a multi-view fusion network to acquire high-resolution characteristic information;
constructing a residual error unit of the high-resolution network according to the high-resolution characteristic information, and performing convolution sampling on the residual error unit to obtain a multi-resolution module;
fusing the feature maps with different resolutions at each stage in the multi-resolution module to obtain a target heat map;
matching and fusing the human body center point heat map information of the target heat map to obtain fused information.
Optionally, the fusing the feature maps with different resolutions at each stage in the multi-resolution module to obtain the target heat map includes:
fusing the feature maps with different resolutions at each stage in the multi-resolution module to obtain a fused feature map;
inputting the fusion characteristic graph into a deconvolution module, obtaining an output result through convolution and channel conversion, and performing dimensionality splicing on the output result and the fusion characteristic graph to obtain a splicing characteristic;
and improving the resolution ratio of the splicing features according to the deconvolution layer, extracting target feature information of the splicing features with improved resolution ratio through the residual error unit, and generating a target heat map according to the target feature information.
Optionally, the matching and fusing the human body central point heat map information of the target heat map to obtain fused information includes:
taking preset key points between hip joints of the human body in the target heat map as central points of the human body;
and matching and fusing the human body central point heat map information of the target heat map according to the human body central point of the multiple views to obtain fused information.
Optionally, the matching and fusing the human body central point heat map information of the target heat map according to the human body central point of the multiple views to obtain fused information, including:
sampling polar lines corresponding to the central points of the graphs in the target heat map according to the central points of the human body of the multiple views to obtain a corresponding point set;
generating a probability region of the Gaussian distribution of the target heat map at the corresponding coordinates according to the corresponding point set;
fusing values of all points on the epipolar line in the probability region through a full-connection layer to obtain a finally fused central point coordinate;
and carrying out coordinate matching fusion on the human body central point heat map information of the target heat map according to the central point coordinate to obtain fusion information.
Optionally, the projecting the fusion information to a 3D space to obtain a three-dimensional feature volume includes:
acquiring camera calibration data of a video camera, and projecting each voxel center in the fusion information to a camera view by using the camera calibration data to obtain camera view projection data;
and constructing a three-dimensional characteristic volume according to the camera view projection data by using a 3D CNN network.
Optionally, the estimating a three-dimensional human body posture according to the three-dimensional feature volume includes:
dividing the three-dimensional feature volume into a plurality of discrete grids;
and acquiring the 3D heat map space coordinate of each key point in each discrete grid, and regressing the 3D heat map space coordinate to obtain the three-dimensional human body posture.
In a second aspect, to achieve the above object, the present invention further provides a three-dimensional body posture estimation device, including:
the fusion module is used for generating a target heat map corresponding to the input image by adopting a multi-view fusion network, and matching and fusing the human body center point heat map information of the target heat map to obtain fusion information;
the projection module is used for projecting the fusion information to a 3D space to obtain a three-dimensional characteristic volume;
and the posture estimation module is used for estimating the three-dimensional human body posture according to the three-dimensional characteristic volume.
In a third aspect, to achieve the above object, the present invention further provides a three-dimensional human body posture estimation device, including: a memory, a processor and a three-dimensional body pose estimation program stored on the memory and executable on the processor, the three-dimensional body pose estimation program configured to implement the steps of the three-dimensional body pose estimation method as described above.
In a fourth aspect, to achieve the above object, the present invention further provides a storage medium having a three-dimensional body posture estimation program stored thereon, wherein the three-dimensional body posture estimation program, when executed by a processor, implements the steps of the three-dimensional body posture estimation method as described above.
The invention provides a three-dimensional human body posture estimation method, which comprises the steps of generating a target heat map corresponding to an input image by adopting a multi-view fusion network, and matching and fusing human body center point heat map information of the target heat map to obtain fusion information; projecting the fusion information to a 3D space to obtain a three-dimensional characteristic volume; the three-dimensional human body posture is estimated according to the three-dimensional characteristic volume, the accuracy of three-dimensional human body posture estimation can be improved, reasoning search spaces of other human body key points are reduced, errors of three-dimensional human body posture estimation are reduced, the posture reconstruction quality is improved, the calculation cost is reduced, the influence of quantization errors is avoided, the accuracy of three-dimensional human body posture estimation is improved, the scheme is simple and reliable to implement, the method can be suitable for three-dimensional human body posture estimation of most scenes, and the speed and the efficiency of three-dimensional human body posture estimation are improved.
Drawings
FIG. 1 is a schematic diagram of an apparatus architecture of a hardware operating environment according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a method for estimating a three-dimensional human body pose according to a first embodiment of the present invention;
FIG. 3 is a flowchart illustrating a method for estimating a three-dimensional human body pose according to a second embodiment of the present invention;
FIG. 4 is a network structure of key point detection in the three-dimensional human body posture estimation method of the present invention;
FIG. 5 is a flowchart illustrating a method for estimating a three-dimensional human body pose according to a third embodiment of the present invention;
FIG. 6 is a schematic antipodal geometry diagram in the three-dimensional human body pose estimation method of the present invention;
FIG. 7 is a schematic diagram of a multi-view epipolar constraint model in the three-dimensional human body pose estimation method of the present invention;
FIG. 8 is a flowchart illustrating a method for estimating a three-dimensional human body pose according to a fourth embodiment of the present invention;
FIG. 9 is a schematic diagram of a 3D CNN network structure in the three-dimensional human body posture estimation method according to the present invention;
FIG. 10 is a flowchart illustrating a fifth embodiment of a three-dimensional human body posture estimation method according to the present invention;
FIG. 11 is a functional block diagram of a three-dimensional human body posture estimation device according to a first embodiment of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.
The solution of the embodiment of the invention is mainly as follows: generating a target heat map corresponding to an input image by adopting a multi-view fusion network, and matching and fusing human body central point heat map information of the target heat map to obtain fusion information; projecting the fusion information to a 3D space to obtain a three-dimensional characteristic volume; the three-dimensional human body posture is estimated according to the three-dimensional characteristic volume, the accuracy of three-dimensional human body posture estimation can be improved, reasoning search spaces of key points of other human bodies are reduced, errors of three-dimensional human body posture estimation are reduced, the posture reconstruction quality is improved, the calculation cost is reduced, the influence of quantization errors is avoided, the accuracy of three-dimensional human body posture estimation is improved, the scheme is simple and reliable to implement, the method can be suitable for three-dimensional human body posture estimation of most scenes, the speed and the efficiency of three-dimensional human body posture estimation are improved, the technical problems that the reconstruction quality of a 3D posture is greatly influenced due to inaccuracy of two-dimensional posture estimation and the calculation cost is high in a direct regression mode and the error is large in the prior art are solved.
Referring to fig. 1, fig. 1 is a schematic device structure diagram of a hardware operating environment according to an embodiment of the present invention.
As shown in fig. 1, the apparatus may include: a processor 1001, e.g. a CPU, a communication bus 1002, a user interface 1003, a network interface 1004, a memory 1005. The communication bus 1002 is used to implement connection communication among these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., a Wi-Fi interface). The Memory 1005 may be a high-speed RAM Memory or a Non-Volatile Memory (Non-Volatile Memory), such as a disk Memory. The memory 1005 may alternatively be a storage device separate from the processor 1001.
Those skilled in the art will appreciate that the configuration of the apparatus shown in fig. 1 is not intended to be limiting of the apparatus and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.
As shown in fig. 1, a memory 1005, which is a storage medium, may include therein an operating device, a network communication module, a user interface module, and a three-dimensional human body posture estimation program.
The apparatus of the present invention calls a three-dimensional human body posture estimation program stored in the memory 1005 through the processor 1001 and performs the following operations:
generating a target heat map corresponding to an input image by adopting a multi-view fusion network, and matching and fusing human body center point heat map information of the target heat map to obtain fusion information;
projecting the fusion information to a 3D space to obtain a three-dimensional characteristic volume;
and estimating the three-dimensional human body posture according to the three-dimensional characteristic volume.
The apparatus of the present invention calls the three-dimensional human body posture estimation program stored in the memory 1005 through the processor 1001, and also performs the following operations:
inputting an input image into a high-resolution network of a multi-view fusion network to acquire high-resolution characteristic information;
constructing a residual error unit of the high-resolution network according to the high-resolution characteristic information, and performing convolution sampling on the residual error unit to obtain a multi-resolution module;
fusing the feature maps with different resolutions at each stage in the multi-resolution module to obtain a target heat map;
matching and fusing the human body central point heat map information of the target heat map to obtain fused information.
The apparatus of the present invention calls the three-dimensional human body posture estimation program stored in the memory 1005 through the processor 1001, and also performs the following operations:
fusing the feature maps with different resolutions at each stage in the multi-resolution module to obtain a fused feature map;
inputting the fusion characteristic graph into a deconvolution module, obtaining an output result through convolution and channel conversion, and performing dimensionality splicing on the output result and the fusion characteristic graph to obtain a splicing characteristic;
and improving the resolution ratio of the splicing features according to the deconvolution layer, extracting target feature information of the splicing features with improved resolution ratio through the residual error unit, and generating a target heat map according to the target feature information.
The apparatus of the present invention calls the three-dimensional human body posture estimation program stored in the memory 1005 through the processor 1001, and also performs the following operations:
taking preset key points between hip joints of the human body in the target heat map as central points of the human body;
matching and fusing the human body central point heat map information of the target heat map according to the human body central point of the multiple views to obtain fused information.
The apparatus of the present invention calls the three-dimensional human body posture estimation program stored in the memory 1005 through the processor 1001, and also performs the following operations:
sampling polar lines corresponding to the central points of the graphs in the target heat map according to the central points of the human body of the multiple views to obtain a corresponding point set;
generating a probability region of the Gaussian distribution of the target heat map at the corresponding coordinates according to the corresponding point set;
fusing values of all points on the epipolar line in the probability region through a full-connection layer to obtain a finally fused central point coordinate;
and carrying out coordinate matching fusion on the human body central point heat map information of the target heat map according to the central point coordinate to obtain fusion information.
The apparatus of the present invention calls the three-dimensional human body posture estimation program stored in the memory 1005 through the processor 1001, and also performs the following operations:
acquiring camera calibration data of a video camera, and projecting each voxel center in the fusion information into a camera view by using the camera calibration data to obtain camera view projection data;
and constructing a three-dimensional characteristic volume according to the camera view projection data by using a 3D CNN network.
The apparatus of the present invention calls the three-dimensional human body posture estimation program stored in the memory 1005 by the processor 1001, and also performs the following operations:
dividing the three-dimensional feature volume into a plurality of discrete grids;
and acquiring the 3D heat map space coordinate of each key point in each discrete grid, and regressing the 3D heat map space coordinate to obtain the three-dimensional human body posture.
According to the scheme, a multi-view fusion network is adopted to generate a target heat map corresponding to an input image, and matching fusion is carried out on human body central point heat map information of the target heat map to obtain fusion information; projecting the fusion information to a 3D space to obtain a three-dimensional characteristic volume; the three-dimensional human body posture is estimated according to the three-dimensional characteristic volume, the accuracy of three-dimensional human body posture estimation can be improved, inference search spaces of other human body key points are reduced, errors of three-dimensional human body posture estimation are reduced, posture reconstruction quality is improved, calculation cost is reduced, quantization error influence is avoided, the accuracy of three-dimensional human body posture estimation is improved, the scheme is simple and reliable to implement, the method can be suitable for three-dimensional human body posture estimation of most scenes, and the speed and the efficiency of three-dimensional human body posture estimation are improved.
Based on the hardware structure, the embodiment of the three-dimensional human body posture estimation method is provided.
Referring to fig. 2, fig. 2 is a flowchart illustrating a method for estimating a three-dimensional human body pose according to a first embodiment of the present invention.
In a first embodiment, the three-dimensional human body posture estimation method comprises the following steps:
and S10, generating a target heat map corresponding to the input image by adopting a multi-view fusion network, and matching and fusing the human body central point heat map information of the target heat map to obtain fusion information.
It should be noted that, through a Multi-View Fusion Network (MVFNet), the Network may obtain a corresponding target heatmap according to an input image on the basis of a high resolution Network HRNet, and perform matching Fusion on the human body center point heatmap information of the target heatmap to obtain fused heatmap information.
And step S20, projecting the fusion information to a 3D space to obtain a three-dimensional characteristic volume.
It will be appreciated that projecting the fusion information into 3D space enables a three-dimensional feature volume to be obtained from a coarse to a fine build volume.
And step S30, estimating the three-dimensional human body posture according to the three-dimensional characteristic volume.
It should be appreciated that an accurate three-dimensional body pose can be estimated from the three-dimensional feature volumes.
According to the scheme, a multi-view fusion network is adopted to generate a target heat map corresponding to an input image, and matching fusion is carried out on human body central point heat map information of the target heat map to obtain fusion information; projecting the fusion information to a 3D space to obtain a three-dimensional characteristic volume; the three-dimensional human body posture is estimated according to the three-dimensional characteristic volume, the accuracy of three-dimensional human body posture estimation can be improved, reasoning search spaces of other human body key points are reduced, errors of three-dimensional human body posture estimation are reduced, the posture reconstruction quality is improved, the calculation cost is reduced, the influence of quantization errors is avoided, the accuracy of three-dimensional human body posture estimation is improved, the scheme is simple and reliable to implement, the method can be suitable for three-dimensional human body posture estimation of most scenes, and the speed and the efficiency of three-dimensional human body posture estimation are improved.
Further, fig. 3 is a schematic flow chart of a second embodiment of the three-dimensional human body posture estimation method of the present invention, and as shown in fig. 3, the second embodiment of the three-dimensional human body posture estimation method of the present invention is proposed based on the first embodiment, and in this embodiment, the step S10 specifically includes the following steps:
and step S11, inputting the input image into a high-resolution network of the multi-view fusion network, and acquiring high-resolution characteristic information.
It should be noted that, when the input image is input into the high-resolution network of the multiview fusion network, the high-resolution feature information can be obtained.
In specific implementation, in order to obtain high-resolution feature information, a network before HRNet adopts the steps of sampling a high-resolution feature map to a low resolution and then restoring the high resolution to realize multi-scale feature extraction, such as U-Net, SegNet, Hourglass and the like; in such network architectures, the high resolution features are mainly derived from two parts: the first is the original high-resolution feature, which only provides low-level semantic expression due to a small amount of convolution operation; secondly, down-sampling and up-sampling are carried out to obtain high-resolution features, but a large amount of effective feature information is lost when up-down sampling is repeatedly carried out; the HRNet gradually introduces low-resolution convolution while always keeping high-resolution characteristics through paralleling a plurality of branches with high resolution to low resolution, and connects the convolutions with different resolutions in parallel to carry out information interaction, so that each feature with high resolution to low resolution repeatedly receives information from other parallel sub-networks, and the purpose of obtaining strong semantic information and accurate position information is achieved.
And step S12, constructing a residual error unit of the high-resolution network according to the high-resolution characteristic information, and performing convolution sampling on the residual error unit to obtain a multi-resolution module.
It can be understood that a residual unit of the high-resolution network can be constructed through the high-resolution feature information, and a multi-resolution module can be obtained by performing convolution sampling on the residual unit.
In a specific implementation, the present embodiment may use HRNet as a basic framework through a multi-view fusion network MVFNet network, and add a deconvolution module to obtain a heatmap with higher resolution and richer semantic information, as shown in fig. 4, fig. 4 is a key point detection network structure in the three-dimensional human body posture estimation method of the present invention, referring to fig. 4, the network is divided into four stages, the main body is four parallel sub-networks, and the high resolution sub-network is used as a first stage, the sub-networks from high resolution to low resolution are gradually added, and the multi-resolution sub-networks are connected in parallel; the first stage comprises 4 residual error units, each residual error unit is the same as ResNet-50 and is composed of a bottompiece with 64 channel numbers; then downsampled to the second stage by a convolution with 3 x 3, step size 2; the second, third and fourth stages respectively comprise 1, 4 and 3 multi-resolution blocks, so that the network can keep a certain depth and fully extract the characteristic information, each multi-resolution block has 4 residual error units, and the BasicBlock of ResNet is adopted, namely two 3 multiplied by 3 convolutions.
And step S13, fusing the feature maps with different resolutions in each stage in the multi-resolution module to obtain the target heat map.
It should be understood that the target heat map can be obtained by fusing the feature maps of different resolutions at different stages in the multi-resolution module.
Further, the step S13 specifically includes the following steps:
fusing the feature maps with different resolutions at each stage in the multi-resolution module to obtain a fused feature map;
inputting the fusion characteristic graph into a deconvolution module, obtaining an output result through convolution and channel conversion, and performing dimensionality splicing on the output result and the fusion characteristic graph to obtain a splicing characteristic;
and improving the resolution ratio of the splicing features according to the deconvolution layer, extracting target feature information of the splicing features with improved resolution ratio through the residual error unit, and generating a target heat map according to the target feature information.
It can be understood that a fused feature map can be obtained by fusing feature maps of different resolutions at each stage in the multi-resolution module, channel conversion is performed in the deconvolution module, and after dimension splicing is performed on an output result and the fused feature map, spliced features can be obtained, so that the resolution of the spliced features can be improved according to the deconvolution layer, and target feature information of the spliced features after resolution improvement is extracted through the residual error unit, thereby generating a target heat map.
In the specific implementation, feature maps with different resolutions at each stage are fused at the end of a network, the fused feature maps are used as the input of a deconvolution module, channel conversion is carried out through convolution, the result is subjected to dimensional splicing with the input features, the resolution of the feature maps is improved to be 2 times of the original resolution by deconvolution with a convolution kernel of 4 × 4, feature information is further extracted through 4 residual blocks, and finally heatmap is predicted through convolution of 1 × 1; the higher resolution is beneficial to obtaining richer key point information, and further accurate three-dimensional human body posture estimation is realized.
And step S14, matching and fusing the human body central point heat map information of the target heat map to obtain fused information.
It can be understood that the fused information can be obtained by matching and fusing the human body central point heat map information of the target heat map.
According to the scheme, the high-resolution characteristic information is acquired by inputting the input image into the high-resolution network of the multi-view fusion network; constructing a residual error unit of the high-resolution network according to the high-resolution characteristic information, and performing convolution sampling on the residual error unit to obtain a multi-resolution module; fusing the feature maps with different resolutions at each stage in the multi-resolution module to obtain a target heat map; matching and fusing the human body central point heat map information of the target heat map to obtain fused information, so that the fused information can be accurately obtained, the accuracy of three-dimensional human body posture estimation is improved, reasoning and searching spaces of other human body key points are reduced, and the error of three-dimensional human body posture estimation is reduced.
Further, fig. 5 is a schematic flow chart of a third embodiment of the three-dimensional human body posture estimation method of the present invention, and as shown in fig. 5, the third embodiment of the three-dimensional human body posture estimation method of the present invention is proposed based on the second embodiment, and in this embodiment, the step S14 specifically includes the following steps:
and step S141, taking preset key points between hip joints of the human body in the target heat map as central points of the human body.
It should be noted that, the preset key points between hip joints of the human body in the target heat map may be used as the center points of the human body according to the preset key points.
It will be appreciated that there is an epipolar geometry between the multiple view images, describing the intrinsic projective relationship between the two views, independent of the external scene, and relying only on the in-camera parameters and the inter-view parametersA relative attitude; the epipolar geometric relationship is fully utilized to help the network to acquire more position information, irrelevant noise in the training process is eliminated, and the accuracy of network prediction is improved; the principle is shown in fig. 6, fig. 6 is a geometric diagram of epipolar lines in the three-dimensional human body posture estimation method of the invention, see fig. 6, O 1 、O 2 Is the optical center of two cameras, I 1 、I 2 As an image plane, e 1 、e 2 The projection points of the optical center of the camera on the relative plane are called poles; if the two cameras cannot shoot each other due to an angle problem, the pole does not appear on the imaging plane; observed point P is at I 1 、I 2 Projected point on is P 1 、P 2 Since the depth information is unknown, P may be at ray O 1 P 1 At a different point on the ray, onto a line L formed on the right image 2 Called and point P 1 Corresponding polar line, then P 1 At the corresponding point P of the right image 2 Necessarily in the polar line L 2 The above step (1); the relative position of the matching points is constrained by the geometrical relationship of the image plane space, the constrained relationship can be expressed by a basic matrix, and according to the literature, the epipolar constraint is shown as formula (1):
Figure 355465DEST_PATH_IMAGE001
(1)
wherein
Figure 210289DEST_PATH_IMAGE002
The calculation formula is shown in (2) for the basic matrix:
Figure 724447DEST_PATH_IMAGE003
(2)
wherein
Figure 650814DEST_PATH_IMAGE004
And
Figure 364387DEST_PATH_IMAGE005
is a matrix of the two camera intrinsic parameters,
Figure 390112DEST_PATH_IMAGE006
the method comprises the following steps of (1) being an intrinsic matrix comprising an external reference translation matrix and a rotation matrix of a camera; therefore, to take full advantage of the geometric constraint relationship between views.
In a specific implementation, a multi-view epipolar constraint model is introduced into the MVFNet network provided by this embodiment, key points between human hip joints are taken as central points, the heatmap matching fusion of the multi-view human central points is performed, the high-resolution heatmap is input, epipolar lines corresponding to the central points of the respective graphs are solved by epipolar geometric constraint relations, and sampling is performed to obtain a set of corresponding points; according to the characteristics of the heatmap, a probability area of Gaussian distribution is generated at a corresponding coordinate, only high response exists near a corresponding point, and other places are close to 0, so that the values of all points on the epipolar line can be fused by using a full-link layer, and the accuracy of central point detection is improved; finally, the differences between the final fused centroid coordinates and the labeled centroid coordinates are compared using L2 loss to perform training constraints.
And S142, matching and fusing the human body central point heat map information of the target heat map according to the human body central point of the multiple views to obtain fused information.
It can be understood that, the human body center point heat map information of the target heat map can be matched and fused through the human body center points of the multiple views, so that matched and fused heatmap fusion information can be obtained.
Further, the step S142 specifically includes the following steps:
sampling polar lines corresponding to the central points of the graphs in the target heat map according to the central points of the human body of the multiple views to obtain a corresponding point set;
generating probability regions of Gaussian distribution of the target heat map at corresponding coordinates according to the corresponding point sets;
fusing values of all points on the epipolar line in the probability region through a full-connection layer to obtain a finally fused central point coordinate;
and carrying out coordinate matching fusion on the human body central point heat map information of the target heat map according to the central point coordinate to obtain fusion information.
It can be understood that polar lines corresponding to central points of the respective graphs in the target heat map are sampled, after corresponding point sets are obtained, probability regions with gaussian distribution can be generated, and then values of all points on the polar lines in the probability regions are fused through the full connection layer, so that finally fused central point coordinates are obtained, and heat map coordinate matching is performed to obtain fusion information.
In a specific implementation, as shown in fig. 7, fig. 7 is a schematic diagram of a multi-view epipolar constraint model in the three-dimensional human body posture estimation method of the present invention, see fig. 7, a high-resolution heatmap is input, epipolar lines corresponding to central points of each graph are solved by an epipolar geometric constraint relationship, and sampling is performed to obtain a set of corresponding points; according to the characteristics of the heatmap, a probability area of Gaussian distribution is generated at a corresponding coordinate, only high response exists near a corresponding point, and other places are close to 0, so that the values of all points on the epipolar line can be fused by using a full-connection layer, and the accuracy of central point detection is improved; finally, the differences between the final fused centroid coordinates and the labeled centroid coordinates are compared using L2 loss to perform training constraints.
According to the scheme, the preset key points between hip joints of the human body in the target heat map are used as the central points of the human body; matching and fusing the human body central point heat map information of the target heat map according to the human body central point of the multiple views to obtain fusion information, so that the fusion information can be accurately obtained, the accuracy of three-dimensional human body posture estimation is improved, reasoning and searching spaces of other human body key points are reduced, and the error of three-dimensional human body posture estimation is reduced.
Further, fig. 8 is a schematic flowchart of a fourth embodiment of the three-dimensional human body posture estimation method of the present invention, and as shown in fig. 8, the fourth embodiment of the three-dimensional human body posture estimation method of the present invention is proposed based on the first embodiment, in this embodiment, the step S20 specifically includes the following steps:
and step S21, acquiring camera calibration data of the video camera, and projecting each voxel center in the fusion information into a camera view by using the camera calibration data to obtain camera view projection data.
It should be noted that, after camera calibration data of the video camera is obtained, each voxel center in the fusion information may be projected into a camera view by using the camera calibration data, so as to obtain camera view projection data.
It can be understood that the features of all the obtained views are aggregated into a 3D voxel volume by inverse image projection method, a voxel grid is initialized and contains the whole space observed by the camera, meanwhile, the center of each voxel is projected into the camera view by using the camera calibration data, and then the feature volume is constructed by the 3D CNN network from coarse to fine by taking the center as the center to estimate the position of all the key points.
In specific implementation, referring to fig. 9, fig. 9 is a schematic diagram of a 3D CNN network structure in the three-dimensional human body posture estimation method of the present invention, as shown in fig. 9, a network input of the 3D CNN network structure is a 3D feature volume, which is constructed by projecting 2D heatmaps in all camera views to a common 3D space, because the heatmaps encode position information of a central point, the obtained 3D feature volume also has rich information for detecting a 3D posture, and a search area of other key points in the 3D space can be reduced according to human body prior information; black open arrows represent standard 3D convolutional layers, black solid arrows represent residual blocks of two 3D convolutional layers, linear arrows are pooling, and dashed arrows are deconvolution; discretizing a three-dimensional space into
Figure 391566DEST_PATH_IMAGE007
At a discrete location
Figure 121624DEST_PATH_IMAGE008
Each position can be regarded as an anchor of a detected person; to reduce quantization error, adjustments are made
Figure 197028DEST_PATH_IMAGE009
The distance between adjacent anchors is reduced; on a common data set, the space is typically 8m
Figure 721550DEST_PATH_IMAGE010
8m
Figure 882404DEST_PATH_IMAGE010
2m, thus will
Figure 150574DEST_PATH_IMAGE009
Are set to 80, 80, 20.
And S22, constructing a three-dimensional characteristic volume according to the camera view projection data by using a 3D CNN network.
It will be appreciated that by constructing feature volumes from coarse to fine centered by the 3D CNN network to estimate the location of all keypoints, three-dimensional feature volumes can be constructed from the camera view projection data.
In specific implementation, the 2D heatmap value of the projection position of each anchor in a camera view is fused, and the feature vector of each anchor is calculated; let 2D heatmap in view a be denoted
Figure 844599DEST_PATH_IMAGE011
Where K is the number of body keypoints; position for each anchor
Figure 805601DEST_PATH_IMAGE008
Its projected position in the view and
Figure 453752DEST_PATH_IMAGE012
here, the heatmap value is expressed as
Figure 260034DEST_PATH_IMAGE013
(ii) a Then, calculating a feature vector of the anchor as an average heatmap value in all camera views; as shown in formula (3):
Figure 310029DEST_PATH_IMAGE014
(3)
where V is the number of cameras, it can be seen that
Figure 176354DEST_PATH_IMAGE015
Actually encodes K key points
Figure 311800DEST_PATH_IMAGE008
The possibility of (a); then, a 3D bounding box is used for representing the positions of key points of the human body including detection, and the size and the direction of the bounding box are fixed in the experiment; this is a reasonable simplification because the human variation in 3D space is limited; sliding a small network on the characteristic volume F; each sliding window centered at the anchor maps to a low-dimensional feature that is fed back to the fully-connected layer with regression confidence as the output of the 3D CNN network, indicating the likelihood of the person appearing at that location; calculating the GT heatmap value of each anchor according to the distance from the anchor to the GT pose; for each pair of GT and anchor, calculating a Gaussian score according to the distance between GT and anchor, wherein the Gaussian score decreases exponentially when the distance increases; if there are N people in the scene, an anchor may have multiple scores, and the N largest, i.e., representative of the N positions of people, are retained through non-maximum suppression (NMS).
In this embodiment, by using the above scheme, camera calibration data of a video camera is acquired, and the camera calibration data is used to project each voxel center in the fusion information into a camera view, so as to obtain camera view projection data; the three-dimensional characteristic volume is constructed by utilizing the 3D CNN network according to the camera view projection data, the accuracy of three-dimensional human body posture estimation can be improved, reasoning search spaces of other human body key points are reduced, errors of the three-dimensional human body posture estimation are reduced, the posture reconstruction quality is improved, the calculation cost is reduced, the influence of quantization errors is avoided, the accuracy of the three-dimensional human body posture estimation is improved, and the scheme is simple and reliable to implement.
Further, fig. 10 is a schematic flowchart of a fifth embodiment of the three-dimensional human body posture estimation method according to the present invention, and as shown in fig. 10, the fifth embodiment of the three-dimensional human body posture estimation method according to the present invention is proposed based on the first embodiment, in this embodiment, the step S30 specifically includes the following steps:
and step S31, dividing the three-dimensional characteristic volume into a plurality of discrete grids.
It should be noted that, after the three-dimensional feature volume is divided, a plurality of discrete grids can be obtained.
In specific implementation, the first 3D CNN network cannot accurately estimate the 3D positions of all the key points, so a finer-grained feature volume is constructed in the second 3D CNN network, with the size set to 2000mm
Figure 921773DEST_PATH_IMAGE010
2000mm
Figure 324810DEST_PATH_IMAGE010
2000mm, ratio 8m
Figure 627616DEST_PATH_IMAGE010
8m
Figure 719200DEST_PATH_IMAGE010
2m is much smaller but sufficient to cover any pose of a person, the volume being divided into X 0 =Y 0 =Z 0 =64 discrete meshes, whose network body structure is the same as the first 3D CNN.
And S32, acquiring the 3D heat map space coordinates of each key point in each discrete grid, and performing regression on the 3D heat map space coordinates to obtain the three-dimensional human body posture.
It should be understood that, further, the 3D heat map space coordinates of each key point in each discrete grid are obtained, and then the 3D heat map space coordinates may be regressed, so that the three-dimensional human body posture may be obtained.
It will be appreciated that the 3D heatmap for each keypoint K is estimated based on the feature volumes of the construct
Figure 132864DEST_PATH_IMAGE016
Finally, returning to the accurate three-dimensional human body posture,
Figure 157451DEST_PATH_IMAGE017
calculating according to equation (4)
Figure 631158DEST_PATH_IMAGE018
The centroid of the key points can be obtained
Figure 475617DEST_PATH_IMAGE019
Figure 427393DEST_PATH_IMAGE020
(4)
Comparing the estimated joint position with the true position
Figure 805022DEST_PATH_IMAGE021
Making comparisons to train the network, loss function
Figure 652892DEST_PATH_IMAGE022
Is represented by formula (5):
Figure 109282DEST_PATH_IMAGE023
(5)
in a specific implementation, the accuracy of the 3D pose of the Campus and Shelf datasets is evaluated using the Percentage PCP3D (percent of Correct Part 3D) that correctly estimates the Joint position, and the detection is considered Correct if the distance between the predicted Joint position and the true Joint position is less than half the limb length. For each frame f and the human skeleton S, the calculation of MPJPE is given by equation (6):
Figure 740114DEST_PATH_IMAGE024
(6)
wherein
Figure 332770DEST_PATH_IMAGE025
Is the number of joints in the skeleton S, and for a set of frames, the error is the average of the MPJPE of all frames; while at the threshold of MPJPE (from 25mm to 1)50mm, step length is 25mm) and Average Precision (Average Precision) and Recall rate (Recall) are taken as performance indexes for comprehensively evaluating 3D human body center detection and human body posture estimation; the AP is an area under a PR curve surrounded by two dimensions of horizontal coordinate Recall and vertical coordinate Precision (Precision), and the larger the value of the AP is, the better the comprehensive performance of the detection model is.
According to the scheme, the three-dimensional characteristic volume is divided into a plurality of discrete grids; the method comprises the steps of obtaining the 3D heat map space coordinates of each key point in each discrete grid, and regressing the 3D heat map space coordinates to obtain the three-dimensional human body posture, so that the accuracy of three-dimensional human body posture estimation can be improved, reasoning search spaces of other human body key points are reduced, errors of the three-dimensional human body posture estimation are reduced, the reconstruction quality of the posture is improved, the calculation cost is reduced, the influence of quantization errors is avoided, the accuracy of the three-dimensional human body posture estimation is improved, and the scheme is simple and reliable to implement.
Correspondingly, the invention further provides a three-dimensional human body posture estimation device.
Referring to fig. 11, fig. 11 is a functional block diagram of a three-dimensional human body posture estimation device according to a first embodiment of the present invention.
In a first embodiment of the three-dimensional body posture estimation device of the present invention, the three-dimensional body posture estimation device includes:
the fusion module 10 is configured to generate a target heat map corresponding to the input image by using a multi-view fusion network, and perform matching fusion on the human body center point heat map information of the target heat map to obtain fusion information.
And a projection module 20, configured to project the fusion information to a 3D space to obtain a three-dimensional feature volume.
And the posture estimation module 30 is used for estimating the three-dimensional human body posture according to the three-dimensional characteristic volume.
The fusion module 10 is further configured to input the input image into a high-resolution network of the multiview fusion network, and acquire high-resolution feature information; constructing a residual error unit of the high-resolution network according to the high-resolution characteristic information, and performing convolution sampling on the residual error unit to obtain a multi-resolution module; fusing the feature maps with different resolutions at each stage in the multi-resolution module to obtain a target heat map; matching and fusing the human body central point heat map information of the target heat map to obtain fused information.
The fusion module 10 is further configured to fuse feature maps of different resolutions at different stages in the multi-resolution module to obtain a fusion feature map; inputting the fusion characteristic graph into a deconvolution module, obtaining an output result through convolution and channel conversion, and performing dimensionality splicing on the output result and the fusion characteristic graph to obtain a splicing characteristic; and improving the resolution ratio of the splicing features according to the deconvolution layer, extracting target feature information of the splicing features with improved resolution ratio through the residual error unit, and generating a target heat map according to the target feature information.
The fusion module 10 is further configured to use a preset key point between hip joints of the human body in the target heat map as a central point of the human body; and matching and fusing the human body central point heat map information of the target heat map according to the human body central point of the multiple views to obtain fused information.
The fusion module 10 is further configured to sample epipolar lines corresponding to the central points of the graphs in the target heat map according to the human central points of the multiple views, so as to obtain a corresponding point set; generating a probability region of the Gaussian distribution of the target heat map at the corresponding coordinates according to the corresponding point set; fusing values of all points on the epipolar line in the probability region through a full-connection layer to obtain a finally fused central point coordinate; and carrying out coordinate matching fusion on the human body central point heat map information of the target heat map according to the central point coordinate to obtain fusion information.
The projection module 20 is further configured to acquire camera calibration data of a video camera, and project each voxel center in the fusion information into a camera view by using the camera calibration data to obtain camera view projection data; and constructing a three-dimensional characteristic volume according to the camera view projection data by using a 3D CNN network.
The pose estimation module 30 is further configured to divide the three-dimensional feature volume into a plurality of discrete grids; and acquiring the 3D heat map space coordinate of each key point in each discrete grid, and regressing the 3D heat map space coordinate to obtain the three-dimensional human body posture.
The steps implemented by the functional modules of the three-dimensional human body posture estimation device can refer to the embodiments of the three-dimensional human body posture estimation method of the present invention, and are not described herein again.
In addition, an embodiment of the present invention further provides a storage medium, where a three-dimensional body posture estimation program is stored on the storage medium, and when executed by a processor, the three-dimensional body posture estimation program implements the following operations:
generating a target heat map corresponding to an input image by adopting a multi-view fusion network, and matching and fusing human body central point heat map information of the target heat map to obtain fusion information;
projecting the fusion information to a 3D space to obtain a three-dimensional characteristic volume;
and estimating the three-dimensional human body posture according to the three-dimensional characteristic volume.
Further, the three-dimensional human body posture estimation program when executed by the processor further realizes the following operations:
inputting an input image into a high-resolution network of a multi-view fusion network to acquire high-resolution characteristic information;
constructing a residual error unit of the high-resolution network according to the high-resolution characteristic information, and performing convolution sampling on the residual error unit to obtain a multi-resolution module;
fusing the feature maps with different resolutions at each stage in the multi-resolution module to obtain a target heat map;
matching and fusing the human body central point heat map information of the target heat map to obtain fused information.
Further, the three-dimensional human body posture estimation program when executed by the processor further realizes the following operations:
fusing the feature maps with different resolutions at each stage in the multi-resolution module to obtain a fused feature map;
inputting the fusion characteristic graph into a deconvolution module, obtaining an output result through convolution and channel conversion, and performing dimensionality splicing on the output result and the fusion characteristic graph to obtain a splicing characteristic;
and improving the resolution ratio of the splicing features according to the deconvolution layer, extracting target feature information of the splicing features with improved resolution ratio through the residual error unit, and generating a target heat map according to the target feature information.
Further, the three-dimensional human body posture estimation program further realizes the following operations when being executed by the processor:
taking preset key points between hip joints of the human body in the target heat map as central points of the human body;
and matching and fusing the human body central point heat map information of the target heat map according to the human body central point of the multiple views to obtain fused information.
Further, the three-dimensional human body posture estimation program when executed by the processor further realizes the following operations:
sampling polar lines corresponding to the central points of the graphs in the target heat map according to the central points of the human body of the multiple views to obtain a corresponding point set;
generating a probability region of the Gaussian distribution of the target heat map at the corresponding coordinates according to the corresponding point set;
fusing values of all points on the epipolar line in the probability region through a full-connection layer to obtain a finally fused central point coordinate;
and carrying out coordinate matching fusion on the human body central point heat map information of the target heat map according to the central point coordinate to obtain fusion information.
Further, the three-dimensional human body posture estimation program further realizes the following operations when being executed by the processor:
acquiring camera calibration data of a video camera, and projecting each voxel center in the fusion information into a camera view by using the camera calibration data to obtain camera view projection data;
and constructing a three-dimensional characteristic volume according to the camera view projection data by using a 3D CNN network.
Further, the three-dimensional human body posture estimation program further realizes the following operations when being executed by the processor:
dividing the three-dimensional feature volume into a plurality of discrete grids;
and acquiring the 3D heat map space coordinate of each key point in each discrete grid, and regressing the 3D heat map space coordinate to obtain the three-dimensional human body posture.
According to the scheme, a multi-view fusion network is adopted to generate a target heat map corresponding to an input image, and matching fusion is carried out on human body central point heat map information of the target heat map to obtain fusion information; projecting the fusion information to a 3D space to obtain a three-dimensional characteristic volume; the three-dimensional human body posture is estimated according to the three-dimensional characteristic volume, the accuracy of three-dimensional human body posture estimation can be improved, inference search spaces of other human body key points are reduced, errors of three-dimensional human body posture estimation are reduced, posture reconstruction quality is improved, calculation cost is reduced, quantization error influence is avoided, the accuracy of three-dimensional human body posture estimation is improved, the scheme is simple and reliable to implement, the method can be suitable for three-dimensional human body posture estimation of most scenes, and the speed and the efficiency of three-dimensional human body posture estimation are improved.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element identified by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A three-dimensional human body posture estimation method is characterized by comprising the following steps:
generating a target heat map corresponding to an input image by adopting a multi-view fusion network, and matching and fusing human body central point heat map information of the target heat map to obtain fusion information;
projecting the fusion information to a 3D space to obtain a three-dimensional characteristic volume;
and estimating the three-dimensional human body posture according to the three-dimensional characteristic volume.
2. The method for estimating the three-dimensional human body pose according to claim 1, wherein the generating a target heat map corresponding to the input image by using a multi-view fusion network, and performing matching fusion on the human body center point heat map information of the target heat map to obtain fusion information comprises:
inputting an input image into a high-resolution network of a multi-view fusion network to acquire high-resolution characteristic information;
constructing a residual error unit of the high-resolution network according to the high-resolution characteristic information, and performing convolution sampling on the residual error unit to obtain a multi-resolution module;
fusing the feature maps with different resolutions at each stage in the multi-resolution module to obtain a target heat map;
matching and fusing the human body central point heat map information of the target heat map to obtain fused information.
3. The method for estimating the three-dimensional human body pose according to claim 2, wherein the fusing the feature maps with different resolutions at each stage in the multi-resolution module to obtain the target heat map comprises:
fusing the feature maps with different resolutions at each stage in the multi-resolution module to obtain a fused feature map;
inputting the fusion characteristic graph into a deconvolution module, obtaining an output result through convolution and channel conversion, and performing dimensionality splicing on the output result and the fusion characteristic graph to obtain a splicing characteristic;
and improving the resolution ratio of the splicing features according to the deconvolution layer, extracting target feature information of the splicing features with improved resolution ratio through the residual error unit, and generating a target heat map according to the target feature information.
4. The method for estimating the three-dimensional human body pose according to claim 2, wherein the matching and fusing the human body center point heat map information of the target heat map to obtain fused information comprises:
taking preset key points between hip joints of the human body in the target heat map as central points of the human body;
and matching and fusing the human body central point heat map information of the target heat map according to the human body central point of the multiple views to obtain fused information.
5. The method according to claim 4, wherein said matching and fusing the human body center point heat map information of the target heat map according to the multi-view human body center point to obtain fused information comprises:
sampling polar lines corresponding to the central points of the graphs in the target heat map according to the central points of the human body of the multiple views to obtain a corresponding point set;
generating probability regions of Gaussian distribution of the target heat map at corresponding coordinates according to the corresponding point sets;
fusing values of all points on the epipolar line in the probability region through a full connection layer to obtain a final fused central point coordinate;
and carrying out coordinate matching fusion on the human body central point heat map information of the target heat map according to the central point coordinate to obtain fusion information.
6. The three-dimensional human pose estimation method of claim 1, wherein said projecting the fusion information into a 3D space to obtain a three-dimensional feature volume comprises:
acquiring camera calibration data of a video camera, and projecting each voxel center in the fusion information into a camera view by using the camera calibration data to obtain camera view projection data;
and constructing a three-dimensional characteristic volume according to the camera view projection data by using a 3D CNN network.
7. The method of estimating a three-dimensional body pose according to claim 1, wherein said estimating a three-dimensional body pose from said three-dimensional feature volume comprises:
dividing the three-dimensional feature volume into a plurality of discrete grids;
and acquiring the 3D heat map space coordinate of each key point in each discrete grid, and regressing the 3D heat map space coordinate to obtain the three-dimensional human body posture.
8. A three-dimensional body posture estimation device, characterized by comprising:
the fusion module is used for generating a target heat map corresponding to the input image by adopting a multi-view fusion network, and matching and fusing the human body center point heat map information of the target heat map to obtain fusion information;
the projection module is used for projecting the fusion information to a 3D space to obtain a three-dimensional characteristic volume;
and the posture estimation module is used for estimating the three-dimensional human body posture according to the three-dimensional characteristic volume.
9. A three-dimensional body posture estimation device characterized by comprising: a memory, a processor, and a three-dimensional body pose estimation program stored on the memory and executable on the processor, the three-dimensional body pose estimation program configured to implement the steps of the three-dimensional body pose estimation method of any of claims 1 to 7.
10. A storage medium having stored thereon a three-dimensional body pose estimation program, which when executed by a processor, implements the steps of the three-dimensional body pose estimation method of any of claims 1 to 7.
CN202210956640.4A 2022-08-10 2022-08-10 Three-dimensional human body posture estimation method, device, equipment and storage medium Active CN115035551B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210956640.4A CN115035551B (en) 2022-08-10 2022-08-10 Three-dimensional human body posture estimation method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210956640.4A CN115035551B (en) 2022-08-10 2022-08-10 Three-dimensional human body posture estimation method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN115035551A true CN115035551A (en) 2022-09-09
CN115035551B CN115035551B (en) 2022-12-02

Family

ID=83130421

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210956640.4A Active CN115035551B (en) 2022-08-10 2022-08-10 Three-dimensional human body posture estimation method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115035551B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116665309A (en) * 2023-07-26 2023-08-29 山东睿芯半导体科技有限公司 Method, device, chip and terminal for identifying walking gesture features

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113643366A (en) * 2021-07-12 2021-11-12 中国科学院自动化研究所 Multi-view three-dimensional object attitude estimation method and device
CN114548224A (en) * 2022-01-19 2022-05-27 南京邮电大学 2D human body pose generation method and device for strong interaction human body motion
CN114613001A (en) * 2022-01-28 2022-06-10 厦门理工学院 3D human body posture estimation method based on high-quality heat map in multiple views
CN114627491A (en) * 2021-12-28 2022-06-14 浙江工商大学 Single three-dimensional attitude estimation method based on polar line convergence
CN114758205A (en) * 2022-04-24 2022-07-15 湖南大学 Multi-view feature fusion method and system for 3D human body posture estimation

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113643366A (en) * 2021-07-12 2021-11-12 中国科学院自动化研究所 Multi-view three-dimensional object attitude estimation method and device
CN114627491A (en) * 2021-12-28 2022-06-14 浙江工商大学 Single three-dimensional attitude estimation method based on polar line convergence
CN114548224A (en) * 2022-01-19 2022-05-27 南京邮电大学 2D human body pose generation method and device for strong interaction human body motion
CN114613001A (en) * 2022-01-28 2022-06-10 厦门理工学院 3D human body posture estimation method based on high-quality heat map in multiple views
CN114758205A (en) * 2022-04-24 2022-07-15 湖南大学 Multi-view feature fusion method and system for 3D human body posture estimation

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
BOWEN CHENG 等: "HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation", 《CVPR 2020》 *
HAIBO QIU 等: "Cross View Fusion for 3D Human Pose Estimation", 《ARXIV:1909.01203V1 [CS.CV]》 *
HANYUE TU 等: "VoxelPose: Towards Multi-Camera 3D Human Pose Estimation in Wild Environment", 《ARXIV:2004.06239V4 [CS.CV]》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116665309A (en) * 2023-07-26 2023-08-29 山东睿芯半导体科技有限公司 Method, device, chip and terminal for identifying walking gesture features
CN116665309B (en) * 2023-07-26 2023-11-14 山东睿芯半导体科技有限公司 Method, device, chip and terminal for identifying walking gesture features

Also Published As

Publication number Publication date
CN115035551B (en) 2022-12-02

Similar Documents

Publication Publication Date Title
US11145078B2 (en) Depth information determining method and related apparatus
Walch et al. Image-based localization using lstms for structured feature correlation
JP7177062B2 (en) Depth Prediction from Image Data Using Statistical Model
US10334168B2 (en) Threshold determination in a RANSAC algorithm
CN110705574B (en) Positioning method and device, equipment and storage medium
JP2018520425A (en) 3D space modeling
WO2017110836A1 (en) Method and system for fusing sensed measurements
KR20210025942A (en) Method for stereo matching usiing end-to-end convolutional neural network
CN113762147B (en) Facial expression migration method and device, electronic equipment and storage medium
CN109063549B (en) High-resolution aerial video moving target detection method based on deep neural network
CN113643366B (en) Multi-view three-dimensional object attitude estimation method and device
CN115035235A (en) Three-dimensional reconstruction method and device
CN111709984B (en) Pose depth prediction method, visual odometer device, pose depth prediction equipment and visual odometer medium
CN115035551B (en) Three-dimensional human body posture estimation method, device, equipment and storage medium
Zhou et al. PADENet: An efficient and robust panoramic monocular depth estimation network for outdoor scenes
CN112288813B (en) Pose estimation method based on multi-view vision measurement and laser point cloud map matching
CN113902802A (en) Visual positioning method and related device, electronic equipment and storage medium
KR20220014678A (en) Method and apparatus for estimating depth of images
CN111696167A (en) Single image super-resolution reconstruction method guided by self-example learning
CN109741245B (en) Plane information insertion method and device
Harisankar et al. Unsupervised depth estimation from monocular images for autonomous vehicles
CN114494612A (en) Method, device and equipment for constructing point cloud map
JP4675368B2 (en) Object position estimation apparatus, object position estimation method, object position estimation program, and recording medium recording the program
Ikehata et al. Depth map inpainting and super-resolution based on internal statistics of geometry and appearance
Cho et al. Depth map up-sampling using cost-volume filtering

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CB03 Change of inventor or designer information

Inventor after: Hu Bo

Inventor after: Hu Shizhuo

Inventor after: Zhou Bin

Inventor after: Shen Zhengang

Inventor after: Li Yanhong

Inventor before: Hu Bo

Inventor before: Hu Shizhuo

Inventor before: Zhou Bin

Inventor before: Shen Zhengang

Inventor before: Li Yanhong

CB03 Change of inventor or designer information
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: Method, device, device, and storage medium for three-dimensional human pose estimation

Granted publication date: 20221202

Pledgee: Guanggu Branch of Wuhan Rural Commercial Bank Co.,Ltd.

Pledgor: WUHAN ETAH INFORMATION TECHNOLOGY Co.,Ltd.

Registration number: Y2024980009498

PE01 Entry into force of the registration of the contract for pledge of patent right