CN108108722A - A kind of accurate three-dimensional hand and estimation method of human posture based on single depth image - Google Patents
A kind of accurate three-dimensional hand and estimation method of human posture based on single depth image Download PDFInfo
- Publication number
- CN108108722A CN108108722A CN201810046261.5A CN201810046261A CN108108722A CN 108108722 A CN108108722 A CN 108108722A CN 201810046261 A CN201810046261 A CN 201810046261A CN 108108722 A CN108108722 A CN 108108722A
- Authority
- CN
- China
- Prior art keywords
- mrow
- voxel
- dimensional
- network
- volume
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/103—Static body considered as a whole, e.g. static pedestrian or occupant recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/64—Three-dimensional objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/107—Static hand or arm
Abstract
A kind of the accurate three-dimensional hand and estimation method of human posture based on single depth image proposed in the present invention, main contents include:Network model, improved target location, the input of system, voxel predict network to voxel, its process is, the overall architecture of network is provided first, then utilize and the position of target is improved based on the method for convolutional neural networks, then the input of system is constructed using Back-projection technique, finally four class building block of block is up-sampled with volume basic block, volume residual block, volume down-sampling block and volume and encoder and decoder composition voxel predicts network to voxel.The present invention solves the problems, such as perspective distortion and Nonlinear Mapping, can obtain the three-dimensional hand of pinpoint accuracy and human body attitude estimation, and takes less, can accomplish to carry out human body behavior prediction and estimation in real time.
Description
Technical field
The present invention relates to three-dimensional hand and human body attitude estimation field, more particularly, to a kind of based on single depth image
Accurate three-dimensional hand and estimation method of human posture.
Background technology
Human body behavior interaction is computer by positioning and identifying the mankind, tracking human limb's movement locus, tracking expression
Feature so as to understand the action of the mankind and behavior, and responds.Its application background is very extensive, is concentrated mainly on man-machine friendship
Mutually, virtual reality, smart home, intelligent security guard, intelligent video monitoring, patient monitoring system, sportsman's supplemental training, in addition base
In the method that many human body behaviors interactions have also been used in video frequency searching and intelligent image compression of content etc..Such as by train
It stands, the suspicious hand motion or posture of the detection of the public arenas such as airport and estimation personage, Security Personnel can be assisted to judge that it is
No is that will implement theft or the suspect of other hazardous acts, so as to effectively reduce the generation of thievery and hazard event.
For another example, by the camera supervised patient with major disease of dispensary's fitting depth, detection and the gesture for estimating patient
And human body attitude, medical staff can so be helped to judge whether patient wants help, and make corresponding processing in time.People
The main task of machine behavior interaction is three-dimensional hand and human body attitude estimation.With the appearance of cheap depth camera, based on single
Three-dimensional hand and the human body attitude estimation of depth image are increasingly subject to the concern of people.Recently, the method based on convolutional neural networks
It is used for the three-dimensional hand of single depth image and human body attitude estimation problem and achieves great accuracy.But this kind of side
Method still have limitation, particularly when there are it is serious self block, depth image is second-rate when.It is in addition, traditional
Three-dimensional hand and estimation method of human posture tool there are two deficiency:First be there are the perspective distortion of two-dimensional depth image, so as to
Cause to estimate distortion;Second is there are the Nonlinear Mapping relation of height between depth image and three-dimensional coordinate, this is non-linear
Mapping relations hinder the study course of system, and influence the three-dimensional coordinate that network accurately estimates target.
The present invention proposes a kind of accurate three-dimensional hand and estimation method of human posture based on single depth image, gives first
Go out the overall architecture of network, then utilize and the position of target is improved based on the method for convolutional neural networks, then use
Back projection's means construct the input of system, finally in volume basic block, volume residual block, volume down-sampling block and volume
Four class building block of sampling block and encoder and decoder composition voxel predict network to voxel.The present invention solves perspective distortion
And the problem of Nonlinear Mapping, the three-dimensional hand of pinpoint accuracy and human body attitude estimation can be obtained, and take it is less, can be with
Accomplish to carry out human body behavior prediction and estimation in real time.
The content of the invention
The problem of for perspective distortion and Nonlinear Mapping, it is an object of the invention to provide one kind to be based on single depth
The accurate three-dimensional hand and estimation method of human posture of image provide the overall architecture of network first, then using based on convolution god
Method through network is improved the position of target, and the input of system is then constructed using Back-projection technique, finally uses body
Product basic block, volume residual block, volume down-sampling block and volume up-sampling four class building block of block and encoder and decoder
It forms voxel and network is predicted to voxel.
For the certainly solution above problem, the present invention provides a kind of accurate three-dimensional hand and human body attitude based on single depth image and estimates
Meter method, main contents include:
(1) network model;
(2) improved target location;
(3) input of system;
(4) voxel predicts network to voxel.
Wherein, the network model, the task of model is the articulate three-dimensional coordinate of estimation institute, is broadly divided into following three
A step:First, by point back projection to three dimensions and the continuous space of discretization, turning so as to fulfill by two-dimensional depth figure
Turn to three-D volumes expression;Second, using the data of three-dimensional voxel as input of the voxel to voxel prediction network, for estimating
Count the likelihood value of each voxel in each joint;3rd, find out the position corresponding to the maximum likelihood value in each joint
And the true coordinate representated by it, and using this as the final result of model.
Wherein, the improved target location, precondition be need one comprising the hand in three dimensions or
The three-dimensional frame of human body.
Further, the three-dimensional frame, position is generally near reference point;And reference point can select to demarcate
Common point or can by the region of hand limit a simple depth threshold after choose its barycenter.
Further, the common point demarcated and barycenter, with following limitation:
First, for the common point demarcated, it is not easy to obtain in practical applications;
Second, for barycenter, in complex environment, since barycenter is there are error, so as to cause it cannot be guaranteed that target is accurate
Really inside obtained three-dimensional frame.
Further, the limitation, can be by one simple two-dimensional convolution neutral net of training, for estimating
One accurate reference point.
Further, the two-dimensional convolution neutral net, by limiting a simple depth threshold in the region of hand,
It is as a reference point to calculate its barycenter;A depth image is inputted, and exports the public position for calculating the reference point of gained and having demarcated
3-D migration amount between the central point put;Then in the reference point obtained by calculating before, in addition this offset, is improved
Reference point.
Wherein, the input of the system, first, each pixel back projection of two-dimensional depth figure to three dimensions;
Then, three dimensions is discretized into as pre-defined voxel size;Then, three-dimensional frame is drawn around reference point, extracts mesh
Mark;Finally, it is 1 to set the voxel value consistent with depth point position, and the voxel value of other positions is 0.
Wherein, the voxel predicts network to voxel, mainly including following three parts:
First, using four class building blocks, i.e., adopted in volume basic block, volume residual block, volume down-sampling block and volume
Sample block;
Second, network is built, then network passes through three continuous bodies by volume basic block and volume down-sampling BOB(beginning of block)
Product residual block extracts useful local feature, subsequently enters encoder and decoder;
3rd, three-dimensional hotspot graph is constructed to supervise the pre- voxel likelihood function in each joint, wherein, the average quilt of Gaussian peak
The common point demarcated is fixed on, i.e.,:
Meanwhile
Cost function is represented using the mean square error function shown in above formula.
Further, the encoder and decoder, for encoder, volume down-sampling block reduces the space of characteristic pattern
Size, volume residual block increase the quantity of channel;For decoder, volume up-sampling block increases the bulk of characteristic pattern, when
During up-sampling, network reduces the quantity of channel, so as to compress the feature of extraction.
Description of the drawings
Fig. 1 is that the present invention is a kind of based on the accurate three-dimensional hand of single depth image and the voxel pair of estimation method of human posture
Voxel predicts the integrated stand composition of network.
Fig. 2 is a kind of three-dimensional appearance of accurate three-dimensional hand and estimation method of human posture based on single depth image of the present invention
The constitutional diagram of the different input and output of state estimation network.
Fig. 3 is that the present invention is a kind of based on the accurate three-dimensional hand of single depth image and the reference point of estimation method of human posture
Improve network.
Fig. 4 is that the present invention is a kind of based on the accurate three-dimensional hand of single depth image and the voxel pair of estimation method of human posture
Voxel predicts the coder structure figure of network.
Fig. 5 is that the present invention is a kind of based on the accurate three-dimensional hand of single depth image and the voxel pair of estimation method of human posture
Voxel predicts the decoder architecture figure of network.
Specific embodiment
It should be noted that in the case where there is no conflict, the feature in embodiment and embodiment in the application can phase
It mutually combines, the present invention is described in further detail in the following with reference to the drawings and specific embodiments.
Fig. 1 is that the present invention is a kind of based on the accurate three-dimensional hand of single depth image and the voxel pair of estimation method of human posture
Voxel predicts the integrated stand composition of network.First, by point back projection to three dimensions and the continuous space of discretization, thus
It realizes and two-dimensional depth figure is converted into three-D volumes expression;Then, it is the data of three-dimensional voxel are pre- to voxel as voxel
The input of survey grid network, for estimating the likelihood value of each voxel in each joint;Finally, the maximum in each joint is found out
Position corresponding to likelihood value and the true coordinate representated by it, and using this as the final result of model.
Fig. 2 is a kind of three-dimensional appearance of accurate three-dimensional hand and estimation method of human posture based on single depth image of the present invention
The constitutional diagram of the different input and output of state estimation network.In order to solve the problems, such as perspective distortion and non-linear projection, the present invention
A kind of voxel is provided, Attitude estimation is used for voxel prediction network.Unlike pervious method, voxel is to the pre- survey grid of voxel
Network estimates the likelihood value of each voxel in each joint using voxelization grid as inputting.
By two-dimensional depth image being converted into the form of three-dimensional voxel, as the input of network, network can be without mistake
The actual look of true ground display target object.Meanwhile the likelihood value of each voxel by estimating each joint, it can allow network
The more easily task of Expectation of Learning.
Fig. 3 is that the present invention is a kind of based on the accurate three-dimensional hand of single depth image and the reference point of estimation method of human posture
Improve network.For positioning joint, precondition is to need to include the hand or the three-dimensional frame of human body in three dimensions.
The position of three-dimensional frame is generally near reference point;And reference point can select the common point demarcated or can pass through
Its barycenter is chosen after limiting a simple depth threshold in the region of hand.But the common point demarcated is with following
Limitation:
First, for the common point demarcated, it is not easy to obtain in practical applications;
Second, for barycenter, in complex environment, since barycenter is there are error, so as to cause it cannot be guaranteed that target is accurate
Really inside obtained three-dimensional frame.
It therefore, can be by one simple two-dimensional convolution neutral net of training, for estimating in order to overcome more than limitation
Count an accurate reference point.Specifically, by limiting a simple depth threshold in the region of hand, its barycenter work is calculated
For reference point;Input a depth image, and export the central point of common point that calculates the reference point of gained and demarcated it
Between 3-D migration amount;Then in the reference point obtained by calculating before, in addition this offset, obtains improved reference point.
Fig. 4 is that the present invention is a kind of based on the accurate three-dimensional hand of single depth image and the voxel pair of estimation method of human posture
Voxel predicts the coder structure figure of network.Voxel mainly includes following three parts to voxel prediction network:
First, using four class building blocks, i.e., adopted in volume basic block, volume residual block, volume down-sampling block and volume
Sample block;
Second, network is built, then network passes through three continuous bodies by volume basic block and volume down-sampling BOB(beginning of block)
Product residual block extracts useful local feature, subsequently enters encoder and decoder;
3rd, three-dimensional hotspot graph is constructed to supervise the pre- voxel likelihood function in each joint, wherein, the average quilt of Gaussian peak
The common point demarcated is fixed on, i.e.,:
Meanwhile
Cost function is represented using the mean square error function shown in above formula.
For encoder, volume down-sampling block reduces the bulk of characteristic pattern, and volume residual block increases the quantity of channel.
Fig. 5 is that the present invention is a kind of based on the accurate three-dimensional hand of single depth image and the voxel pair of estimation method of human posture
Voxel predicts the decoder architecture figure of network.For decoder, the bulk of volume up-sampling block increase characteristic pattern, when above adopting
During sample, network reduces the quantity of channel, so as to compress the feature of extraction.
For those skilled in the art, the present invention is not limited to the details of above-described embodiment, in the essence without departing substantially from the present invention
In the case of refreshing and scope, the present invention can be realized in other specific forms.In addition, those skilled in the art can be to this hair
Bright to carry out various modification and variations without departing from the spirit and scope of the present invention, these improvements and modifications also should be regarded as the present invention's
Protection domain.Therefore, appended claims are intended to be construed to include preferred embodiment and fall into all changes of the scope of the invention
More and change.
Claims (10)
1. a kind of accurate three-dimensional hand and estimation method of human posture based on single depth image, which is characterized in that mainly include
Network model (one);Improved target location (two);The input (three) of system;Voxel is to voxel prediction network (four).
2. based on the network model (one) described in claims 1, which is characterized in that the task of model is that estimation institute is articulate
Three-dimensional coordinate is broadly divided into following three steps:
First, by the way that point back projection to three dimensions and the continuous space of discretization, is converted so as to fulfill by two-dimensional depth figure
For three-D volumes expression;
Second, using the data of three-dimensional voxel as input of the voxel to voxel prediction network, for estimating each joint
The likelihood value of each voxel;
3rd, find out the position corresponding to the maximum likelihood value in each joint and the true coordinate representated by it, and by this
Final result as model.
3. the improved target location (two) described in based on claims 1, which is characterized in that its precondition is to need one
Three-dimensional frame comprising the hand in three dimensions or human body.
4. the three-dimensional frame described in based on claims 3, which is characterized in that its position is generally near reference point;And it refers to
Point can select the common point demarcated or can be by being selected after limiting a simple depth threshold in the region of hand
Take its barycenter.
5. based on the common point demarcated and barycenter described in claims 4, which is characterized in that it is with following limitation
Property:
First, for the common point demarcated, it is not easy to obtain in practical applications;
Second, for barycenter, in complex environment, since barycenter is there are error, so as to cause it cannot be guaranteed that target exactly
Inside obtained three-dimensional frame.
6. based on the limitation described in claims 5, which is characterized in that in order to overcome limitation, training one can be passed through
Simple two-dimensional convolution neutral net, for estimating an accurate reference point.
7. based on the two-dimensional convolution neutral net described in claims 6, which is characterized in that by limiting one in the region of hand
It is as a reference point to calculate its barycenter for simple depth threshold;Input a depth image, and export calculate gained reference point with
3-D migration amount between the central point for the common point demarcated;Then in the reference point obtained by calculating before, in addition this
Offset obtains improved reference point.
8. the input (three) based on the system described in claims 1, which is characterized in that first, each of two-dimensional depth figure
A pixel back projection is to three dimensions;Then, three dimensions is discretized into as pre-defined voxel size;Then, joining
Three-dimensional frame is drawn around examination point, extracts target;Finally, it is 1 to set the voxel value consistent with depth point position, the body of other positions
Element value is 0.
9. based on the voxel described in claims 1 to voxel prediction network (four), which is characterized in that mainly including following three
Part:
First, use four class building blocks, i.e. volume basic block, volume residual block, volume down-sampling block and volume up-sampling block;
Second, network is built, network is then residual by three continuous volumes by volume basic block and volume down-sampling BOB(beginning of block)
Remaining block extracts useful local feature, subsequently enters encoder and decoder;
3rd, three-dimensional hotspot graph is constructed to supervise the pre- voxel likelihood function in each joint, wherein, the average of Gaussian peak is fixed
In the common point demarcated, i.e.,:
<mrow>
<msubsup>
<mi>H</mi>
<mi>n</mi>
<mo>*</mo>
</msubsup>
<mrow>
<mo>(</mo>
<mi>i</mi>
<mo>,</mo>
<mi>j</mi>
<mo>,</mo>
<mi>k</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mi>exp</mi>
<mrow>
<mo>(</mo>
<mo>-</mo>
<mfrac>
<mrow>
<msup>
<mrow>
<mo>(</mo>
<mi>i</mi>
<mo>-</mo>
<msub>
<mi>i</mi>
<mi>n</mi>
</msub>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
<mo>+</mo>
<msup>
<mrow>
<mo>(</mo>
<mi>j</mi>
<mo>-</mo>
<msub>
<mi>j</mi>
<mi>n</mi>
</msub>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
<mo>+</mo>
<msup>
<mrow>
<mo>(</mo>
<mi>k</mi>
<mo>-</mo>
<msub>
<mi>k</mi>
<mi>n</mi>
</msub>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
</mrow>
<mrow>
<mn>2</mn>
<msup>
<mi>&sigma;</mi>
<mn>2</mn>
</msup>
</mrow>
</mfrac>
<mo>)</mo>
</mrow>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>1</mn>
<mo>)</mo>
</mrow>
</mrow>
Meanwhile
<mrow>
<mi>L</mi>
<mo>=</mo>
<msubsup>
<mi>&Sigma;</mi>
<mrow>
<mi>n</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>N</mi>
</msubsup>
<msub>
<mi>&Sigma;</mi>
<mrow>
<mi>i</mi>
<mo>,</mo>
<mi>j</mi>
<mo>,</mo>
<mi>k</mi>
</mrow>
</msub>
<mo>|</mo>
<mo>|</mo>
<msubsup>
<mi>H</mi>
<mi>n</mi>
<mo>*</mo>
</msubsup>
<mrow>
<mo>(</mo>
<mi>i</mi>
<mo>,</mo>
<mi>j</mi>
<mo>,</mo>
<mi>k</mi>
<mo>)</mo>
</mrow>
<mo>-</mo>
<msub>
<mi>H</mi>
<mi>n</mi>
</msub>
<mrow>
<mo>(</mo>
<mi>i</mi>
<mo>,</mo>
<mi>j</mi>
<mo>,</mo>
<mi>k</mi>
<mo>)</mo>
</mrow>
<mo>|</mo>
<msup>
<mo>|</mo>
<mn>2</mn>
</msup>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>2</mn>
<mo>)</mo>
</mrow>
</mrow>
Cost function is represented using the mean square error function shown in above formula.
10. based on the encoder and decoder described in claims 9, which is characterized in that for encoder, volume down-sampling block
The bulk of characteristic pattern is reduced, volume residual block increases the quantity of channel;For decoder, volume up-sampling block increase feature
The bulk of figure, when up-sampling, network reduces the quantity of channel, so as to compress the feature of extraction.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810046261.5A CN108108722A (en) | 2018-01-17 | 2018-01-17 | A kind of accurate three-dimensional hand and estimation method of human posture based on single depth image |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810046261.5A CN108108722A (en) | 2018-01-17 | 2018-01-17 | A kind of accurate three-dimensional hand and estimation method of human posture based on single depth image |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108108722A true CN108108722A (en) | 2018-06-01 |
Family
ID=62220174
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810046261.5A Withdrawn CN108108722A (en) | 2018-01-17 | 2018-01-17 | A kind of accurate three-dimensional hand and estimation method of human posture based on single depth image |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108108722A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111047548A (en) * | 2020-03-12 | 2020-04-21 | 腾讯科技(深圳)有限公司 | Attitude transformation data processing method and device, computer equipment and storage medium |
CN111724414A (en) * | 2020-06-23 | 2020-09-29 | 宁夏大学 | Basketball movement analysis method based on 3D attitude estimation |
CN111932678A (en) * | 2020-08-13 | 2020-11-13 | 北京未澜科技有限公司 | Multi-view real-time human motion, gesture, expression and texture reconstruction system |
CN112446923A (en) * | 2020-11-23 | 2021-03-05 | 中国科学技术大学 | Human body three-dimensional posture estimation method and device, electronic equipment and storage medium |
WO2021129569A1 (en) * | 2019-12-25 | 2021-07-01 | 神思电子技术股份有限公司 | Human action recognition method |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8787663B2 (en) * | 2010-03-01 | 2014-07-22 | Primesense Ltd. | Tracking body parts by combined color image and depth processing |
CN105069423A (en) * | 2015-07-29 | 2015-11-18 | 北京格灵深瞳信息技术有限公司 | Human body posture detection method and device |
-
2018
- 2018-01-17 CN CN201810046261.5A patent/CN108108722A/en not_active Withdrawn
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8787663B2 (en) * | 2010-03-01 | 2014-07-22 | Primesense Ltd. | Tracking body parts by combined color image and depth processing |
CN105069423A (en) * | 2015-07-29 | 2015-11-18 | 北京格灵深瞳信息技术有限公司 | Human body posture detection method and device |
Non-Patent Citations (1)
Title |
---|
GYEONGSIK MOON ET AL: ""V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map"", 《HTTPS://ARXIV.ORG/PDF/1711.07399V1》 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021129569A1 (en) * | 2019-12-25 | 2021-07-01 | 神思电子技术股份有限公司 | Human action recognition method |
CN111047548A (en) * | 2020-03-12 | 2020-04-21 | 腾讯科技(深圳)有限公司 | Attitude transformation data processing method and device, computer equipment and storage medium |
CN111047548B (en) * | 2020-03-12 | 2020-07-03 | 腾讯科技(深圳)有限公司 | Attitude transformation data processing method and device, computer equipment and storage medium |
CN111724414A (en) * | 2020-06-23 | 2020-09-29 | 宁夏大学 | Basketball movement analysis method based on 3D attitude estimation |
CN111724414B (en) * | 2020-06-23 | 2024-01-26 | 宁夏大学 | Basketball motion analysis method based on 3D gesture estimation |
CN111932678A (en) * | 2020-08-13 | 2020-11-13 | 北京未澜科技有限公司 | Multi-view real-time human motion, gesture, expression and texture reconstruction system |
CN111932678B (en) * | 2020-08-13 | 2021-05-14 | 北京未澜科技有限公司 | Multi-view real-time human motion, gesture, expression and texture reconstruction system |
CN112446923A (en) * | 2020-11-23 | 2021-03-05 | 中国科学技术大学 | Human body three-dimensional posture estimation method and device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108108722A (en) | A kind of accurate three-dimensional hand and estimation method of human posture based on single depth image | |
Liu et al. | Tracking-based 3D human skeleton extraction from stereo video camera toward an on-site safety and ergonomic analysis | |
Von Marcard et al. | Sparse inertial poser: Automatic 3d human pose estimation from sparse imus | |
US11436745B1 (en) | Reconstruction method of three-dimensional (3D) human body model, storage device and control device | |
Achilles et al. | Patient MoCap: Human pose estimation under blanket occlusion for hospital monitoring applications | |
CN105787439A (en) | Depth image human body joint positioning method based on convolution nerve network | |
Min et al. | Support vector machine approach to fall recognition based on simplified expression of human skeleton action and fast detection of start key frame using torso angle | |
CN105912985A (en) | Human skeleton joint point behavior motion expression method based on energy function | |
CN103237155B (en) | The tracking of the target that a kind of single-view is blocked and localization method | |
CN105760809A (en) | Method and apparatus for head pose estimation | |
Araújo et al. | Circle: Capture in rich contextual environments | |
Yang et al. | Depth map super-resolution using stereo-vision-assisted model | |
Guðmundsson et al. | Improved 3D reconstruction in smart-room environments using ToF imaging | |
Hou et al. | Handheld 3D reconstruction based on closed-loop detection and nonlinear optimization | |
Luo et al. | Scene semantic reconstruction from egocentric rgb-d-thermal videos | |
Huan et al. | GeoRec: Geometry-enhanced semantic 3D reconstruction of RGB-D indoor scenes | |
He et al. | Volumeter: 3D human body parameters measurement with a single Kinect | |
CN106203350A (en) | A kind of moving target is across yardstick tracking and device | |
Ruget et al. | Pixels2pose: Super-resolution time-of-flight imaging for 3d pose estimation | |
Dai et al. | A novel STSOSLAM algorithm based on strong tracking second order central difference Kalman filter | |
Pintore et al. | Mobile mapping and visualization of indoor structures to simplify scene understanding and location awareness | |
Folgado et al. | A block-based model for monitoring of human activity | |
Raunhardt et al. | Immersive singularity‐free full‐body interactions with reduced marker set | |
Kim et al. | Absolute motion and structure from stereo image sequences without stereo correspondence and analysis of degenerate cases | |
Ruget et al. | Real-time, low-cost multi-person 3D pose estimation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20180601 |