CN109271933A - The method for carrying out 3 D human body Attitude estimation based on video flowing - Google Patents
The method for carrying out 3 D human body Attitude estimation based on video flowing Download PDFInfo
- Publication number
- CN109271933A CN109271933A CN201811080931.1A CN201811080931A CN109271933A CN 109271933 A CN109271933 A CN 109271933A CN 201811080931 A CN201811080931 A CN 201811080931A CN 109271933 A CN109271933 A CN 109271933A
- Authority
- CN
- China
- Prior art keywords
- human body
- artis
- module
- dimensional
- dimension
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/64—Three-dimensional objects
- G06V20/647—Three-dimensional objects by matching two-dimensional images to three-dimensional objects
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Human Computer Interaction (AREA)
- Computer Graphics (AREA)
- Geometry (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
The method of the present invention that 3 D human body Attitude estimation is carried out based on video flowing, 3 D human body 3D Attitude estimation is carried out to video flowing based on the method for deep learning, avoid many defects caused by analyzing mistake because of two-dimensional visual, the time relationship for fully utilizing video interframe, improves the accuracy and real-time of video flowing 3D posture inferred results.Include video n-th (n >=2) frame, 1) input present frame two dimensional image, image shallow-layer figure is generated using shallow-layer neural network module;2) the image shallow-layer figure of (n-1) frame generates human body two dimension artis thermodynamic chart, present frame generation, is input to LSTM module together to generate profound characteristic pattern;3) the deep layer characteristics of image figure that present frame generates exports the human body two dimension artis thermodynamic chart that present frame is generated to residual error module;4) the human body two dimension artis thermodynamic chart of present frame is exported to three-dimensional artis inference module, carries out two dimension to three-dimensional space reflection;The human body three-dimensional artis thermodynamic chart superposition that every frame generates above, generates the video flowing of 3 D human body Attitude estimation.
Description
Technical field
The present invention relates to the methods for carrying out 3 D human body Attitude estimation for two dimensional image video flowing, belong to virtual reality skill
Art field.
Background technique
The 3D Attitude estimation of human body is by the position the 3D essence in several joints (such as head, shoulder, ancon etc.) of human body
Really estimate.Due to losing depth information, estimate that the position of the 3D artis of human body is meter from two-dimentional rgb video stream
The very big challenge of one of calculation machine visual field.
With the development of deep neural network (Deep Convolutional Networks), more and more technology wounds
It newly focuses on and three-dimensional human skeleton detection is carried out based on deep neural network end to end.Existing relatively conventional 3 D human body appearance
State estimation method, there are mainly two types of technology path:
Two-part 3D artis infers that as shown in rear attached drawing 1, this method is divided into two stages, and the first stage, utilization is existing
Two-dimentional artis infer model, accurately estimate the position of human body 2D artis, it is general to be indicated with two dimension artis thermodynamic chart;
Second stage generates the number of human body three-dimensional artis using the 2D artis thermodynamic chart and middle layer characteristic pattern generated on last stage
Learn expression formula.
End-to-end 3D artis is inferred, as shown in rear attached drawing 2, the input of the deduction model is RGB image, exports as people
Body 3D mathematic(al) representation.
As described above, existing 3 D human body Attitude estimation has following technological deficiency: A, general directly to export human synovial
Point 3D coordinate, this is very difficult to study for network, because the learning tasks of feature space to 3D configuration space are one
The learning tasks of a nonlinearity, non-linear disadvantage with higher;B, when carrying out artis 3D deduction, in neural network
Between feature and be underutilized, it is difficult to the characteristic information of different scale, dimension is combined, generate infer effect compared with
Difference;C, during the 3D posture based on video flowing is inferred, calculation amount amplification is larger, so that final deduction effect is not achieved
Requirement of real-time, practical application effect are poor;D, during 3D posture based on video flowing is inferred, using every interframe when
Void relation, to can not solve the problems, such as that artis is blocked and disappears.
In view of this, special propose present patent application.
Summary of the invention
The method of the present invention that 3 D human body Attitude estimation is carried out based on video flowing, it is above-mentioned existing its object is to solve
Technology there are the problem of and 3 D human body 3D Attitude estimation is carried out to video flowing based on the method for deep learning, main includes three-dimensional
Human body attitude model generates, the spatial relationship of artis is established and video interframe temporal correlation captures, to avoid because of two dimension
Many defects caused by visual analysis mistake fully utilize the time relationship of video interframe, improve video flowing 3D posture and push away
The accuracy and real-time of disconnected result.
For achieving the above object, it is described based on video flowing carry out 3 D human body Attitude estimation method, include with
Lower implementation steps:
Video first frame, 1) input present frame two dimensional image, mentioning for human body two-dimensional attitude is carried out using hourglass network module
It takes, generates the human body two dimension artis thermodynamic chart of first frame;2) the human body two dimension artis thermodynamic chart of present frame is exported to three
Artis inference module is tieed up, carries out two dimension to three-dimensional space reflection to generate human body three-dimensional artis thermodynamic chart;
The second frame of video, 1) input present frame two dimensional image, image shallow-layer figure is generated using shallow-layer neural network module;2)
First frame generate human body two dimension artis thermodynamic chart, present frame generate image shallow-layer figure, be input to together LSTM module with
Generate profound characteristic pattern;3) the deep layer characteristics of image figure that present frame generates exports the human body that present frame is generated to residual error module
Two-dimentional artis thermodynamic chart;4) the human body two dimension artis thermodynamic chart of present frame is exported to three-dimensional artis inference module, is carried out
Two dimension is to three-dimensional space reflection to generate human body three-dimensional artis thermodynamic chart;
Video n-th (n >=2) frame, 1) input present frame two dimensional image, image shallow-layer is generated using shallow-layer neural network module
Figure;2) the image shallow-layer figure of (n-1) frame generates human body two dimension artis thermodynamic chart, present frame generation, is input to together
LSTM module is to generate profound characteristic pattern;3) the deep layer characteristics of image figure that present frame generates is exported to residual error module, and generation is worked as
The human body two dimension artis thermodynamic chart of previous frame;4) the human body two dimension artis thermodynamic chart of present frame exports to three-dimensional artis and infers
Module carries out two dimension to three-dimensional space reflection to generate human body three-dimensional artis thermodynamic chart;
The human body three-dimensional artis thermodynamic chart superposition that every frame generates above, generates the video flowing of 3 D human body Attitude estimation.
As described above, for the time-space relationship for fully utilizing every interframe, main integrated use hourglass network (Hourglass
Network), shallow-layer neural network, LSTM (Long Short-Term Memory, shot and long term memory) module, residual error module and
Three-dimensional artis inference module carries out 3 D human body Attitude estimation.Wherein,
Hourglass module is extracted to carry out human body 2D posture to calculate to a nicety, generate the heating power of human body two dimension artis
Figure;
Shallow-layer neural network, to export the characteristic pattern of single-frame images;
The image that LSTM module, the human body 2D artis thermodynamic chart generated with hourglass module and shallow-layer neural network generate is special
Sign figure is input, generates the profound characteristic pattern of present frame;
Residual error module is input with the present frame deep layer characteristics of image figure that LSTM module generates, generates human body two dimension joint
Point;
Three-dimensional artis inference module, it is empty that the depth of the 2D artis and estimation extracted using hourglass module carries out 2D to 3D
Between mapping, ultimately generate human body three-dimensional body joint point coordinate.
It is that single order hourglass network (Hourglass) includes following with additional project for advanced optimizing for hourglass network
Structure in parallel:
Upper midway has several primary modules of M input channel and N output channel;
Lower midway has concatenated down-sampled 1/2 pond layer, several primary modules, rises sampling arest neighbors interpolating module;
N (n >=2) rank hourglass network has a structure that
Any primary module of midway under (n-1) rank hourglass network is replaced with into (n-1) rank hourglass network, in others,
Lower half line structure is identical as (n-1) rank hourglass network.
Specifically, upper midway extracts the data in M channel to obtain the data of N channel.In several concatenated primary moulds
In block, the input channel number of two adjacent primary modules, the latter primary module is always equal to the defeated of previous primary module
Port number out.
Lower midway equally extracts the data in M channel to obtain the data of N channel, the difference is that in script input half
It is carried out in size, that is, be in series with down-sampled 1/2 pond layer, primary module and rise sampling arest neighbors interpolating module.
It is by the primary module replacement under (n-1) rank hourglass network (Hourglass) in midway in n rank hourglass network
N-1 rank hourglass network is expanded by the way that the primary module is replaced with a new hourglass network for (n-1) rank hourglass network
For n rank hourglass network.
To sum up content, the method for carrying out 3 D human body Attitude estimation based on video flowing have the advantage that
1, make full use of video interframe time relationship, improve video flowing 3D posture inferred results accuracy and in real time
Property.
2, the nonlinear degree from " feature space " to " 3D configuration space " learning tasks is significantly reduced, realizes one
The representation method and learning method of kind science.
3, the deep learning network for realizing a kind of " end-to-end " for carrying out human body 3D Attitude estimation, carries out human joint points
3D avoids the generation of accumulated error during inferring.
4, the intermediate features for maximumlly utilizing neural network are realized, the feature of different scale, dimension is combined, are produced
Bear optimal deduction effect.
5, calculation amount is directly reduced, so that final deduction effect reaches the requirement of real-time, practicability is stronger.
Detailed description of the invention
Fig. 1 is two-part estimation method schematic diagram in the prior art;
Fig. 2 is end-to-end estimation method schematic diagram in the prior art;
Fig. 3 is herein described based on video flowing progress 3 D human body Attitude estimation method flow diagram;
Fig. 4 is the structural schematic diagram of the primary module (Residual);
Fig. 5 is the structural schematic diagram of single order hourglass module;
Fig. 6 is the structural schematic diagram of second order hourglass module;
Fig. 7 is the structural schematic diagram of the shallow-layer neural network;
Fig. 8 is three-dimensional artis inference module flow chart;
Specific embodiment
The present invention is described in further detail with implementation example with reference to the accompanying drawing.
Embodiment 1, as shown in figure 3, as follows based on the method that video flowing carries out 3 D human body Attitude estimation:
Video first frame, 1) input present frame two dimensional image, mentioning for human body two-dimensional attitude is carried out using hourglass network module
It takes, generates the human body two dimension artis thermodynamic chart of first frame;2) the human body two dimension artis thermodynamic chart of present frame is exported to three
Artis inference module is tieed up, carries out two dimension to three-dimensional space reflection to generate human body three-dimensional artis thermodynamic chart;
The second frame of video, 1) input present frame two dimensional image, image shallow-layer figure is generated using shallow-layer neural network module;2)
First frame generate human body two dimension artis thermodynamic chart, present frame generate image shallow-layer figure, be input to together LSTM module with
Generate profound characteristic pattern;3) the deep layer characteristics of image figure that present frame generates exports the human body that present frame is generated to residual error module
Two-dimentional artis thermodynamic chart;4) the human body two dimension artis thermodynamic chart of present frame is exported to three-dimensional artis inference module, is carried out
Two dimension is to three-dimensional space reflection to generate human body three-dimensional artis thermodynamic chart;
Video third frame, 1) input present frame two dimensional image, image shallow-layer figure is generated using shallow-layer neural network module;2)
The image shallow-layer figure of human body two dimension artis thermodynamic chart, present frame generation that 2nd frame generates, is input to LSTM module together with life
At profound characteristic pattern;3) the deep layer characteristics of image figure that present frame generates exports the human body two that present frame is generated to residual error module
Tie up artis thermodynamic chart;4) the human body two dimension artis thermodynamic chart of present frame is exported to three-dimensional artis inference module, carries out two
It ties up to three-dimensional space reflection to generate human body three-dimensional artis thermodynamic chart;
The human body three-dimensional artis thermodynamic chart superposition that every frame generates above, generates the video flowing of 3 D human body Attitude estimation.
In video first frame, hourglass module carries out human body 2D posture and extracts, and generates accurate prediction human body two dimension artis
Thermodynamic chart time-consuming 100ms;
In the second frame of video, third frame, shallow-layer neural network exports the characteristic pattern of single-frame images, and time-consuming is 20ms/ frame;
The characteristics of image figure that LSTM module, the human body 2D artis thermodynamic chart generated according to hourglass network and shallow-layer neural network generate,
Generate the profound characteristic pattern of present frame, time-consuming 10ms/ frame;Residual error module, input are that the present frame that LSTM module generates is deep
Tomographic image characteristic pattern generates human body two dimension artis, time-consuming 10ms/ frame;Three-dimensional artis inference module, is mentioned using hourglass module
The depth of the 2D artis and estimation that take carries out mapping of the 2D to 3d space, time-consuming 10ms/ frame;
That is, the three-dimensional artis deduction of video first frame needs 120ms, 60ms is only needed for every frame thereafter, to make
It obtains while guaranteeing 3 D human body Attitude estimation precision, ensures that the Real time Efficiency of estimation method.
In human body 2D Attitude estimation, processing is iterated for the export structure of neural network, in multiple processing ranks
Section generates prediction.These intermediate prediction results can be improved gradually to generate more accurate estimated result." hourglass module " just
It is this design structure, uses the multiple prediction result of cascade scheme, gradually correction result.
" hourglass module " described herein is made of primary module (Residual Module).
As shown in figure 4, the primary module (Residual Module), is the characteristic pattern with the channel M, it is defeated
Out be the characteristic pattern with N channel.
First behavior convolution road, by the different convolutional layer of three core scales, round rectangle is expressed as a convolution operation,
In text write the parameter of the convolution operation exactly, be divided into 3 rows, be the port number of input feature vector, the size of convolution kernel respectively
And the port number of output feature;
Second behavior is skipped a grade road, the convolutional layer for being only 1 comprising a core scale;Skip a grade the I/O channel number phase on road
Together, this is unit mapping all the way.
The step-length of all convolutional layers is 1, pading 0, does not change the long and wide size of data, only to data depth
(channel) it changes.
Above-mentioned primary module (Residual Module), can be by two state modulators: input depth M and output depth N,
Realize the operation to arbitrary dimension image.
Primary module (Residual Module) is extracted the feature (convolution road) of higher level, while remaining original
The information (road of skipping a grade) of level can regard advanced " convolution " layer of guarantor's size as.
As shown in figure 5, the input of single order hourglass module is the characteristic pattern in the channel M, output is the characteristic pattern of N channel.Thereon
It include on the way 3 concatenated primary modules (Residual), in two adjacent primary modules, the input of the latter primary module
Port number is always equal to the output channel number of previous primary module, gradually to extract deeper time feature.
Lower midway equally extracts the data in M channel to obtain the data of N channel, the difference is that in script input half
It is carried out in size.With concatenated down-sampled 1/2 pond layer, 5 primary modules, rise sampling arest neighbors interpolating module.
Specifically, upper midway is carried out in archeus, and lower midway experienced first down-sampled (rectangle with/2 printed words) to be risen again
Sample the process of (rectangle with * 2).
Wherein, down-sampled module is risen sampling module and is used arest neighbors interpolation using maximum pond.
Single order hourglass network (Hourglass), by the way that the characteristic pattern in the channel M of input is divided into two-way processing.Wherein one
A branch is carried out on original scale;It in addition all the way, is to be carried out on a lower scale, finally in respective branch
On be disposed after merged.So that neural network identification with higher and ability to express, it can be to different scale
Characteristic information is preferably selected, to extract the substantive characteristics for influencing final result.
As shown in fig. 6, second order hourglass network (Hourglass), is the dotted line frame portion of single order hourglass network (Hourglass)
Divide and is substituted for a single order hourglass network (input channel 256, output channel N).
That is second order hourglass network (Hourglass) is by the 4th in the lower midway of single order hourglass network (Hourglass)
A primary module replaces with single order hourglass network (Hourglass).
In second order hourglass network (Hourglass), lower midway constitutes mistake that is down-sampled twice, then rising sampling twice
Journey.
Second order hourglass network (Hourglass) has carried out maximum relative to initial data size on down-sampled branch
For 1/4 it is down-sampled, the otherness of dimensional information has more been highlighted relative to single order hourglass network (Hourglass).
The information of different scale is integrated in order to further increase, the application can take n rank hourglass network (Hourglass),
Undergo the down-sampled of most n times, and it is down-sampled every time before, separate midway and retain archeus information;Sampling is risen every time
It is added afterwards with the data of a upper scale;Between down-sampled twice, feature is extracted using three primary modules;It is added twice
Between, feature is extracted using a primary module (Residual).That is n rank hourglass network (Hourglass) can extract from original
Scale is to 1/2nThe intermediate features of scale.
N (n >=2) rank hourglass network is that a primary module of midway under (n-1) rank hourglass network is replaced with (n-1)
Rank hourglass network, other upper and lower half line structures are identical as (n-1) rank hourglass network.
For n rank and (n-1) rank hourglass network, the primary module position that lower midway is replaced can be identical, can also
With not identical.In the present embodiment, the primary module that the lower midway of n rank and (n-1) rank hourglass network is replaced is the 4th.
As shown in fig. 7, the shallow-layer neural network, is handled single-frame images to extract characteristics of image.In this Shen
Please in, shallow-layer neural network removes last full articulamentum and Soft-max layers using VGG16.
The LSTM module is RNN (Recurrent neural network, the circulation nerve of a kind of particular form
Network), and RNN is a series of general name of neural networks for being capable of handling sequence data.
In this application, being connected between frame and frame is done using LSMT module, inputs the thermodynamic chart for previous frame and is worked as
The shallow-layer neural network of previous frame exports feature, and output is present frame profound level feature.
As shown in following formula,
ft=σ (Wf·[ht-1, xt]+bf)
it=σ (Wi·[ht-1, xt]+bi)
ot=σ (Wo[ht-1, xt]+bo)
ht=ot*tanh(Ct)
ftIt indicates to forget door, first determine what information can be abandoned from cell state in LSTM module, this determines logical
It crosses this and forgets door to complete.I.e. the forgetting door can read h_ { t-1 } and x_t, export the numerical value between 0 to 1 to each
Number in cell state C_ { t-1 };1 indicates " being fully retained ", and 0 indicates " giving up completely ".
itIt indicates input gate, determines which type of new information is stored in cell state.It include following two parts,
First part, sigmoid layers claim " input gate layer " to determine that value will be to be updated;Second part, a tanh layers of creation one
A new candidate value vector, C_t can be added into state.
OtIndicate out gate, Ct-1It is updated to Ct.By oldState and ftIt is multiplied, discards the information for determining and needing to abandon.It connects
Add it *Ct.New candidate value is generated, is changed according to the degree for determining each state of update.
The residual error module is a kind of depth convolutional network, have be easier to optimization, can be by increasing comparable depth
Come the characteristics of improving accuracy rate.
Residual error module described herein, i.e., to residual error module usually used in the prior art, removal is therein to be connected entirely
Layer and Soft-max layers are connect, the study of feature combination is made of its remaining module.
Residual error module described herein, input are the present frame further feature that LSTM module is supplemented according to former frames
Figure exports as human body two dimension artis mathematic(al) representation, therefore can be promoted on the basis of keeping hourglass module precision integrally
The operational efficiency of estimation method.
As shown in figure 8, human body three-dimensional artis inference module described herein, utilizes the 2D heat for generating hourglass module
Try hard to and shallow-layer neural network extracts middle layer characteristics of image as input, artis depth is predicted, output is
The vector of one P*1, for indicating each artis depth information predicted, then again by the artis thermodynamic chart of P*P and
The artis depth map of P*1 combines the mathematic(al) representation to form 3 D human body posture.
Three-dimensional artis is inferred, can be based on individual RGB picture by the method for deep learning and obtain depth information.This
Kind method is established on the basis of large-scale target database, such as face database, scene database.Firstly, passing through the side of study
Method carries out feature extraction (including brightness, depth, texture, geometry, mutual alignment) to each target in database;So
Afterwards, probability function is established to feature;Finally, the similarity degree for rebuilding similar purpose in target and database is expressed as probability
Size takes the target depth of maximum probability to rebuild target depth, carries out three-dimensional reconstruction in conjunction with texture mapping or interpolation method.
The three-dimensional artis that the application uses is inferred, i.e., by the feature of preceding several modules extractions, by deep learning mould
Type predicts the human joint points depth information of two-dimension picture, in conjunction with the human body two dimension artis that previous stage generates, generates people
Body three-dimensional artis.
Unlike the prior art, the herein described method for carrying out 3 D human body Attitude estimation based on video flowing, makes
Human body 3D Attitude estimation is carried out to video flowing with deep learning method, this method mainly includes following sections:
1,3 D human body attitude mode generates
Using hourglass module, human body three-dimensional artis inference module, 3 D human body Attitude estimation model is established.The model point
At two parts, first part is a Generator network, generates the 3 d pose of human body, second part is one
Discriminator network, for judging that the posture superiority and inferiority that Generator is generated can make by two networks interactions
The performance for obtaining two networks is mutually promoted, and finally obtains a high accuracy three-dimensional human body attitude.
2, the spatial relationship of artis is established
Using shallow-layer neural network, residual error module, passes through the foundation of spatial relationship and optimize above-mentioned 3 D human body posture mould
Type, to learn the space configuration information of artis.
Denoising autocoder can be based on by using Dropout Autoencoder (DAE) component, for learning to making an uproar
Sound data have the expression of robustness, extend framework more clearly to infer the space configuration of human skeletal.After input layer
It is introduced directly into stratum disjunction, effect is the position for removing to completely random joint from skeleton, rather than simply interfering them
And angle.Then, the unique method for restoring complete posture is the joint angle that missing is rebuild by the deduction from adjacent segment
Spend information.
3, video interframe temporal correlation captures
Using LSTM module, to learn the continuity of the every interframe of video, the mesh of learning time dimensional information is reached with this
's.
It can be realized by multistage convolutional neural networks (CNN) and estimate about single image human body attitude.Although in static state
There is superior performance on image, but application of these models on video is not only computation-intensive, but also by
Performance degradation and the influence flicked.
In this application, the new recirculating network of one kind is proposed to solve the above problems.Weight secret sharing is imposed on
Multistage CNN can be rewritten as recurrent neural network (RNN), to dramatically speed up the speed for calling video network.It is every in video
Interframe remembers (LSTM) unit using shot and long term, highly effective in terms of every interframe forces Geometrical consistency, can handle well
Input quality decline in video, while successful stabilization Sequential output.
It should be understood that for those of ordinary skills, it can be modified or changed according to the above description,
And all these modifications and variations should all belong to the protection domain of appended claims of the present invention.
Claims (3)
1. a kind of method for carrying out 3 D human body Attitude estimation based on video flowing, it is characterised in that: it include following implementation steps,
Video first frame, 1) input present frame two dimensional image, the extraction of human body two-dimensional attitude is carried out using hourglass network module, it is raw
At the human body two dimension artis thermodynamic chart of first frame;2) the human body two dimension artis thermodynamic chart of present frame is exported to three-dimensional joint
Point inference module carries out two dimension to three-dimensional space reflection to generate human body three-dimensional artis thermodynamic chart;
The second frame of video, 1) input present frame two dimensional image, image shallow-layer figure is generated using shallow-layer neural network module;2) first
The image shallow-layer figure of human body two dimension artis thermodynamic chart, present frame generation that frame generates, is input to LSTM module together to generate
Profound characteristic pattern;3) the deep layer characteristics of image figure that present frame generates exports the human body two dimension that present frame is generated to residual error module
Artis thermodynamic chart;4) the human body two dimension artis thermodynamic chart of present frame is exported to three-dimensional artis inference module, carries out two dimension
To three-dimensional space reflection to generate human body three-dimensional artis thermodynamic chart;
Video n-th (n >=2) frame, 1) input present frame two dimensional image, image shallow-layer figure is generated using shallow-layer neural network module;
2) the image shallow-layer figure of (n-1) frame generates human body two dimension artis thermodynamic chart, present frame generation, is input to LSTM mould together
Block is to generate profound characteristic pattern;3) the deep layer characteristics of image figure that present frame generates is exported to residual error module, generates present frame
Human body two dimension artis thermodynamic chart;4) the human body two dimension artis thermodynamic chart of present frame is exported to three-dimensional artis inference module,
Two dimension is carried out to three-dimensional space reflection to generate human body three-dimensional artis thermodynamic chart;
The human body three-dimensional artis thermodynamic chart superposition that every frame generates above, generates the video flowing of 3 D human body Attitude estimation.
2. the method according to claim 1 for carrying out 3 D human body Attitude estimation based on video flowing, it is characterised in that:
Single order hourglass network includes the structure of following parallel connection,
Upper midway has several primary modules of M input channel and N output channel;
Lower midway has concatenated down-sampled 1/2 pond layer, several primary modules, rises sampling arest neighbors interpolating module;
Described n (n >=2) the rank hourglass network is that any primary module of midway under (n-1) rank hourglass network is replaced with (n-
1) rank hourglass network.
3. the method according to claim 2 for carrying out 3 D human body Attitude estimation based on video flowing, it is characterised in that: described
Primary module (Residual), have the channel M input and N channel output;
Primary module (Residual) includes the structure of following parallel connection,
First behavior convolution road, it is in series by three different convolutional layers of core scale;
Second behavior is skipped a grade road, is 1 comprising a core scale, input convolutional layer identical with output channel number.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811080931.1A CN109271933B (en) | 2018-09-17 | 2018-09-17 | Method for estimating three-dimensional human body posture based on video stream |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811080931.1A CN109271933B (en) | 2018-09-17 | 2018-09-17 | Method for estimating three-dimensional human body posture based on video stream |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109271933A true CN109271933A (en) | 2019-01-25 |
CN109271933B CN109271933B (en) | 2021-11-16 |
Family
ID=65189536
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811080931.1A Active CN109271933B (en) | 2018-09-17 | 2018-09-17 | Method for estimating three-dimensional human body posture based on video stream |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109271933B (en) |
Cited By (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109821239A (en) * | 2019-02-20 | 2019-05-31 | 网易(杭州)网络有限公司 | Implementation method, device, equipment and the storage medium of somatic sensation television game |
CN109949368A (en) * | 2019-03-14 | 2019-06-28 | 郑州大学 | A kind of human body three-dimensional Attitude estimation method based on image retrieval |
CN110427877A (en) * | 2019-08-01 | 2019-11-08 | 大连海事大学 | A method of the human body three-dimensional posture estimation based on structural information |
CN110472532A (en) * | 2019-07-30 | 2019-11-19 | 中国科学院深圳先进技术研究院 | A kind of the video object Activity recognition method and apparatus |
CN110619310A (en) * | 2019-09-19 | 2019-12-27 | 北京达佳互联信息技术有限公司 | Human skeleton key point detection method, device, equipment and medium |
CN110751039A (en) * | 2019-09-18 | 2020-02-04 | 平安科技(深圳)有限公司 | Multi-view 3D human body posture estimation method and related device |
CN110826459A (en) * | 2019-10-31 | 2020-02-21 | 上海交通大学 | Migratable campus violent behavior video identification method based on attitude estimation |
CN110991319A (en) * | 2019-11-29 | 2020-04-10 | 广州市百果园信息技术有限公司 | Hand key point detection method, gesture recognition method and related device |
CN111401230A (en) * | 2020-03-13 | 2020-07-10 | 深圳市商汤科技有限公司 | Attitude estimation method and apparatus, electronic device, and storage medium |
CN111695457A (en) * | 2020-05-28 | 2020-09-22 | 浙江工商大学 | Human body posture estimation method based on weak supervision mechanism |
CN111767847A (en) * | 2020-06-29 | 2020-10-13 | 佛山市南海区广工大数控装备协同创新研究院 | Pedestrian multi-target tracking method integrating target detection and association |
CN111898566A (en) * | 2020-08-04 | 2020-11-06 | 成都井之丽科技有限公司 | Attitude estimation method, attitude estimation device, electronic equipment and storage medium |
CN112215160A (en) * | 2020-10-13 | 2021-01-12 | 厦门大学 | Video three-dimensional human body posture estimation algorithm using long-term and short-term information fusion |
CN112509123A (en) * | 2020-12-09 | 2021-03-16 | 北京达佳互联信息技术有限公司 | Three-dimensional reconstruction method and device, electronic equipment and storage medium |
CN112767534A (en) * | 2020-12-31 | 2021-05-07 | 北京达佳互联信息技术有限公司 | Video image processing method and device, electronic equipment and storage medium |
WO2021163103A1 (en) * | 2020-02-13 | 2021-08-19 | Northeastern University | Light-weight pose estimation network with multi-scale heatmap fusion |
CN113469136A (en) * | 2021-07-28 | 2021-10-01 | 大连海事大学 | Method for identifying work monitoring of turbine crew based on improved LSTM-VGG16 deep neural network structure |
US11380121B2 (en) | 2020-08-25 | 2022-07-05 | Sony Group Corporation | Full skeletal 3D pose recovery from monocular camera |
CN114926860A (en) * | 2022-05-12 | 2022-08-19 | 哈尔滨工业大学 | Three-dimensional human body attitude estimation method based on millimeter wave radar |
WO2024083100A1 (en) * | 2022-10-17 | 2024-04-25 | Alibaba Damo (Hangzhou) Technology Co., Ltd. | Method and apparatus for talking face video compression |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107392097A (en) * | 2017-06-15 | 2017-11-24 | 中山大学 | A kind of 3 D human body intra-articular irrigation method of monocular color video |
CN108197547A (en) * | 2017-12-26 | 2018-06-22 | 深圳云天励飞技术有限公司 | Face pose estimation, device, terminal and storage medium |
US20180204111A1 (en) * | 2013-02-28 | 2018-07-19 | Z Advanced Computing, Inc. | System and Method for Extremely Efficient Image and Pattern Recognition and Artificial Intelligence Platform |
-
2018
- 2018-09-17 CN CN201811080931.1A patent/CN109271933B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180204111A1 (en) * | 2013-02-28 | 2018-07-19 | Z Advanced Computing, Inc. | System and Method for Extremely Efficient Image and Pattern Recognition and Artificial Intelligence Platform |
CN107392097A (en) * | 2017-06-15 | 2017-11-24 | 中山大学 | A kind of 3 D human body intra-articular irrigation method of monocular color video |
CN108197547A (en) * | 2017-12-26 | 2018-06-22 | 深圳云天励飞技术有限公司 | Face pose estimation, device, terminal and storage medium |
Non-Patent Citations (2)
Title |
---|
YUE LUO ET AL.: "LSTM Pose Machines", 《HTTPS://ARXIV.ORG/ABS/1712.06316》 * |
李润顺: "基于深度学习的三维目标识别算法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (34)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109821239B (en) * | 2019-02-20 | 2024-05-28 | 网易(杭州)网络有限公司 | Method, device, equipment and storage medium for realizing somatosensory game |
CN109821239A (en) * | 2019-02-20 | 2019-05-31 | 网易(杭州)网络有限公司 | Implementation method, device, equipment and the storage medium of somatic sensation television game |
CN109949368A (en) * | 2019-03-14 | 2019-06-28 | 郑州大学 | A kind of human body three-dimensional Attitude estimation method based on image retrieval |
CN109949368B (en) * | 2019-03-14 | 2020-11-06 | 郑州大学 | Human body three-dimensional attitude estimation method based on image retrieval |
CN110472532A (en) * | 2019-07-30 | 2019-11-19 | 中国科学院深圳先进技术研究院 | A kind of the video object Activity recognition method and apparatus |
CN110427877A (en) * | 2019-08-01 | 2019-11-08 | 大连海事大学 | A method of the human body three-dimensional posture estimation based on structural information |
CN110427877B (en) * | 2019-08-01 | 2022-10-25 | 大连海事大学 | Human body three-dimensional posture estimation method based on structural information |
CN110751039B (en) * | 2019-09-18 | 2023-07-25 | 平安科技(深圳)有限公司 | Multi-view 3D human body posture estimation method and related device |
CN110751039A (en) * | 2019-09-18 | 2020-02-04 | 平安科技(深圳)有限公司 | Multi-view 3D human body posture estimation method and related device |
CN110619310A (en) * | 2019-09-19 | 2019-12-27 | 北京达佳互联信息技术有限公司 | Human skeleton key point detection method, device, equipment and medium |
CN110826459A (en) * | 2019-10-31 | 2020-02-21 | 上海交通大学 | Migratable campus violent behavior video identification method based on attitude estimation |
CN110826459B (en) * | 2019-10-31 | 2022-09-30 | 上海交通大学 | Migratable campus violent behavior video identification method based on attitude estimation |
CN110991319A (en) * | 2019-11-29 | 2020-04-10 | 广州市百果园信息技术有限公司 | Hand key point detection method, gesture recognition method and related device |
CN110991319B (en) * | 2019-11-29 | 2021-10-19 | 广州市百果园信息技术有限公司 | Hand key point detection method, gesture recognition method and related device |
US20230126178A1 (en) * | 2020-02-13 | 2023-04-27 | Northeastern University | Light-Weight Pose Estimation Network With Multi-Scale Heatmap Fusion |
WO2021163103A1 (en) * | 2020-02-13 | 2021-08-19 | Northeastern University | Light-weight pose estimation network with multi-scale heatmap fusion |
CN111401230A (en) * | 2020-03-13 | 2020-07-10 | 深圳市商汤科技有限公司 | Attitude estimation method and apparatus, electronic device, and storage medium |
CN111401230B (en) * | 2020-03-13 | 2023-11-28 | 深圳市商汤科技有限公司 | Gesture estimation method and device, electronic equipment and storage medium |
CN111695457A (en) * | 2020-05-28 | 2020-09-22 | 浙江工商大学 | Human body posture estimation method based on weak supervision mechanism |
CN111695457B (en) * | 2020-05-28 | 2023-05-09 | 浙江工商大学 | Human body posture estimation method based on weak supervision mechanism |
CN111767847A (en) * | 2020-06-29 | 2020-10-13 | 佛山市南海区广工大数控装备协同创新研究院 | Pedestrian multi-target tracking method integrating target detection and association |
CN111898566B (en) * | 2020-08-04 | 2023-02-03 | 成都井之丽科技有限公司 | Attitude estimation method, attitude estimation device, electronic equipment and storage medium |
CN111898566A (en) * | 2020-08-04 | 2020-11-06 | 成都井之丽科技有限公司 | Attitude estimation method, attitude estimation device, electronic equipment and storage medium |
US11380121B2 (en) | 2020-08-25 | 2022-07-05 | Sony Group Corporation | Full skeletal 3D pose recovery from monocular camera |
CN112215160A (en) * | 2020-10-13 | 2021-01-12 | 厦门大学 | Video three-dimensional human body posture estimation algorithm using long-term and short-term information fusion |
CN112215160B (en) * | 2020-10-13 | 2023-11-24 | 厦门大学 | Video three-dimensional human body posture estimation algorithm utilizing long-short period information fusion |
CN112509123A (en) * | 2020-12-09 | 2021-03-16 | 北京达佳互联信息技术有限公司 | Three-dimensional reconstruction method and device, electronic equipment and storage medium |
CN112767534A (en) * | 2020-12-31 | 2021-05-07 | 北京达佳互联信息技术有限公司 | Video image processing method and device, electronic equipment and storage medium |
CN112767534B (en) * | 2020-12-31 | 2024-02-09 | 北京达佳互联信息技术有限公司 | Video image processing method, device, electronic equipment and storage medium |
CN113469136B (en) * | 2021-07-28 | 2024-05-14 | 大连海事大学 | Method for identifying turbine employee monitoring based on improved LSTM-VGG16 deep neural network structure |
CN113469136A (en) * | 2021-07-28 | 2021-10-01 | 大连海事大学 | Method for identifying work monitoring of turbine crew based on improved LSTM-VGG16 deep neural network structure |
CN114926860A (en) * | 2022-05-12 | 2022-08-19 | 哈尔滨工业大学 | Three-dimensional human body attitude estimation method based on millimeter wave radar |
CN114926860B (en) * | 2022-05-12 | 2024-08-09 | 哈尔滨工业大学 | Three-dimensional human body posture estimation method based on millimeter wave radar |
WO2024083100A1 (en) * | 2022-10-17 | 2024-04-25 | Alibaba Damo (Hangzhou) Technology Co., Ltd. | Method and apparatus for talking face video compression |
Also Published As
Publication number | Publication date |
---|---|
CN109271933B (en) | 2021-11-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109271933A (en) | The method for carrying out 3 D human body Attitude estimation based on video flowing | |
CN111369681B (en) | Three-dimensional model reconstruction method, device, equipment and storage medium | |
CN110443842B (en) | Depth map prediction method based on visual angle fusion | |
CN105654492B (en) | Robust real-time three-dimensional method for reconstructing based on consumer level camera | |
CN111047548B (en) | Attitude transformation data processing method and device, computer equipment and storage medium | |
CN111311685B (en) | Motion scene reconstruction unsupervised method based on IMU and monocular image | |
CN110728219B (en) | 3D face generation method based on multi-column multi-scale graph convolution neural network | |
CN110889343B (en) | Crowd density estimation method and device based on attention type deep neural network | |
CN111275518A (en) | Video virtual fitting method and device based on mixed optical flow | |
CN110140147A (en) | Video frame synthesis with deep learning | |
CN112232134B (en) | Human body posture estimation method based on hourglass network and attention mechanism | |
CN109299685A (en) | Deduction network and its method for the estimation of human synovial 3D coordinate | |
CN114339409B (en) | Video processing method, device, computer equipment and storage medium | |
CN114663509B (en) | Self-supervision monocular vision odometer method guided by key point thermodynamic diagram | |
CN107977930A (en) | A kind of image super-resolution method and its system | |
JP2023545189A (en) | Image processing methods, devices, and electronic equipment | |
CN111462274A (en) | Human body image synthesis method and system based on SMP L model | |
CN111738092B (en) | Method for recovering occluded human body posture sequence based on deep learning | |
Zhang et al. | Unsupervised multi-view constrained convolutional network for accurate depth estimation | |
CN113706670A (en) | Method and device for generating dynamic three-dimensional human body mesh model sequence | |
CN116363308A (en) | Human body three-dimensional reconstruction model training method, human body three-dimensional reconstruction method and equipment | |
CN112380764A (en) | End-to-end rapid reconstruction method for gas scene under limited view | |
CN117274446A (en) | Scene video processing method, device, equipment and storage medium | |
CN111311732A (en) | 3D human body grid obtaining method and device | |
CA3177593A1 (en) | Transformer-based shape models |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |