CN110009717A - A kind of animated character's binding recording system based on monocular depth figure - Google Patents
A kind of animated character's binding recording system based on monocular depth figure Download PDFInfo
- Publication number
- CN110009717A CN110009717A CN201910256680.6A CN201910256680A CN110009717A CN 110009717 A CN110009717 A CN 110009717A CN 201910256680 A CN201910256680 A CN 201910256680A CN 110009717 A CN110009717 A CN 110009717A
- Authority
- CN
- China
- Prior art keywords
- value
- animated character
- artis
- coordinate
- smoothing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T13/00—Animation
- G06T13/20—3D [Three Dimensional] animation
- G06T13/40—3D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
Abstract
The invention discloses a kind of, and the animated character based on monocular depth figure binds recording system, belongs to video human Attitude estimation technical field.In the system, data handling procedure is based on machine learning and deep learning frame, from monocular depth figure, utilize three-dimensional information deep learning network, estimate human joint points coordinate in figure, human joint points coordinate estimated value is introduced into animated character and binds recording system, and is smoothed using filter algorithm, realizes that artis and animated character bind the binding of animated character in recording system.Pass through the estimation using three-dimensional information deep learning network implementations to body joint point coordinate, body joint point coordinate estimated value is introduced into animated character to bind in recording system, so that more accurate to the estimated value of human joint points coordinate in figure, it is recorded to be bound in animated character, it enables to human action in shooting picture to be accurately embodied on animated character, realizes that artis and animated character bind the accurate binding of animated character in recording system.
Description
Technical field
The present invention relates to a kind of, and the animated character based on monocular depth figure binds recording system, belongs to video human posture and estimates
Count technical field.
Background technique
Joint and limb of the human body attitude estimation i.e. based on image reconstruction people are dry, human body attitude tracking and joint based on image
Point estimation in human-computer interaction, safety monitoring, motion analysis, augmented reality, virtual reality, medical treatment & health, game and animation field,
All there is huge application potential and market.Current method mainly includes following two categories:
(1) top-down approach is called model driven method, it relies on the model constructed in advance or priori knowledge, passes through
Posterior probability is matched or solved in image sequence and calculates corresponding aspect of model variable, corrects building in advance using variable
Good model, makes model close to the specific posture of human body in image.The main calculation process of this method is to predict, match, more
Newly, such method is mainly from model, and using data as the driving of model, therefore final precision is by the dual of model and data
It influences, so there is a problem of that precision is not high.
(2) Bottom-up approach is called data-driven method, it by a large amount of matching primitives of image data, directly from
It is returned in data and extracts target joint point, final precision fluctuation is big, and by data specific effect, and generalization ability is poor.
Summary of the invention
In order to solve presently, there are estimation method of human posture existing for precision it is not high so as to cause based on monocular depth
Human action can not accurately be embodied in the problem on animated character in shooting picture in animated character's binding recording of figure, the present invention
Provide a kind of animated character's binding recording system based on monocular depth figure.
The application provides a kind of animated character and binds data processing method in recording system, and the method includes following steps
It is rapid:
Step (1) is handled 2D monocular depth figure using deeprior++ network, exports spatial offset xyz_
Offset;
The enhancing of step (2) data: 2D monocular depth figure is rotated, scaled and is translated and is mapped to three-dimensional Euclidean
Space forms point cloud;Mapping equation is as follows:
Wherein, u, v are the arbitrary coordinate point under image coordinate system;u0、v0For the centre coordinate of image;xω、yω、zωIt indicates
Three-dimensional coordinate point under world coordinate system;zcThe z-axis value for indicating camera coordinate in world coordinate system, i.e., in 2D monocular depth figure
Animated character to camera distance;R, T are respectively the 3x3 spin matrix and 3x1 translation matrix of outer ginseng matrix, and f/dx is to x
Partial differential, f/dy is the partial differential to y;
The point cloud that step (3) is obtained using the spatial offset xyz_Offset amendment step (2) that step (1) obtains, so
Revised cloud is trimmed using preset parameter afterwards, preliminarily forms set a little, the collection of the point preliminarily formed is collectively referred to as
It is the square that a space size is 88x88x88 for voxel collection Cubic, voxel collection Cubic, wherein position a little is designated as 1,
The position of no point is designated as 0;
The voxel collection Cubic that step (4) obtains step (3) inputs three-dimensional information deep learning network FeSHEN, obtains
The maximum likelihood of animated character's artis responds position, and the maximum likelihood response position of artis is mapped to world coordinates later
In system, 18 artis of animated character are finally predicted, obtain space coordinate of 18 artis in world coordinate system;
Step (5) handles space coordinate of 18 artis in world coordinate system using smoothing method, including
Variation limitation and jitter smoothing;Wherein, variation limitation is for preventing artis from the variation of the movement beyond human body limit occur;It trembles
It is dynamic smooth for avoiding because artis caused by noise is shaken;
Jitter smoothing algorithm is as follows:
Input: this frame coordinate input value Xt;The input value X of upper framet-1;
Output: smoothed out output valve
S1 calculates XtWith Xt-1Between Euclidean distance dis;
S2 judges the size of dis, setting shake limits value Jitter;
If dis > Jitter, X 't=Xt;
If dis≤Jitter, X is determinedtFor shake, then utilize following formula to XtIt carries out smooth
S3 calculates the smooth value Y of this frame using Holter two fingers number smoothing formulat;
Yt=X 't×(1-Smoothing)+(Xt-1+Tt-1)×Smoothing
Wherein, Smoothing is smoothing parameter, and value range is [0,1];Tt-1For the Trend value of previous frame, by previous frame
Trend formula calculate;
S4 calculates smooth value YtWith input value Xt-1Between difference Dis;
Dis=Yt-Xt-1
S5 calculates the Trend value T of this frame using the double exponential trend formula of Holtert;
Tt=Dis × Correction+Tt-1×(1-Correction)
Wherein, Correction is corrected parameter, and value range is [0,1];
S6 calculates final predicted value using Holter two fingers number predictor formula
Wherein, Prediction value is [0, n];
S7 checks predicted value using maximum filtering distance MaxDistWith input value XtBetween Euclidean distance
DisOut, if DisOut > MaxDist,
It has finally obtained for input value XtSmoothed out output valveFor a three-dimensional vector, include
Coordinate value of the artis on x, y, z axis;
Step (6) is according to smoothed out output valveIt establishes the animated character based on monocular depth figure and binds to record and be
System.
Optionally, the 2D monocular depth figure uses KinectV2 the and Xtion Pro or Asus Xtion Pro of Microsoft
Shooting obtains.
Optionally, the step (6) is according to smoothed out output valveEstablish the animated character based on monocular depth figure
Bind recording system, comprising:
(1) by output valveThe human body estimation module of reading system, in human body estimation module using filter algorithm to pass
Node coordinate is smoothed, and the editor that smoothed artis parameter is output to system is then recorded module;
(2) animated character's model is established in model binding module, animated character's artis and skeleton pattern is bound, generated
Class people's animation model, the editor for being output to system record module;
(3) the human body picture and human body estimation module that editor's recording module captures photographic device export smoothed
Artis parameter, the animated character's model established with model binding module are bound, and complete animated character's recording and binding is appointed
Business;
The editor records the main interface that module is system, and user is chosen by main interface operation and preview model, true
After determining cartoon scene and animated character, the recording for carrying out animated video using embedded video recording method is clicked, record is ultimately generated
Video processed.
Optionally, in the step (4) three-dimensional information deep learning network FeSHEN network structure are as follows:
One convolution block is followed by a pond block, and four ME modules of connecting later continuously export two residual blocks, are most followed by
One convolution block;The ME module is used for said three-dimensional body by pond block, residual block, four module compositions of warp block and supervision block
Element estimation;
The convolution block is made of voxel convolution, voxel batch normalization layer, activation primitive, based on the convolution of three-dimensional information
It calculates;The pond block is made of voxel down-sampling, voxel batch normalization layer, activation primitive, for reducing the ruler of three-dimensional feature figure
Very little size;The residual block includes main line and branch line Liang Ge branch, and main line branch includes two convolution blocks, and branch line branch includes one
A convolution block, the residual block is for being adjusted the port number of three-dimensional feature figure;The warp block by voxel up-sample,
Voxel batch normalization layer, activation primitive composition, for merging three-dimensional feature figure, reduce the port number of characteristic pattern, finally obtain defeated
Feature out;The supervision block includes upper and lower Liang Ge branch, and each branch is made of two residual blocks, and top set is used for three-dimensional
Characteristic pattern carries out channel transformation, and first residual block of inferior division extracts non-linear spy for compressing three-dimensional feature figure
Sign, second residual block of inferior division are used to carry out compressed supervision feature channel extension, so with it is former export feature into
Row fusion;
Wherein, the Kernel size of residual block is 3x3x3, and the Kernel size of convolution block and warp block is 2x2x2, step
Length is 2;The monitoring parameter of four ME modules is followed successively by [2,4,8,16], and output parameter is followed successively by [8,16,32,64].It is optional
, when being rotated in the step (2) to 2D monocular depth figure, for by 2D monocular depth figure in X/Y plane [- 40,40] angle
It is rotated in range.
Optionally, it when being zoomed in and out in the step (2) to 2D monocular depth figure, scales multiple range [0.8,1.2].
Optionally, translation is carried out to 2D monocular depth figure in the step (2) to translate in [- 8,8] voxel space.
Optionally, the variation that variation is limited to be arranged between frame and frame to artis in the step (5) limits, including three
Kind method:
1. coordinate limits, the variation range of coordinate value in each reference axis is limited, and then control angle change;
2. swinging limitation, angle change of the limitation artis or so with front and back;
3. angle limits, the angle change of limitation artis in all directions.
Optionally, in step (5) the jitter smoothing algorithm S3, smoothing parameter Smoothing value is smaller, then this frame
Smooth value YtIt is influenced by previous frame smaller.
Optionally, in step (5) the jitter smoothing algorithm S5, corrected parameter Correction value is bigger, then to pass
The amendment of the deviation of node is faster.
Optionally, the method is based on machine learning and deep learning frame, from monocular depth figure, is believed using three-dimensional
Deep learning network is ceased, human joint points coordinate in figure is estimated, the human joint points coordinate estimated value estimated is introduced dynamic
It draws personage and binds recording system, artis is smoothed using filter algorithm, it is final to realize artis and animated character
Bind the binding of animated character in recording system.
The medicine have the advantages that
By the estimation using three-dimensional information deep learning network implementations to human joint points coordinate in figure, by what is estimated
Human joint points coordinate estimated value is introduced into animated character and binds in recording system, so that the estimation to human joint points coordinate in figure
Be worth it is more accurate, thus animated character bind record, enable to human action in shooting picture to be accurately embodied in animation people
On object, realize that artis and animated character bind the accurate binding of animated character in recording system.
Detailed description of the invention
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment
Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for
For those of ordinary skill in the art, without creative efforts, it can also be obtained according to these attached drawings other
Attached drawing.
Fig. 1 is deeprior++ structure chart.
Fig. 2 is artis variation limitation figure, wherein the limitation of crossbanding indicates coordinate includes ginseng as shown in (1) in Fig. 2
Number △ x, △ y, △ z indicate respective coordinates variation;Inclined stripe indicates mobile limitation, as shown in (2) in Fig. 2, include parameter △ S,
△ T indicates swing amplitude and amplitude of rocking back and forth;Dotted striped indicates angle limitation, as shown in (3) in Fig. 2, includes ginseng
Number △ A, indicates angle change.
Fig. 3 is that animated character binds recording system structure chart in the present invention.
Fig. 4 is the design drawing of the basic module of deep learning network in the present invention.
Fig. 5 is the design drawing of deep learning network in the present invention.
Fig. 6 is that FeSHEN network provided by the invention and V2V-PoseNet network are carried out to human body compound action sequence
Comparative result figure when Attitude estimation.
Fig. 7 is that FeSHEN network provided by the invention and V2V-PoseNet network are carried out to human body continuous action sequence
Comparative result figure when Attitude estimation.
Specific embodiment
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with attached drawing to embodiment party of the present invention
Formula is described in further detail.
Embodiment one:
The present embodiment provides a kind of animated characters to bind the data processing method in recording system, for human body and animation people
During object binding is recorded, the method is based on machine learning and deep learning frame and utilizes three-dimensional information from monocular depth figure
Deep learning network FeSHEN, estimates human joint points coordinate in figure, and the human joint points coordinate estimated value estimated is drawn
Enter animated character and bind recording system, artis is smoothed using filter algorithm, it is final to realize artis and animation
Personage binds the binding of animated character in recording system.
Monocular depth figure shoots to obtain using KinectV2 the and Xtion Pro or Asus Xtion Pro of Microsoft.
Wherein three-dimensional information deep learning network FeSHEN is a kind of network end to end, following respectively from the mould of network
Three block, structure and network losses function aspects introduce the establishment process of network:
(1) module of network
The application devises a networking component Monitored-Endecoder (ME mould for three-dimensional voxel estimation
Block), specific structure is as shown in Figure 4.The ME module is made of pond block, residual block, warp block and supervision four modules of block, is
Simplified figure is not come out Z axis information shown in the figure in Fig. 4.
Two rows below each module respectively indicate the module input and output, and number represents the sky of the module process data
Between size and feature port number, the size of cube represents the size of characteristic pattern in module, the thickness channel of cube it is big
It is small.
Leftmost dotted box portion is encoder in figure, and pond block is used to reduce the size of characteristic pattern, residual block
For increasing feature number of channels, increases number of channels and be equal to the species number for increasing convolution kernel, the type of convolution kernel is more, energy
Learn to be more conducive to the promotion to model performance to more features.
Intermediate dotted box portion is decoder, and wherein warp block is adopted while increasing the space size of characteristic pattern
With less convolution kernel function Characteristics figure, port number is reduced, realizes compression and decoding.In an encoding process, by using smaller
Step-length, compressive features, increase port number, expand feature map space, network characterization can be made richer, it is easier to drop to close
Collection property key point position, thus positioning joint point.
The dotted box portion of rightmost represents monitor, is mainly made of the residual block of different I/O channel parameters,
Channel parameters are denoted as Out and Moni, the former is known as output parameter, and the latter is monitoring parameter.The monitor there are two branch, by
Two residual block compositions.Top set is used to carry out characteristic pattern channel transformation, and the port number of residual block is parameter Out;Inferior division
First residual block, be intermediate monitoring module (Intermediate Monitor Block), the intermediate monitoring module it is defeated
Entering port number is channel parameters Out, and output channel number is that monitoring parameter Moni is compressed characteristic pattern by the residual block,
Extract nonlinear characteristic;Second residual block of inferior division, input channel number are monitoring parameter Moni, and output channel number is output
Parameter Out is mainly used for carrying out compressed supervision feature channel extension, and then is merged with original output feature.According to
The degree of refinement of the difference of supervision size every time, study to feature is different, and for different image resolution ratios, difference can be set
Supervision parameter, to reach optimum prediction effect.
(2) network structure
Construct network basic unit be 3D convolution basic block, totally 4 kinds.
1st kind of basic block is by voxel convolution, voxel batch normalization (Batch Normalize) layer, activation primitive (ReLU)
Composition, referred to as convolution block, are substantially carried out the convolutional calculation of three-dimensional information;
2nd kind of basic block is known as residual block, is mainly used for being adjusted the port number of three-dimensional feature figure;
3rd kind of basic block is used to carry out 3D voxel down-sampling, as principle with the pond 2D is, referred to as pond block,
It is mainly used for the three-dimensional size for reducing characteristic pattern;
4th kind of basic block is for up-sampling 3D voxel, by voxel up-sampling (Upsampling), voxel batch normalizing
Change (Batch Normalize) layer, activation primitive (ReLU) composition, referred to as warp block.Make in convolution block and warp block
Help to simplify learning process with batch normalization layer and activation primitive, accelerates decrease speed.In network design process, residual block
Kernel size be 3x3x3, the Kernel size of convolution block and warp block is 2x2x2, and step-length is 2.
The major part of high dimensional information encoding and decoding network (FeSHEN) with supervision uses different prisons by four ME modules
Control parameter is composed in series.By connecting to ME module, deepens the network number of plies, using different monitoring parameters, realize to not
Feature with degree of refinement extracts.It is handled in beginning and end part with convolution block, specific network structure such as Fig. 2 institute
Show.
Number in Fig. 4 under module represents the size and port number of the resume module characteristic pattern, and wherein red numerical represents
The monitoring parameter of monitoring parameter Moni, ME component is followed successively by [2,4,8,16], and blue digital represents output parameter Out, ME component
Output parameter be followed successively by [8,16,32,64].
(3) network losses function
Three-dimensional information deep learning network FeSHEN is using the mean square error of Gauss extreme point mean value as loss function
L, specific as follows:
Wherein, N indicates the label of artis,And HnIt is the true value of n-th of artis and the Gauss of predicted value respectively
Extreme point mean value, i, j, k are respectively the D coordinates value of n-th of artis.
Use Gauss extreme point mean value (mean of Gaussian peak) as the feature of coordinate points, is to consider
To the prediction possibility to each artis, each future position and target point are sought into Gaussian mean, the spy as target point
Sign, specific Gauss extreme point characteristics of mean indicate that process is as follows:
Wherein,It is the Gauss extreme point mean value of n-th of artis, in, jn, knIt is the label coordinate of n-th of artis
Value, and σ=1.7 are the standard deviations of Gauss extreme point.
It is as shown in Figure 5 to establish three-dimensional information deep learning network FeSHEN structure chart.
After establishing three-dimensional information deep learning network FeSHEN, executes following step and complete human joint points and animation
Personage's binding:
It is mono- to shoot to obtain the 2D comprising human body using KinectV2 the and Xtion Pro or Asus Xtion Pro of Microsoft
Mesh depth map;
Step (1) is handled 2D monocular depth figure using deeprior++ network, exports spatial offset xyz_
Offset;Deeprior++ network structure is as shown in Figure 1.
DeepPrior++:Improving disclosed in 2017 wherein can refer to for the introduction of deeprior++ network
Content in Fast and Accurate 3D Hand Pose Estimation.
The enhancing of step (2) data: 2D monocular depth figure is rotated, scaled and is translated and is mapped to three-dimensional Euclidean
Space forms point cloud;Wherein, it rotates to be and is rotated 2D monocular depth figure in X/Y plane [- 40,40] angular range;Scaling
When, it scales multiple range [0.8,1.2];Translation is the translation in [- 8,8] voxel space;Three dimensional euclidean space is mapped to be formed
The mapping equation of point cloud is as follows:
Wherein, u, v are the arbitrary coordinate point under image coordinate system;u0、v0For the centre coordinate of image;xω、yω、zωIt indicates
Three-dimensional coordinate point under world coordinate system;zcThe z-axis value for indicating camera coordinate in world coordinate system, i.e., in 2D monocular depth figure
Animated character to camera distance;R, T are respectively the 3x3 spin matrix and 3x1 translation matrix of outer ginseng matrix, and f/dx is to x
Partial differential, f/dy is the partial differential to y;
The point cloud that step (3) is obtained using the spatial offset xyz_Offset amendment step (2) that step (1) obtains, so
Revised cloud is trimmed using preset parameter afterwards, preliminarily forms set a little, the collection of the point preliminarily formed is collectively referred to as
It is the square that a space size is 88x88x88 for voxel collection Cubic, voxel collection Cubic, wherein position a little is designated as 1,
The position of no point is designated as 0;
The voxel collection Cubic that step (4) obtains step (3) inputs above-mentioned established three-dimensional information deep learning network
FeSHEN, the maximum likelihood for obtaining animated character's artis respond position, later reflect the maximum likelihood response position of artis
It is mapped in world coordinate system, finally predicts 18 artis of animated character, obtain 18 artis in world coordinate system
Space coordinate;
Step (5) handles space coordinate of 18 artis in world coordinate system using smoothing method, including
Variation limitation and jitter smoothing;Wherein, variation limitation is for preventing artis from the variation of the movement beyond human body limit occur;It trembles
It is dynamic smooth for avoiding because artis caused by noise is shaken;
The variation that variation is limited to be arranged between frame and frame to artis limits, referring to FIG. 2, including three kinds of methods:
1. coordinate limits, the variation range of coordinate value in each reference axis is limited, and then control angle change;
2. swinging limitation, angle change of the limitation artis or so with front and back;
3. angle limits, the angle change of limitation artis in all directions.
In Fig. 2, it is labelled with the method for limiting of artis movement with different colours, and is labelled with limitation ginseng by artis
Number.Wherein crossbanding indicates coordinate limits, specific as shown in (1) in figure, includes parameter △ x, △ y, △ z, and indicates coordinate becomes
Change;Inclined stripe indicates mobile limitation, includes parameter, Δ S, Δ T specifically as shown in (2) in figure, expression swings amplitude with before
After rock amplitude;Dotted striped indicates angle limitation, specific as shown in (3) in figure, includes parameter, Δ A, indicates angle change.
Jitter smoothing algorithm is as follows:
Input: this frame coordinate input value Xt;The input value X of upper framet-1;
Output: smoothed out output valve
S1 calculates XtWith Xt-1Between Euclidean distance dis;
S2 judges the size of dis, setting shake limits value Jitter;
If dis > Jitter, X 't=Xt;
If dis≤Jitter, X is determinedtFor shake, then utilize following formula to XtIt carries out smooth
S3 calculates the smooth value Y of this frame using Holter two fingers number smoothing formulat;
Yt=X 't×(1-Smoothing)+(Xt-1+Tt-1)×Smoothing
Wherein, Smoothing is smoothing parameter, and value range is [0,1];Tt-1For the Trend value of previous frame, by previous frame
Trend formula calculate;According to above-mentioned Holter two fingers number smoothing formula it is found that smoothing parameter Smoothing value is smaller,
The then smooth value Y of this frametIt is influenced by previous frame smaller;
S4 calculates smooth value YtWith input value Xt-1Between difference Dis;
Dis=Yt-Xt-1
S5 calculates the Trend value T of this frame using the double exponential trend formula of Holtert;
Tt=Dis × Correction+Tt-1×(1-Correction)
Wherein, Correction is corrected parameter, and value range is [0,1];It is public according to the double exponential trends of above-mentioned Holter
Formula is it is found that corrected parameter Correction value is bigger, then faster to the amendment of the deviation of artis.
S6 calculates final predicted value using Holter two fingers number predictor formula
Wherein, Prediction value be [0, n], be influence expectation predict future n frame predicted value parameter, be used for pair
The following n frame image has an impact;
S7 checks predicted value using maximum filtering distance MaxDistWith input value XtBetween Euclidean distance
DisOut, if DisOut > MaxDist,
It has finally obtained for input value XtSmoothed out output valveFor a three-dimensional vector, include
Coordinate value of the artis on x, y, z axis;
Above-mentioned maximum filtering distance MaxDist is set according to specific smoothness requirements, and the value is excessive, and will lead to cannot be right
DisOut is filtered;The value is too small, will lead to smoothed out output valveTend to be single.
Step (6) is according to smoothed out output valveIt establishes the animated character based on monocular depth figure and binds to record and be
System, it includes that human body estimation module, model binding module and editor record module three mainly which, which binds recording system,
Module, specific structure are shown in Fig. 3:
(1) by output valveThe editor for being output to system records module;
(2) animated character's model is established in model binding module, animated character's artis and skeleton pattern is bound, generated
Class people's animation model, the editor for being output to system record module;
(3) the human body picture and human body estimation module that editor's recording module captures photographic device export smoothed
Artis parameter, the animated character's model established with model binding module are bound, and complete animated character's recording and binding is appointed
Business;
The editor records the main interface that module is system, and user is chosen by main interface operation and preview model, true
After determining cartoon scene and animated character, the recording for carrying out animated video using embedded video recording method is clicked, record is ultimately generated
Video processed.
In above-mentioned data handling procedure, using three-dimensional information deep learning it is network-evaluated go out figure in human joint points coordinate,
Human joint points coordinate in figure is estimated compared with using other modes, such as: it is compared with V2V-PoseNet network, it should
The structure of V2V-PoseNet network can refer to V2v-posenet:Voxel-to-voxel prediction disclosed in 2018
network for accurate 3d hand and humanpose estimation from a single depth
map;
Comparison result please refers to Fig. 6 and Fig. 7, and Fig. 6 show a human motion sequence, the movement of human body in the motion sequence
Foot is caused to have exceeded depth camera detection range, V2V-PoseNet is made that the estimation of mistake at this time, returns foot key point
Initial position is arrived, as comparison diagram is evident that in (2) column in Fig. 6 and (3) column.And the application is done using FeSHEN
Adjustment appropriate gives the predicted value closer to true value.
In Fig. 7, in (1), (2) two column comparison diagrams, for foot joint point, V2V-PoseNet is by human body Liang Ge foot
Corresponding two point predictions are a point, and FeSHEN is then refined and separated, it can thus be appreciated that provided by the present application
FeSHEN has stronger nonlinear prediction effect.And (3) (4) (5) three column comparison diagram in Fig. 7, then show that FeSHEN has
Stronger robustness, for the movement of human body complexity, for example when human body occurs from when blocking, FeSHEN can be more accurately to variation
Depth map information make a response, adjust predicted value, as a result closer to true value.
Synthesis is it is found that using human joint points coordinate in the network-evaluated figure out of three-dimensional information deep learning provided by the present application
Closer true value, and it is non-linear more preferable, and robustness is stronger, may make prediction to tie to bind recording system in animated character
Fruit is more accurate.
Part steps in the embodiment of the present invention, can use software realization, and corresponding software program can store can
In the storage medium of reading, such as CD or hard disk.
The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all in spirit of the invention and
Within principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.
Claims (10)
1. a kind of animated character binds the data processing method in recording system, which is characterized in that the method includes following steps
It is rapid:
Step (1) is handled 2D monocular depth figure using deeprior++ network, exports spatial offset xyz_Offset;
The enhancing of step (2) data: 2D monocular depth figure is rotated, scaled and is translated and is mapped to three dimensional euclidean space
Form point cloud;Mapping equation is as follows:
Wherein, u, v are the arbitrary coordinate point under image coordinate system;u0、v0For the centre coordinate of image;xω、yω、zωIndicate the world
Three-dimensional coordinate point under coordinate system;zcThe z-axis value for indicating camera coordinate in world coordinate system, i.e., it is dynamic in 2D monocular depth figure
Distance of the picture personage to camera;R, T are respectively the 3x3 spin matrix and 3x1 translation matrix of outer ginseng matrix;
The point cloud that step (3) is obtained using the spatial offset xyz_Offset amendment step (2) that step (1) obtains, it is then sharp
Revised cloud is trimmed with preset parameter, preliminarily forms set a little, the collecting for point preliminarily formed will be collectively referred to as body
Element collection Cubic, voxel collection Cubic is the square that a space size is 88x88x88, wherein position a little is designated as 1, no point
Position be designated as 0;
The voxel collection Cubic that step (4) obtains step (3) inputs three-dimensional information deep learning network FeSHEN, obtains animation
The maximum likelihood of character joint point responds position, and the maximum likelihood response position of artis is mapped to world coordinate system later
In, 18 artis of animated character are finally predicted, space coordinate of 18 artis in world coordinate system is obtained;
Step (5) handles space coordinate of 18 artis in world coordinate system using smoothing method, including variation
Limitation and jitter smoothing;Wherein, variation limitation is for preventing artis from the variation of the movement beyond human body limit occur;Shake is flat
It slides for avoiding because artis caused by noise is shaken;
Jitter smoothing algorithm is as follows:
Input: this frame coordinate input value Xt;The input value X of upper framet-1;
Output: smoothed out output valve
S1 calculates XtWith Xt-1Between Euclidean distance dis;
S2 judges the size of dis, setting shake limits value Jitter;
If dis > Jitter, X 't=Xt;
If dis≤Jitter, X is determinedtFor shake, then utilize following formula to XtIt carries out smooth
S3 calculates the smooth value Y of this frame using Holter two fingers number smoothing formulat;
Yt=X 't×(1-Smoothing)+(Xt-1+Tt-1)×Smoothing
Wherein, Smoothing is smoothing parameter, and value range is [0,1];Tt-1For the Trend value of previous frame, by becoming for previous frame
Gesture formula calculates;
S4 calculates smooth value YtWith input value Xt-1Between difference Dis;
Dis=Yt-Xt-1
S5 calculates the Trend value T of this frame using the double exponential trend formula of Holtert;
Tt=Dis × Correction+Tt-1×(1-Correction)
Wherein, Correction is corrected parameter, and value range is [0,1];
S6 calculates final predicted value using Holter two fingers number predictor formula
Wherein, Prediction value is [0, n];
S7 checks predicted value using maximum filtering distance MaxDistWith input value XtBetween Euclidean distance DisOut, if
DisOut > MaxDist, then
It has finally obtained for input value XtSmoothed out output valve For a three-dimensional vector, pass is contained
Coordinate value of the node on x, y, z axis;
Step (6) is according to smoothed out output valveIt establishes the animated character based on monocular depth figure and binds recording system.
2. the method according to claim 1, wherein the step (6) is according to smoothed out output valveIt builds
The animated character of the monocular depth that is based on figure binds recording system, comprising:
(1) by output valveThe editor for being output to system records module;
(2) animated character's model is established in model binding module, animated character's artis and skeleton pattern is bound, generate class people
Animation model, the editor for being output to system record module;
(3) editor records the smoothed joint of the human body picture that module captures photographic device and the output of human body estimation module
Point parameter, the animated character's model established with model binding module are bound, and animated character's recording and binding task are completed;
The editor records the main interface that module is system, and user is chosen by main interface operation and preview model, dynamic determining
After drawing scene and animated character, the recording for carrying out animated video using embedded video recording method is clicked, ultimately generates recording view
Frequently.
3. the method according to claim 1, wherein three-dimensional information deep learning network in the step (4)
The network structure of FeSHEN are as follows: a convolution block is followed by a pond block, four ME modules of connecting later, and continuous output two is residual
Poor block is most followed by a convolution block;The ME module is made of pond block, residual block, warp block and supervision four modules of block,
Estimate for three-dimensional voxel;
Wherein, the Kernel size of residual block is 3x3x3, and the Kernel size of convolution block and warp block is 2x2x2, and step-length is equal
It is 2;The monitoring parameter of four ME modules is followed successively by [2,4,8,16], and output parameter is followed successively by [8,16,32,64].
4. the method according to claim 1, wherein being rotated in the step (2) to 2D monocular depth figure
When, for 2D monocular depth figure is rotated in X/Y plane [- 40,40] angular range.
5. the method according to claim 1, wherein being zoomed in and out in the step (2) to 2D monocular depth figure
When, it scales multiple range [0.8,1.2].
6. the method according to claim 1, wherein being translated in the step (2) to 2D monocular depth figure
To be translated in [- 8,8] voxel space.
7. the method according to claim 1, wherein variation is limited to carry out to artis in the step (5)
Limitation, the variation being arranged between frame and frame limits, including three kinds of methods:
1. coordinate limits, the variation range of coordinate value in each reference axis is limited, and then control angle change;
2. swinging limitation, angle change of the limitation artis or so with front and back;
3. angle limits, the angle change of limitation artis in all directions.
8. the method according to claim 1, wherein in the step (5) jitter smoothing algorithm S3, smoothing parameter
Smoothing value is smaller, then the smooth value Y of this frametIt is influenced by previous frame smaller.
9. the method according to claim 1, wherein in the step (5) jitter smoothing algorithm S5, corrected parameter
Correction value is bigger, then faster to the amendment of the deviation of artis.
10. -9 any method according to claim 1, which is characterized in that the method is based on machine learning and depth
Frame is practised, estimates human joint points coordinate in figure from monocular depth figure using three-dimensional information deep learning network, it will
The human joint points coordinate estimated value estimated introduces animated character and binds recording system, is carried out using filter algorithm to artis
Smoothing processing, it is final to realize that artis and animated character bind the binding of animated character in recording system.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910256680.6A CN110009717B (en) | 2019-04-01 | 2019-04-01 | Animation figure binding recording system based on monocular depth map |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910256680.6A CN110009717B (en) | 2019-04-01 | 2019-04-01 | Animation figure binding recording system based on monocular depth map |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110009717A true CN110009717A (en) | 2019-07-12 |
CN110009717B CN110009717B (en) | 2020-11-03 |
Family
ID=67169200
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910256680.6A Active CN110009717B (en) | 2019-04-01 | 2019-04-01 | Animation figure binding recording system based on monocular depth map |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110009717B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110321964A (en) * | 2019-07-10 | 2019-10-11 | 重庆电子工程职业学院 | Identification model update method and relevant apparatus |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102486816A (en) * | 2010-12-02 | 2012-06-06 | 三星电子株式会社 | Device and method for calculating human body shape parameters |
CN102622591A (en) * | 2012-01-12 | 2012-08-01 | 北京理工大学 | 3D (three-dimensional) human posture capturing and simulating system |
CN106846403A (en) * | 2017-01-04 | 2017-06-13 | 北京未动科技有限公司 | The method of hand positioning, device and smart machine in a kind of three dimensions |
US20170345183A1 (en) * | 2016-04-27 | 2017-11-30 | Bellus 3D, Inc. | Robust Head Pose Estimation with a Depth Camera |
CN108573231A (en) * | 2018-04-17 | 2018-09-25 | 中国民航大学 | Human bodys' response method based on the Depth Motion figure that motion history point cloud generates |
CN108665496A (en) * | 2018-03-21 | 2018-10-16 | 浙江大学 | A kind of semanteme end to end based on deep learning is instant to be positioned and builds drawing method |
CN109003301A (en) * | 2018-07-06 | 2018-12-14 | 东南大学 | A kind of estimation method of human posture and rehabilitation training system based on OpenPose and Kinect |
US20190026548A1 (en) * | 2017-11-22 | 2019-01-24 | Intel Corporation | Age classification of humans based on image depth and human pose |
CN109492578A (en) * | 2018-11-08 | 2019-03-19 | 北京华捷艾米科技有限公司 | A kind of gesture remote control method and device based on depth camera |
-
2019
- 2019-04-01 CN CN201910256680.6A patent/CN110009717B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102486816A (en) * | 2010-12-02 | 2012-06-06 | 三星电子株式会社 | Device and method for calculating human body shape parameters |
CN102622591A (en) * | 2012-01-12 | 2012-08-01 | 北京理工大学 | 3D (three-dimensional) human posture capturing and simulating system |
US20170345183A1 (en) * | 2016-04-27 | 2017-11-30 | Bellus 3D, Inc. | Robust Head Pose Estimation with a Depth Camera |
CN106846403A (en) * | 2017-01-04 | 2017-06-13 | 北京未动科技有限公司 | The method of hand positioning, device and smart machine in a kind of three dimensions |
US20190026548A1 (en) * | 2017-11-22 | 2019-01-24 | Intel Corporation | Age classification of humans based on image depth and human pose |
CN108665496A (en) * | 2018-03-21 | 2018-10-16 | 浙江大学 | A kind of semanteme end to end based on deep learning is instant to be positioned and builds drawing method |
CN108573231A (en) * | 2018-04-17 | 2018-09-25 | 中国民航大学 | Human bodys' response method based on the Depth Motion figure that motion history point cloud generates |
CN109003301A (en) * | 2018-07-06 | 2018-12-14 | 东南大学 | A kind of estimation method of human posture and rehabilitation training system based on OpenPose and Kinect |
CN109492578A (en) * | 2018-11-08 | 2019-03-19 | 北京华捷艾米科技有限公司 | A kind of gesture remote control method and device based on depth camera |
Non-Patent Citations (2)
Title |
---|
YING CHEN: "Automatic Facial Feature Correspondence Based on Pose Estimation", 《2010 SECOND INTERNATIONAL WORKSHOP ON EDUCATION TECHNOLOGY AND COMPUTER SCIENCE》 * |
陈莹: "基于特征回归的单目深度图无标记人体姿态估计", 《系统仿真学报》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110321964A (en) * | 2019-07-10 | 2019-10-11 | 重庆电子工程职业学院 | Identification model update method and relevant apparatus |
Also Published As
Publication number | Publication date |
---|---|
CN110009717B (en) | 2020-11-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Anwar et al. | Image colorization: A survey and dataset | |
Penner et al. | Soft 3d reconstruction for view synthesis | |
CN108898630A (en) | A kind of three-dimensional rebuilding method, device, equipment and storage medium | |
Starck et al. | Surface capture for performance-based animation | |
CN113706714A (en) | New visual angle synthesis method based on depth image and nerve radiation field | |
WO2022205760A1 (en) | Three-dimensional human body reconstruction method and apparatus, and device and storage medium | |
WO2019017985A1 (en) | Robust mesh tracking and fusion by using part-based key frames and priori model | |
CN109191369A (en) | 2D pictures turn method, storage medium and the device of 3D model | |
CN111028330A (en) | Three-dimensional expression base generation method, device, equipment and storage medium | |
CN113496507A (en) | Human body three-dimensional model reconstruction method | |
CN109598234A (en) | Critical point detection method and apparatus | |
US11880935B2 (en) | Multi-view neural human rendering | |
CN115428027A (en) | Neural opaque point cloud | |
CN110176079A (en) | A kind of three-dimensional model deformation algorithm based on quasi- Conformal | |
WO2019221739A1 (en) | Image location identification | |
CN115100334A (en) | Image edge drawing and animation method, device and storage medium | |
US20220198731A1 (en) | Pixel-aligned volumetric avatars | |
Flagg et al. | Human video textures | |
Fuentes-Jimenez et al. | Deep shape-from-template: Wide-baseline, dense and fast registration and deformable reconstruction from a single image | |
CN110009717A (en) | A kind of animated character's binding recording system based on monocular depth figure | |
Jin et al. | Learning to dodge a bullet: Concyclic view morphing via deep learning | |
CN113886510A (en) | Terminal interaction method, device, equipment and storage medium | |
CN116704084A (en) | Training method of facial animation generation network, facial animation generation method and device | |
CN114049678B (en) | Facial motion capturing method and system based on deep learning | |
Khan et al. | Towards monocular neural facial depth estimation: Past, present, and future |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |