CN110009717A - A kind of animated character's binding recording system based on monocular depth figure - Google Patents

A kind of animated character's binding recording system based on monocular depth figure Download PDF

Info

Publication number
CN110009717A
CN110009717A CN201910256680.6A CN201910256680A CN110009717A CN 110009717 A CN110009717 A CN 110009717A CN 201910256680 A CN201910256680 A CN 201910256680A CN 110009717 A CN110009717 A CN 110009717A
Authority
CN
China
Prior art keywords
value
animated character
artis
coordinate
smoothing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910256680.6A
Other languages
Chinese (zh)
Other versions
CN110009717B (en
Inventor
陈莹
沈栎
化春键
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangnan University
Original Assignee
Jiangnan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangnan University filed Critical Jiangnan University
Priority to CN201910256680.6A priority Critical patent/CN110009717B/en
Publication of CN110009717A publication Critical patent/CN110009717A/en
Application granted granted Critical
Publication of CN110009717B publication Critical patent/CN110009717B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation
    • G06T13/403D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings

Abstract

The invention discloses a kind of, and the animated character based on monocular depth figure binds recording system, belongs to video human Attitude estimation technical field.In the system, data handling procedure is based on machine learning and deep learning frame, from monocular depth figure, utilize three-dimensional information deep learning network, estimate human joint points coordinate in figure, human joint points coordinate estimated value is introduced into animated character and binds recording system, and is smoothed using filter algorithm, realizes that artis and animated character bind the binding of animated character in recording system.Pass through the estimation using three-dimensional information deep learning network implementations to body joint point coordinate, body joint point coordinate estimated value is introduced into animated character to bind in recording system, so that more accurate to the estimated value of human joint points coordinate in figure, it is recorded to be bound in animated character, it enables to human action in shooting picture to be accurately embodied on animated character, realizes that artis and animated character bind the accurate binding of animated character in recording system.

Description

A kind of animated character's binding recording system based on monocular depth figure
Technical field
The present invention relates to a kind of, and the animated character based on monocular depth figure binds recording system, belongs to video human posture and estimates Count technical field.
Background technique
Joint and limb of the human body attitude estimation i.e. based on image reconstruction people are dry, human body attitude tracking and joint based on image Point estimation in human-computer interaction, safety monitoring, motion analysis, augmented reality, virtual reality, medical treatment & health, game and animation field, All there is huge application potential and market.Current method mainly includes following two categories:
(1) top-down approach is called model driven method, it relies on the model constructed in advance or priori knowledge, passes through Posterior probability is matched or solved in image sequence and calculates corresponding aspect of model variable, corrects building in advance using variable Good model, makes model close to the specific posture of human body in image.The main calculation process of this method is to predict, match, more Newly, such method is mainly from model, and using data as the driving of model, therefore final precision is by the dual of model and data It influences, so there is a problem of that precision is not high.
(2) Bottom-up approach is called data-driven method, it by a large amount of matching primitives of image data, directly from It is returned in data and extracts target joint point, final precision fluctuation is big, and by data specific effect, and generalization ability is poor.
Summary of the invention
In order to solve presently, there are estimation method of human posture existing for precision it is not high so as to cause based on monocular depth Human action can not accurately be embodied in the problem on animated character in shooting picture in animated character's binding recording of figure, the present invention Provide a kind of animated character's binding recording system based on monocular depth figure.
The application provides a kind of animated character and binds data processing method in recording system, and the method includes following steps It is rapid:
Step (1) is handled 2D monocular depth figure using deeprior++ network, exports spatial offset xyz_ Offset;
The enhancing of step (2) data: 2D monocular depth figure is rotated, scaled and is translated and is mapped to three-dimensional Euclidean Space forms point cloud;Mapping equation is as follows:
Wherein, u, v are the arbitrary coordinate point under image coordinate system;u0、v0For the centre coordinate of image;xω、yω、zωIt indicates Three-dimensional coordinate point under world coordinate system;zcThe z-axis value for indicating camera coordinate in world coordinate system, i.e., in 2D monocular depth figure Animated character to camera distance;R, T are respectively the 3x3 spin matrix and 3x1 translation matrix of outer ginseng matrix, and f/dx is to x Partial differential, f/dy is the partial differential to y;
The point cloud that step (3) is obtained using the spatial offset xyz_Offset amendment step (2) that step (1) obtains, so Revised cloud is trimmed using preset parameter afterwards, preliminarily forms set a little, the collection of the point preliminarily formed is collectively referred to as It is the square that a space size is 88x88x88 for voxel collection Cubic, voxel collection Cubic, wherein position a little is designated as 1, The position of no point is designated as 0;
The voxel collection Cubic that step (4) obtains step (3) inputs three-dimensional information deep learning network FeSHEN, obtains The maximum likelihood of animated character's artis responds position, and the maximum likelihood response position of artis is mapped to world coordinates later In system, 18 artis of animated character are finally predicted, obtain space coordinate of 18 artis in world coordinate system;
Step (5) handles space coordinate of 18 artis in world coordinate system using smoothing method, including Variation limitation and jitter smoothing;Wherein, variation limitation is for preventing artis from the variation of the movement beyond human body limit occur;It trembles It is dynamic smooth for avoiding because artis caused by noise is shaken;
Jitter smoothing algorithm is as follows:
Input: this frame coordinate input value Xt;The input value X of upper framet-1
Output: smoothed out output valve
S1 calculates XtWith Xt-1Between Euclidean distance dis;
S2 judges the size of dis, setting shake limits value Jitter;
If dis > Jitter, X 't=Xt
If dis≤Jitter, X is determinedtFor shake, then utilize following formula to XtIt carries out smooth
S3 calculates the smooth value Y of this frame using Holter two fingers number smoothing formulat
Yt=X 't×(1-Smoothing)+(Xt-1+Tt-1)×Smoothing
Wherein, Smoothing is smoothing parameter, and value range is [0,1];Tt-1For the Trend value of previous frame, by previous frame Trend formula calculate;
S4 calculates smooth value YtWith input value Xt-1Between difference Dis;
Dis=Yt-Xt-1
S5 calculates the Trend value T of this frame using the double exponential trend formula of Holtert
Tt=Dis × Correction+Tt-1×(1-Correction)
Wherein, Correction is corrected parameter, and value range is [0,1];
S6 calculates final predicted value using Holter two fingers number predictor formula
Wherein, Prediction value is [0, n];
S7 checks predicted value using maximum filtering distance MaxDistWith input value XtBetween Euclidean distance DisOut, if DisOut > MaxDist,
It has finally obtained for input value XtSmoothed out output valveFor a three-dimensional vector, include Coordinate value of the artis on x, y, z axis;
Step (6) is according to smoothed out output valveIt establishes the animated character based on monocular depth figure and binds to record and be System.
Optionally, the 2D monocular depth figure uses KinectV2 the and Xtion Pro or Asus Xtion Pro of Microsoft Shooting obtains.
Optionally, the step (6) is according to smoothed out output valveEstablish the animated character based on monocular depth figure Bind recording system, comprising:
(1) by output valveThe human body estimation module of reading system, in human body estimation module using filter algorithm to pass Node coordinate is smoothed, and the editor that smoothed artis parameter is output to system is then recorded module;
(2) animated character's model is established in model binding module, animated character's artis and skeleton pattern is bound, generated Class people's animation model, the editor for being output to system record module;
(3) the human body picture and human body estimation module that editor's recording module captures photographic device export smoothed Artis parameter, the animated character's model established with model binding module are bound, and complete animated character's recording and binding is appointed Business;
The editor records the main interface that module is system, and user is chosen by main interface operation and preview model, true After determining cartoon scene and animated character, the recording for carrying out animated video using embedded video recording method is clicked, record is ultimately generated Video processed.
Optionally, in the step (4) three-dimensional information deep learning network FeSHEN network structure are as follows:
One convolution block is followed by a pond block, and four ME modules of connecting later continuously export two residual blocks, are most followed by One convolution block;The ME module is used for said three-dimensional body by pond block, residual block, four module compositions of warp block and supervision block Element estimation;
The convolution block is made of voxel convolution, voxel batch normalization layer, activation primitive, based on the convolution of three-dimensional information It calculates;The pond block is made of voxel down-sampling, voxel batch normalization layer, activation primitive, for reducing the ruler of three-dimensional feature figure Very little size;The residual block includes main line and branch line Liang Ge branch, and main line branch includes two convolution blocks, and branch line branch includes one A convolution block, the residual block is for being adjusted the port number of three-dimensional feature figure;The warp block by voxel up-sample, Voxel batch normalization layer, activation primitive composition, for merging three-dimensional feature figure, reduce the port number of characteristic pattern, finally obtain defeated Feature out;The supervision block includes upper and lower Liang Ge branch, and each branch is made of two residual blocks, and top set is used for three-dimensional Characteristic pattern carries out channel transformation, and first residual block of inferior division extracts non-linear spy for compressing three-dimensional feature figure Sign, second residual block of inferior division are used to carry out compressed supervision feature channel extension, so with it is former export feature into Row fusion;
Wherein, the Kernel size of residual block is 3x3x3, and the Kernel size of convolution block and warp block is 2x2x2, step Length is 2;The monitoring parameter of four ME modules is followed successively by [2,4,8,16], and output parameter is followed successively by [8,16,32,64].It is optional , when being rotated in the step (2) to 2D monocular depth figure, for by 2D monocular depth figure in X/Y plane [- 40,40] angle It is rotated in range.
Optionally, it when being zoomed in and out in the step (2) to 2D monocular depth figure, scales multiple range [0.8,1.2].
Optionally, translation is carried out to 2D monocular depth figure in the step (2) to translate in [- 8,8] voxel space.
Optionally, the variation that variation is limited to be arranged between frame and frame to artis in the step (5) limits, including three Kind method:
1. coordinate limits, the variation range of coordinate value in each reference axis is limited, and then control angle change;
2. swinging limitation, angle change of the limitation artis or so with front and back;
3. angle limits, the angle change of limitation artis in all directions.
Optionally, in step (5) the jitter smoothing algorithm S3, smoothing parameter Smoothing value is smaller, then this frame Smooth value YtIt is influenced by previous frame smaller.
Optionally, in step (5) the jitter smoothing algorithm S5, corrected parameter Correction value is bigger, then to pass The amendment of the deviation of node is faster.
Optionally, the method is based on machine learning and deep learning frame, from monocular depth figure, is believed using three-dimensional Deep learning network is ceased, human joint points coordinate in figure is estimated, the human joint points coordinate estimated value estimated is introduced dynamic It draws personage and binds recording system, artis is smoothed using filter algorithm, it is final to realize artis and animated character Bind the binding of animated character in recording system.
The medicine have the advantages that
By the estimation using three-dimensional information deep learning network implementations to human joint points coordinate in figure, by what is estimated Human joint points coordinate estimated value is introduced into animated character and binds in recording system, so that the estimation to human joint points coordinate in figure Be worth it is more accurate, thus animated character bind record, enable to human action in shooting picture to be accurately embodied in animation people On object, realize that artis and animated character bind the accurate binding of animated character in recording system.
Detailed description of the invention
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for For those of ordinary skill in the art, without creative efforts, it can also be obtained according to these attached drawings other Attached drawing.
Fig. 1 is deeprior++ structure chart.
Fig. 2 is artis variation limitation figure, wherein the limitation of crossbanding indicates coordinate includes ginseng as shown in (1) in Fig. 2 Number △ x, △ y, △ z indicate respective coordinates variation;Inclined stripe indicates mobile limitation, as shown in (2) in Fig. 2, include parameter △ S, △ T indicates swing amplitude and amplitude of rocking back and forth;Dotted striped indicates angle limitation, as shown in (3) in Fig. 2, includes ginseng Number △ A, indicates angle change.
Fig. 3 is that animated character binds recording system structure chart in the present invention.
Fig. 4 is the design drawing of the basic module of deep learning network in the present invention.
Fig. 5 is the design drawing of deep learning network in the present invention.
Fig. 6 is that FeSHEN network provided by the invention and V2V-PoseNet network are carried out to human body compound action sequence Comparative result figure when Attitude estimation.
Fig. 7 is that FeSHEN network provided by the invention and V2V-PoseNet network are carried out to human body continuous action sequence Comparative result figure when Attitude estimation.
Specific embodiment
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with attached drawing to embodiment party of the present invention Formula is described in further detail.
Embodiment one:
The present embodiment provides a kind of animated characters to bind the data processing method in recording system, for human body and animation people During object binding is recorded, the method is based on machine learning and deep learning frame and utilizes three-dimensional information from monocular depth figure Deep learning network FeSHEN, estimates human joint points coordinate in figure, and the human joint points coordinate estimated value estimated is drawn Enter animated character and bind recording system, artis is smoothed using filter algorithm, it is final to realize artis and animation Personage binds the binding of animated character in recording system.
Monocular depth figure shoots to obtain using KinectV2 the and Xtion Pro or Asus Xtion Pro of Microsoft.
Wherein three-dimensional information deep learning network FeSHEN is a kind of network end to end, following respectively from the mould of network Three block, structure and network losses function aspects introduce the establishment process of network:
(1) module of network
The application devises a networking component Monitored-Endecoder (ME mould for three-dimensional voxel estimation Block), specific structure is as shown in Figure 4.The ME module is made of pond block, residual block, warp block and supervision four modules of block, is Simplified figure is not come out Z axis information shown in the figure in Fig. 4.
Two rows below each module respectively indicate the module input and output, and number represents the sky of the module process data Between size and feature port number, the size of cube represents the size of characteristic pattern in module, the thickness channel of cube it is big It is small.
Leftmost dotted box portion is encoder in figure, and pond block is used to reduce the size of characteristic pattern, residual block For increasing feature number of channels, increases number of channels and be equal to the species number for increasing convolution kernel, the type of convolution kernel is more, energy Learn to be more conducive to the promotion to model performance to more features.
Intermediate dotted box portion is decoder, and wherein warp block is adopted while increasing the space size of characteristic pattern With less convolution kernel function Characteristics figure, port number is reduced, realizes compression and decoding.In an encoding process, by using smaller Step-length, compressive features, increase port number, expand feature map space, network characterization can be made richer, it is easier to drop to close Collection property key point position, thus positioning joint point.
The dotted box portion of rightmost represents monitor, is mainly made of the residual block of different I/O channel parameters, Channel parameters are denoted as Out and Moni, the former is known as output parameter, and the latter is monitoring parameter.The monitor there are two branch, by Two residual block compositions.Top set is used to carry out characteristic pattern channel transformation, and the port number of residual block is parameter Out;Inferior division First residual block, be intermediate monitoring module (Intermediate Monitor Block), the intermediate monitoring module it is defeated Entering port number is channel parameters Out, and output channel number is that monitoring parameter Moni is compressed characteristic pattern by the residual block, Extract nonlinear characteristic;Second residual block of inferior division, input channel number are monitoring parameter Moni, and output channel number is output Parameter Out is mainly used for carrying out compressed supervision feature channel extension, and then is merged with original output feature.According to The degree of refinement of the difference of supervision size every time, study to feature is different, and for different image resolution ratios, difference can be set Supervision parameter, to reach optimum prediction effect.
(2) network structure
Construct network basic unit be 3D convolution basic block, totally 4 kinds.
1st kind of basic block is by voxel convolution, voxel batch normalization (Batch Normalize) layer, activation primitive (ReLU) Composition, referred to as convolution block, are substantially carried out the convolutional calculation of three-dimensional information;
2nd kind of basic block is known as residual block, is mainly used for being adjusted the port number of three-dimensional feature figure;
3rd kind of basic block is used to carry out 3D voxel down-sampling, as principle with the pond 2D is, referred to as pond block, It is mainly used for the three-dimensional size for reducing characteristic pattern;
4th kind of basic block is for up-sampling 3D voxel, by voxel up-sampling (Upsampling), voxel batch normalizing Change (Batch Normalize) layer, activation primitive (ReLU) composition, referred to as warp block.Make in convolution block and warp block Help to simplify learning process with batch normalization layer and activation primitive, accelerates decrease speed.In network design process, residual block Kernel size be 3x3x3, the Kernel size of convolution block and warp block is 2x2x2, and step-length is 2.
The major part of high dimensional information encoding and decoding network (FeSHEN) with supervision uses different prisons by four ME modules Control parameter is composed in series.By connecting to ME module, deepens the network number of plies, using different monitoring parameters, realize to not Feature with degree of refinement extracts.It is handled in beginning and end part with convolution block, specific network structure such as Fig. 2 institute Show.
Number in Fig. 4 under module represents the size and port number of the resume module characteristic pattern, and wherein red numerical represents The monitoring parameter of monitoring parameter Moni, ME component is followed successively by [2,4,8,16], and blue digital represents output parameter Out, ME component Output parameter be followed successively by [8,16,32,64].
(3) network losses function
Three-dimensional information deep learning network FeSHEN is using the mean square error of Gauss extreme point mean value as loss function L, specific as follows:
Wherein, N indicates the label of artis,And HnIt is the true value of n-th of artis and the Gauss of predicted value respectively Extreme point mean value, i, j, k are respectively the D coordinates value of n-th of artis.
Use Gauss extreme point mean value (mean of Gaussian peak) as the feature of coordinate points, is to consider To the prediction possibility to each artis, each future position and target point are sought into Gaussian mean, the spy as target point Sign, specific Gauss extreme point characteristics of mean indicate that process is as follows:
Wherein,It is the Gauss extreme point mean value of n-th of artis, in, jn, knIt is the label coordinate of n-th of artis Value, and σ=1.7 are the standard deviations of Gauss extreme point.
It is as shown in Figure 5 to establish three-dimensional information deep learning network FeSHEN structure chart.
After establishing three-dimensional information deep learning network FeSHEN, executes following step and complete human joint points and animation Personage's binding:
It is mono- to shoot to obtain the 2D comprising human body using KinectV2 the and Xtion Pro or Asus Xtion Pro of Microsoft Mesh depth map;
Step (1) is handled 2D monocular depth figure using deeprior++ network, exports spatial offset xyz_ Offset;Deeprior++ network structure is as shown in Figure 1.
DeepPrior++:Improving disclosed in 2017 wherein can refer to for the introduction of deeprior++ network Content in Fast and Accurate 3D Hand Pose Estimation.
The enhancing of step (2) data: 2D monocular depth figure is rotated, scaled and is translated and is mapped to three-dimensional Euclidean Space forms point cloud;Wherein, it rotates to be and is rotated 2D monocular depth figure in X/Y plane [- 40,40] angular range;Scaling When, it scales multiple range [0.8,1.2];Translation is the translation in [- 8,8] voxel space;Three dimensional euclidean space is mapped to be formed The mapping equation of point cloud is as follows:
Wherein, u, v are the arbitrary coordinate point under image coordinate system;u0、v0For the centre coordinate of image;xω、yω、zωIt indicates Three-dimensional coordinate point under world coordinate system;zcThe z-axis value for indicating camera coordinate in world coordinate system, i.e., in 2D monocular depth figure Animated character to camera distance;R, T are respectively the 3x3 spin matrix and 3x1 translation matrix of outer ginseng matrix, and f/dx is to x Partial differential, f/dy is the partial differential to y;
The point cloud that step (3) is obtained using the spatial offset xyz_Offset amendment step (2) that step (1) obtains, so Revised cloud is trimmed using preset parameter afterwards, preliminarily forms set a little, the collection of the point preliminarily formed is collectively referred to as It is the square that a space size is 88x88x88 for voxel collection Cubic, voxel collection Cubic, wherein position a little is designated as 1, The position of no point is designated as 0;
The voxel collection Cubic that step (4) obtains step (3) inputs above-mentioned established three-dimensional information deep learning network FeSHEN, the maximum likelihood for obtaining animated character's artis respond position, later reflect the maximum likelihood response position of artis It is mapped in world coordinate system, finally predicts 18 artis of animated character, obtain 18 artis in world coordinate system Space coordinate;
Step (5) handles space coordinate of 18 artis in world coordinate system using smoothing method, including Variation limitation and jitter smoothing;Wherein, variation limitation is for preventing artis from the variation of the movement beyond human body limit occur;It trembles It is dynamic smooth for avoiding because artis caused by noise is shaken;
The variation that variation is limited to be arranged between frame and frame to artis limits, referring to FIG. 2, including three kinds of methods:
1. coordinate limits, the variation range of coordinate value in each reference axis is limited, and then control angle change;
2. swinging limitation, angle change of the limitation artis or so with front and back;
3. angle limits, the angle change of limitation artis in all directions.
In Fig. 2, it is labelled with the method for limiting of artis movement with different colours, and is labelled with limitation ginseng by artis Number.Wherein crossbanding indicates coordinate limits, specific as shown in (1) in figure, includes parameter △ x, △ y, △ z, and indicates coordinate becomes Change;Inclined stripe indicates mobile limitation, includes parameter, Δ S, Δ T specifically as shown in (2) in figure, expression swings amplitude with before After rock amplitude;Dotted striped indicates angle limitation, specific as shown in (3) in figure, includes parameter, Δ A, indicates angle change.
Jitter smoothing algorithm is as follows:
Input: this frame coordinate input value Xt;The input value X of upper framet-1
Output: smoothed out output valve
S1 calculates XtWith Xt-1Between Euclidean distance dis;
S2 judges the size of dis, setting shake limits value Jitter;
If dis > Jitter, X 't=Xt
If dis≤Jitter, X is determinedtFor shake, then utilize following formula to XtIt carries out smooth
S3 calculates the smooth value Y of this frame using Holter two fingers number smoothing formulat
Yt=X 't×(1-Smoothing)+(Xt-1+Tt-1)×Smoothing
Wherein, Smoothing is smoothing parameter, and value range is [0,1];Tt-1For the Trend value of previous frame, by previous frame Trend formula calculate;According to above-mentioned Holter two fingers number smoothing formula it is found that smoothing parameter Smoothing value is smaller, The then smooth value Y of this frametIt is influenced by previous frame smaller;
S4 calculates smooth value YtWith input value Xt-1Between difference Dis;
Dis=Yt-Xt-1
S5 calculates the Trend value T of this frame using the double exponential trend formula of Holtert
Tt=Dis × Correction+Tt-1×(1-Correction)
Wherein, Correction is corrected parameter, and value range is [0,1];It is public according to the double exponential trends of above-mentioned Holter Formula is it is found that corrected parameter Correction value is bigger, then faster to the amendment of the deviation of artis.
S6 calculates final predicted value using Holter two fingers number predictor formula
Wherein, Prediction value be [0, n], be influence expectation predict future n frame predicted value parameter, be used for pair The following n frame image has an impact;
S7 checks predicted value using maximum filtering distance MaxDistWith input value XtBetween Euclidean distance DisOut, if DisOut > MaxDist,
It has finally obtained for input value XtSmoothed out output valveFor a three-dimensional vector, include Coordinate value of the artis on x, y, z axis;
Above-mentioned maximum filtering distance MaxDist is set according to specific smoothness requirements, and the value is excessive, and will lead to cannot be right DisOut is filtered;The value is too small, will lead to smoothed out output valveTend to be single.
Step (6) is according to smoothed out output valveIt establishes the animated character based on monocular depth figure and binds to record and be System, it includes that human body estimation module, model binding module and editor record module three mainly which, which binds recording system, Module, specific structure are shown in Fig. 3:
(1) by output valveThe editor for being output to system records module;
(2) animated character's model is established in model binding module, animated character's artis and skeleton pattern is bound, generated Class people's animation model, the editor for being output to system record module;
(3) the human body picture and human body estimation module that editor's recording module captures photographic device export smoothed Artis parameter, the animated character's model established with model binding module are bound, and complete animated character's recording and binding is appointed Business;
The editor records the main interface that module is system, and user is chosen by main interface operation and preview model, true After determining cartoon scene and animated character, the recording for carrying out animated video using embedded video recording method is clicked, record is ultimately generated Video processed.
In above-mentioned data handling procedure, using three-dimensional information deep learning it is network-evaluated go out figure in human joint points coordinate, Human joint points coordinate in figure is estimated compared with using other modes, such as: it is compared with V2V-PoseNet network, it should The structure of V2V-PoseNet network can refer to V2v-posenet:Voxel-to-voxel prediction disclosed in 2018 network for accurate 3d hand and humanpose estimation from a single depth map;
Comparison result please refers to Fig. 6 and Fig. 7, and Fig. 6 show a human motion sequence, the movement of human body in the motion sequence Foot is caused to have exceeded depth camera detection range, V2V-PoseNet is made that the estimation of mistake at this time, returns foot key point Initial position is arrived, as comparison diagram is evident that in (2) column in Fig. 6 and (3) column.And the application is done using FeSHEN Adjustment appropriate gives the predicted value closer to true value.
In Fig. 7, in (1), (2) two column comparison diagrams, for foot joint point, V2V-PoseNet is by human body Liang Ge foot Corresponding two point predictions are a point, and FeSHEN is then refined and separated, it can thus be appreciated that provided by the present application FeSHEN has stronger nonlinear prediction effect.And (3) (4) (5) three column comparison diagram in Fig. 7, then show that FeSHEN has Stronger robustness, for the movement of human body complexity, for example when human body occurs from when blocking, FeSHEN can be more accurately to variation Depth map information make a response, adjust predicted value, as a result closer to true value.
Synthesis is it is found that using human joint points coordinate in the network-evaluated figure out of three-dimensional information deep learning provided by the present application Closer true value, and it is non-linear more preferable, and robustness is stronger, may make prediction to tie to bind recording system in animated character Fruit is more accurate.
Part steps in the embodiment of the present invention, can use software realization, and corresponding software program can store can In the storage medium of reading, such as CD or hard disk.
The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all in spirit of the invention and Within principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.

Claims (10)

1. a kind of animated character binds the data processing method in recording system, which is characterized in that the method includes following steps It is rapid:
Step (1) is handled 2D monocular depth figure using deeprior++ network, exports spatial offset xyz_Offset;
The enhancing of step (2) data: 2D monocular depth figure is rotated, scaled and is translated and is mapped to three dimensional euclidean space Form point cloud;Mapping equation is as follows:
Wherein, u, v are the arbitrary coordinate point under image coordinate system;u0、v0For the centre coordinate of image;xω、yω、zωIndicate the world Three-dimensional coordinate point under coordinate system;zcThe z-axis value for indicating camera coordinate in world coordinate system, i.e., it is dynamic in 2D monocular depth figure Distance of the picture personage to camera;R, T are respectively the 3x3 spin matrix and 3x1 translation matrix of outer ginseng matrix;
The point cloud that step (3) is obtained using the spatial offset xyz_Offset amendment step (2) that step (1) obtains, it is then sharp Revised cloud is trimmed with preset parameter, preliminarily forms set a little, the collecting for point preliminarily formed will be collectively referred to as body Element collection Cubic, voxel collection Cubic is the square that a space size is 88x88x88, wherein position a little is designated as 1, no point Position be designated as 0;
The voxel collection Cubic that step (4) obtains step (3) inputs three-dimensional information deep learning network FeSHEN, obtains animation The maximum likelihood of character joint point responds position, and the maximum likelihood response position of artis is mapped to world coordinate system later In, 18 artis of animated character are finally predicted, space coordinate of 18 artis in world coordinate system is obtained;
Step (5) handles space coordinate of 18 artis in world coordinate system using smoothing method, including variation Limitation and jitter smoothing;Wherein, variation limitation is for preventing artis from the variation of the movement beyond human body limit occur;Shake is flat It slides for avoiding because artis caused by noise is shaken;
Jitter smoothing algorithm is as follows:
Input: this frame coordinate input value Xt;The input value X of upper framet-1
Output: smoothed out output valve
S1 calculates XtWith Xt-1Between Euclidean distance dis;
S2 judges the size of dis, setting shake limits value Jitter;
If dis > Jitter, X 't=Xt
If dis≤Jitter, X is determinedtFor shake, then utilize following formula to XtIt carries out smooth
S3 calculates the smooth value Y of this frame using Holter two fingers number smoothing formulat
Yt=X 't×(1-Smoothing)+(Xt-1+Tt-1)×Smoothing
Wherein, Smoothing is smoothing parameter, and value range is [0,1];Tt-1For the Trend value of previous frame, by becoming for previous frame Gesture formula calculates;
S4 calculates smooth value YtWith input value Xt-1Between difference Dis;
Dis=Yt-Xt-1
S5 calculates the Trend value T of this frame using the double exponential trend formula of Holtert
Tt=Dis × Correction+Tt-1×(1-Correction)
Wherein, Correction is corrected parameter, and value range is [0,1];
S6 calculates final predicted value using Holter two fingers number predictor formula
Wherein, Prediction value is [0, n];
S7 checks predicted value using maximum filtering distance MaxDistWith input value XtBetween Euclidean distance DisOut, if DisOut > MaxDist, then
It has finally obtained for input value XtSmoothed out output valve For a three-dimensional vector, pass is contained Coordinate value of the node on x, y, z axis;
Step (6) is according to smoothed out output valveIt establishes the animated character based on monocular depth figure and binds recording system.
2. the method according to claim 1, wherein the step (6) is according to smoothed out output valveIt builds The animated character of the monocular depth that is based on figure binds recording system, comprising:
(1) by output valveThe editor for being output to system records module;
(2) animated character's model is established in model binding module, animated character's artis and skeleton pattern is bound, generate class people Animation model, the editor for being output to system record module;
(3) editor records the smoothed joint of the human body picture that module captures photographic device and the output of human body estimation module Point parameter, the animated character's model established with model binding module are bound, and animated character's recording and binding task are completed;
The editor records the main interface that module is system, and user is chosen by main interface operation and preview model, dynamic determining After drawing scene and animated character, the recording for carrying out animated video using embedded video recording method is clicked, ultimately generates recording view Frequently.
3. the method according to claim 1, wherein three-dimensional information deep learning network in the step (4) The network structure of FeSHEN are as follows: a convolution block is followed by a pond block, four ME modules of connecting later, and continuous output two is residual Poor block is most followed by a convolution block;The ME module is made of pond block, residual block, warp block and supervision four modules of block, Estimate for three-dimensional voxel;
Wherein, the Kernel size of residual block is 3x3x3, and the Kernel size of convolution block and warp block is 2x2x2, and step-length is equal It is 2;The monitoring parameter of four ME modules is followed successively by [2,4,8,16], and output parameter is followed successively by [8,16,32,64].
4. the method according to claim 1, wherein being rotated in the step (2) to 2D monocular depth figure When, for 2D monocular depth figure is rotated in X/Y plane [- 40,40] angular range.
5. the method according to claim 1, wherein being zoomed in and out in the step (2) to 2D monocular depth figure When, it scales multiple range [0.8,1.2].
6. the method according to claim 1, wherein being translated in the step (2) to 2D monocular depth figure To be translated in [- 8,8] voxel space.
7. the method according to claim 1, wherein variation is limited to carry out to artis in the step (5) Limitation, the variation being arranged between frame and frame limits, including three kinds of methods:
1. coordinate limits, the variation range of coordinate value in each reference axis is limited, and then control angle change;
2. swinging limitation, angle change of the limitation artis or so with front and back;
3. angle limits, the angle change of limitation artis in all directions.
8. the method according to claim 1, wherein in the step (5) jitter smoothing algorithm S3, smoothing parameter Smoothing value is smaller, then the smooth value Y of this frametIt is influenced by previous frame smaller.
9. the method according to claim 1, wherein in the step (5) jitter smoothing algorithm S5, corrected parameter Correction value is bigger, then faster to the amendment of the deviation of artis.
10. -9 any method according to claim 1, which is characterized in that the method is based on machine learning and depth Frame is practised, estimates human joint points coordinate in figure from monocular depth figure using three-dimensional information deep learning network, it will The human joint points coordinate estimated value estimated introduces animated character and binds recording system, is carried out using filter algorithm to artis Smoothing processing, it is final to realize that artis and animated character bind the binding of animated character in recording system.
CN201910256680.6A 2019-04-01 2019-04-01 Animation figure binding recording system based on monocular depth map Active CN110009717B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910256680.6A CN110009717B (en) 2019-04-01 2019-04-01 Animation figure binding recording system based on monocular depth map

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910256680.6A CN110009717B (en) 2019-04-01 2019-04-01 Animation figure binding recording system based on monocular depth map

Publications (2)

Publication Number Publication Date
CN110009717A true CN110009717A (en) 2019-07-12
CN110009717B CN110009717B (en) 2020-11-03

Family

ID=67169200

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910256680.6A Active CN110009717B (en) 2019-04-01 2019-04-01 Animation figure binding recording system based on monocular depth map

Country Status (1)

Country Link
CN (1) CN110009717B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110321964A (en) * 2019-07-10 2019-10-11 重庆电子工程职业学院 Identification model update method and relevant apparatus

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102486816A (en) * 2010-12-02 2012-06-06 三星电子株式会社 Device and method for calculating human body shape parameters
CN102622591A (en) * 2012-01-12 2012-08-01 北京理工大学 3D (three-dimensional) human posture capturing and simulating system
CN106846403A (en) * 2017-01-04 2017-06-13 北京未动科技有限公司 The method of hand positioning, device and smart machine in a kind of three dimensions
US20170345183A1 (en) * 2016-04-27 2017-11-30 Bellus 3D, Inc. Robust Head Pose Estimation with a Depth Camera
CN108573231A (en) * 2018-04-17 2018-09-25 中国民航大学 Human bodys' response method based on the Depth Motion figure that motion history point cloud generates
CN108665496A (en) * 2018-03-21 2018-10-16 浙江大学 A kind of semanteme end to end based on deep learning is instant to be positioned and builds drawing method
CN109003301A (en) * 2018-07-06 2018-12-14 东南大学 A kind of estimation method of human posture and rehabilitation training system based on OpenPose and Kinect
US20190026548A1 (en) * 2017-11-22 2019-01-24 Intel Corporation Age classification of humans based on image depth and human pose
CN109492578A (en) * 2018-11-08 2019-03-19 北京华捷艾米科技有限公司 A kind of gesture remote control method and device based on depth camera

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102486816A (en) * 2010-12-02 2012-06-06 三星电子株式会社 Device and method for calculating human body shape parameters
CN102622591A (en) * 2012-01-12 2012-08-01 北京理工大学 3D (three-dimensional) human posture capturing and simulating system
US20170345183A1 (en) * 2016-04-27 2017-11-30 Bellus 3D, Inc. Robust Head Pose Estimation with a Depth Camera
CN106846403A (en) * 2017-01-04 2017-06-13 北京未动科技有限公司 The method of hand positioning, device and smart machine in a kind of three dimensions
US20190026548A1 (en) * 2017-11-22 2019-01-24 Intel Corporation Age classification of humans based on image depth and human pose
CN108665496A (en) * 2018-03-21 2018-10-16 浙江大学 A kind of semanteme end to end based on deep learning is instant to be positioned and builds drawing method
CN108573231A (en) * 2018-04-17 2018-09-25 中国民航大学 Human bodys' response method based on the Depth Motion figure that motion history point cloud generates
CN109003301A (en) * 2018-07-06 2018-12-14 东南大学 A kind of estimation method of human posture and rehabilitation training system based on OpenPose and Kinect
CN109492578A (en) * 2018-11-08 2019-03-19 北京华捷艾米科技有限公司 A kind of gesture remote control method and device based on depth camera

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
YING CHEN: "Automatic Facial Feature Correspondence Based on Pose Estimation", 《2010 SECOND INTERNATIONAL WORKSHOP ON EDUCATION TECHNOLOGY AND COMPUTER SCIENCE》 *
陈莹: "基于特征回归的单目深度图无标记人体姿态估计", 《系统仿真学报》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110321964A (en) * 2019-07-10 2019-10-11 重庆电子工程职业学院 Identification model update method and relevant apparatus

Also Published As

Publication number Publication date
CN110009717B (en) 2020-11-03

Similar Documents

Publication Publication Date Title
Anwar et al. Image colorization: A survey and dataset
Penner et al. Soft 3d reconstruction for view synthesis
CN108898630A (en) A kind of three-dimensional rebuilding method, device, equipment and storage medium
Starck et al. Surface capture for performance-based animation
CN113706714A (en) New visual angle synthesis method based on depth image and nerve radiation field
WO2022205760A1 (en) Three-dimensional human body reconstruction method and apparatus, and device and storage medium
WO2019017985A1 (en) Robust mesh tracking and fusion by using part-based key frames and priori model
CN109191369A (en) 2D pictures turn method, storage medium and the device of 3D model
CN111028330A (en) Three-dimensional expression base generation method, device, equipment and storage medium
CN113496507A (en) Human body three-dimensional model reconstruction method
CN109598234A (en) Critical point detection method and apparatus
US11880935B2 (en) Multi-view neural human rendering
CN115428027A (en) Neural opaque point cloud
CN110176079A (en) A kind of three-dimensional model deformation algorithm based on quasi- Conformal
WO2019221739A1 (en) Image location identification
CN115100334A (en) Image edge drawing and animation method, device and storage medium
US20220198731A1 (en) Pixel-aligned volumetric avatars
Flagg et al. Human video textures
Fuentes-Jimenez et al. Deep shape-from-template: Wide-baseline, dense and fast registration and deformable reconstruction from a single image
CN110009717A (en) A kind of animated character's binding recording system based on monocular depth figure
Jin et al. Learning to dodge a bullet: Concyclic view morphing via deep learning
CN113886510A (en) Terminal interaction method, device, equipment and storage medium
CN116704084A (en) Training method of facial animation generation network, facial animation generation method and device
CN114049678B (en) Facial motion capturing method and system based on deep learning
Khan et al. Towards monocular neural facial depth estimation: Past, present, and future

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant