CN110826500B - Method for estimating 3D human body posture based on antagonistic network of motion link space - Google Patents

Method for estimating 3D human body posture based on antagonistic network of motion link space Download PDF

Info

Publication number
CN110826500B
CN110826500B CN201911085729.2A CN201911085729A CN110826500B CN 110826500 B CN110826500 B CN 110826500B CN 201911085729 A CN201911085729 A CN 201911085729A CN 110826500 B CN110826500 B CN 110826500B
Authority
CN
China
Prior art keywords
human body
human
coordinates
skeleton
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911085729.2A
Other languages
Chinese (zh)
Other versions
CN110826500A (en
Inventor
薛裕明
谢军伟
李�根
罗鸣
童同
高钦泉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujian Imperial Vision Information Technology Co ltd
Original Assignee
Fujian Imperial Vision Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujian Imperial Vision Information Technology Co ltd filed Critical Fujian Imperial Vision Information Technology Co ltd
Priority to CN201911085729.2A priority Critical patent/CN110826500B/en
Publication of CN110826500A publication Critical patent/CN110826500A/en
Application granted granted Critical
Publication of CN110826500B publication Critical patent/CN110826500B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/23Recognition of whole body movements, e.g. for sport training
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention relates to a method for estimating a 3D human body posture based on a antagonism network of a motion link space. A convolution neural network is adopted, three-dimensional coordinates of key nodes of a human body are estimated from an image collected by monocular equipment, specifically, a monocular RGB image is used as input, and a motion link space and countermeasure network technology are adopted, so that the overfitting phenomenon is solved, and the accuracy and precision of 3D human body posture estimation are improved.

Description

Method for estimating 3D human body posture based on antagonistic network of motion link space
Technical Field
The invention relates to image content understanding, in particular to a method for estimating a 3D human body posture based on a antagonism network of a motion link space.
Background
The current artificial intelligence technology brings huge breakthroughs in the fields of image content understanding, video enhancement, voice recognition and the like. Especially in the image content understanding, the 3D human body posture recognition technology has high application value in the fields of rehabilitation, video monitoring, advanced human-computer interaction and the like.
The 3D human body posture estimation refers to a technology for predicting three-dimensional coordinates of a human body posture from a monocular or monocular image. The 3D body pose estimation can be roughly classified into the following three methods:
the first method is to calculate a spatial coordinate system according to information such as position relation and shooting angle among multi-view cameras by adopting a mathematical operation or machine learning mode, predict a corresponding depth map and estimate a 2D image of any angle. However, the disadvantage is that not only the image collected by the multi-view camera is needed, but also the arrangement position of the collecting device cannot be changed.
The second method is to directly calculate 2D human body posture coordinates from a single image by using only a single acquisition device, and then estimate the corresponding 3D human body posture by a simple matrix multiplication or lightweight network learning mode. However, due to lack of original image input, spatial information may be lost, resulting in poor accuracy of 3D coordinates; in addition, this method relies only on 2D pose input information, so its error is amplified in the 3D estimation process.
The third method is to calculate the end-to-end mapping relationship from the monocular RGB images to the 3-dimensional coordinates by a deep learning method. Compared with the former two methods, the method has obvious improvement on efficiency and performance.
Although the 3D human body posture estimation has made a certain progress, additional acquisition equipment information is still needed, and the phenomenon of overfitting is very easy to occur due to the existence of a deep neural network.
Therefore, the invention only takes the monocular RGB image as input, adopts the motion link space and the countermeasure network technology, not only solves the overfitting phenomenon, but also improves the precision and the accuracy of the 3D human body posture estimation.
Disclosure of Invention
The invention aims to provide a method for estimating a 3D human body posture based on a reactive network of a motion link space, which adopts a convolutional neural network to estimate three-dimensional coordinates of key nodes of a human body from an image acquired by monocular equipment, so that the accuracy and precision of estimating the 3D human body posture are improved.
In order to achieve the purpose, the technical scheme of the invention is as follows: a method for estimating a 3D human body posture based on a antagonism network of a motion link space comprises the following steps:
s1, collecting a human body color image I by adopting monocular equipment, then carrying out image normalization, and labeling by utilizing 2D and 3D human body data sets to respectively obtainTaking 2D human body skeleton coordinate P and 3D human body skeleton coordinate M epsilon R 3×n (ii) a Adopting the original image and the human skeleton coordinate to carry out mirror image and cutting, and carrying out image data augmentation;
s2, generating a network by the 3D human body skeleton coordinates: weak supervision generation is adopted to resist network learning to solve the problem of data overfitting, wherein the following calculation formula is adopted in the feature extraction stage:
F=R(BN(W 1 *I g +B 1 )) (1)
wherein R represents a nonlinear activation function LeakyRelu, W 1 ,B 1 Respectively representing the weights and offsets of the convolutional layers in the feature extraction stage, BN representing the normalization function, I g Representing an input picture, and F representing an output result obtained in the characteristic extraction stage; then, the 3D human skeleton coordinates are obtained through the convolution block, the remodeling module and the two full-connection layers respectively;
s3, estimating a camera coordinate parameter K epsilon R by adopting a convolutional neural network 2×3 To assist in back projecting the layers;
s4, generating a 3D human body skeleton coordinate generated by a network based on the 3D human body skeleton coordinate obtained by labeling in the S1 and the 3D human body skeleton coordinate generated in the S2, calculating a link angle and a link length of a human body skeleton by adopting a Wassertein GAN discriminator of a motion link space, and simultaneously fusing and inputting the input image and the 3D human body skeleton coordinate into a convolutional neural network so as to improve the accuracy of a human body structure, namely the generation of the 3D human body skeleton coordinate;
s5, through a back projection layer, based on the camera coordinate parameter K belonging to R calculated in the step S3 2×3 Converting the 3D human skeleton coordinates into 2D human skeleton coordinates;
P'=KM (2)
wherein P' is the predicted 2D human skeletal coordinates;
s6, predicting a loss function of the key nodes of the 3D human body posture, wherein M belongs to R 3×n Representing 3D human skeleton coordinates, i.e. 3D human pose key node position, coordinate m i (x, y, z) represents one of the key node positions of the human body, i =1, \8230N, and carrying out reshape operation on the last output layer so as to obtain the 3D human body coordinate;
s7, a gradual training strategy: dividing the training process into a plurality of preset sub-training periods, and adopting a stepping increasing strategy to train the sub-training periods in sequence; when training is started, the original image is zoomed into a small picture and training is started with a large learning rate, and after each sub-training period is completed, the color original image is gradually increased and the learning rate is gradually reduced; when the 3D human skeleton coordinate generated after completing one sub-training period and the corresponding calibration data have large entries, continuing to perform backward propagation, updating the convolution weight parameter and the bias parameter by using a gradient descent optimization algorithm, and then executing the step S2; and when the 3D human body bone coordinates generated after one sub-training period is finished reach the expected times or all preset sub-training periods are finished, obtaining the final result.
In an embodiment of the present invention, the loss function of the key node of the 3D body pose is equal to:
W(P r ,P g )+λL cam
Figure BDA0002265268980000031
Figure BDA0002265268980000032
wherein, W (P) r ,P g ) Representing the loss function of WGAN, the input of which comprises two parts, P g The notation is a batch of data (containing images and correspondingly generated 3D human skeleton coordinates) input as generated, P r Representing a batch of inputs as real data (containing images and corresponding real labeled 3D human bone coordinates),
Figure BDA0002265268980000033
representing the loss value discriminated as a real 3D human skeleton,
Figure BDA0002265268980000034
a loss value representing a 3D human skeleton discriminated to be generated; | f | non-conducting phosphor L The Lipschitz constant of the function f is less than or equal to 1, which means that the Lipschitz constant of the function f is required to cover the laces in the whole space L Not more than 1, fetch @onall possible f that satisfy the condition>
Figure BDA0002265268980000035
An upper bound of (c); l is a radical of an alcohol cam Expressing the loss function of the camera estimation network, taking 0-1,trace as the lambda to calculate the trace of the corresponding matrix, | | | calcing F Is F norm, K is belonged to R 2×3 ,I 2 Is a 2 x 2 identity matrix. />
Compared with the prior art, the invention has the following beneficial effects:
the innovation of the method for estimating the 3D human body posture based on the antagonistic network of the motion link space is mainly embodied in two aspects: firstly, a deep neural network model is used for generating a human body 3D skeleton frame in a weak supervision mode, the generation is accurate, the effect is good, and most human body action analysis requirements can be met. And secondly, 3D coordinates are introduced for the first time to be fused with the images, and a discrimination network is introduced into a KCS network layer at the same time, so that the discrimination network is upgraded, and a great auxiliary effect is provided for the generation of a 3D structure. The invention aims to provide a method for estimating a 3D human body posture based on a antagonism network of a motion link space, which is accurate and reliable by using an antagonism generation network and a KCS network layer and a camera back projection network as an auxiliary means.
Drawings
FIG. 1 is a diagram of the present invention FIG. 1 is a diagram of a network structure of a 3D human skeleton coordinate part generated by the method for estimating 3D human posture based on an antagonistic network of a kinematic link space according to the present invention;
FIG. 2 is a camera estimation network structure of the method for estimating 3D human body posture based on the antagonistic network of the kinematic link space;
FIG. 3 is a discriminator portion of the method of the present invention for estimating 3D body pose based on a antagonism network of the kinematic link space;
FIG. 4 is a basic flow chart of the method for estimating 3D human body posture based on the antagonistic network of the motion link space;
FIG. 5 is a diagram illustrating the effect of the method for estimating the 3D human body posture based on the antagonism network of the kinematic link space.
Detailed Description
The technical scheme of the invention is specifically explained by combining the attached drawings 1-5.
As shown in fig. 4, the method for estimating a 3D human body posture based on a antagonistic network of a motion link space of the present invention aims to estimate three-dimensional coordinates of key nodes of a human body from an image collected by monocular equipment by using a convolutional neural network, and specifically comprises the following steps:
step 1:
in order to train the model, a large number of color human body images are selected as input I, then image normalization is carried out, and 2D and 3D human body data sets are used for labeling, so that 2D and 3D coordinates of each human body are obtained as P and M. The method comprises the steps of carrying out mirror image inversion on a color original image and labeling information, randomly changing the brightness and the chroma saturation to obtain a large amount of augmented image data, and storing the augmented image data in a matching data pair mode to serve as a training data set for deep learning. At the same time, the 2D coordinate P (P) on the training set is also matched 1 ,p 2 ,...p n ) (ii) a 3D coordinate M (M) 0 ,m 1 ,...,m n ),M∈R 3×n (ii) a And normalization processing is performed, so that the convergence rate of the model can be further improved, the precision of the model is improved, and gradient explosion is prevented.
Step 2:
the generator portion 1: the 3D human skeletal coordinates generate a network. Compared with the traditional method, the method adopts weak supervision generation to resist network learning to solve the problem of data overfitting, and the specific steps are as follows:
the characteristic extraction stage consists of a convolution layer, a batch regularization layer and a LeakyRelu activation function, and the calculation formula is as follows:
F=R(BN(W 1 *I g +B 1 )) (1)
wherein R represents a nonlinear activation function LeakyRelu, W 1 ,B 1 Respectively representing the weights and offsets of the convolutional layers in the feature extraction stage, BN representing the normalization function, I g The input picture is shown, and F shows an output result obtained in the characteristic extraction stage; then, respectively passing through a convolution block, a remodeling module (flatten) and two full-connection layers to obtain corresponding 3D human body skeleton coordinates;
and 3, step 3:
the generator section 2: in order to estimate the accuracy of the human body posture, the invention adopts a convolution neural network to estimate the coordinate parameter K belonging to R of the camera 2×3 The method aims to assist a back projection layer, back projects the 3D human skeleton coordinates to the corresponding 2D human skeleton coordinates, compares the 2D human skeleton coordinates with the 2D coordinates in the original input image, and calculates the back projection loss, thereby preventing the over-fitting phenomenon. Since K must have the following properties as a matrix transformation:
KK t =s 2 I 2 (2)
where s is the scaling factor of the projection, I 2 Is a 2 x 2 identity matrix to which the invention assigns the largest singular value in the K matrix, since s is an uncertain quantity. The calculation formula method is as follows:
Figure BDA0002265268980000051
the loss function of the camera estimation network is as follows:
Figure BDA0002265268980000052
wherein trace is the trace for calculating the corresponding matrix, | | | | | non-woven calculation F Is F norm, K is equal to R 2×3
The 3D human bone coordinates are converted to 2D bone coordinates by training the network acquisition output shown in fig. 2, i.e. obtaining the backprojected matrix K:
P'=KM (5)
and 4, step 4:
a discriminator section: as shown in FIG. 3, in order to determine the accuracy of human structure generation, the present invention uses a classifier of Wasserstein GAN [1] of kinematic link space [2] (KCS: kinetic chain space) for more reasonable calculation of link angle and length. Meanwhile, the input image and the 3D human skeleton frame are fused and input into the convolutional neural network, and the characteristic of whether the 3D skeleton is attached to the original image or not is increased.
The KCS layer is a network layer introduced by the invention and capable of improving the representation of the human posture. The KCS matrix is an important method for representing human body posture, and it contains joint link nodes and bone lengths. A bone b k Can be represented as a link of the r-th and t-th nodes.
b k =p r -p t =Mc (6)
c=(0,...,0,1,0,...,0,-1,0,...,0) T (7)
The position of r is 1, and the position of t is-1. The final overall human skeleton is defined as:
B=(b 1 ,b 2 ,...,b n ) (8)
the matrix C is obtained by linking a plurality of C vectors so that B can be represented as.
B=MC (9)
The KCS matrix is calculated as follows:
Figure BDA0002265268980000053
by adding the Ψ matrix to the network layer, it can be found that there is a length of each bone on the diagonal, and an angle representation between any two bones at other locations. Compared with matrices for calculating Euclidean distances in other methods, the algorithm adopts a matrix operation form, so that the operation speed is effectively improved, and the part is mainly used for extracting bone features and making the judgment on virtually constructed bones at the fastest speed.
In order to increase the characteristic of whether the 3D skeleton is jointed with the original image or not, the invention adds a second part of input, namely the original image and the 3D skeleton are combined as input, and the characteristic is extracted through a convolutional neural network. The method specifically comprises the steps of initializing a newly added 3D image part into a floating-point matrix of width, height and depth, wherein the initial values are all 0.5, the width and the height are equal to those of an original image, the depth is the maximum depth value of a 3D human body, and each point of an input 3D human body is assigned to be 1.0. As shown in fig. 3.
The invention links the two parts of extracted features, and adds two full link layers in the following network, wherein each full link layer comprises 90 neurons. A determination is ultimately made from whom the 3D bone coordinates are derived.
And 5:
loss function: predicting a loss function of the key nodes of the 3D human body posture: w (P) r ,P g )+λL cam ,M∈R 3×n The 3D human body posture key node position is represented, the coordinate mi (x, y, z) represents one key node position of the human body, and reshape operation is carried out on the last output layer, so that the 3D human body coordinate is obtained. The discriminator part adopts Wasserteiinloss [1]]The loss function as part is shown below:
Figure BDA0002265268980000061
wherein, W (P) r ,P g ) Representing the loss function of WGAN, the input of which comprises two parts, P g The notation is a batch of input as the generated data, P r Indicating that a batch of input is real data,
Figure BDA0002265268980000063
representing the loss value discriminated as a real 3D human skeleton,
Figure BDA0002265268980000064
a loss value representing a 3D human skeleton discriminated to be generated; | f | non-conducting phosphor L The Lipschitz constant of the function f is less than or equal to 1, which means that the Lipschitz constant of the function f is required to cover the laces in the whole space L Without exceeding 1, fetch @ for all possible f's that satisfy the condition>
Figure BDA0002265268980000065
An upper bound of (c);
the loss function of the camera estimation network is as follows:
Figure BDA0002265268980000062
wherein trace is a trace for calculating a corresponding matrix, | | | | computation proceeds F Is F norm, K is belonged to R 2×3 ,I 2 Is a 2 x 2 identity matrix.
And 6:
and (5) gradually training a strategy. Dividing the training process into a plurality of preset sub-training periods, and adopting a stepping increasing strategy to train the sub-training periods in sequence; the original image is scaled to small pictures at the beginning of training and the training is started with a large learning rate, and the color original image is gradually increased and the learning rate is gradually decreased after each sub-training period.
When the 3D human skeleton coordinates generated after completing one sub-training period and the corresponding calibration data have a large exit, continuing to perform back propagation, updating the convolution weight parameters and the bias parameters by using a gradient descent optimization algorithm, and then executing the step 2; and when the 3D human body bone coordinates generated after one sub-training period is finished reach the expected times or all preset sub-training periods are finished, obtaining the final result. The reason for this is that training is started on the basis of scaling the original picture into a small picture, and is assisted by the university learning rate. And after the training period is finished, increasing the input picture, reducing the learning rate and performing training again. By analogy, the precision of the picture with higher resolution can be enhanced on the basis of the picture with low resolution, and the robustness of the network is increased.
FIG. 5 is a diagram illustrating the effect of the method for estimating 3D human body posture based on the antagonistic network of the motion link space.
Reference documents:
[1].M.Arjovsky,S.Chintala,and L.Bottou.Wasserstein generative adversarial networks.In D.Precup and Y.W.Teh,editors,Proceedings of the 34th International Conference on Machine Learning,volume 70of Proceedings of Machine Learning Research,pages 214–223,International Convention Centre,Sydney,Australia,06–11Aug 2017.PMLR.3,4,5
[2]B.Wandt,H.Ackermann,and B.Rosenhahn.A kinematic chain space for monocular motion capture.In ECCV Workshops,Sept.2018.1,2,4,8。
the above are preferred embodiments of the present invention, and all changes made according to the technical scheme of the present invention that produce functional effects do not exceed the scope of the technical scheme of the present invention belong to the protection scope of the present invention.

Claims (2)

1. A method for estimating a 3D human body posture based on a antagonism network of a motion link space is characterized by comprising the following steps:
s1, collecting a human body color image I by adopting monocular equipment, then carrying out image normalization, marking by utilizing 2D and 3D human body data sets, and respectively obtaining 2D human body bone coordinates P and 3D human body bone coordinates M e to R 3×n (ii) a Adopting an original image and human skeleton coordinates to carry out mirror image and cutting, and carrying out image data augmentation;
s2, generating a network by the 3D human body skeleton coordinates: weak supervision generation is adopted to resist network learning to solve the problem of data overfitting, wherein the feature extraction stage adopts the following calculation formula:
F=R(BN(W 1 *I g +B 1 )) (1)
wherein R represents a nonlinear activation function LeakyRelu, W 1 ,B 1 Respectively representing the weight and the bias of the convolution layer in the feature extraction stage, wherein BN represents a normalization function, ig represents an input picture, and F represents an output result obtained in the feature extraction stage; then, the 3D human skeleton coordinates are obtained through the convolution block, the remodeling module and the two full-connection layers respectively;
s3, estimating a camera coordinate parameter K epsilon R by adopting a convolutional neural network 2×3 To assist in the back projection layer;
s4, generating a 3D human body skeleton coordinate generated by a network based on the 3D human body skeleton coordinate obtained by labeling in the step S1 and the 3D human body skeleton coordinate generated in the step S2, calculating a link angle and a link length of a human body skeleton by adopting a WasserteinGAN discriminator of a motion link space, and simultaneously fusing and inputting the input image and the 3D human body skeleton coordinate into a convolutional neural network to improve the accuracy of generating the 3D human body skeleton coordinate;
s5, through a back projection layer, based on the camera coordinate parameter K belonging to R calculated in the step S3 2×3 Converting the 3D human skeleton coordinates into 2D human skeleton coordinates;
P'=KM (2)
wherein P' is the predicted 2D human skeletal coordinates;
s6, predicting a loss function of key nodes of the 3D human body posture, wherein M belongs to R 3×n Representing 3D human skeleton coordinates, i.e. 3D human posture key node position, coordinate m i (x, y, z) represents one of key node positions of the human body, i =1, \8230;, n, and a reshape operation is performed at the last output layer, thereby acquiring 3D human body coordinates;
s7, a gradual training strategy: dividing the training process into a plurality of preset sub-training periods, and adopting a stepping increasing strategy to train the sub-training periods in sequence; scaling the original image into small pictures and starting training at a large learning rate when training is started, and gradually increasing the color original image and gradually reducing the learning rate after each sub-training period is finished; when the 3D human skeleton coordinates generated after completing one sub-training period and the corresponding calibration data have large entries, continuing to perform back propagation, updating the convolution weight parameters and the bias parameters by using a gradient descent optimization algorithm, and then executing the step S2; and when the 3D human body bone coordinates generated after one sub-training period is finished reach the expected times or all preset sub-training periods are finished, obtaining the final result.
2. The method for estimating the 3D human body posture based on the antagonistic network of the motion link space of claim 1, wherein the loss function of the key node of the 3D human body posture is equal to:
W(P r ,P g )+λL cam
Figure FDA0002265268970000021
Figure FDA0002265268970000022
wherein, W (P) r ,P g ) Representing the loss function of WGAN, the input of which comprises two parts, P g The notation is a batch of input as the data generated, P r Indicating that a batch of input is real data,
Figure FDA0002265268970000023
represents a value which is discriminated as a loss of real 3D human bone, is>
Figure FDA0002265268970000024
A loss value representing a 3D human skeleton discriminated to be generated; | f | non-conducting phosphor L 1 or less means that the Lipschitz constant of the function f is 1, meaning that the Lipschitz constant of the function f | | | ventilated phosphor is required L Under the condition of not exceeding 1, taking f to all possible satisfied conditions
Figure FDA0002265268970000025
The upper bound of (c); l is cam Expressing the loss function of the camera estimation network, taking lambda as 0-1, trace as the trace for calculating the corresponding matrix, | | | | purple F Is F norm, K is belonged to R 2×3 ,I 2 Is a 2 x 2 identity matrix. />
CN201911085729.2A 2019-11-08 2019-11-08 Method for estimating 3D human body posture based on antagonistic network of motion link space Active CN110826500B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911085729.2A CN110826500B (en) 2019-11-08 2019-11-08 Method for estimating 3D human body posture based on antagonistic network of motion link space

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911085729.2A CN110826500B (en) 2019-11-08 2019-11-08 Method for estimating 3D human body posture based on antagonistic network of motion link space

Publications (2)

Publication Number Publication Date
CN110826500A CN110826500A (en) 2020-02-21
CN110826500B true CN110826500B (en) 2023-04-14

Family

ID=69553460

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911085729.2A Active CN110826500B (en) 2019-11-08 2019-11-08 Method for estimating 3D human body posture based on antagonistic network of motion link space

Country Status (1)

Country Link
CN (1) CN110826500B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111598954A (en) * 2020-04-21 2020-08-28 哈尔滨拓博科技有限公司 Rapid high-precision camera parameter calculation method
CN111462274A (en) * 2020-05-18 2020-07-28 南京大学 Human body image synthesis method and system based on SMP L model
CN111914618B (en) * 2020-06-10 2024-05-24 华南理工大学 Three-dimensional human body posture estimation method based on countermeasure type relative depth constraint network

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108549876A (en) * 2018-04-20 2018-09-18 重庆邮电大学 The sitting posture detecting method estimated based on target detection and human body attitude
CN109949368A (en) * 2019-03-14 2019-06-28 郑州大学 A kind of human body three-dimensional Attitude estimation method based on image retrieval
CN110135375A (en) * 2019-05-20 2019-08-16 中国科学院宁波材料技术与工程研究所 More people's Attitude estimation methods based on global information integration

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8971597B2 (en) * 2005-05-16 2015-03-03 Intuitive Surgical Operations, Inc. Efficient vision and kinematic data fusion for robotic surgical instruments and other applications
US8994790B2 (en) * 2010-02-25 2015-03-31 The Board Of Trustees Of The Leland Stanford Junior University Motion capture with low input data constraints

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108549876A (en) * 2018-04-20 2018-09-18 重庆邮电大学 The sitting posture detecting method estimated based on target detection and human body attitude
CN109949368A (en) * 2019-03-14 2019-06-28 郑州大学 A kind of human body three-dimensional Attitude estimation method based on image retrieval
CN110135375A (en) * 2019-05-20 2019-08-16 中国科学院宁波材料技术与工程研究所 More people's Attitude estimation methods based on global information integration

Also Published As

Publication number Publication date
CN110826500A (en) 2020-02-21

Similar Documents

Publication Publication Date Title
Li et al. Deepim: Deep iterative matching for 6d pose estimation
CN110826500B (en) Method for estimating 3D human body posture based on antagonistic network of motion link space
CN111968217B (en) SMPL parameter prediction and human body model generation method based on picture
CN109978021B (en) Double-flow video generation method based on different feature spaces of text
CN108154104A (en) A kind of estimation method of human posture based on depth image super-pixel union feature
CN111695523B (en) Double-flow convolutional neural network action recognition method based on skeleton space-time and dynamic information
CN110135277B (en) Human behavior recognition method based on convolutional neural network
Yin et al. Bridging the gap between semantic segmentation and instance segmentation
CN113221647A (en) 6D pose estimation method fusing point cloud local features
CN112183675B (en) Tracking method for low-resolution target based on twin network
CN112819951A (en) Three-dimensional human body reconstruction method with shielding function based on depth map restoration
CN116030498A (en) Virtual garment running and showing oriented three-dimensional human body posture estimation method
CN115063717B (en) Video target detection and tracking method based on real scene modeling of key area
CN114463492A (en) Adaptive channel attention three-dimensional reconstruction method based on deep learning
CN104463962B (en) Three-dimensional scene reconstruction method based on GPS information video
CN114743273A (en) Human skeleton behavior identification method and system based on multi-scale residual error map convolutional network
CN117252928B (en) Visual image positioning system for modular intelligent assembly of electronic products
Peng et al. RGB-D human matting: A real-world benchmark dataset and a baseline method
CN117711066A (en) Three-dimensional human body posture estimation method, device, equipment and medium
CN111507276B (en) Construction site safety helmet detection method based on hidden layer enhanced features
CN112288812A (en) Mobile robot real-time positioning method based on visual features
CN117152829A (en) Industrial boxing action recognition method of multi-view self-adaptive skeleton network
CN116524601A (en) Self-adaptive multi-stage human behavior recognition model for assisting in monitoring of pension robot
CN116152926A (en) Sign language identification method, device and system based on vision and skeleton information fusion
Gao et al. Study of improved Yolov5 algorithms for gesture recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant