CN111914618A - Three-dimensional human body posture estimation method based on countermeasure type relative depth constraint network - Google Patents

Three-dimensional human body posture estimation method based on countermeasure type relative depth constraint network Download PDF

Info

Publication number
CN111914618A
CN111914618A CN202010521352.7A CN202010521352A CN111914618A CN 111914618 A CN111914618 A CN 111914618A CN 202010521352 A CN202010521352 A CN 202010521352A CN 111914618 A CN111914618 A CN 111914618A
Authority
CN
China
Prior art keywords
human body
dimensional
depth
body posture
error
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010521352.7A
Other languages
Chinese (zh)
Other versions
CN111914618B (en
Inventor
刘阳温
李桂清
韦国栋
聂勇伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN202010521352.7A priority Critical patent/CN111914618B/en
Publication of CN111914618A publication Critical patent/CN111914618A/en
Application granted granted Critical
Publication of CN111914618B publication Critical patent/CN111914618B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/061Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using biological neurons, e.g. biological neurons connected to an integrated circuit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Human Computer Interaction (AREA)
  • Neurology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Social Psychology (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Psychiatry (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a three-dimensional human body posture estimation method based on an antagonistic relative depth constraint network, which comprises the following steps: 1) inputting two-dimensional pixel coordinates of 16 joint points of a human body, and carrying out normalization pretreatment; 2) inputting two-dimensional pixel coordinates to a depth prediction network, and outputting depth values of 16 joint points of a human body; 3) reconstructing three-dimensional coordinates of the joint point by using the depth value and the two-dimensional pixel coordinates; 4) inputting the three-dimensional human body posture to a discriminator of a generating countermeasure network for calculating the authenticity error, and calculating the relative depth error by using the relative depth information between the three-dimensional human body posture and each joint point corresponding to the image; 5) and adding the authenticity error calculated by the discriminator of the generative countermeasure network and the relative depth error to obtain a total error, and feeding the total error back to the depth prediction network to obtain a more accurate three-dimensional human body posture. The invention solves the problem that the relative depth relation between the result of the outdoor three-dimensional human body posture data lack and the generation type confrontation network method and the joint points of the picture is not consistent.

Description

Three-dimensional human body posture estimation method based on countermeasure type relative depth constraint network
Technical Field
The invention relates to the technical field of three-dimensional human body posture estimation, in particular to a three-dimensional human body posture estimation method based on a countermeasure type relative depth constraint network.
Background
The three-dimensional human body posture estimation refers to a process of estimating three-dimensional coordinates of each main joint point of a human body in an image from the image and representing the three-dimensional posture of the human body in the image. In recent years, with new application scenes which are driven to increase by current technological progress, three-dimensional human body posture estimation has wide application value in the aspects of human-computer interaction, motion estimation, animation, virtual reality and the like, and becomes a basic and challenging subject.
Due to the development of deep learning and the easy acquisition of two-dimensional human body posture data, the field of two-dimensional human body posture estimation is greatly developed and broken through. However, in the aspect of three-dimensional human body posture estimation, since the three-dimensional human body posture data acquisition work is difficult and the cost is high, the three-dimensional human body posture data available for network learning is less. Most of the existing three-dimensional human body posture data are manually collected indoors through a precise instrument. Therefore, the existing three-dimensional human body posture estimation method is poor in outdoor image performance due to the lack of abundant outdoor three-dimensional human body posture data.
Due to the mature development of two-dimensional posture estimation and the difficulty in acquiring three-dimensional human body posture data. The existing three-dimensional human body posture estimation method tends to estimate the three-dimensional human body posture through a weak supervision method from two-dimensional human body postures. The weak supervision mode aims to learn the prior attributes of the three-dimensional human body posture, such as the attributes of the length of bones and the included angle between bones of the three-dimensional human body posture through a constraint neural network, and does not need full supervision on three-dimensional human body posture data corresponding to pictures one by one, so that the limitation of lack of outdoor three-dimensional human body posture data is relieved. In order to generate a more reasonable three-dimensional human body posture by a weak supervision neural network, the existing method adopts a generating type countermeasure network to carry out weak supervision learning of the three-dimensional human body posture. The method of generating the countermeasure network aims to generate the three-dimensional human body posture which accords with the distribution of the existing three-dimensional human body posture data by utilizing the existing acquired three-dimensional human body posture data and by using the neural network called as a generator under weak supervision. The mode of the generated countermeasure network can enable the generator to learn a reasonable three-dimensional human body posture, for example, the lengths of the left arm and the right arm of the human body are symmetrical and equal, the included angle between the skeletons is reasonable, the reprojection is superposed with the two-dimensional human body posture, and the like. However, the existing generative countermeasure network method focuses on the constraint of the existing collected three-dimensional human body posture data distribution, but neglects the constraint of the relative depth between the joint points of the human body corresponding to the image, so that the estimated three-dimensional human body posture conforms to the existing collected three-dimensional human body posture data distribution, but does not conform to the relative depth relationship between the corresponding joint points in the image. The relative depth refers to the relative relationship between the distance from the camera to the respective joint points of the human body in the image. The relative depth can be obtained from the image through human eye observation, and is easy to obtain compared with the difficulty of capturing real three-dimensional coordinates. The relative depth information can be used as a kind of weakly supervised information.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a three-dimensional human body posture estimation method based on a countermeasure type relative depth constraint network, the limitation that the three-dimensional human body posture data is difficult to acquire is solved through a weak supervision method, and the defect that the three-dimensional human body posture estimated by the existing method using the generation type countermeasure network does not conform to the relative depth relation corresponding to an image is solved through combining the generation type countermeasure network and the relative depth constraint.
In order to achieve the purpose, the technical scheme provided by the invention is as follows: the three-dimensional human body posture estimation method based on the antagonistic relative depth constraint network comprises the following steps:
1) inputting two-dimensional pixel coordinates of 16 joint points of a human body, and performing normalization pretreatment;
2) inputting two-dimensional pixel coordinates of the human body after normalization preprocessing of 16 joint points into a depth prediction network, and outputting depth values of the 16 joint points of the human body;
3) reconstructing three-dimensional coordinates of the joint points by using the depth values and the two-dimensional pixel coordinates of the 16 joint points to obtain a reconstructed three-dimensional human body posture;
4) inputting the reconstructed three-dimensional human body posture to a discriminator of a generative countermeasure network for calculating the authenticity error, and simultaneously calculating the relative depth error by using the reconstructed three-dimensional human body posture and the relative depth information between the joint points corresponding to the image;
5) and adding the authenticity error calculated by the discriminator of the generative countermeasure network and the relative depth error to obtain a total error, feeding the total error back to the depth prediction network, and constraining the depth prediction network to predict a more accurate depth value so as to reconstruct and obtain a more accurate three-dimensional human body posture.
In step 1), for each human body, subtracting the average value of the two-dimensional pixel coordinates of 16 joint points of the human body from the two-dimensional pixel coordinates of each joint point, and then dividing the average value by the standard deviation of the two-dimensional pixel coordinates of the 16 joint points of the human body, thereby obtaining the two-dimensional pixel coordinates after normalization preprocessing.
In step 2), inputting the two-dimensional pixel coordinates after normalization preprocessing of each joint point obtained in the previous step into a depth prediction network composed of three modules to predict the depth values of 16 joint points of the human body, and the method comprises the following steps:
2.1) inputting the two-dimensional pixel coordinates after normalization preprocessing of each joint point into a feature extraction module to extract features, wherein the feature extraction module consists of a full connection layer containing 1024 neurons and a linear rectification activation function layer;
2.2) inputting the features extracted by the feature extraction module into a residual error network module for feature learning, wherein the residual error network module consists of two residual error blocks, each residual error block inputs the output value of the upper layer of the neural network into a full connection layer containing 1024 neurons and a linear rectification activation function layer to output a preliminary feature value, then inputs the preliminary feature value into the full connection layer containing 1024 neurons to output a further feature value, then adds the further feature value with the input value input into the residual error block, and finally inputs the feature value obtained by addition into a linear rectification activation function layer and outputs the feature value of the residual error block to the lower layer of the neural network;
2.3) inputting the output characteristics of the residual error network module into a depth value regression module, wherein the depth value regression module consists of a full connection layer containing 16 neurons, and the depth value regression module inputs the output characteristics of the residual error network module and outputs the depth values of 16 joint points of the human body.
In step 3), reconstructing three-dimensional coordinates of the joint points by using the depth values and two-dimensional pixel coordinates of the 16 joint points, which are as follows:
assuming that the two-dimensional pixel coordinate of a certain joint point of the human body is (u, v), wherein u is the horizontal coordinate of the joint point in the image, and v is the vertical coordinate of the joint point in the image; assuming that the depth value predicted by the joint point in the previous step is H and the focal length corresponding to the image is f, the three-dimensional coordinate of the joint point is
Figure BDA0002532245810000041
And reconstructing the three-dimensional coordinates of each joint point to reconstruct the three-dimensional coordinates of 16 joint points of the human body, wherein the three-dimensional coordinates of the 16 joint points of the human body form the three-dimensional posture of the human body.
In step 4), inputting the reconstructed three-dimensional human body posture to a discriminator of a generative countermeasure network for authenticity error calculation, and simultaneously performing relative depth error calculation by using the reconstructed three-dimensional human body posture and relative depth information between the joint points corresponding to the image, wherein the method comprises the following steps:
4.1) taking the three-dimensional human body posture obtained by the last step of reconstruction as a false sample, taking the existing collected three-dimensional human body posture data as a true sample, and inputting the sample into a discriminator of a generative countermeasure network, so that the reconstructed three-dimensional human body posture can be in accordance with the existing collected real three-dimensional human body posture data distribution, and a more reasonable three-dimensional human body posture can be obtained; the discriminator of the generative countermeasure network consists of an upper layer full-connection characteristic extraction module, a lower layer full-connection characteristic extraction module and a full-connection true and false prediction module; firstly, inputting a three-dimensional human body posture sample into an upper layer and a lower layer of fully-connected feature extraction modules for feature extraction, splicing features extracted by the upper layer and the lower layer of fully-connected feature extraction modules to obtain merged features, inputting the merged features into a fully-connected true and false prediction module for sample true and false judgment, outputting a judgment value of the sample, and calculating the authenticity error of the three-dimensional human body posture by using a loss function of a generative countermeasure network through the judgment value; the upper fully-connected feature extraction module and the lower fully-connected feature extraction module have the same structure and are both composed of a feature extraction module in a depth prediction network and a residual error network module composed of a residual error block, and the fully-connected true-false prediction module is composed of a fully-connected layer containing 1024 neurons, a linear rectification activation function layer and a fully-connected layer containing 1 neuron;
4.2) calculating relative depth errors by using the reconstructed three-dimensional human body posture and the relative depth information between the joint points corresponding to the image, acquiring the relative depth information between the joint points of the human body in the image through human eye observation of the image, and storing the relative depth relation information between the joint points in a matrix form of 16 rows and 16 columns, wherein the method specifically comprises the following steps: from the image observation, if the ith joint point of the human body is closer to the camera than the jth joint point, the element value r (i, j) of the ith row and j column of the matrix is 1; r (i, j) is-1 if the ith joint point is farther from the camera than the jth joint point; the distance difference between the ith joint point and the camera is within a set range than that between the jth joint point and the camera, and then r (i, j) is 0; wherein i and j are integers with values in an interval [1,16], r is a matrix for storing relative depth information between joint points, and r (i, j) is an element value of an ith row and a jth column in the matrix and is used for representing the relative depth relation between an ith joint point and a jth joint point;
calculating the relative depth error between each pair of joint points in the three-dimensional human body posture obtained by reconstruction in the step 3) by using the obtained matrix of the relative depth information, specifically:
Figure BDA0002532245810000051
in the formula, Li,jRepresenting a relative depth error value of a point pair formed by the ith joint point and the jth joint point in the three-dimensional human body posture; r (i, j) represents the relative depth relation between the ith joint point and the jth joint point, and the value is {1, -1, 0 }; l r (i, j) | represents the absolute value of r (i, j); hiAnd HjRespectively representing the depth values of the ith joint point and the jth joint point obtained in the depth prediction network; finally, the relative depth error sum of 256 point pairs formed by 16 joint points of the human body is calculated according to the relative depth error between each pair of joint points of the human body, and the method specifically comprises the following steps:
Figure BDA0002532245810000052
in the formula, LrankThe depth error sum of the relative depth of 256 point pairs formed by 16 joint points of a human body is represented, wherein (i, j) represents a point pair formed by the ith joint point and the jth joint point in the human body, and B represents a set formed by 256 point pairs formed by 16 joint points of the human body in a pairwise manner; and calculating the sum of the relative depth errors of the 256 point pairs formed by the 16 joint points of the human body pairwise, and expressing the sum as the relative depth error of the three-dimensional human body posture of the human body.
In the step 5), the authenticity error calculated by the discriminator of the generative countermeasure network is added with the relative depth error to obtain the total error of the reconstructed three-dimensional human body posture in the two aspects of authenticity and relative depth, the error is fed back to the depth prediction network through the backward gradient descent propagation of the neural network, and the parameters in the depth prediction network are updated, so that the neural network can learn the authenticity of the three-dimensional human body posture and the relative depth information between the joint points corresponding to the picture, predict more accurate joint point depth, and reconstruct to obtain more accurate three-dimensional human body posture.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. the method adopts the generative countermeasure network to carry out weak supervision, only needs to use the existing acquired three-dimensional human body posture data for training, does not need to acquire the three-dimensional human body posture data corresponding to the image for full supervision, thereby relieving the problem that the three-dimensional human body posture data is difficult to acquire, and having the advantage of wider application.
2. The invention adopts a mode of combining the generative confrontation network and the relative depth constraint, fully utilizes the relative depth information among all the joint points in the picture on the basis of obtaining a more reasonable three-dimensional human body posture through the generative confrontation network, and leads the estimated three-dimensional human body posture to be more in line with the three-dimensional posture corresponding to the human body in the image, thereby obtaining higher precision.
3. The network of the invention adopts a simple full connection layer, has simple network structure and fast and efficient calculation, thereby achieving real-time performance.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
Fig. 2 is a schematic diagram of 16 joint points of a human body.
FIG. 3 is a block diagram of a depth prediction network; in the figure, Linear represents a full connection layer, the number below the Linear represents the number of neurons contained in the full connection layer, RELU represents a Linear rectification activation function layer, the content in a large box represents the structure of a residual block, and x 2 in the upper right corner of the large box represents that two residual blocks exist.
FIG. 4 is a diagram of an arbiter structure of a generative countermeasure network; in the figure, Linear represents a fully-connected layer, the number below Linear represents the number of neurons included in the fully-connected layer, RELU represents a Linear rectification activation function layer, FCnet represents a fully-connected feature extraction module network, and Concat represents the concatenation of features extracted by the fully-connected feature extraction modules of the upper and lower layers.
Fig. 5 is a network structure diagram of a fully-connected feature extraction module in an arbiter of a generative countermeasure network. In the figure, Linear represents a fully-connected layer, the number below Linear represents the number of neurons included in the fully-connected layer, and RELU represents a Linear rectification activation function layer.
Detailed Description
The present invention will be further described with reference to the following specific examples.
The three-dimensional human body posture estimation method based on the antagonistic relative depth constraint network provided by the embodiment has a complete flow of three-dimensional human body posture estimation as shown in fig. 1. Firstly, inputting two-dimensional pixel coordinates of 16 joint points of a human body, and carrying out normalization pretreatment; secondly, inputting two-dimensional pixel coordinates of the human body after normalization preprocessing of 16 joint points into a depth prediction network, and outputting depth values of the 16 joint points of the human body; then, reconstructing three-dimensional coordinates of the joint points by using the depth values and two-dimensional pixel coordinates of the 16 joint points; then, inputting the reconstructed three-dimensional human body posture to a discriminator of a generative countermeasure network for authenticity error calculation, and simultaneously utilizing the reconstructed three-dimensional human body posture and relative depth information among all joint points corresponding to the image for relative depth error calculation; and finally, adding the authenticity error calculated by the discriminator of the generative countermeasure network and the relative depth error to obtain a total error, feeding the total error back to the depth prediction network, and constraining the depth prediction network to predict a depth value with a smaller total error, thereby reconstructing to obtain a more accurate three-dimensional human body posture. The specific situation is as follows:
1) inputting two-dimensional pixel coordinates of human body joint points, and then carrying out normalization processing on the two-dimensional pixel coordinates of the human body joint points, wherein the normalization processing specifically comprises the following steps: for each human body, subtracting the mean value of the two-dimensional pixel coordinates of the 16 joint points of the human body from the two-dimensional pixel coordinates of each joint point, and then dividing the mean value by the standard deviation of the two-dimensional pixel coordinates of the 16 joint points of the human body, thereby obtaining the two-dimensional pixel coordinates after normalization preprocessing. The human body has 16 joint points as shown in figure 2.
2) The structure of the depth prediction network is shown in fig. 3. In the figure, line indicates a fully-connected layer, and the number below line indicates the number of neurons included in the fully-connected layer. RELU denotes the linear rectifying activation function layer. The content in the large box represents the structure of the residual block, and the x 2 in the upper right corner of the large box represents that there are two residual blocks. Inputting the two-dimensional pixel coordinates of the human body after normalization preprocessing of 16 joint points into a depth prediction network, and outputting the depth values of the 16 joint points of the human body. Inputting the two-dimensional pixel coordinates obtained in the last step after normalization preprocessing of all the joint points into a depth prediction network consisting of three modules to predict the depth values of 16 joint points of the human body, and the method comprises the following steps:
2.1) inputting the two-dimensional pixel coordinates after normalization preprocessing of each joint point into a feature extraction module to extract features. The feature extraction module consists of a fully-connected layer containing 1024 neurons and a linear rectification activation function layer.
2.2) inputting the features extracted by the feature extraction module into the residual error network module for feature learning. The residual error network module is composed of two residual error blocks. Each residual block is to input the output value of the upper layer of the neural network into a full connection layer containing 1024 neurons and a linear rectification activation function layer to output a preliminary characteristic value, then input the preliminary characteristic value into the full connection layer containing 1024 neurons to output a further characteristic value, then add the further characteristic value and the input value input into the residual block, finally input the characteristic value obtained by the addition into the linear rectification activation function layer, and output the residual block characteristic value to the next layer of the neural network.
2.3) inputting the output characteristics of the residual error network module to a depth value regression module. The depth value regression module consists of a fully connected layer containing 16 neurons. The depth value regression module inputs the output characteristics of the residual error network module and outputs the depth values of 16 joint points of the human body.
3) The three-dimensional coordinates of the joint points are reconstructed by using the depth values and the two-dimensional pixel coordinates of the 16 joint points, which are as follows:
suppose that the two-dimensional pixel coordinate of a certain joint point of the human body is (u, v), wherein u is the horizontal coordinate of the joint point in the image, and v is the vertical coordinate of the joint point in the image. Assuming that the depth value predicted by the joint point in the previous step is H and the focal length corresponding to the image is f, the three-dimensional coordinate of the joint point is
Figure BDA0002532245810000091
And reconstructing the three-dimensional coordinates of each joint point to reconstruct the three-dimensional coordinates of 16 joint points of the human body. The three-dimensional coordinates of the 16 joint points of the human body constitute the three-dimensional posture of the human body.
4) The structure of the arbiter of the generative countermeasure network is shown in fig. 4. In the figure, line indicates a fully-connected layer, and the number below line indicates the number of neurons included in the fully-connected layer. RELU denotes the linear rectifying activation function layer. FCnet represents a fully connected feature extraction module network. Concat represents that the features extracted by the upper and lower layers of fully-connected feature extraction modules are spliced. The network structure of the fully-connected feature extraction module in the arbiter of the generative countermeasure network is shown in fig. 5. The method comprises the following steps of performing error calculation by using a discriminator of a generative countermeasure network and relative depth information, inputting a reconstructed three-dimensional human body posture to the discriminator of the generative countermeasure network for authenticity error calculation, and performing relative depth error calculation by using the reconstructed three-dimensional human body posture and the relative depth information between all joint points corresponding to an image, wherein the method comprises the following steps:
4.1) taking the three-dimensional human body posture obtained by the last step of reconstruction as a false sample, taking the existing collected three-dimensional human body posture data as a true sample, and inputting the sample into a discriminator of a generative countermeasure network, so that the reconstructed three-dimensional human body posture can accord with the existing collected real three-dimensional human body posture data distribution, and a more reasonable three-dimensional human body posture can be obtained; the discriminator of the generative countermeasure network consists of an upper layer full-connection characteristic extraction module, a lower layer full-connection characteristic extraction module and a full-connection true and false prediction module. Firstly, inputting a three-dimensional human body posture sample into an upper layer and a lower layer of fully-connected feature extraction modules for feature extraction; and then splicing the features extracted by the upper and lower layers of fully-connected feature extraction modules to obtain merged features, inputting the merged features into a fully-connected true and false prediction module to judge whether the sample is true or false, outputting a judgment value of the sample, and calculating the authenticity error of the three-dimensional human posture by utilizing a loss function of a generative countermeasure network through the judgment value. The upper layer full-connection feature extraction module and the lower layer full-connection feature extraction module have the same structure and are both composed of a feature extraction module in a depth prediction network and a residual error network module consisting of a residual error block. The full-connection true-false prediction module consists of a full-connection layer containing 1024 neurons, a linear rectification activation function layer and a full-connection layer containing 1 neuron.
And 4.2) calculating a relative depth error by using the reconstructed three-dimensional human body posture and the relative depth information between the joint points corresponding to the image. Through the human eye observation of the image, the relative depth information among all the joint points of the human body in the image can be obtained. The invention uses a matrix form of 16 rows and 16 columns to store the relative depth information between the joint points. The method specifically comprises the following steps: from the image observation, the element value r (i, j) of the ith row and j column of the matrix is 1 if the ith joint point of the human body is obviously closer to the camera than the jth joint point; the ith joint point is significantly farther from the camera than the jth joint point, then r (i, j) is-1; if the i-th joint point is not far away from the camera by a large difference than the j-th joint point, r (i, j) is 0. Wherein i and j are integers whose values are in the interval [1,16], r is a matrix for storing relative depth information between the joint points, and r (i, j) is an element value of the ith row and j column in the matrix and is used for representing the relative depth relationship between the ith joint point and the jth joint point.
Calculating the relative depth error between each pair of joint points in the three-dimensional human body posture obtained by reconstruction in the step 3) by using the obtained matrix of the relative depth information, specifically:
Figure BDA0002532245810000101
in the formula, i and j are both in the range of [1,16]]Is an integer of (1). L isi,jAnd the relative depth error value of a point pair formed by the ith joint point and the jth joint point in the three-dimensional human body posture is represented. And r (i, j) represents the relative depth relation between the ith joint point and the jth joint point and takes the value of {1, -1, 0 }. L r (i, j) | represents the absolute value of r (i, j). HiAnd HjAnd respectively representing the depth values of the ith joint point and the jth joint point obtained in the depth prediction network. Finally, the relative depth error sum of 256 point pairs formed by every two of the 16 joint points of the human body is calculated according to the relative depth error between each pair of joint points of the human body, which specifically comprises the following steps:
Figure BDA0002532245810000111
in the formula, i and j are both in the range of [1,16]]Is an integer of (1). L isi,jAnd the relative depth error value of a point pair formed by the ith joint point and the jth joint point in the three-dimensional human body posture is represented. L isrankRepresents the relative depth error sum of 256 point pairs formed by two of the 16 joint points of the human body. (i, j) represents a point pair consisting of the ith joint point and the jth joint point in the human body, and B represents a set consisting of 256 point pairs consisting of 16 joint points of the human body in pairs. And calculating the sum of the relative depth errors of 256 point pairs formed by every two 16 joint points of the human body, and expressing the sum as the relative depth error of the three-dimensional human body posture of the human body.
5) Adding the authenticity error calculated by the discriminator of the generative countermeasure network and the relative depth error to obtain a total error, feeding the total error back to the depth prediction network, constraining the depth prediction network to predict a more accurate depth value, and reconstructing to obtain a more accurate three-dimensional human body posture, wherein the method specifically comprises the following steps:
and adding the authenticity error and the relative depth error calculated by a discriminator of the generating countermeasure network to obtain the total error of the reconstructed three-dimensional human body posture in the two aspects of authenticity and relative depth, feeding the error back to the depth prediction network through the backward gradient descent propagation of the neural network, and updating parameters in the depth prediction network, so that the neural network can learn the authenticity of the three-dimensional human body posture and the relative depth information between the joint points corresponding to the picture, predict more accurate joint point depth, and reconstruct to obtain more accurate three-dimensional human body posture.
In conclusion, after the scheme is adopted, the invention provides a new weak supervision method for three-dimensional human body posture estimation. The invention combines the generation type confrontation network and the relative depth constraint mode, and utilizes the relative depth relation information among all the joint points in the picture on the basis of obtaining a more reasonable three-dimensional human body posture through the generation type confrontation network, so that the estimated three-dimensional human body posture is more consistent with the three-dimensional posture corresponding to the human body in the image, thereby obtaining higher precision, having practical application value and being worth popularizing.
The above-mentioned embodiments are merely preferred embodiments of the present invention, and the scope of the present invention is not limited thereto, so that the changes in the shape and principle of the present invention should be covered within the protection scope of the present invention.

Claims (6)

1. The three-dimensional human body posture estimation method based on the antagonistic relative depth constraint network is characterized by comprising the following steps of:
1) inputting two-dimensional pixel coordinates of 16 joint points of a human body, and performing normalization pretreatment;
2) inputting two-dimensional pixel coordinates of the human body after normalization preprocessing of 16 joint points into a depth prediction network, and outputting depth values of the 16 joint points of the human body;
3) reconstructing three-dimensional coordinates of the joint points by using the depth values and the two-dimensional pixel coordinates of the 16 joint points to obtain a reconstructed three-dimensional human body posture;
4) inputting the reconstructed three-dimensional human body posture to a discriminator of a generative countermeasure network for calculating the authenticity error, and simultaneously calculating the relative depth error by using the reconstructed three-dimensional human body posture and the relative depth information between the joint points corresponding to the image;
5) and adding the authenticity error calculated by the discriminator of the generative countermeasure network and the relative depth error to obtain a total error, feeding the total error back to the depth prediction network, and constraining the depth prediction network to predict a more accurate depth value so as to reconstruct and obtain a more accurate three-dimensional human body posture.
2. The three-dimensional human body posture estimation method based on the antagonistic relative depth constraint network according to claim 1, characterized in that: in step 1), for each human body, subtracting the average value of the two-dimensional pixel coordinates of 16 joint points of the human body from the two-dimensional pixel coordinates of each joint point, and then dividing the average value by the standard deviation of the two-dimensional pixel coordinates of the 16 joint points of the human body, thereby obtaining the two-dimensional pixel coordinates after normalization preprocessing.
3. The three-dimensional human body posture estimation method based on the antagonistic relative depth constraint network according to claim 1, characterized in that: in step 2), inputting the two-dimensional pixel coordinates after normalization preprocessing of each joint point obtained in the previous step into a depth prediction network composed of three modules to predict the depth values of 16 joint points of the human body, and the method comprises the following steps:
2.1) inputting the two-dimensional pixel coordinates after normalization preprocessing of each joint point into a feature extraction module to extract features, wherein the feature extraction module consists of a full connection layer containing 1024 neurons and a linear rectification activation function layer;
2.2) inputting the features extracted by the feature extraction module into a residual error network module for feature learning, wherein the residual error network module consists of two residual error blocks, each residual error block inputs the output value of the upper layer of the neural network into a full connection layer containing 1024 neurons and a linear rectification activation function layer to output a preliminary feature value, then inputs the preliminary feature value into the full connection layer containing 1024 neurons to output a further feature value, then adds the further feature value with the input value input into the residual error block, and finally inputs the feature value obtained by addition into a linear rectification activation function layer and outputs the feature value of the residual error block to the lower layer of the neural network;
2.3) inputting the output characteristics of the residual error network module into a depth value regression module, wherein the depth value regression module consists of a full connection layer containing 16 neurons, and the depth value regression module inputs the output characteristics of the residual error network module and outputs the depth values of 16 joint points of the human body.
4. The three-dimensional human body posture estimation method based on the antagonistic relative depth constraint network according to claim 1, characterized in that: in step 3), reconstructing three-dimensional coordinates of the joint points by using the depth values and two-dimensional pixel coordinates of the 16 joint points, which are as follows:
assuming that the two-dimensional pixel coordinate of a certain joint point of the human body is (u, v), wherein u is the horizontal coordinate of the joint point in the image, and v is the vertical coordinate of the joint point in the image; suppose that the depth value predicted by the joint in the previous step isH, if the focal distance corresponding to the image is f, the three-dimensional coordinate of the joint point is
Figure FDA0002532245800000021
And reconstructing the three-dimensional coordinates of each joint point to reconstruct the three-dimensional coordinates of 16 joint points of the human body, wherein the three-dimensional coordinates of the 16 joint points of the human body form the three-dimensional posture of the human body.
5. The three-dimensional human body posture estimation method based on the antagonistic relative depth constraint network according to claim 1, characterized in that: in step 4), inputting the reconstructed three-dimensional human body posture to a discriminator of a generative countermeasure network for authenticity error calculation, and simultaneously performing relative depth error calculation by using the reconstructed three-dimensional human body posture and relative depth information between the joint points corresponding to the image, wherein the method comprises the following steps:
4.1) taking the three-dimensional human body posture obtained by the last step of reconstruction as a false sample, taking the existing collected three-dimensional human body posture data as a true sample, and inputting the sample into a discriminator of a generative countermeasure network, so that the reconstructed three-dimensional human body posture can be in accordance with the existing collected real three-dimensional human body posture data distribution, and a more reasonable three-dimensional human body posture can be obtained; the discriminator of the generative countermeasure network consists of an upper layer full-connection characteristic extraction module, a lower layer full-connection characteristic extraction module and a full-connection true and false prediction module; firstly, inputting a three-dimensional human body posture sample into an upper layer and a lower layer of fully-connected feature extraction modules for feature extraction, splicing features extracted by the upper layer and the lower layer of fully-connected feature extraction modules to obtain merged features, inputting the merged features into a fully-connected true and false prediction module for sample true and false judgment, outputting a judgment value of the sample, and calculating the authenticity error of the three-dimensional human body posture by using a loss function of a generative countermeasure network through the judgment value; the upper fully-connected feature extraction module and the lower fully-connected feature extraction module have the same structure and are both composed of a feature extraction module in a depth prediction network and a residual error network module composed of a residual error block, and the fully-connected true-false prediction module is composed of a fully-connected layer containing 1024 neurons, a linear rectification activation function layer and a fully-connected layer containing 1 neuron;
4.2) calculating relative depth errors by using the reconstructed three-dimensional human body posture and the relative depth information between the joint points corresponding to the image, acquiring the relative depth information between the joint points of the human body in the image through human eye observation of the image, and storing the relative depth relation information between the joint points in a matrix form of 16 rows and 16 columns, wherein the method specifically comprises the following steps: from the image observation, if the ith joint point of the human body is closer to the camera than the jth joint point, the element value r (i, j) of the ith row and j column of the matrix is 1; r (i, j) is-1 if the ith joint point is farther from the camera than the jth joint point; the distance difference between the ith joint point and the camera is within a set range than that between the jth joint point and the camera, and then r (i, j) is 0; wherein i and j are integers with values in an interval [1,16], r is a matrix for storing relative depth information between joint points, and r (i, j) is an element value of an ith row and a jth column in the matrix and is used for representing the relative depth relation between an ith joint point and a jth joint point;
calculating the relative depth error between each pair of joint points in the three-dimensional human body posture obtained by reconstruction in the step 3) by using the obtained matrix of the relative depth information, specifically:
Figure FDA0002532245800000041
in the formula, Li,jRepresenting a relative depth error value of a point pair formed by the ith joint point and the jth joint point in the three-dimensional human body posture; r (i, j) represents the relative depth relation between the ith joint point and the jth joint point, and the value is {1, -1, 0 }; l r (i, j) | represents the absolute value of r (i, j); hiAnd HjRespectively representing the depth values of the ith joint point and the jth joint point obtained in the depth prediction network; finally, the relative depth error sum of 256 point pairs formed by 16 joint points of the human body is calculated according to the relative depth error between each pair of joint points of the human body, and the method specifically comprises the following steps:
Figure FDA0002532245800000042
in the formula, LrankThe depth error sum of the relative depth of 256 point pairs formed by 16 joint points of a human body is represented, wherein (i, j) represents a point pair formed by the ith joint point and the jth joint point in the human body, and B represents a set formed by 256 point pairs formed by 16 joint points of the human body in a pairwise manner; and calculating the sum of the relative depth errors of the 256 point pairs formed by the 16 joint points of the human body pairwise, and expressing the sum as the relative depth error of the three-dimensional human body posture of the human body.
6. The three-dimensional human body posture estimation method based on the antagonistic relative depth constraint network according to claim 1, characterized in that: in the step 5), the authenticity error calculated by the discriminator of the generative countermeasure network is added with the relative depth error to obtain the total error of the reconstructed three-dimensional human body posture in the two aspects of authenticity and relative depth, the error is fed back to the depth prediction network through the backward gradient descent propagation of the neural network, and the parameters in the depth prediction network are updated, so that the neural network can learn the authenticity of the three-dimensional human body posture and the relative depth information between the joint points corresponding to the picture, predict more accurate joint point depth, and reconstruct to obtain more accurate three-dimensional human body posture.
CN202010521352.7A 2020-06-10 2020-06-10 Three-dimensional human body posture estimation method based on countermeasure type relative depth constraint network Active CN111914618B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010521352.7A CN111914618B (en) 2020-06-10 2020-06-10 Three-dimensional human body posture estimation method based on countermeasure type relative depth constraint network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010521352.7A CN111914618B (en) 2020-06-10 2020-06-10 Three-dimensional human body posture estimation method based on countermeasure type relative depth constraint network

Publications (2)

Publication Number Publication Date
CN111914618A true CN111914618A (en) 2020-11-10
CN111914618B CN111914618B (en) 2024-05-24

Family

ID=73237497

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010521352.7A Active CN111914618B (en) 2020-06-10 2020-06-10 Three-dimensional human body posture estimation method based on countermeasure type relative depth constraint network

Country Status (1)

Country Link
CN (1) CN111914618B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113066169A (en) * 2021-04-14 2021-07-02 湘潭大学 Human body three-dimensional posture reconstruction method and system based on skeleton length constraint
CN113239892A (en) * 2021-06-10 2021-08-10 青岛联合创智科技有限公司 Monocular human body three-dimensional attitude estimation method based on data enhancement architecture
CN113506131A (en) * 2021-06-29 2021-10-15 安徽农业大学 Personalized recommendation method based on generative confrontation network
CN117456612A (en) * 2023-12-26 2024-01-26 西安龙南铭科技有限公司 Cloud computing-based body posture automatic assessment method and system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110427877A (en) * 2019-08-01 2019-11-08 大连海事大学 A method of the human body three-dimensional posture estimation based on structural information
CN110647991A (en) * 2019-09-19 2020-01-03 浙江大学 Three-dimensional human body posture estimation method based on unsupervised field self-adaption
CN110826500A (en) * 2019-11-08 2020-02-21 福建帝视信息科技有限公司 Method for estimating 3D human body posture based on antagonistic network of motion link space

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110427877A (en) * 2019-08-01 2019-11-08 大连海事大学 A method of the human body three-dimensional posture estimation based on structural information
CN110647991A (en) * 2019-09-19 2020-01-03 浙江大学 Three-dimensional human body posture estimation method based on unsupervised field self-adaption
CN110826500A (en) * 2019-11-08 2020-02-21 福建帝视信息科技有限公司 Method for estimating 3D human body posture based on antagonistic network of motion link space

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
杨彬 等: "基于视频的三维人体姿态估计 杨", 《北京航空航天大学学报》, vol. 45, no. 12, 31 December 2019 (2019-12-31), pages 2463 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113066169A (en) * 2021-04-14 2021-07-02 湘潭大学 Human body three-dimensional posture reconstruction method and system based on skeleton length constraint
CN113066169B (en) * 2021-04-14 2022-06-07 湘潭大学 Human body three-dimensional posture reconstruction method and system based on skeleton length constraint
CN113239892A (en) * 2021-06-10 2021-08-10 青岛联合创智科技有限公司 Monocular human body three-dimensional attitude estimation method based on data enhancement architecture
CN113506131A (en) * 2021-06-29 2021-10-15 安徽农业大学 Personalized recommendation method based on generative confrontation network
CN113506131B (en) * 2021-06-29 2023-07-25 安徽农业大学 Personalized recommendation method based on generated type countermeasure network
CN117456612A (en) * 2023-12-26 2024-01-26 西安龙南铭科技有限公司 Cloud computing-based body posture automatic assessment method and system
CN117456612B (en) * 2023-12-26 2024-03-12 西安龙南铭科技有限公司 Cloud computing-based body posture automatic assessment method and system

Also Published As

Publication number Publication date
CN111914618B (en) 2024-05-24

Similar Documents

Publication Publication Date Title
CN111914618B (en) Three-dimensional human body posture estimation method based on countermeasure type relative depth constraint network
Gomez-Donoso et al. Lonchanet: A sliced-based cnn architecture for real-time 3d object recognition
CN110310317A (en) A method of the monocular vision scene depth estimation based on deep learning
CN111652966A (en) Three-dimensional reconstruction method and device based on multiple visual angles of unmanned aerial vehicle
CN112580515B (en) Lightweight face key point detection method based on Gaussian heat map regression
CN101877143A (en) Three-dimensional scene reconstruction method of two-dimensional image group
CN112598775B (en) Multi-view generation method based on contrast learning
CN109544666A (en) A kind of full automatic model deformation transmission method and system
CN111062326A (en) Self-supervision human body 3D posture estimation network training method based on geometric drive
CN113610046B (en) Behavior recognition method based on depth video linkage characteristics
CN114666564A (en) Method for synthesizing virtual viewpoint image based on implicit neural scene representation
CN112489198A (en) Three-dimensional reconstruction system and method based on counterstudy
CN112819951A (en) Three-dimensional human body reconstruction method with shielding function based on depth map restoration
Lutz et al. Jointformer: Single-frame lifting transformer with error prediction and refinement for 3d human pose estimation
CN114333002A (en) Micro-expression recognition method based on deep learning of image and three-dimensional reconstruction of human face
Zhang et al. Fchp: Exploring the discriminative feature and feature correlation of feature maps for hierarchical dnn pruning and compression
Peng et al. Attention-guided fusion network of point cloud and multiple views for 3D shape recognition
WO2023214093A1 (en) Accurate 3d body shape regression using metric and/or semantic attributes
CN116797640A (en) Depth and 3D key point estimation method for intelligent companion line inspection device
CN115496859A (en) Three-dimensional scene motion trend estimation method based on scattered point cloud cross attention learning
CN116091762A (en) Three-dimensional target detection method based on RGBD data and view cone
CN110543845A (en) Face cascade regression model training method and reconstruction method for three-dimensional face
CN116091793A (en) Light field significance detection method based on optical flow fusion
CN113192186B (en) 3D human body posture estimation model establishing method based on single-frame image and application thereof
CN110517307A (en) The solid matching method based on laser specklegram is realized using convolution

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant