CN112861699A - Method for estimating height of human body in any posture based on single depth image and multi-stage neural network - Google Patents

Method for estimating height of human body in any posture based on single depth image and multi-stage neural network Download PDF

Info

Publication number
CN112861699A
CN112861699A CN202110150551.6A CN202110150551A CN112861699A CN 112861699 A CN112861699 A CN 112861699A CN 202110150551 A CN202110150551 A CN 202110150551A CN 112861699 A CN112861699 A CN 112861699A
Authority
CN
China
Prior art keywords
image
neural network
module
height
depth image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110150551.6A
Other languages
Chinese (zh)
Inventor
尹富坤
周世哲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan University
Original Assignee
Hunan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan University filed Critical Hunan University
Priority to CN202110150551.6A priority Critical patent/CN112861699A/en
Publication of CN112861699A publication Critical patent/CN112861699A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method for estimating the height of a human body in any posture based on a single depth image and a multi-stage neural network, which realizes the accurate height estimation of the human body in any posture and position through a 4-stage neural network framework and a development neural network framework. The method mainly comprises the following steps: human body segmentation, namely segmenting a human body trunk image from a single depth image by using a neural network, and extracting high-frequency detail information as input in order to enable the segmentation edge to be finer; constructing intermediate expression, further dividing the trunk image into four parts with small bending degrees, namely a head, an upper body, a thigh and a shank, and respectively predicting the lengths of the four parts, thereby fully utilizing the excellent performance of the convolutional neural network in local perception; and model design, namely, designing a network architecture and a training method for developing a neural network and combining a mixed pooling strategy to estimate the height of a human body, thereby further improving the network performance and reducing the training time. The average human height prediction accuracy of the invention is more than 99.1%.

Description

Method for estimating height of human body in any posture based on single depth image and multi-stage neural network
Technical Field
The invention relates to the field of computer vision and machine learning, and provides a method for estimating the height of a human body in any posture based on a single depth image and a multi-stage neural network. The specific technology comprises the following steps: the method comprises the steps of depth image segmentation, image feature extraction, image semantic information extraction, neural network architecture design and neural network training. Through the technology, a 4-stage neural network framework is constructed, a training method for developing a neural network is provided, and the accurate estimation of the height of a human body from a single depth image is realized.
Background
In the fields of human body three-dimensional reconstruction, virtual reality, medical health, clothing design and the like, height data is essential data information in the development process of the fields. In most cases, the conventional method generally requires that the person to be measured stands upright, and then the person is measured and read by means of a meter or a height meter. The measurement mode not only consumes a great deal of time and labor, but also requires active cooperation of the measured person and limited use scenes. Particularly, in practical application scenarios, if a measuring person lacks a measuring tool such as a meter or a height meter, or the measured person cannot stand upright due to injury or disease, the conventional height measurement cannot be performed.
In recent years, some methods attempt to acquire information from images or videos to achieve the purpose of measuring the height of a human body in a non-contact manner, so that the problems in the conventional methods can be solved to a certain extent, but the methods still have certain limitations: most methods can only measure simple postures such as standing and walking, or require a tested person to stand at a specified position, so that the use scenes of the methods are greatly limited. Some methods require manual calibration of the head and feet, cannot be fully automated, and require a large amount of manual labeling. Still other methods require the acquisition of multiple photographs, or the use of multiple devices, adding to the expenditure of time and cost.
Aiming at the problems, the method can output a reliable result within millisecond time by only shooting one depth image, realize full-automatic estimation of the height of the human body from the image and save a large amount of manpower and time. Meanwhile, the tested person can make any postures of standing, walking, bending, sitting, standing and the like without being required to be in a certain fixed position in the image, can be positioned at any position in the acquisition range of the depth camera in various postures, and has good adaptability and robustness.
Disclosure of Invention
Technical problem to be solved
The invention aims to provide a method for estimating the height of a human body in any posture based on a single depth image and a multi-stage neural network, which can accurately estimate the height of the human body in any position and any posture by a full-automatic non-contact measuring means.
(II) technical scheme
1. The invention provides a method for estimating the height of a human body in any posture based on a single depth image and a multi-stage neural network, which comprises the following steps:
and S1, acquiring data, wherein the step acquires a human height data set with 2136 depth images by using the depth camera. The tested person can be positioned at any position in the acquisition range of the depth camera and can be put in any posture, including non-upright postures such as sitting, bending, walking and the like. The height of each volunteer and the lengths of four approximately rigid parts, namely the head, the upper body, the thigh and the shank, are measured and recorded. And marking the real values of the corresponding trunk images and the real values of the body part images for each depth image.
And S2, extracting the edge image, and extracting edge high-frequency information in the original depth image by using an edge detection operator.
S3, segmenting human body trunk, and designing a convolutional neural network f1(x) And the human body image is extracted from the original depth image and the edge image.
S4, recognizing body parts and designing a convolutional neural network f2(x) The torso image is further segmented into four approximately rigid body parts: head, upper body, thigh and calf, and obtaining body position image.
S5, predicting the length of the body part, and designing a convolutional neural network f3(x) The lengths of four approximately rigid body parts, i.e., the probe, the upper body, the thigh, and the calf, are predicted from the body part image and the original depth image.
S6, predicting the height of human body, designing a convolutional neural network f4(x) The human height is predicted through the original depth image, the body part image and the body part length. At the same time, different characteristics are adopted according to different input dataUsing a hopping connection structure to input raw input data into each convolutional layer.
S7, designing a developing neural network, predicting task characteristics based on height and a convolutional neural network f4(x) The framework of (2) and a training method of a network structure changing along with the iteration times are designed, and the fitting state is repeatedly destroyed until the neural network finds a global optimal solution.
2. The method of any pose human height estimation based on a single depth image and a multi-stage neural network of claim 1, wherein: and only using a single depth image to predict the height of the human body in a non-contact mode. The input data adopted in all steps of the invention are original depth images or intermediate expressions obtained by the original depth images. Only a low-price commercial-grade depth camera is used for collecting depth data to serve as an original depth image, so that equipment cost is reduced, and the method is easy to use, practical and popularize. The depth image is obtained by using an infrared technology, and the performance is not influenced even if no external light source exists, so that the method can be applied to the fields of night security monitoring and the like. The body height of a human body is predicted in the image in a non-contact full-automatic mode, only one image is needed to be shot, a measurer does not need to be in direct contact with a detected person, and the method is suitable for measuring the body height data of the human body in epidemic situation prevention and control normalization periods, physical examination, riding, ticket selling and other situations. The depth image does not contain human face features and clothing texture features, and the fact that the neural network learns the identity features of the detected person can be avoided only by using a single depth image, so that interference is generated on height estimation.
3. The method of any pose human height estimation based on a single depth image and a multi-stage neural network of claim 1, wherein: and extracting height prediction related features by using a multi-stage neural network, and converting the height estimation problem into a plurality of local small problems. By convolutional neural networks f1(x) Obtaining the image of the human body, and obtaining the image of the human body through a convolution neural network f2(x) Obtaining body part images through a convolution neural network f3(x) Obtaining the length of body part, decomposing the body height estimation into four approximately rigid parts of head, upper half body, thigh and shank, respectively predicting, and analyzingThe four results are integrated into the height of the measured human body. The benefits of this are: the height prediction is decomposed into four rigid part predictions, which is an easier problem; the lengths of the four body parts and the topological relationship between them can suggest the posture of the human body, thereby providing a favorable clue for height estimation; the height prediction problem is divided into four small local problems, and the excellent performance of the convolutional neural network on local perception can be fully utilized.
4. The arbitrary pose human height estimation method based on a single depth image and a multi-stage neural network of claim 1, further comprising: convolutional neural network f1(x) And the method is used for segmenting the human body image from the single depth image. Since in the field of height measurement we define the distance between the apex of the head and the point of the sole of the foot when the body is upright as height, the positioning of the apex of the head and the point of the sole of the foot is of great importance. In the existing human body segmentation method, the segmentation at the body edge is often inaccurate, which influences the selection of the head vertex and the foot bottom point. Our method uses Canny operator to extract edge information from depth image, and enhances the edge of human body segmentation image.
E=Canny(X)
X represents the original depth image acquired by the camera and E represents the corresponding edge image extracted.
f1(x) Including five downsampling and five upsampling modules. In the up-sampling module, a module I and a module II both comprise 2 convolutional layers and activation functions, and a module III, a module IV and a module V all comprise 3 convolutional layers and activation functions. The down-sampling module is symmetrical to the up-sampling module, the first module, the second module and the third module all comprise 3 convolution layers and activation functions, and the fourth module and the fifth module all comprise 2 convolution layers and activation functions. Our method takes the original depth image X and the edge image E as input, and passes through a convolutional neural network f1(x) To obtain a human body trunk segmentation image prediction image T'.
T′=f1(X,E)
Convolutional neural network f1(x) Loss value of Loss1 using a predictive torso image sumMean of the pixel-by-pixel difference sum of squares of the real torso image, Adam is used by the optimizer.
Figure BDA0002932543200000041
N is the total number of pixels in the image, i is a certain pixel point in the cyclic variable representation image, and T is a real trunk image. The high-frequency information of the human body edge is input into the neural network, so that the accuracy of the edge in the human body trunk segmentation graph can be obviously improved.
5. The arbitrary pose human height estimation method based on a single depth image and a multi-stage neural network of claim 1, further comprising: convolutional neural network f2(x) The method is used for further dividing the human body image into four approximately rigid parts, namely a head, an upper body, thighs and calves, so as to obtain the human body image. Convolutional neural network f2(x) And f1(x) The network structures of the modules are the same, and each module comprises 5 down-sampling modules and 5 up-sampling modules, and each module comprises a plurality of convolution layers and an activation function. To a convolutional neural network f2(x) The prediction graph L ' of the human body part segmentation image is obtained by inputting the prediction graph T ' of the human body trunk segmentation image in the middle, because the prediction graph L ' of the human body part segmentation image is obtained by the convolutional neural network f1(x) The obtained trunk segmentation image T' has errors, so the images are simultaneously f2(x) The original depth image X is input to avoid accumulation of errors.
L′=f2(X,T′)
Convolutional neural network f2(x) Loss value Loss2 is the average of the sum of squared pixel-by-pixel differences between the predicted body region image and the actual body region image.
Figure BDA0002932543200000042
N is the total number of pixels in the image, i is a certain pixel point in the cyclic variable representation image, and L is a real human body part image. The optimizer employs Adam to minimize the error of predicted and real images until the network can robustly achieve convergence.
6. The arbitrary pose human height estimation method based on a single depth image and a multi-stage neural network of claim 1, further comprising: convolutional neural network f3(x) The method is used for predicting the lengths of four approximately rigid parts, namely the measuring head, the upper half body, the thigh and the lower leg. Convolutional neural network f3(x) The device comprises 13 convolutional layers, 5 full-connection layers and corresponding activation functions, wherein the 13 convolutional layers form 5 up-sampling modules, a first module and a second module respectively comprise 2 convolutional layers and activation functions, and a third module, a fourth module and a fifth module respectively comprise 3 convolutional layers and activation functions. And outputting a 1 x 4 vector representing the length of 4 body parts through 6 fully-connected layers, wherein the number of nodes of the 6 fully-connected layers is respectively as follows: 4096. 4096, 1000, 256, 64, 4. To a convolutional neural network f3(x) Meanwhile, in order to reduce accumulated errors, an original depth image X is input into the network, and length estimated values of 4 body parts including the head, the upper body, the thigh and the lower leg are obtained.
[HheadHupperbodyHthighHcalf]1*4=f3(X,L)
HheadIs the head length, HupperbodyTo upper body length, HthighIs thigh length, HcalfIs the calf length.
The Loss value Loss3 uses the sum of the squares of the difference of the predicted four-part length and the true value.
Loss3=|Hhead-THhead|2+|Hupperbody-THupperbody|2+|Hthigh-THthigh|2+|Hcalf-THcalf|2
THhead,THuooerbody,THthigh,THcalfRepresenting the actual length of the head, upper body, thigh and calf, respectively. The optimizer uses Adam to predict the lengths of the 4 approximately rigid sites separately by minimizing the loss values.
7. The single-sheet based depth map of claim 1The method for estimating the height of the human body with any posture, such as the multi-stage neural network, further comprises the following steps: convolutional neural network f4(x) The height of the human body is estimated by the body part image, the body part length and the original depth image. Convolutional neural network f4(x) The method comprises 13 convolutional layers, 7 full-connection layers and corresponding activation functions, wherein the 13 convolutional layers form 5 upsampling modules, a first module and a second module both comprise 2 convolutional layers and activation functions, a third module, a fourth module and a fifth module all comprise 3 convolutional layers and activation functions, the last convolutional layer result is unfolded into a one-dimensional vector, and an estimated height value is output through the 7 full-connection layers, wherein the number of nodes of the 7 full-connection layers is 4096, 1000, 256, 64, 16 and 1. In order to reduce accumulated errors and improve the accuracy of prediction, an original depth image X and a human body part segmentation image prediction image L' are input into a convolutional neural network f together4(x) In (1). Meanwhile, the invention also adopts a jump structure, and adopts different pooling strategies according to the characteristics of different input data: and adopting an average pooling strategy for the depth image and a maximum pooling strategy for the human body part segmentation image, so that input data with different scales are directly input into each convolution layer.
[Hhuman]=f4(X,L′,H4-part)
HhumanIs an estimate of body height, H4-partAre predicted values of the 4 approximate rigid body part lengths in step S5.
Convolutional neural network f4(x) Loss value Loss4 uses the square of the difference between the estimated height and the actual height.
Loss4=|Hhuman-THhuman|2
THhumanIs the true value of the height of the human body. The optimizer estimates body height by minimizing the loss value using Adam.
8. The arbitrary pose human height estimation method based on a single depth image and a multi-stage neural network of claim 7, further comprising: a hopping connection structure with a hybrid pooling strategy. Convolutional neural network f4(x) Having 13 convolution layers, at f4(x) The problem that the gradient disappears under the condition that the network layer number is deep can be solved by adding the jump connection, and meanwhile, the reverse propagation of the gradient is facilitated, and the training process is accelerated. In steps S3, S4, and S5, we use three neural networks to obtain length estimates of the head, upper body, thigh, and lower leg of the human body, and input them as input data to the convolutional neural network f4(x) In the method, because the estimated value predicted by each neural network has an error with the true value, and the current error is input into the neural network of the next stage, the error is increased. We use the skip-join structure to input the original depth image X and the body part image L directly into the convolutional neural network f4(x) To minimize the accumulated error.
Since the original depth image X and the body part image L have different characteristics: the original depth image X has a plurality of noise points, which is expressed on the depth image, namely a plurality of extreme points, and if a maximum pooling strategy is simply used, the noise points are reserved, the depth information is lost, the network learning is interfered, and a larger error is caused; the body part image L is smoother and if an average pooling strategy is used, the gradient at the body part boundary is reduced, introducing errors. Therefore, an average pooling strategy is adopted for the original depth image X, noise of the depth image is smoothed, and the accurate original depth image under different scales is input into a network. And adopting a maximum pooling strategy for the body part image L, keeping the accuracy of the segmentation boundary, and still keeping gradient information when the image size is reduced.
Lnext=Maxpool(Lnow)
Xnext=Avgpool(Xnow)
LnowIs a body part image of the current scale, LnextIs the body part image of the next scale. XnowFor depth images of the current scale, XnextIs the depth image of the next scale.
The original information is input into each convolution module by adopting a jump connection structure of a mixed pooling strategy, so that the original image can be kept undistorted under each scale to the maximum extent, and the accuracy of network prediction is improved.
9. The method for arbitrary pose body height estimation based on a single depth image and a multi-stage neural network of claim 7, further comprising: architecture and training methods for neural networks were developed. The invention proposes to develop a neural network and apply it to a convolutional neural network f4(x) So as to improve the network precision, reduce the training time and prevent overfitting.
The main idea of developing neural networks is: in the network training process, when the network tends to converge, the architecture of the convolution layer part of the network is adjusted to jump out of the local minimum value, and a global optimal solution is searched.
The specific method comprises the following steps: in a convolutional neural network f4(x) When the number of iterations is less than 4 x 10 in the training process4In time, the network is pre-trained, the neural network f4(x) 13 convolution layers of the middle 5 modules are all in working states; when the number of iterations equals 4 x 104Then, storing the pre-training model; when the number of iterations is greater than 4 x 104And less than 6 x 104Time, neural network f4(x) Each convolution module only reserves the first convolution layer, and the total number of the convolution layers is 5 for training; when the number of iterations is greater than 6 x 104And less than 8 x 104And then, the first module and the second module recover one convolutional layer from the pre-training model respectively, and at the moment, 7 convolutional layers in the network participate in training. When the number of iterations is more than 8 x 104And less than 1 x 105And then, recovering one convolutional layer from the pre-training model by using a module three, a module four and a module five, wherein 10 convolutional layers in the network participate in training. When the number of iterations is greater than 1 x 105In time, module three, module four, and module five each recover one convolutional layer from the pre-trained model, i.e., recover the first 13 convolutional layers.
Developing a neural network can be accomplished by fitting and destroying the fitting conditions iteratively until the network finds a globally optimal solution. Since when the number of iterations is greater than 4 x 104And less than 1 x 105Only part of the convolutional layer is trained, so that the network training time can be reduced. When the network tends to converge, by increasingThe deconvolution layer prevents the network from being over-fitted, so that the network jumps out of the local optimal solution to search the global optimal solution, thereby effectively improving the accuracy of height estimation.
Drawings
FIG. 1 is a schematic diagram of a framework of an arbitrary posture human height estimation method based on a single depth image and a multi-stage neural network according to an embodiment of the present invention. Firstly extracting edge images from the acquired depth images and then inputting the edge images into a neural network f1(x) Obtaining a trunk image; the torso image and depth image are then input to a neural network f2(x) Obtaining a body part image; inputting the body position image and the depth image into a neural network f3(x) Obtaining a predicted value of the length of the body part; finally, the length of the body part, the body part image and the depth image are input into the neural network f4(x) And outputting the estimated value of the height of the human body.
FIG. 2 shows a convolutional neural network f according to an embodiment of the present invention1(x) The network structure chart of (1) inputting the depth image and the edge image and outputting the trunk image. The system comprises 10 convolution modules and 26 convolution layers. The first 5 convolution modules are used to extract features, and the last 5 convolution modules generate a torso image.
FIG. 3 shows a convolutional neural network f according to an embodiment of the present invention2(x) The network structure chart of (1) inputs the depth image and the body image and outputs the body position image. The system comprises 10 convolution modules and 26 convolution layers. The first 5 convolution modules are used to extract features, and the last 5 convolution modules generate body part images.
FIG. 4 shows a convolutional neural network f according to an embodiment of the present invention3(x) The network structure chart of (1) inputs the depth image and the body part image, and outputs the body part length estimation value. The system comprises 5 convolution modules, 13 convolution layers and 6 full-connection layers. The 5 convolution modules are used for extracting features, and length estimated values of 4 body parts including the head, the upper body, the thigh and the shank are obtained through 6 full-connection layers.
FIG. 5 shows a convolutional neural network f according to an embodiment of the present invention4(x) The network structure chart inputs the depth image, the length value of the body part and the body part image and outputs the height of the human body. Contains 5 convolution modules13 convolutional layers and 7 fully connected layers. The 5 convolution modules are used for extracting features, and then the estimated value of the height of the human body is obtained through 7 full-connection layers.
FIG. 6 shows a convolutional neural network f according to an embodiment of the present invention4(x) The network structure changes with the iteration number.
FIG. 7 is an example of some experimental results according to an embodiment of the present invention.
Detailed Description
The invention provides a method for estimating the height of a human body in any posture based on a single depth image and a multi-stage neural network, which realizes the accurate height estimation of the human body in any position and any posture through a 4-stage neural network framework and a development neural network framework. Firstly, a large number of depth images are collected for a plurality of people, the height of a tested person and the length of each part of a body are recorded, and a trunk image and a body part image are marked as truth values to construct a data set. Then the network and model are designed, the height estimation is converted into the length prediction of four approximately rigid body parts, and a 4-stage convolution neural network is designed to complete the process. And finally, improving a network architecture and a training method, and providing an architecture named as developing a neural network, so that the training time is reduced, overfitting is prevented, and the accuracy of network prediction is further improved.
In order that the objects, technical solutions and advantages of the present invention will become more apparent, the present invention will be further described in detail with reference to the accompanying drawings in conjunction with the following specific embodiments.
The invention provides a method for estimating height of a human body in any posture based on a single depth image and a multi-stage neural network, and the method is a frame schematic diagram of the method for estimating the height based on the single depth image. As shown in FIG. 1, the invention mainly realizes the accurate estimation of the human body height through a 4-stage neural network framework, which comprises: neural network f1(x) Neural network f2(x) Neural network f3(x) And neural network f4(x) In that respect Neural network f1(x) Estimating a detected human body trunk image from the depth image and the edge image; neural network f2(x)From depth images and neural networks f1(x) The output human body trunk image estimates four approximately rigid body part segmentation images of the head, the upper half body, the thigh and the shank of the tested person; neural network f3(x) From depth images and neural networks f2(x) The output body part images estimate the corresponding lengths of 4 body parts of the tested person; neural network f4(x) From depth images, body position images and neural networks f3(x) The output length of body part estimates the height of human body, the invention also designs a mixed pooling strategy and develops a neural network and applies the neural network to f4(x) Further improving the accuracy of height estimation.
With reference to fig. 2 to 6, specific technical steps of the above-mentioned framework are described:
and S1, segmenting the human body image through the depth image and the edge image.
The method comprises the following steps of firstly, acquiring an edge image from a depth image by using a Canny operator, and specifically comprises the following steps: and selecting a Gauss filter for smooth filtering of the image, and processing by adopting a non-extreme value suppression technology to obtain edge information of the depth image. The depth image and edge image are then input to a convolutional neural network f1(x) In the method, features of an input image are extracted by 5 downsampling modules, and the image is represented by a low-dimensional feature vector, wherein the first module and the second module respectively comprise 2 convolutional layers, and the third module, the fourth module and the fifth module respectively comprise 3 convolutional layers. And inputting the feature vectors into 5 upsampling modules, wherein each of the module six, the module seven and the module eight comprises 3 convolutional layers, each of the module 9 and the module 10 comprises 2 convolutional layers, and the 5 upsampling modules output corresponding human body trunk images according to the input feature vectors.
Convolutional neural network f1(x) The network architecture is shown in fig. 2.
The Loss value Loss1 quantization neural network f is adopted in the step1(x) The error between the estimated image and the real image is specifically an average value of the pixel-by-pixel difference sum of squares of the predicted torso image and the real torso image.
Figure BDA0002932543200000101
N is the total number of pixels in the image, i is a cyclic variable representing a certain pixel point in the image, X is a depth image, E is an edge image, and T is a real trunk image. The initial learning rate was set to 0.0001 with a reduction of 0.8 every 5 rounds. And calculating an updating step size by adopting an Adam optimizer, and obtaining an estimated human body trunk image by minimizing the Loss 1.
And S2, obtaining a body position image through the depth image and the body image.
The step is to further divide the trunk image according to the body part to obtain the body part image, and simultaneously input the depth image to eliminate the accumulated error, thereby converting the human height estimation into the estimation of the lengths of four approximate rigid parts, namely the head, the upper body, the thigh and the shank. The specific method comprises the following steps: inputting depth images and torso into a convolutional neural network f2(x) In the method, features of an input image are extracted by 5 downsampling modules, and the image is represented by a low-dimensional feature vector, wherein the first module and the second module respectively comprise 2 convolutional layers, and the third module, the fourth module and the fifth module respectively comprise 3 convolutional layers. And inputting the feature vectors into 5 upsampling modules, wherein each of the module six, the module seven and the module eight comprises 3 convolutional layers, each of the module 9 and the module 10 comprises 2 convolutional layers, and the 5 upsampling modules output corresponding body part images according to the input feature vectors.
Convolutional neural network f2(x) The network architecture is shown in fig. 3.
The Loss value Loss2 quantization neural network f is adopted in the step2(x) The error between the estimated image and the real image is specifically an average value of pixel-by-pixel difference sum of squares of the predicted body part image and the real body part image.
Figure BDA0002932543200000102
N is the total number of pixels in the image, i is a certain pixel point in the cyclic variable representation image, X is a depth image, T is a human body trunk image, and L is a real body part image. The initial learning rate was set to 0.0001 with a reduction of 0.8 every 5 rounds. An Adam optimizer is used to calculate the update step size and an estimated body part image is obtained by minimizing Loss 2.
And S3, obtaining the length value of the body part through the depth image and the body part image.
This step estimates the lengths of four approximately rigid body parts, the head, the upper body, the thighs, and the calves, from the depth image and the body part image. The specific method comprises the following steps: inputting depth image and body part image to convolutional neural network f3(x) Firstly, 5 downsampling modules are used for extracting image features, wherein a module I and a module II respectively comprise 2 convolutional layers, and a module III, a module IV and a module V respectively comprise 3 convolutional layers. Then, the output characteristics of the last layer are expanded into 25088-dimensional vectors, 6 full-connected layers are sequentially input, and a four-dimensional vector representing the lengths of the 4 body parts of the head, the upper body, the thigh and the calf is output, and the process can be expressed as follows:
[HheadHupperbodyHthighHcalf]1*4=f3(X,L)
Hheadis the head length, HupperbodyTo upper body length, HthighIs thigh length, HcalfIs the calf length, X is the depth image, and L is the body part image.
Convolutional neural network f3(x) The network structure diagram is shown in fig. 4, and the number of input and output nodes of each fully-connected layer is marked below the layer.
The Loss value Loss3 quantization neural network f is adopted in the step3(x) And outputting an error between the estimated length and the real length, specifically, a sum of squares of differences between the four parts of the estimated length and the real length.
Loss3=|Hhead-THhead|2+|Hupperbody-THupperbody|2+|Hthigh-THthigh|2+|Hcalf-THcalf|2
THhead,THupperbody,THthigh,THcalfRepresenting the actual length of the head, upper body, thigh and calf, respectively. The initial learning rate was set to 0.0001 with a reduction of 0.5 every 50 rounds. The update step size is calculated using an Adam optimizer, and the estimated 4 body part lengths are obtained by minimizing Loss 3.
And S4, obtaining the height of the human body through the depth image, the length of the body part and the body part image.
The step estimates the height of the human body by combining the length of the body part and the image of the body part, and inputs the original depth image to eliminate the accumulated error. The specific method comprises the following steps: inputting the depth image, the body part length and the body part length into a convolutional neural network f4(x) Firstly, 5 downsampling modules are used for extracting height estimation related features, wherein a module I and a module II respectively comprise 2 convolutional layers, and a module III, a module IV and a module V respectively comprise 3 convolutional layers. Jump connection is added before each convolution layer, and the depth images and body part segmentation images under different scales are input before each convolution layer, so that the problem that the gradient disappears under the condition that the network layer number is deep is solved, the reverse propagation of the gradient is facilitated, and the training process is accelerated. In the step, different pooling strategies are selected according to different characteristics of an input image, noise is prevented from being introduced or original image gradient is prevented from disappearing, an average pooling strategy is adopted for a depth image, a maximum pooling strategy is adopted for a body part image, and the depth image and the body part image under different scales are input to each convolution layer. And finally, expanding the output characteristics of the last layer into 25088-dimensional vectors, sequentially inputting 7 fully-connected layers, and outputting a one-dimensional vector to represent the height of the human body, wherein the process can be represented as follows:
[Hhuman]=f4(X,L,H4-part)
Hhumanis an estimate of body height, H4-partAre predicted values of the 4 approximate rigid body part lengths in step S3.
Convolutional neural network f4(x) The network structure diagram is shown in fig. 5, and the number of input and output nodes of each fully-connected layer is marked below the layer.
The Loss value Loss4 quantization neural network f is adopted in the step4(x) And outputting the error between the estimated height and the real height, specifically the square sum of the difference between the estimated height and the real height.
Loss4=|Hhuman-THhuman|2
The initial learning rate was set to 0.0001 with a reduction of 0.5 every 50 rounds. An Adam optimizer is used to calculate the update step size, and an estimate of height is obtained by minimizing Loss 4.
S5 optimization of convolutional neural network f using a evolving neural network framework4(x)
Directly using the convolutional neural network f in the step S44(x) The problem of overfitting is easily generated when the height of a human body is estimated, and the prediction accuracy is influenced. This step applies the developing neural network to the convolutional neural network f4(x) So as to improve the network precision, reduce the training time and prevent overfitting. In a convolutional neural network f4(x) When the number of iterations is less than 4 x 10 in the training process4When the method is used, the network is pre-trained, and 13 convolutional layers in 5 modules in the network work; when the number of iterations equals 4 x 104Then, storing the pre-training model; when the number of iterations is greater than 4 x 104And less than 6 x 104When the training is carried out, each module only reserves the first convolutional layer, and the training is carried out on 5 convolutional layers in total; when the number of iterations is greater than 6 x 104And less than 8 x 104And then, the first module and the second module recover one convolutional layer from the pre-training model respectively, and at the moment, the network has 7 convolutional layers in total. When the number of iterations is more than 8 x 104And less than 1 x 105And then, recovering one convolutional layer from the pre-training model by using a module three, a module four and a module five respectively, wherein the network has 10 convolutional layers. When the number of iterations is greater than 1 x 105In time, module three, module four, and module five each recover one convolutional layer from the pre-trained model, i.e., recover the first 13 convolutional layers. By fitting and breaking the fitting conditions iteratively until the network finds a globally optimal solution.
FIG. 5 is a convolutional neural network f4(x) Schematic diagram of convolution layer changing with iteration number after applying development neural network framework.
FIG. 6 is an example of the experimental results of the present invention.
Experiments prove that the technology can accurately estimate the height of the human body from a single depth image.
The techniques of the present invention may be implemented in computer software, for example written using Python, and the development environment may be, for example, the Windows 10 system and Pycharm Version 2018.3.
The hardware support required is:
CPU:
Figure BDA0002932543200000131
CoreTMi7-7700K processor
GPU: inviada (NVIDIA) GeForce RTX 2080 Ti Foundation Edition
The required deep learning environments are:
Pytorch 1.1.0
NVIDIA CUDA 10.1.120 driver
cuDNN-10.0-windows10-x64 v7.3.1.20
experiments prove that the invention can accurately predict the height of a human body from only a single depth image, and the tested person can be positioned at any position in the image to make any posture.
The above-mentioned embodiments are intended to illustrate the objects, aspects and advantages of the present invention, and it should be understood that the above-mentioned embodiments are only examples of the present invention, and are not intended to limit the present invention, and any modifications, equivalent substitutions, improvements and the like within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (9)

1. The method for estimating the height of the human body in any posture based on the single depth image and the multi-stage neural network can accurately measure the height of the human body in any position and any posture, and the average prediction accuracy rate reaches 99.1%. Our technique can be summarized as the following steps.
And S1, acquiring data, wherein the step acquires a human height data set with 2136 depth images by using the depth camera. The tested person can be positioned at any position in the acquisition range of the depth camera and can be put in any posture, including non-upright postures such as sitting, bending, walking and the like. The height of each volunteer and the lengths of four approximately rigid parts, namely the head, the upper body, the thigh and the shank, are measured and recorded. And marking the real values of the corresponding trunk images and the real values of the body part images for each depth image.
And S2, extracting the edge image, and extracting edge high-frequency information in the original depth image by using an edge detection operator.
S3, segmenting human body trunk, and designing a convolutional neural network f1(x) And the human body image is extracted from the original depth image and the edge image.
S4, recognizing body parts and designing a convolutional neural network f2(x) The torso image is further segmented into four approximately rigid body parts: head, upper body, thigh and calf, and obtaining body position image.
S5, predicting the length of the body part, and designing a convolutional neural network f3(x) The lengths of four approximately rigid body parts, i.e., the probe, the upper body, the thigh, and the calf, are predicted from the body part image and the original depth image.
S6, predicting the height of human body, designing a convolutional neural network f4(x) The human height is predicted through the original depth image, the body part image and the body part length. Meanwhile, different pooling strategies are adopted according to different input data characteristics, and the original input data is input into each convolution layer by using a jump connection structure.
S7, designing a developing neural network, predicting task characteristics based on height and a convolutional neural network f4(x) The framework of (2) and a training method of a network structure changing along with the iteration times are designed, and the fitting state is repeatedly destroyed until the neural network finds a global optimal solution.
2. The method of any pose human height estimation based on a single depth image and a multi-stage neural network of claim 1, wherein: and only using a single depth image to predict the height of the human body in a non-contact mode. The input data adopted in all steps of the invention are original depth images or intermediate expressions obtained by the original depth images. Only a low-price commercial-grade depth camera is used for collecting depth data to serve as an original depth image, so that equipment cost is reduced, and the method is easy to use, practical and popularize. The depth image is obtained by using an infrared technology, and the performance is not influenced even if no external light source exists, so that the method can be applied to the fields of night security monitoring and the like. The body height of a human body is predicted in the image in a non-contact full-automatic mode, only one image is needed to be shot, a measurer does not need to be in direct contact with a detected person, and the method is suitable for measuring the body height data of the human body in epidemic situation prevention and control normalization periods, physical examination, riding, ticket selling and other situations. The depth image does not contain human face features and clothing texture features, and the fact that the neural network learns the identity features of the detected person can be avoided only by using a single depth image, so that interference is generated on height estimation.
3. The method of any pose human height estimation based on a single depth image and a multi-stage neural network of claim 1, wherein: and extracting height prediction related features by using a multi-stage neural network, and converting the height estimation problem into a plurality of local small problems. By convolutional neural networks f1(x) Obtaining the image of the human body, and obtaining the image of the human body through a convolution neural network f2(x) Obtaining body part images through a convolution neural network f3(x) And obtaining the length of the body part, so that the human height estimation is decomposed into four approximately rigid parts of the head, the upper half body, the thigh and the shank which are respectively predicted, and the four parts of results are integrated into the measured human height. The benefits of this are: the height prediction is decomposed into four rigid part predictions, which is an easier problem; the lengths of the four body parts and the topological relationship between them can suggest the posture of the human body, thereby providing a favorable clue for height estimation; the height prediction problem is divided into four small local problems, and the excellent performance of the convolutional neural network on local perception can be fully utilized.
4. The base of claim 1The method for estimating the height of the human body in any posture based on a single depth image and a multi-stage neural network further comprises the following steps: convolutional neural network f1(x) And the method is used for segmenting the human body image from the single depth image. Since in the field of height measurement we define the distance between the apex of the head and the point of the sole of the foot when the body is upright as height, the positioning of the apex of the head and the point of the sole of the foot is of great importance. In the existing human body segmentation method, the segmentation at the body edge is often inaccurate, which influences the selection of the head vertex and the foot bottom point. Our method uses Canny operator to extract edge information from depth image, and enhances the edge of human body segmentation image.
E=Canny(X)
X represents the original depth image acquired by the camera and E represents the corresponding edge image extracted.
f1(x) Including five downsampling and five upsampling modules. In the up-sampling module, a module I and a module II both comprise 2 convolutional layers and activation functions, and a module III, a module IV and a module V all comprise 3 convolutional layers and activation functions. The down-sampling module is symmetrical to the up-sampling module, the first module, the second module and the third module all comprise 3 convolution layers and activation functions, and the fourth module and the fifth module all comprise 2 convolution layers and activation functions. Our method takes the original depth image X and the edge image E as input, and passes through a convolutional neural network f1(x) To obtain a human body trunk segmentation image prediction image T'.
T′=f1(X,E)
Convolutional neural network f1(x) Loss value Loss1 uses the average of the sum of squares of pixel-by-pixel differences of the predicted torso image and the true torso image, and the optimizer uses Adam.
Figure FDA0002932543190000031
N is the total number of pixels in the image, i is a certain pixel point in the cyclic variable representation image, and T is a real trunk image. The high-frequency information of the human body edge is input into the neural network, so that the accuracy of the edge in the human body trunk segmentation graph can be obviously improved.
5. The arbitrary pose human height estimation method based on a single depth image and a multi-stage neural network of claim 1, further comprising: convolutional neural network f2(x) The method is used for further dividing the human body image into four approximately rigid parts, namely a head, an upper body, thighs and calves, so as to obtain the human body image. Convolutional neural network f2(x) And f1(x) The network structures of the modules are the same, and each module comprises 5 down-sampling modules and 5 up-sampling modules, and each module comprises a plurality of convolution layers and an activation function. To a convolutional neural network f2(x) The prediction graph L ' of the human body part segmentation image is obtained by inputting the prediction graph T ' of the human body trunk segmentation image in the middle, because the prediction graph L ' of the human body part segmentation image is obtained by the convolutional neural network f1(x) The obtained trunk segmentation image T' has errors, so the images are simultaneously f2(x) The original depth image X is input to avoid accumulation of errors.
L′=f2(X,T′)
Convolutional neural network f2(x) Loss value Loss2 is the average of the sum of squared pixel-by-pixel differences between the predicted body region image and the actual body region image.
Figure FDA0002932543190000032
N is the total number of pixels in the image, i is a certain pixel point in the cyclic variable representation image, and L is a real human body part image. The optimizer employs Adam to minimize the error of predicted and real images until the network can robustly achieve convergence.
6. The arbitrary pose human height estimation method based on a single depth image and a multi-stage neural network of claim 1, further comprising: convolutional neural network f3(x) The method is used for predicting the lengths of four approximately rigid parts, namely the measuring head, the upper half body, the thigh and the lower leg. Convolutional neural network f3(x) Comprising 13 convolutional layers and 5 fully-connected layers and correspondingAnd the activation functions comprise 13 convolutional layers to form 5 upsampling modules, the first module and the second module respectively comprise 2 convolutional layers and activation functions, and the third module, the fourth module and the fifth module respectively comprise 3 convolutional layers and activation functions. And outputting a 1 x 4 vector representing the length of 4 body parts through 6 fully-connected layers, wherein the number of nodes of the 6 fully-connected layers is respectively as follows: 4096. 4096, 1000, 256, 64, 4. To a convolutional neural network f3(x) Meanwhile, in order to reduce accumulated errors, an original depth image X is input into the network, and length estimated values of 4 body parts including the head, the upper body, the thigh and the lower leg are obtained.
[Hhead Hupperbody Hthigh Hcalf]1*4=f3(X,L)
HheadIs the head length, HupperbodyTo upper body length, HthighIs thigh length, HcalfIs the calf length.
The Loss value Loss3 uses the sum of the squares of the difference of the predicted four-part length and the true value.
Loss3=|Hhead-THhead|2+|Hupperbody-THupperbody|2+|Hthigh-THthigh|2+|Hcalf-THcalf|2
THhead,THupperbody,THthigh,THcalfRepresenting the actual length of the head, upper body, thigh and calf, respectively. The optimizer uses Adam to predict the lengths of the 4 approximately rigid sites separately by minimizing the loss values.
7. The arbitrary pose human height estimation method based on a single depth image and a multi-stage neural network of claim 1, further comprising: convolutional neural network f4(x) The height of the human body is estimated by the body part image, the body part length and the original depth image. Convolutional neural network f4(x) Comprises 13 convolutional layers and 7 full-connection layers and corresponding activation functions, wherein the 13 convolutional layers form 5 or moreAnd the sampling module, the first module and the second module respectively comprise 2 convolutional layers and activation functions, the third module, the fourth module and the fifth module respectively comprise 3 convolutional layers and activation functions, the last convolutional layer result is expanded into a one-dimensional vector, and the estimated height value is output through 7 full-connection layers, wherein the number of nodes of the 7 full-connection layers is 4096, 1000, 256, 64, 16 and 1. In order to reduce accumulated errors and improve the accuracy of prediction, an original depth image X and a human body part segmentation image prediction image L' are input into a convolutional neural network f together4(x) In (1). Meanwhile, the invention also adopts a jump structure, and adopts different pooling strategies according to the characteristics of different input data: and adopting an average pooling strategy for the depth image and a maximum pooling strategy for the human body part segmentation image, so that input data with different scales are directly input into each convolution layer.
[Hhuman]=f4(X,L′,H4-part)
HhumanIs an estimate of body height, H4-partAre predicted values of the 4 approximate rigid body part lengths in step S5.
Convolutional neural network f4(x) Loss value Loss4 uses the square of the difference between the estimated height and the actual height.
Loss4=|Hhuman-THhuman|2
THhumanIs the true value of the height of the human body. The optimizer estimates body height by minimizing the loss value using Adam.
8. The arbitrary pose human height estimation method based on a single depth image and a multi-stage neural network of claim 7, further comprising: a hopping connection structure with a hybrid pooling strategy. Convolutional neural network f4(x) Having 13 convolution layers, at f4(x) The problem that the gradient disappears under the condition that the network layer number is deep can be solved by adding the jump connection, and meanwhile, the reverse propagation of the gradient is facilitated, and the training process is accelerated. In steps S3, S4 and S5, we use three neural networks to acquire the lengths of the head, upper body, thigh and lower leg of the human bodyEstimate and input as input data to the convolutional neural network f4(x) In the method, because the estimated value predicted by each neural network has an error with the true value, and the current error is input into the neural network of the next stage, the error is increased. We use the skip-join structure to input the original depth image X and the body part image L directly into the convolutional neural network f4(x) To minimize the accumulated error.
Since the original depth image X and the body part image L have different characteristics: the original depth image X has a plurality of noise points, which is expressed on the depth image, namely a plurality of extreme points, and if a maximum pooling strategy is simply used, the noise points are reserved, the depth information is lost, the network learning is interfered, and a larger error is caused; the body part image L is smoother and if an average pooling strategy is used, the gradient at the body part boundary is reduced, introducing errors. Therefore, an average pooling strategy is adopted for the original depth image X, noise of the depth image is smoothed, and the accurate original depth image under different scales is input into a network. And adopting a maximum pooling strategy for the body part image L, keeping the accuracy of the segmentation boundary, and still keeping gradient information when the image size is reduced.
Lnext=Maxpool(Lnow)
Xnext=Avgpool(Xnow)
LnowIs a body part image of the current scale, LnextIs the body part image of the next scale. XnowFor depth images of the current scale, XnextIs the depth image of the next scale.
The original information is input into each convolution module by adopting a jump connection structure of a mixed pooling strategy, so that the original image can be kept undistorted under each scale to the maximum extent, and the accuracy of network prediction is improved.
9. The method for arbitrary pose body height estimation based on a single depth image and a multi-stage neural network of claim 7, further comprising: developing neural networksArchitecture and a training method. The invention proposes to develop a neural network and apply it to a convolutional neural network f4(x) So as to improve the network precision, reduce the training time and prevent overfitting.
The main idea of developing neural networks is: in the network training process, when the network tends to converge, the architecture of the convolution layer part of the network is adjusted to jump out of the local minimum value, and a global optimal solution is searched.
The specific method comprises the following steps: in a convolutional neural network f4(x) When the number of iterations is less than 4 x 10 in the training process4In time, the network is pre-trained, the neural network f4(x) 13 convolution layers of the middle 5 modules are all in working states; when the number of iterations equals 4 x 104Then, storing the pre-training model; when the number of iterations is greater than 4 x 104And less than 6 x 104Time, neural network f4(x) Each convolution module only reserves the first convolution layer, and the total number of the convolution layers is 5 for training; when the number of iterations is greater than 6 x 104And less than 8 x 104And then, the first module and the second module recover one convolutional layer from the pre-training model respectively, and at the moment, 7 convolutional layers in the network participate in training. When the number of iterations is more than 8 x 104And less than 1 x 105And then, recovering one convolutional layer from the pre-training model by using a module three, a module four and a module five, wherein 10 convolutional layers in the network participate in training. When the number of iterations is greater than 1 x 105In time, module three, module four, and module five each recover one convolutional layer from the pre-trained model, i.e., recover the first 13 convolutional layers.
Developing a neural network can be accomplished by fitting and destroying the fitting conditions iteratively until the network finds a globally optimal solution. Since when the number of iterations is greater than 4 x 104And less than 1 x 105Only part of the convolutional layer is trained, so that the network training time can be reduced. When the network tends to be convergent, overfitting of the network is prevented by increasing or decreasing the convolution layer, so that the network jumps out of a local optimal solution to search a global optimal solution, and the accuracy of height estimation is effectively improved.
CN202110150551.6A 2021-02-03 2021-02-03 Method for estimating height of human body in any posture based on single depth image and multi-stage neural network Pending CN112861699A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110150551.6A CN112861699A (en) 2021-02-03 2021-02-03 Method for estimating height of human body in any posture based on single depth image and multi-stage neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110150551.6A CN112861699A (en) 2021-02-03 2021-02-03 Method for estimating height of human body in any posture based on single depth image and multi-stage neural network

Publications (1)

Publication Number Publication Date
CN112861699A true CN112861699A (en) 2021-05-28

Family

ID=75987778

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110150551.6A Pending CN112861699A (en) 2021-02-03 2021-02-03 Method for estimating height of human body in any posture based on single depth image and multi-stage neural network

Country Status (1)

Country Link
CN (1) CN112861699A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023047157A1 (en) * 2021-09-23 2023-03-30 Sensetime International Pte. Ltd. Image generating methods and apparatuses, and detecting methods and apparatuses

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109285139A (en) * 2018-07-23 2019-01-29 同济大学 A kind of x-ray imaging weld inspection method based on deep learning
CN109522924A (en) * 2018-09-28 2019-03-26 浙江农林大学 A kind of broad-leaf forest wood recognition method based on single photo
CN110020606A (en) * 2019-03-13 2019-07-16 北京工业大学 A kind of crowd density estimation method based on multiple dimensioned convolutional neural networks

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109285139A (en) * 2018-07-23 2019-01-29 同济大学 A kind of x-ray imaging weld inspection method based on deep learning
CN109522924A (en) * 2018-09-28 2019-03-26 浙江农林大学 A kind of broad-leaf forest wood recognition method based on single photo
CN110020606A (en) * 2019-03-13 2019-07-16 北京工业大学 A kind of crowd density estimation method based on multiple dimensioned convolutional neural networks

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YIN FUKUN等: "Accurate Estimation of Body Height from a Single Depth Image via a Four-Stage Developing Network", 020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, pages 8264 - 8273 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023047157A1 (en) * 2021-09-23 2023-03-30 Sensetime International Pte. Ltd. Image generating methods and apparatuses, and detecting methods and apparatuses
AU2021240272A1 (en) * 2021-09-23 2023-04-06 Sensetime International Pte. Ltd. Image generating methods and apparatuses, and detecting methods and apparatuses

Similar Documents

Publication Publication Date Title
CN111598998B (en) Three-dimensional virtual model reconstruction method, three-dimensional virtual model reconstruction device, computer equipment and storage medium
CN110827342B (en) Three-dimensional human body model reconstruction method, storage device and control device
Paragios et al. Non-rigid registration using distance functions
WO2017133009A1 (en) Method for positioning human joint using depth image of convolutional neural network
US20070196007A1 (en) Device Systems and Methods for Imaging
Chaudhari et al. Yog-guru: Real-time yoga pose correction system using deep learning methods
CN109978037A (en) Image processing method, model training method, device and storage medium
CN110853111B (en) Medical image processing system, model training method and training device
JP2022517769A (en) 3D target detection and model training methods, equipment, equipment, storage media and computer programs
CN111160111B (en) Human body key point detection method based on deep learning
CN113111767A (en) Fall detection method based on deep learning 3D posture assessment
Loureiro et al. Using a skeleton gait energy image for pathological gait classification
CN111507184B (en) Human body posture detection method based on parallel cavity convolution and body structure constraint
Yan et al. Cine MRI analysis by deep learning of optical flow: Adding the temporal dimension
CN111462274A (en) Human body image synthesis method and system based on SMP L model
CN106846372A (en) Human motion quality visual A+E system and method
CN113229807A (en) Human body rehabilitation evaluation device, method, electronic device and storage medium
CN114387317B (en) CT image and MRI three-dimensional image registration method and device
CN113822792A (en) Image registration method, device, equipment and storage medium
CN115346272A (en) Real-time tumble detection method based on depth image sequence
CN117274599A (en) Brain magnetic resonance segmentation method and system based on combined double-task self-encoder
CN112861699A (en) Method for estimating height of human body in any posture based on single depth image and multi-stage neural network
CN116843725B (en) River surface flow velocity measurement method and system based on deep learning optical flow method
CN113822323A (en) Brain scanning image identification processing method, device, equipment and storage medium
CN117392746A (en) Rehabilitation training evaluation assisting method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination